The Importance of Data Integrity in a Pharmaceutical R&D Environment

Apr 5, 2023 | Cybersecurity

The FDA is responsible for protecting public health by ensuring that human and veterinary drugs, biological products, and medical devices are safe, effective and of high quality. Verification of data integrity is a critical aspect of this mission, and the FDA therefore expects that all data submitted to the Agency is both reliable and accurate.

On September 16th, 2021, the Food and Drug Administration (FDA) informed several pharmaceutical companies that they would need to repeat some of the clinical trials they had conducted for their drug applications. The announcement came after the FDA discovered significant data integrity violations in two Indian contract research organizations (Synchron Research Services and Panexcell Clinical Lab) that were used by these companies during drug development, resulting in the submission of invalid study data. According to the FDA, this order applied to over 100 applications for drugs that had been approved, tentatively approved, or were currently under review.

Data integrity is a broad term that describes the need for accuracy, consistency, and completeness in a dataset. Enforcement actions by the FDA for data integrity violations can result in a wide variety of negative consequences for pharmaceutical companies – facility shutdown, delayed/denied drug approvals, remediation costs, product recalls, loss of customers due to lack of trust, etc.

One of the most important assets of pharmaceutical companies is the data that is created during discovery and development activities of a novel therapeutic drug. R&D data integrity is essential for making good decisions about what target compounds should enter preclinical trials, effective selection of candidate molecules for clinical trials, maintaining GCP (Good Clinical Practice) compliance, retaining market exclusivity though the patenting process, and ultimately for delivering safe, effective, and potentially lifesaving, products to patients.

While data integrity has not been a primary focus for pharmaceutical R&D laboratories in the past, it is now seen as an important issue that should be considered throughout the whole lifecycle of the drug – from drug discovery through manufacturing. The FDA expects data integrity to be prioritized by scientists from the moment they believe their research may contribute to the production of a drug approved for use by patients.

The massive amounts of data gathered by R&D laboratories from a variety of different sources during drug discovery and development can make maintaining data integrity challenging. In addition, many organizations have legacy systems that utilize shared logins, with insufficient detail for data reviews and required audit trails, which only add to these challenges. In this blog, we will discuss data integrity best practices for the pharmaceutical R&D environment.

Data Integrity Principles – ALCOA CCEA

In order to make sure data integrity is preserved over the full drug development lifecycle, regulators developed the ALCOA principle, which was later revised to ALCOA+. According to the FDA, data should be attributable, legible, contemporaneously recorded, original (or a true copy), and accurate (ALCOA). In addition, data should be complete, consistent, enduring and available throughout its life cycle — characteristics that make up the ALCOA+ (or ALCOA CCEA) principles.

Attributable Data are traceable to the originator, which could be a person and/or a computerized system. Data records should therefore include information about who or what recorded the data and when it was recorded. This includes tracking any changes made to data (i.e., who made the change and when).

Legible Data are readable and understandable. The original data record, along with its attributions and metadata, and any changes to these must remain legible throughout the life cycle of the record.

Contemporaneous Data are recorded at the time they are generated or observed, and the record should include a time entry matching the time of the observation.

Original Data records must be maintained as the file or format in which the data was first generated (e.g., first paper record of manual observation, or electronic raw data file from a computerized system). Although verified copies may be used in place of the original, this should follow an SOP for creating such copies, including thorough documentation of the process.

Accurate Data are correct, truthful and made with appropriate precision. This includes any error corrections and/or edits.

Complete Data includes all descriptions, metadata and associated information necessary to reconstruct the full meaning and context of the data record.

Consistent Data are gathered such that all elements of the data record follow in an expected sequence, using a system that enforces the use of approved data acquisition and analysis methods, reporting templates, and laboratory workflows.

Enduring Data does not change or disappear over time. Data records are recorded in a permanent medium (paper or electronic) and continue to be retained in a human readable format for as long as specified in applicable record retention requirements.

Available Data are maintained in such a way that they are accessible and retrievable in a reasonable time when needed.

Data Integrity in Research

As the NIH has stated, “Good science requires good record keeping. Good record keeping promotes both accountability and integrity in research.” Experimental data gathered via instruments is utilized for the initial identification of target compounds, the fine-tuning of potential new drugs, and the demonstration of efficacy necessary to begin pre-clinical trials. Poor records management and data integrity issues in research records can result in an unsuccessful or delayed patent filing process, difficult collaboration and project handoffs between research teams, poor decisions on future investments in pre-clinical studies, and even rejection of an Investigational New Drug (IND), New Drug Agreement (NDA), or Biologics Licensing Agreement (BLA) by regulatory agencies.

Research labs don’t need to meet the comprehensive regulatory guidelines mandated by the FDA for manufacturing. However, in order to assure research leaders, patent reviewers, and regulatory agencies that studies were conducted carefully and with scientific integrity, it is important that labs follow Good Documentation Practices (GDP). The ALCOA+ principles defined above are essentially an abridged version of core GDP principles.

Data Integrity in Clinical Studies

Data produced by bioanalytical laboratories is critical for supporting BLA and NDA submissions, and also Phase 1-4 clinical trials. Yet, ensuring data integrity in a bioanalytical laboratory goes beyond just non-clinical and clinical data capture. There are many other processes that need to be considered – sample and analyte shipping and stability, method and instrument validation, sample and study management, documentation, and training. In addition, laboratories must comply with the latest requirements of regulatory agencies in each country where sponsors plan to market their products.

The FDA’s Guidance for Bioanalytical Method Validation provides details on validation of analytical methods used in human clinical pharmacology, bioavailability (BA), and bioequivalence (BE) studies that require pharmacokinetic, toxicokinetic, or biomarker concentration evaluation. This Guidance also provides information that can help inform the development of bioanalytical methods used for nonclinical studies that require toxicokinetic or biomarker concentration data. If a method is utilized which does not comply with this Guidance, then auditors will not trust the results of the assay.

Computer system validation (CSV) is another important aspect of ensuring that data produced or generated by a system is reliable and accurate. Federal regulation 21 CFR Part 11 requires laboratories to conduct system validations to produce documented evidence that the system does what is intended and that users of the system can detect when the system is not working properly.

Integrity of the analyte or drug product needs to be maintained from manufacturing of the drug product, to handling, shipment, and storage procedures. This includes ensuring that the product certificate of analysis is attached to each analytical run and that the same lot of drug product is used from method development throughout sample analysis, providing consistency in results. Without complete knowledge of the conditions that an analyte has been through, the accuracy of assay data will be called into question by an inspector.

In order to ensure the integrity of the analyte or drug product, labs must associate the product certificate of analysis with each analytical run and make sure that the same lot of drug product is used from method development throughout sample analysis to provide consistency in results. In addition, labs must make sure that they have an accurate record of the conditions the analyte has been through by monitoring and tracking all movements into and out of storage (e.g., temperature monitoring records should be included with shipping records). Maintaining accurate chain of custody information on the analyte ensures that the composition of the analyte is not unintentionally altered and the credibility of the study is not threatened.

Integrity of documentation is another crucial aspect of data integrity in an R&D environment. Each process in a bioanalytical laboratory that is involved in a bioanalytical study must be fully documented and follow ALCOA+ standards. This includes training records on all SOPs for instrument use, reagent creation, data analysis, and analytical methods. This allows researchers to prove to auditors that everyone involved in the study is properly trained, the steps of the study can be completely replicated, and the full life cycles of all clinical samples are being monitored.

Data Integrity Best Practices for Pharmaceutical R&D

Data integrity in R&D laboratories is best ensured by application of a best practice methodology, which includes the following:

R&D Data Integrity Policy Scope. Data integrity policies should be defined that cover:

  • The complete R&D data lifecycle regardless of method of generation or capture (paper, electronic, hybrid).
  • Data created by both internal and external (e.g., CROs, partnerships, collaborations, etc.) R&D activities.
  • Data generated through re-use activities, where re-use is defined as data used for a purpose other than that for which it was originally collected/generated.
  • Both regulated (e.g., GxP) and non-regulated R&D activities

Focus on People. Company management must work to build a culture which supports the preservation of data integrity.

  • The culture should be one where mistakes are seen as learning opportunities, so staff feel comfortable raising data integrity concerns and issues.
  • Staff should receive training to enable them to understand the importance of data integrity, how to properly manage research data within their field of specialization, and how their work directly contributes to preserving data integrity.
  • Staff should know the process for escalating actual or potential DI issues so corrective and preventive action plans can be designed and implemented where needed.

Proper Planning. Data integrity in an R&D environment relies on proper collection, organization, documentation, standardization, storage, sharing, and archiving of research data. To accomplish effective execution of these fundamentals, a comprehensive data management plan should be in place for every research project. This plan should be written during the design phase of the project, before any data is collected. A good data management plan will contain detailed information about the types of data that will be produced/used during the research project, procedures that define how data will be properly handled during the project, and risk-based management monitoring and quality assurance processes. All procedures should have defined ownership and accountability to support maintenance of data integrity throughout the data lifecycle. Finally, the data management plan should detail any relevant legal and/or contractual data handling requirements.

Focus on Technology. Technology greatly benefits laboratories through increased efficiency and reduced human error, allowing them to save time and money. R&D labs should ensure that:

  • quality Electronic Laboratory Notebooks (ELN) are implemented to ensure scientific data is captured, documented, stored, witnessed, and archived in a safe and secure manner.
  • requirements for new computerized systems are built with a goal to preserve data integrity and reduce manual process steps where possible.
  • computerized system updates include an evaluation to support improvements in data integrity.
  • where necessary, computerized systems undergo and maintain adequate validation

Focus on Third Parties. In order to maintain data integrity, labs should ensure the following:

  • All contractual terms with third parties are aligned with regulatory requirements.
  • Procedures for supervising third parties are in place, ensuring that all data delivered complies with quality standards, data ownership terms, and data integrity principles defined in the contract.
  • All data received from third parties is evaluated by trained employees to determine whether it follows contractual guidelines.


The bottom line: data integrity violation can have personal, legal, financial, and reputational costs for an organization. Scientific laboratories must be fully aware of the ways in which data integrity is maintained throughout the entire data lifecycle. This can be a challenging task for R&D laboratories due to the immense amounts of data involved in drug discovery and development and the constantly changing standards and data integrity guidelines. Incorporating data integrity best practices into research and clinical studies allows R&D labs to face these challenges effectively and ensure that data is accurate, thorough, and properly archived. This ultimately makes certain that pharmaceutical drugs are compliant with GMP regulations and that patients are receiving the safest and most effective treatments possible.


Dana Karen

About the Author

Mary Beth Walsh

Mary Beth Walsh is the founder and president of Kalleid, a consulting firm based in Cambridge, Massachusetts that provides an integrated portfolio of services supporting IT implementations for clients. She has been in the biopharmaceutical industry for over 15 years and has played various roles in both pharma and software serving the R&D community. She spent about 10 years in product management with BIOVIA/Accelrys, working on products in both the cheminformatics and bioinformatics portfolio, from ELNs to next-gen sequencing analytics on Pipeline Pilot. Her education includes a Master of Science (M.S.) in Biophysical Chemistry from Yale University (2001) and Master of Business Administration (MBA) from Babson College (2013).

About Kalleid

Kalleid, Inc. is a boutique IT consulting firm that has served the scientific community since 2014. We work across the value chain in R&D, clinical and quality areas to deliver support services for software implementations in highly complex, multi-site organizations. We pride ourselves in supporting the success of your IT projects and overall organizational transformation efforts with a wide range of interconnected services. At Kalleid, we understand that cybersecurity is a critical concern for research organizations, and we use the vast knowledge accumulated by specialists on our team our to deliver customizable security plans that are aligned with our client’s goals. If you are interested in exploring how Kalleid professional services can benefit your organization, please don’t hesitate to contact us today.