Demonstrating the Data Integrity of routinely collected healthcare systems data for Clinical Trials (DEDICaTe)
This project is a collaboration between the MRC Clinical Trials Unit at UCL, NHS England (formerly NHS Digital), and the University of Oxford, and was funded by HDR UK (Director’s Innovation Fund).
We used a data intelligence platform (Collibra) to record provenance & integrity of NHS datasets including: Hospital Episode Statistics (HES): Admitted Patient Care, Outpatients, Critical Care, and the Civil Registration of Deaths (CRD).
The process of ingesting the metadata and lineage information was semi-automated in NHS England’s Central Metastore aka "single source of truth". We have developed an operating manual which provides guidance on how to ingest metadata.
Regulatory guidance from the MHRA states that real-world data (such as HES) must be shown to be of "sufficient quality" when used in clinical trials. It states that "processes are established to ensure the integrity of the data from acquisition through to archiving and sufficient detail captured to allow for the verification of these activities."
The
MHRA’s GXP guidance on data integrity describes a risk-based approach to data management, which covers data integrity risk, criticality and the data lifecycle. The guidance states that the system must be documented, showing an acceptable state of control based on data integrity risk.
Consequently, we described the integrity of HES Admitted Patient Care and CRD datasets in a
50-page Zenodo publication, and DEDICaTe builds on this work to make it available through the Central Metastore.