Demonstrating the Data Integrity of routinely collected healthcare systems data for Clinical Trials (DEDICaTe)

This project is a collaboration between the MRC Clinical Trials Unit at UCL, NHS England (formerly NHS Digital), and the University of Oxford, and was funded by HDR UK (Director’s Innovation Fund).
We used a data intelligence platform (Collibra) to record provenance & integrity of NHS datasets including: Hospital Episode Statistics (HES): Admitted Patient Care, Outpatients, Critical Care, and the Civil Registration of Deaths (CRD).
The process of ingesting the metadata and lineage information was semi-automated in NHS England’s Central Metastore aka "single source of truth". We have developed an operating manual which provides guidance on how to ingest metadata.
Regulatory guidance from the MHRA states that real-world data (such as HES) must be shown to be of "sufficient quality" when used in clinical trials. It states that "processes are established to ensure the integrity of the data from acquisition through to archiving and sufficient detail captured to allow for the verification of these activities."
The MHRA’s GXP guidance on data integrity describes a risk-based approach to data management, which covers data integrity risk, criticality and the data lifecycle. The guidance states that the system must be documented, showing an acceptable state of control based on data integrity risk.
Consequently, we described the integrity of HES Admitted Patient Care and CRD datasets in a 50-page Zenodo publication, and DEDICaTe builds on this work to make it available through the Central Metastore.

Overview of NHS England’s DEDICaTe work

Name Lineage View PDF Central Metastore Link

Whole Data Set Business Lineage Diagrams

Hospital Episode Statistics (HES) – Admitted Patient Care (APC), Out Patient (OP) and Critical Care (CC) HES APC, OP and CC Whole Data Set Business Lineage Summary View
HES APC, OP and CC Whole Data Set Business Lineage Fully Expanded View
HES APC, OP and CC Whole Data Set Business Lineage Partially Expanded View
Civil Registration – Deaths (CRD) CRD Whole Data Set Business Lineage

Field Level Lineage Example Diagrams

With Code List CRD Field Level Lineage with Code List for SEX
With Derivation CRD Field Level Lineage with Derivation for Confirmed NHS Number

HES APC Field Level Lineage with Derivation for BEDYEAR
With Removal HES CC Field Level Lineage with Removal for Invalid Discharge Date
With Validation HES OP Field Level Lineage with Validation Expanded View for Ethnic Category
HES OP Field Level Lineage with Validation Summary View for Ethnic Category

Slide Deck Overview

Overview of NHS England DEDiCaTe Work  

Video demo overview

Civil Registration - Deaths data lineage demo (mp4, 18:15)  
Hospital Episode Statistics data lineage demo (mp4, 27:12)