Data Provenance

Overview

Data provenance serves as an audit trail for data transformations, ensuring and demonstrating data quality at every stage. For each data element that is transformed, merged, or normalized into Particle’s persistence layer to build a comprehensive longitudinal patient record, a detailed provenance record is created. These records are readily accessible to customers, offering full transparency into the data's journey and transformations.

Value

  • Data integrity and trust: Provenance records ensure data integrity by providing a detailed history of data modifications, additions, and deletions. This helps establish trust in the accuracy and reliability of the data, as any changes can be traced back to their source.
  • Regulatory compliance: Compliance with regulations such as the Health Insurance Portability and Accountability Act (HIPAA) is crucial in healthcare. Provenance records assist in demonstrating compliance by providing evidence of data handling practices, data access, and data sharing activities.
  • Error detection: Provenance records aid in detecting and diagnosing errors or inconsistencies in healthcare data by providing a comprehensive record of data lineage. They provide valuable information to identify where and when an error occurred, facilitating corrective actions and ensuring data quality.
  • Accountability and liability: Provenance records enhance accountability and liability in healthcare by enabling healthcare organizations to attribute responsibility for data changes. This helps in investigations, audits, and legal proceedings, if necessary.
  • Data management: Provenance records support effective data governance and management strategies by assisting in monitoring data access, data sharing, and data usage. They enable organizations to enforce policies, track compliance, and maintain data privacy and security.

How does it work?

Whenever we process the data we receive - such as deduplicating, normalizing, or enriching it - we generate provenance resources to document the actions taken.

The provenance resource provides detailed information about:

  1. the activity that happened (e.g. deduplication, normalization, or enrichment)

  2. the agent who performed the activity

  3. the entity (i.e. data element) upon which the activity was performed

  4. the target resource affected by the data transformation - for instance, in the case of deduplication, the resource that was either 'deleted' or 'merged'