Data Quality: Data Processing

Our core data processing platform was built a couple of years ago, and recently we have been adding a lot of additional functionality. This page describes a few key aspects of the current generation. We continue investing heavily in improving it based on the last few years of experience, and will be rolling out a lot of new features over the next few months.

CCDA to FHIR Conversion

Data from the networks is stored as CCDA documents, an XML format created by HL7 to enable EHR systems to exchange data. The format is complex but very rich. We evaluated multiple open source libraries that convert CCDA to FHIR, some still used by other players, but did not feel any of them met our quality standards. We chose a proprietary approach that allowed us to meet the initial needs of our customers.

Improved CCDA Parsing

Our development team has enhanced our CCDA parsing capabilities, improving its data quality and enabling us to more easily convert to other formats. Since most of the data from the networks comes in CCDA, we are extracting the maximum useful information.

FLAT Format for Analytics

FHIR is designed for data interoperability, and not analytics. That makes it difficult for analysts to import it into their data marts, and run additional analytics. This is why we created a simplified flat format that’s more convenient for analysts and data scientists to use.

Data Deduplication

CCDA files have many repeated elements, with some elements such as patient and providers repeating many, many times. We implemented some simple deduplication logic as we convert to FHIR. For some elements, we saw a reduction by over 90%!

Deltas & Incremental Data

This is not strictly a data quality feature, but usability of data is related to quality. Deltas enabled customers to request only net new data (available for FHIR and FLAT).

Provenance of Data

We implemented the ability to trace the data through to its original files. This is crucial for our customers to understand where the data came from, and enables clinicials to go to the source to look up additional context about the patient.


Additional Data Sources and Data Enrichment

We plan to expand the range of data sources available on our platform. This addition will broaden the scope of data analysis and support a wider range of healthcare applications, enhancing insight into patient care.

Clinical Concepts & Normalization

We have ability to normalize code sets to make it easier to work with data for analytics.

Facility Enrichment

We have some additional data sources that help us understand the facilities better. Often, inside the raw data, we just have the practice address, and we need to match it to which hospital system it's part of, and lookup its NPI.

Provider Match and Affiliation Information

Our Provider Enrichment feature improves provider matches and affiliations.

Medication Fill Data through SureScripts We partnered with Surescripts to enable us to add medication fill data to our data set.

Patient Demographics Enrichment

We recently implemented additional capabilities to enrich data about patients, such as phone numbers, alternate spellings, and previous addresses. This can be useful in transitions of care scenarios, during outreach to patients. We can also use the enriched demographics to return much more data from the networks.