Data Quality During Data Processing

Our core data processing platform was built a couple of years ago, and recently we have been adding a lot of additional functionality. This page describes a few key aspects of the current generation. We continue investing heavily in improving it based on the last few years of experience, and will be rolling out a lot of new features over the next few months.

CCDA to FHIR Conversion.

Data from the networks is stored as CCDA documents, an XML format created by HL7 to enable EHR systems to exchange data. The format is complex but very rich.

We evaluated multiple open source libraries that convert CCDA to FHIR, some still used by other players, but did not feel any of them met our quality standards. We chose a proprietary approach that allowed us to meet the initial needs of our customers.

Flat Format for Analytics.

FHIR is designed for data interoperability, and not analytics. That makes it difficult for analysts to import it into their data marts, and run additional analytics. This is why we created a simplified flat format that’s more convenient for analysts and data scientists to use.

Data Deduplication
CCDA files have many repeated elements, with some elements such as patient and providers repeating many, many times. We implemented some simple deduplication logic as we convert to FHIR. For some elements, we saw a reduction by over 90%!

Deltas / Incremental Data
This is not strictly a data quality feature, but usability of data is important as well. We recently implemented the ability for customers to request only net new data.

Provenance of Data
We implemented the ability to trace the data through to its original files.

We are currently working on many new features to improve our data quality and to make it even easier to do data analysis with Particle data.


A Lot Of New Features on Their Way
The above provides a baseline that makes the data easier to work with. However, this is just the beginning. We have an aggressive roadmap that we believe will take Particle data to the next level, from adding more data sources, to more robust mapping of clinical concepts, to improvement in data quality. Please reach out to your account representative to understand availability.

Improved CCDA Parsing
Our development team is enhancing our CCDA parsing capabilities, improving its data quality and enabling us to more easily convert to other formats. Since most of the data from the networks comes in CCDA, we want to extract the maximum useful information.

Additional Data Sources
We plan to expand the range of data sources available on our platform. This addition will broaden the scope of data analysis and support a wider range of healthcare applications, enhancing insight into patient care.

Schema Optimized for Analytics
While we will continue supporting FHIR, we know it’s not the best tool for analytics. We are evaluating the open source analytics schema from the Tuva Project. We are very excited about the possibility to be part of that ecosystem, including access to the analytics that the community has built.

Clinical Concepts & Normalization
Development is underway to better capture and organize clinical concepts within our platform. We plan to normalize our data to standardize clinical code sets, which will facilitate improved data consistency and usability for analytics purposes.

Provider Match and Affiliation Information
We are developing features that will improve provider matches and affiliations. This information will be useful in a variety of places, starting with helping improve care coordination around referrals.

Rigorous Data Quality Checks with Great Expectations Library
We are implementing the Great Expectations library to perform rigorous data quality checks. This development is crucial for maintaining high standards of data integrity and accuracy, supporting reliable healthcare decision-making.