Data Quality During Data Retrieval Process

How we process data when retrieving it from Health Information Exchanges

As a company, we set our goal to become an analytics provider building on the strengths of Health Information Exchanges. To enable good analytics, we have been investing heavily in ensuring data quality. This is a multi-faceted process - it starts with designing our data retrieval processes to solve for a few potential gotchas, followed by investing in internal data processing, as well as ensuring we have robust data quality processes for our analytics. In other words, we care about quality on the inputs, during processing, and at the output stage. This post explains what we do at the input stage - how do we ensure quality during the retrieval stage.

Geographic / Facility Coverage
Currently, we are connected to the three national networks (Carequality, Commonwell, and eHealth Exchange). We also recently expanded to add Healthix, the public HIE for downstate New York, after we noticed that some downstate New York facilities had poorer coverage on national networks. We will likely expand to additional state HIEs. This wide net allows us to give healthcare providers a more complete view of a patient's medical history, regardless of where they received care.

Streamlining Data Retrieval with Record Locator Service
One of the core components of our data quality strategy is the use of our proprietary Record Locator Service (RLS). This service helps pinpoint the exact location of patient records across various networks. We invested heavily in this area to ensure we maximize our ability to locate records, while being careful not to over-query the networks. In evaluations with customers, we typically can pull about 10-15% more data compared to other providers.

Enhancing Patient Identification with EMPI
The Enterprise Master Patient Index (EMPI) is a key service we use as part of the query process. By accurately identifying patients, the EMPI prevents duplicate records and ensures that all data retrieved is associated with the correct individual. This is essential for both the integrity of the data and the safety of patient care.

Address Verification
Accurate data starts with accurate inputs. We use a service that helps us verify and normalize patients' addresses. This ensures that all location data we collect and store is precise, which is especially important when tracking patient information across multiple care settings.

Record Validation
We observed situations where the networks returned records incorrectly. This poses a risk to patient safety and confidentiality. As part of that partnership, we conduct an additional check to verify that the patient data demographics inbound from the customer match the patient data demographics in clinical documents obtained from the EMRs.

Monitoring Network Health
To ensure that we’re always getting all the necessary data, we closely monitor network error rates. We carefully tune our retries, and timeouts to ensure we get all the data we can. We also regularly communicate with other network implementers when we see issues.

Adopt New Network APIs promptly
To ensure we get the most from the networks, we participate in the committees of the networks, and adopt any new APIs promptly after release.

Robust Logging & Metrics
Data quality program is not only about safeguards for data quality, but also about responding to customers and other network participants, as well as to help debug issues and improve service. We also have some internal dashboards and other tools that allow us to understand metrics we are seeing, which ultimately helps ensure good service.

Getting As Much Data As Possible

We tune our timeout and retries to ensure we get as much data as possible.

While not strictly a data quality concern, availability is hugely important to our customers as we retrieve tens of millions files per month. We see different querying patterns, from querying one patient at a time, to receiving a batch of several hundred patients or more when a customer is querying their patients who have appointments next day. Our event-based architecture is highly scalable to easily absorb high query volumes without slowing down our processing speeds.


When discussing data and data quality, it’s always important to note limitations. Here are a few:

  • While the overwhelming majority are, not all providers are part of health networks yet. Not all EHRs support connectivity. This means we can’t always access every piece of data through our HIE.
  • Some data exists in less accessible formats like PDF or TIFF files, which can be challenging to integrate seamlessly.
  • Mobility of populations also poses a significant challenge, as patients moving from one location to another may have fragmented health records across various systems. Since we can’t query records across all providers for each patient, that may cause gaps, even with our sophisticated Record Locator Service.

In conclusion, maintaining high-quality data when retrieving records from health information exchanges involves a combination of advanced technology, careful monitoring, and constant adaptation to new challenges.