Process for Conformance Checking a CDA Document

Previous: Clinical Safety: Trust between Humans and Machines »

Process for Conformance Checking a CDA Document

Sep 8, 2013

One of the things I’ve done a lot of this year is conformance checking CDA documents in several different contexts. Since someone asked, here’s my basic methodology for conformance checking a CDA document: 1. Read it by hand

In this step, I read the XML directly in an XML editor. I’m doing the following things:

Getting an overall sense of the content of the document
Checking for gross structural errors that might prevent automated checks
Checking that the document metadata (realm, templateId, id, code) makes basic sense

2. Validate the document

In this step I do the following things:

Check that the document conforms to the base CDA schema
Use appropriate schematrons (if available) to check the document against the applicable CDA IG (if there’s no schematron, then I’ll have to do a manual comparison)

For each error reported, the first thing to investigate is whether the error is a true error or not. There’s valid CDA documents that don’t conform to the schema, and whether that matters or not depends on the context. There’s always areas where the schematrons themselves may falsely report errors, so everything has to be checked.

If there are schematrons, I always double the document anyway, and check that it’s valid against the specification, since the schematrons cannot check everything. I particularly keep my eyes open for co-occurrence constraints, since these are often missed in schematrons

3. Check the data

The next step is to manually review a number of specific types of data in the document:

Dates - are the document and event dates internally coherent? do intervals finish after they start? are the timezones coherent (they often aren’t). Do the precisions make sense? (I particularly investigate any date times with 000000 for the time portion)
Codes - are the code systems registered? Are the codes valid? (Private codes can’t be checked, but public ones are often wrong - check code & display name). Check for display names with no codes, mismatches between codes and originalText. Check version information if provided. Some of these checks can be automated, but most can’t
Identifiers - do the root values make sense? Are the OIDs registered? Are UUIDs used properly? are any of the identifiers re-used in the document? should they be? (often the same participant gets different identifiers in different places in the document when they shouldn’t, or vice versa)
Quantities- are the UCUM units valid? (If they have to be)
RIM structural codes - are these correct?

I nearly always find errors in these areas - it’s real hard to get this stuff correct. This is useful: https://hl7connect.healthintersections.com.au/svc/ids

4. Extensions

Check for extensions. What extensions have been added? Are they valid against the rules laid down in the relevant IG? (there’s pretty much no rules in the CDA standard itself)

5. Check narrative

I render the document using an appropriate/applicable stylesheet. I check it for base coherence - one thing to particularly look for is information that is likely to come from pre-formatted ascii that hasn’t been appropriately styled. Test data is often short, whereas real clinical data is longer; this is easy to miss on the part of the developers. Then I systematically check the narrative twice:

I read the narrative, and ensure that the information in the narrative doesn’t disagree with the data in the entries
I read the entries, and check that the narrative agrees with the data

While I’m doing this, I make a list of information that’s in the narrative and not the data, or vice versa. It will depend on the IG and the rules it makes as to whether anything on this list is a problem or not

6. Review the Clinical Picture

Finally, I review the clinical picture described by the data in the document. Does it make sense at all? Very often, actually, it doesn’t, because the document is filled with test data that doesn’t match a real clinical picture. But in spite of that, this a very useful step - I’ve caught some fundamental errors in implementation or even understanding by querying things in the document that don’t make sense.