Identifiers in CDA Documents- Reporting Tool
Jul 26, 2012This post is prompted by the intersection of two issues: * Conclusions from quality checking CDA documents in the Australian National EHR Program
- A series of questions I took privately about how identifiers work in Consolidated CDA
This post explains how identifiers are supposed to work in a CDA document, and introduces a reporting tool to help implementers assess the quality of identifier usage in a CDA document.
@ID attribute
The first kind of identifier in a CDA document is the “ID” attribute that can appear in the following places:
- Section
- ObservationMedia
- RegionOfInterest
These are added so they can be a target of a linkHtml or renderMultiMedia - i.e. the narrative element
The ID attribute also exists on the most of the narrative elements, so that they can be the target of an originalText reference in a CD data type, to indicate that the source of this code is this particular text. This is a very advanced usage. It would also be possible, using this method, to make any narrative element the target of a linkHtml reference, but to my knowledge the CDA specification doesn’t say if this is not legal (I think it’s intended that it’s not).
The ID attribute is also used for references from footnoteRefs to footnotes.
These are the only allowed uses of xml:id attributes in a CDA document. It can’t be used to indicate that this [thing] here is the same as that [thing] over there (i.e. this section and that section share the same author). To do that, you have to use logical object references using the id element
id element
Many elements in the CDA document have a child “id” element that serves to identify the class that contains them. Technically, this is the RIM classes Entity, Role, and Act, which generally are allowed to carry one or more identifiers in the id element
Note that this means that some CDA elements have both a child element “id” and an attribute “ID”. Some tools struggle with this. I would’ve thought that such tools were long fixed - it’s not that uncommon to have duplicate names between attributes and elements, since it’s not wrong, but I’ve found a few dev tools that don’t cope with this in the last 12 months. The only solution is to get back to the maintainer of the tools, and screech loudly at them till they fix their tool.
The id element has two important attributes: root and extension. The root has to be either an OID or a UUID, and an extension - any string not including whitespaces - may be present. The identifier (either the root alone if no extension, or the root+extension) must be globally unique.
I’ve found that this root with optional extension business is a lot harder to grasp than it sounds, partly because OIDs have an internal root/extension structure, and so it’s really unclear whether your leaf concept should be in the root or the extension. Say, for example, you have a medical record number, a six digit number, and you assign an OID for it, 2.16.840.1.113883.19.1. Should you represent your MRNs as
<id root="2.16.840.1.113883.19.1.45235"/>
or as
<id root="2.16.840.1.113883.19.1" extension="45235"/>
Generally, I prefer the second (it allows leading zeros, alpha characters if they become required, and is easier to pull out just the MRN), but both forms are valid, and the decision rests with the person who first registers the OID. And if you look in the OID registry, registered OIDs rarely explain which form is correct.
Another confusing thing is whether an extension is allowed or required if the root is a UUID. It’s allowed - and whether it’s required depends on where the unique part comes from. If I’m going to use a stream of unique numbers to actually make the value unique, and I’m just using the UUID to provide a globally unique space for them, then there’ll be an extension:
<id root=655f67b1-2b11-4038-b82f-f6ab2f566f87" extension="1234"/>
As a rule of thumb, if the UUID is registered in the HL7 OID registry (as 655F67B1-2B11-4038-B82F-F6AB2F566F87 is) then you need an extension for the actual unique part. (Note that the UUID is supposed to be represented in lowercase even though the schema doesn’t say so - and irrespective of what case is registered in the registry).
For any Australian readers, if you aren’t sure about this: consult the new Australian handbook on representing identifiers, see my earlier blog post, or ask me.
Unique Identifiers
The fact that identifiers are required to be unique means two things:
- The identifier uses a properly allocated OID, or a generated UUID, so that no one else would accidentally use it. This sounds hard, but it’s actually relatively easy; generate a GUID (Ctrl-Alt-G in most IDEs), or just register an OID at theHL7 OID registry, but register it carefully, at a fine enough scope that this what you want to use
- You have to use in a disciplined fashion, so that you only use it for one thing.
The second part turns out to be harder than it sounds. The problem is that there’s no tool to alert you when you copy paste an identifier from one part of the document to another (or from one part of your code to another). I see too many documents that contain duplicate identifiers - that is, the same identifier is used on different elements that represent different objects.
One of my correspondents asked why we don’t simply make a rule that you can’t have duplicated identifiers in a CDA document, like we have with the ID attribute. This would prevent accidental or lazy use of the same identifier again - but it’s not possible, because there’s valid cases for using the same identifier more than once
Identifiers are not unique in a document
This occurs when the same concept can appear multiple times in the document. For example:
- When the same template is used multiple times
- When the same person is both author and legalAuthenticator
- When the same organisation employs all the personal and scopes the patient for the document
So these are all common cases. Other than the template id, a natural question that arises is about the relationship between two instances of the same object in the same document. Take, for example, this fragment of an author from an Australian CDA example:
<author>
<assignedAuthor>
<id root="7FCB0EC4-0CD0-11E0-9DFC-8F50DFD72085" />
<id root="1.2.36.1.2001.1003.0.8003611234567890" />
<addr use="WP">
<streetAddressLine>1 Clinician Street</streetAddressLine>
<city>Nehtaville</city>
<state>QLD</state>
<postalCode>5555</postalCode>
</addr>
<telecom use="WP" value="tel:0712341234" />
<assignedPerson>
<name>
<prefix>Dr.</prefix>
<given>Good</given>
<family>Doctor</family>
</name>
</assignedPerson>
</assignedAuthor>
</author>
Note to alert Australian readers: yes, I moved HPI-I from it’s normal place, since this is for international readers.
This author has two identifiers, what we might call a technical identifier (the UUID) and the real-world identifier, which is the number by which the author is registered with the national authority. That’s an arbitrary distinction that’s not made in the document itself - the only way to know this is to consult the definitions of the identifiers
For most documents, the author is also the legal authenticator, so we’re going to repeat all the same information there too:
<legalAuthenticator>
<assignedEntity>
<id root="7FCB0EC4-0CD0-11E0-9DFC-8F50DFD72085" />
<id root="1.2.36.1.2001.1003.0.8003611234567890" />
<addr use="WP">
<streetAddressLine>1 Clinician Street</streetAddressLine>
<city>Nehtaville</city>
<state>QLD</state>
<postalCode>5555</postalCode>
</addr>
<telecom use="WP" value="tel:0712341234" />
<assignedPerson>
<name>
<prefix>Dr.</prefix>
<given>Good</given>
<family>Doctor</family>
</name>
</assignedPerson>
</assignedEntity>
</legalAuthenticator>
Note that the element is different, but everything else is the same. However, you could argue that this is redundant - we already provided all the information about the person the first time, and the second time, all we need to do is provide an identifier:
<legalAuthenticator>
<assignedEntity>
<id root="7FCB0EC4-0CD0-11E0-9DFC-8F50DFD72085" />
</assignedEntity>
</legalAuthenticator>
On reaching the second case, you go and resolve the first identifier, know that this is referring to the same actual object as the first case, and fill in all the details accordingly. However this is complicated by the fact that in some cases where you can do this, the kind of information you can represent is different in each case (author and custodian, for instance), so you mightn’t be able to provide all the details in the first instance. So what do you do if the second case contains different details from the first? Is that an accident, or the correct way to represent it? Unfortunately, the only way to know is to examine the details on each instance, and reason from the underlying RIM classes - there’s no easy rule of thumb.
One notion that this section suggests is that you can extract these RIM entities, roles and classes out to a persistent data store, and use the identifiers to trace the objects across various documents as you see them. This should be safe, after all, because the identifiers are unique. Only, not so much.
Re-using identifiers between documents
Firstly, there’s no guarantee that a given object will have the same identifier across different CDA documents from the same source. Commonly, CDA documents are generated from some intermediary XML or v2 object that doesn’t have the underlying identifiers in it, even if they exist in the original source. In these cases, the objects may acquire a transient identifier that is used multiple times within each document, but is not maintained across the documents. It’s very difficult to consistently identify an object across documents in this case.
Another problem is that some identifiers actually identify the business process that the object represents, and may end up being attached to multiple different objects that all relate to the same real-world process. Lab Order Ids are a classic case here - they’ll be associated with the object that identifies their acknowledgement response to a request for tests, and to the results that represent the outcomes of the request. Driver’s licenses are another example - they’re used to identify multiple different objects that represent the same person (usually from different institutions).
The upshot of this is that even when done well by the author, you can’t simply rely on the identifiers behaving in any particular way.
Reporting Tool
But very often, identifiers aren’t done well. And there’s no conformance tooling that can automatically figure out whether identifiers are being done properly in a document. So I’ve created a little reporting service that takes a CDA document, scans all the identifiers in it, and produces a report that helps visualise the identifiers, and see whether they are being used properly. We’ll be using it in the Australian national program to help check that a document has good identifiers in it. Feel free to use it in other contexts, and I’d welcome suggestions for how to make it more useful (and crash reports for how to break it).
Follow this link to http://hl7connect.healthintersections.com.au/svc/ids, paste your CDA document into the link, click the button, and then read the report… all the steps up to the last one are real easy. Good luck and happy CDA writing/reading…
renderMultiMedia