Data quality requirements in v3 data types are both necessary and spurious

Mar 8, 2012

There’s three design features in the v3 data types that help make v3 very hard to implement. And they’re so low level, they undercut all the attempts at simplification by greenCDA, etc, and none of that stuff makes much difference. There’s 3 features I have in mind:

  • CD(etc).codeSystem must be an OID or a UUID
  • II.root must be an OID or a UUID
  • PQ.unit must be a UCUM unit

Along with these features, the requirements around the interaction between the codeSystem/root values and the HL7 OID Registry.

They really make it hard to implement v3, particularly if you are in a secondary use situation - you’re getting codes, units, or identifiers from somewhere else, and you don’t really know authoritatively what their scope and/or meaning is, or in the case of units, you can’t change them, and they’re not UCUM units. You can’t register OIDs on someone else - or if you do, the HL7 OID registry is so far out of control that no one will notice or know (on that subject, 200+ OIDs are registered daily, and any curation is on volunteer time, i.e. it doesn’t happen).

I’ve spent an inordinate amount of time this year working on the problems caused by these 3 features - they just consume so much time when generating proper CDA content. And when I look at the CDA documents that I get sent for review, these are beyond the average implementer who knows v2 well.

And often, we just have to fold on the units, because this is not resolvable until the primary sources can adopt UCUM - and they have their own standards that work to prohibit UCUM adoption. For example, the Australian prescribing recommendations - which are followed directly by many people - prohibit using ug for micrograms, since it is easily confused with mg. Instead, mcg is required. That’s a hand writing based recommendation, but the recommendation doesn’t make that differentiation. I think that this is resolvable, but it’s going to take years of work with the various communities before they’ll go to UCUM.

Necessary Requirements

The problem is that the requirements are thoroughly based on requirements that are necessary to establish an interoperable healthcare record. If you don’t consistently identify codes and identifiers, then you can’t collate all the health information into a single big logical repository (no matter how distributed it is architecturally). If you don’t use UCUM, then units are not computable or reliable - and this is important. So these are necessary requirements. Here in Australia, we are using CDA to build the distributed (pc)EHR. That’s been controversial - there’s still people claiming that we should have used v2. Well, if we had used v2, then we’d still have to solve the data quality requirements somehow - in fact, several other posts I’ve made are about that, because fixing the data quality in v2 messages is worthwhile anyway.

So these requirements for base data quality are necessary - but they sure add hugely to the project cost. And the costs aren’t particularly visible. And there’s a huge amount of legacy data out there for which it is difficult to bring the base data up to the required level

Spurious Requirements

The problem is that the requirements are also spurious in a point to point messaging context. In this context, it’s easier to resolve the data quality issues retrospectively, by local agreement, instead of having to sort these things out unambiguously in advance. But v3 imposes these costs anyway, even when the requirements are spurious. I wonder how much this data quality issue - which I haven’t really heard a lot about - contributes to the resistence to migrate to v3 messaging from v2 messaging, since the benefits aren’t there in the short term.

In particular, these data quality requirements are part of ISO 21090, and when that gets used for internal exchange within a very focused community (my Peter and George example), these data quality requirements are just tax.

RFH

In the RFH data types, I’m going to back off the pressure - it will be possible to represent data with less quality than the v3 data types allow (though they will also allow the same high quality data as well.