NEHTA Clinical Documents: UCUM alert
Sep 20, 2012The PQ data type has two important properties, value and unit: <value xsi:type="PQ" value="1.3" unit="mg/mL"/>
The unit most be a UCUM code. UCUM is a formal representation of a coded unit that makes the specified code able to parsed and understood by a computer. UCUM codes are also easy to read for a person too - but they are not the same as the normal human representation. Mostly, this is because humans use convenient short-hand for units in a given context, and are careless about case and formality. That works for humans - we are context aware processors who can almost always determine what is meant. Computers can’t do that.
For a variety of reasons, UCUM units are not the same as human units as used in medicine, particularly diagnostic reports:
- mcg/mL is ug/mL. This is fixing the grammar to use only SI prefixes. Note that there are legal requirements in Australia to use mcg instead of ug due to the potential to get mg and µg mixed up in hand-writing. Eventually this will get unwound now that we’re all using computers, but it’s kind of a generational change required.
- U/L becomes IU/L - this is distinguishing between the various uses of “U”
- There’s a few bizarre arbitrary units that are used in diagnostic medicine that can’t be represented using UCUM, such as a unit that has a power with decimal points.
- it’s common to find something like this in a pathology report: leucocytes/mL. This unit doesn’t have a direct equivalent in UCUM, though you could do {leucocytes}/mL
PQ has a problem, then: the unit attribute doesn’t differentiate between a computable representation and the human representation, though these are different things. In principle, CDA has a framework for this - you would use the human readable form in the narrative, and the computable form in the structured data. However, in practice, this has a couple of problems:
- The feed frameworks to CDA - the primary diagnostic applications, their reporting formats, and the clinical information stores - all only store a single unit. They are presented with a binary choice: human or computable form. Because of legacy reasons, their choice must be the human readable for
- the point of providing the structured data is so that it can be extracted and shown to the user in some presentable form - that’s why CD has originalText, for example. PQ doesn’t have the same
In practice, none of the clinical systems that create CDA documents for the pcEHR are in a position to provide valid UCUM codes in the documents, either for medications or diagnostic data items. So for now, the NEHTA validation framework does not validate UCUM codes, and pcEHR connected systems should not expect UCUM codes to be valid. (We’d rather have lots of partially useable data instead of vanishingly small amounts of more useable data). Note that various parts of the community are working towards having more computable units (shout out to the RCPA PUTS project).
I’m making this post because I’ve become aware in the last couple of days that some systems are choosing not to provide data because they can’t do UCUM codes.