Extending UCUM?
Feb 14, 2023Over on the openEHR discussion forum, there’s a discussion about extending UCUM. Here’s a small suggestion as to what that could look like.
UCUM provides a syntax and grammar for dealing with Units of Measure, which makes all the units computable.
This is important for a variety of reasons, and also a surprise to lots of non-programmers because SI units are very close to being computable. Unfortunately, they’re not quite computable. Or maybe that’s fortunate - it’s a question of perspective. The looseness of SI units that makes them non-computable is very convenient for human users who have contest around the units. Examples are units that have overlapping/varying codes such as s/sec for seconds, and S for Seiverts, and all the flexibility around case and grammar. This is good for humans, but computers aren’t like that. Even now, where AI is, it’s still not quite like that - getting closer by the month, but there’s all sorts of gnarly corner cases with clinical safety ramifications, so it’s going to be a while before AI gets us to the point that we don’t care about UCUM.
UCUM is really nice because you can compare units and decide whether they’re compatible. Which isn’t the same as them being the same! e.g if I know UCUM, I can know that 1 mg/kg is the same as 1 g/[ton] without needing a human to decide that. Really useful for decision support.
But UCUM is complex, that’s for sure. There’s a bunch of libraries out there to implement it (I wrote two of them):
Note that although some of these are embedded in a FHIR context, I don’t believe that there’s any actual FHIR dependency in them.
UCUM has long use in the health IT space - v3/CDA, openEHR and FHIR all use UCUM for units of measure for quantities. And it’s served as well. However, there’s a few limits to UCUM:
- UCUM maintenance is very slow - there’s very old requests going nowhere
- UCUM takes an extremely conservative approach to units, and people don’t. It’s very hard to decide whether proposed units are meaningfully contextual variants of existing units, or genuinely new units (sound loudness in particular contexts - how particular does it get before you need something more than just decibels)
- UCUM is extremely careful around formal definitions (example)
- UCUM takes a very hardline approach to arbitary units. In particular, there’s lots of things we think of in medicine like ‘tablets’ as units of measure but they’re not really (see below for more discussion)
- Standing advice around UCUM (e.g. in FHIR (example) is to use something else for units like tablets such as SNOMED CT. That’s all well and good, but there’s no other code system that defines a useable grammer for this kind of use e.g. for 1.5 tablets/day
Note: none of those things should be understood as criticisms of UCUM. I like most of those things, but they are limits.
UCUM does have a way to do countable things using annotations - {tablet}
. That’s kind of nice, but has some real limitations. The first limitation is that annotations are ignored when analysing the semantics of the unit. And they’re not standardised either. I might use {tabs} and you might use {tablets}.
We can work around all of these by defining a way to extend UCUM. A way to extend UCUM that would work from my perspective would have the following properties:
- it would have a controlled way to define arbitrary units like tablets
- it wouldn’t be used to define real units
- it would allo mixing the arbitary units with real UCUM units using the existing UCUM grammar
- it wouldn’t create syntactic or semantic confusion for real UCUM units - e.g. a UCUM library wouldn’t accidentally mis-understand an extended UCUM expression, it would identify that it contained a non-ucum unit
- UCUM libraries could be enhanced to know and understand the non-ucum units, and could reason with them to some degree (but not the same degree, or else they’d be real UCUM units)
We can meet the technical requirements by defining units carefully in another code system that states that it uses the UCUM grammar, including all the existing UCUM units, and prefixes all the units it defines with either $
(for currencies) or @
for countable units like tablets. These are legal unit codes in UCUM, but no existing UCUM codes use either character, and unless UCUM decides to include currencies which it won’t, it doesn’t seem at all likely to.
Then we’d be able to use units like:
2 @tablets/day
1 $US/mg
In FHIR and openEHR etc where this would be used, we’d define a different code system. Say, http://hl7.org/fhir/ucumplus
(if HL7 chose to host it) but it could be elsewhere. I wouldn’t have any problems extending my libraries to cover this additional use of UCUM
If you want to comment on this idea - see the openEHR forum