Data types and irrelevant features

Previous: How to identify AMT in CDA documents and HL7 messages »

Data types and irrelevant features

Dec 1, 2011

There’s been quite a bit of criticism of the ISO 21090 data types because they include features that aren’t relevant in every case where they are used, and it’s annoying to have to deal with the features when they don’t apply to the use case at hand. See here for an example, or the comments here, and there was more less-informed criticism of this at the CIMI meeting yesterday. The trouble with this criticism is that it doesn’t make sense in principle.

The data types are nothing more than basic re-usable patterns that occur throughout healthcare. The whole point of defining a re-usable type is that you choose a set of features that commonly re-occur, and have some inherent behavioral complexity. Then you define a model that represents these things and use it everywhere else. The cost of using the more complex re-usable type is saved many times over by the fact that you get to re-use it nearly for free every where.

So it follows that there’s going to be trade-off between the cost of using the type, and the amount of times you can re-use it - as the features on the type grow, it becomes more useful. So the fact that a type carries features that aren’t used in all cases is evidence that it’s worth spending more time getting it right.If you only defined types that perfectly fit their re-use context, then you hardly get any reuse at all - and people have said this to me (“Death to datatypes. Stick to UML primitives” to quote someone from yesterday).

Data types that include features that aren’t applicable everywhere are a good thing to have.

Of course, you can take it to far, and create a monster that no longer is in a sweet spot because there’s just too much in it. So the question isn’t “why do data types have this extra stuff that I don’t need in my context?”, but “How do you judge the sweet spot when trading between useability and re-use”. We actually have some consensus on that across the various frameworks (hl7 v2, 13606, openEHR, v3/CDA, etc), but some people seem to think that this is part of the problem, not the solution, and want to revisit the whole question.

p.s. Tom’s FOPP principle, which I quoted above, is actually trying to answer my right question (what’s the sweet spot?), but I couldn’t find good examples of the less-informed thinking to link to.