On the subject of original text for Codes
Oct 14, 2013In the various coded data types defined by HL7 across v2, v3, and FHIR, there’s a property named text or originalText that is defined using some variant of these words:
The text as seen and/or selected by the user who entered the data which represents the intended meaning of the user.
Original text can be used in a structured user interface to capture what the user saw as a representation of the code or expression on the data input screen, or in a situation where the user dictates or directly enters text, it is the text entered or uttered by the user.
Unfortunately, what this exactly means is a matter of interpretation. The key question is, to what degree does the context affect the interpretation of the text that represents the code, and therefore, to what degree does the context contribute to the original text?
I’ll illustrate the discussion with an example. In SNOMED CT, there’s a (large) heirarchy for organism type. Part of the hierarchy contains codes for virii. A subset of this is found in the PHINVADS value set “Virus types answer list specific to Arbovirus/ArboNet reporting”. This lists 17 codes for type of virus. So you could easily imagine some kind of UI, for instance, where users would select one of the codes from a pick list:
Virus
In this case, the original text is the same as the Snomed-CT preferred name, and it’s pretty straight-forward to understand. If, for instance, the user picked “Eastern equine encephalitis virus”, then that’s the original text, and nothing further is needed.
However, a lot of system designers will look at this and say, the word “virus” is repeated in every entry, and that’s just a tax on the users. We should get rid of it. That would give you an entry like this:
Virus
Actually, in this case, the example is pretty trivial. “Virus” isn’t hard to read. But how about this SNOMED CT preferred term: “Cholecystectomy with exploration of common bile duct and choledochoenterostomy” - there’s quite a lot of potential for useful simplification there, especially where the set of codes are all siblings, such as the variations of strength in a particular medication:
Synthamine
This somewhat extreme example is from AMT. I doubt any reader can even figure out the differences between those 4 codes. How much easier this is:
Synthamine
Hopefully that example will serve to illustrate that this isn’t just a UI best practices issue - as the codes become finer, it starts to become a clinical safety issue too.
Back to the virus case: if the user picks “Eastern equine encephalitis”, then is the original text “Virus: Eastern equine encephalitis” or just “Eastern equine encephalitis”? What actually works best depends on quite how the original text is going to be used. If the original text is used as the faithful reproduction of the meaning of the user in a similar context as the user entered, then the minimal text the user actually picked is the useful original text - but how similar? If, on the other hand, the original text is used out of context, the full context of the data entry of the code should be represented - but this could be a combination of the text the user actually picked, the field name, additional words taken out of the explicit context on the screen, and even some text that is implicit in the clinical context.
To make things even more fun, a contributor on the HL7 vocabulary mailing list offered this example:
I’m not sure what the best way to resolve this. How do you make original text reliably useful for both uses when the user interface isn’t nailed down?
Well, one way is to rely on the value set - the value set description should contain the information that is implicit in the context. So the true original text would be the value set description + the user picked text. Though I don’t think that any particular field in the value set (either v3 or FHIR) is defined for this purpose in mind. Perhaps that’s something we should address?