CD Question: using codes in multiple related fields

Previous: HL7 v3 Data types R1/R2 difference analysis »

CD Question: using codes in multiple related fields

Oct 25, 2011

This question comes from Singapore: how to you manage displayName, originalText, and most of all, translations, where you have multiple related and overlapping fields. This isn’t a simple question - just explaining the question is going to take quite a bit of content, let alone the answer (such that it is). Note that this answer assumes ISO 21090 (Data types R2), though there’s some R1/R2 discussion at some points. Also, note that this is really a discussion about how CD works, not Snomed CT, though I don’t think you could quite have this discussion without Snomed CT.

Scenario

The scenario is about a problem list - a very common case around the world. Multiple different systems are exchanging their problem lists using this common model:

Problem problem : CD site : CD status : CD

(obviously you need more fields than just those, but these are the bits I’m interested in).

All 3 fields are bound to SNOMED CT reference sets, and the reference sets are sufficiently large that their content is not of interest. The multiple sending systems use one of these three models

First System	code : A Snomed-CT code that spans both site and status
Second System	code : a local code that also spans statussite : a local code for site
Third System	code : a local code for problemsite : a local code for sitestatus : a local code for status

Problem

In all the examples, we will use the example case “suspected lung cancer”. We will also assume a smart terminology server is available on the interface engine that is able to interconvert between snomed forms and between snomed and the local code systems.

Note that this is a common problem - several different fields where codes in one of the fields may cover more than one of them.

Here’s example UIs for the three systems:

(Courtesy of Linda Bird, thanks)

Question A: How is the problem correctly represented for each of the 3 systems?

For the first system:

 <problem code="162573006" codeSystem="[sct]">
  <displayname value="Suspected lung cancer"/>
  <originalText value=">Suspected lung cancer"/>
 </problem>
 <site nullFlavor=NA/>
 <status nullFlavor=NA/>

Whether this is correct depends on the rules around the 3 fields. NEHTA specifications usually say, in a case like this, that site and status need only have a value if they are not specified as part of the main code. Let’s assume, for now, that this is the case - it helps us for now by putting some problems off. But I do think, looking at the example above, the nullFlavor=NA isn’t quite the right nullFlavor - we kind of need NullFlavor.MadeIrrelevantByOtherValue (as opposed to, say, NullFlavor.NotPartOfSystemScope).

For the second system:

 <problem code="?cancer" codeSystem="[local1]">
   <displayName value="Query Cancer"/>
   <originalText>Investigation for lung cancer</originalText>
 </problem>
 <site code="lung" codeSystem="[local2]">
   <displayName value="Lungs"/>
   <originalText>Investigation for lung cancer</originalText>
 </site>
 <status nullFlavor=NA/>

It’s a bit hard to say what the originalText is - in this example, I’ve assumed that both codes were assigned from the same fragment of test (perhaps some assisted code entry).

For the third system:

 <problem code="cancer" codeSystem="[local3]">
   <displayName value="Cancer"/>
   <originalText>Cancer</originalText>
 </problem>
 <site code="lung" codeSystem="[local4]">
   <displayName value="Lungs"/>
   <originalText>Lungs</originalText>
 </site>
 <status code="suspected" codeSystem="[local5]">
   <displayName value="Suspected"/>
   <originalText>Suspected</originalText>
 </status>

In this case, the original texts are the same as the codes, because the user simply looked up the codes straight off the lists.

Question B: How do you add the Snomed codes?

The previous question was straight forward - just getting us going. In the scenario, we have a terminology server that is able to translate between various snomed codes and local codes. Of course, there’s no need for this for the first system.

For the second system:

 <problem code="162572001" codeSystem="[sct]">
   <displayName value="Suspected cancer"/>
   <originalText>Investigation for lung cancer</originalText>
   <translation code="?cancer" codeSystem="[local1]">
     <displayname>Query Cancer</displayName>
   </translation>
 </problem>
 <site code="39607008" codeSystem="[sct]">
   <displayName value="Lungs"/>
   <originalText>Investigation for lung cancer</originalText>
   <translation code="lung" codeSystem="[local2]">
     <displayname>Lung structure</displayName>
   </translation>
 </site>
 <status nullFlavor=NA/>

Note the order of the translations - in R2, the root code is the one that meets the conformance criteria, which is the snomed codes in this case. In R1, there was confusion about whether the root codes are those ones, or the original local codes since they are the source (R1 says both). If you really want to say which is the source, you can do this:

  <problem code="162572001" codeSystem="[sct]" codingRationale="R">
   <displayName value="Suspected cancer"/>
   <originalText>Investigation for lung cancer</originalText>
   <translation code="?cancer" codeSystem="[local1]" codingRationale="O">
     <displayname>Query Cancer</displayName>
   </translation>
 </problem>
 <site code="39607008" codeSystem="[sct]" codingRationale="R">
   <displayName value="Lungs"/>
   <originalText>Investigation for lung cancer</originalText>
   <translation code="lung" codeSystem="[local2]" codingRationale="O">
     <displayname>Lung structure</displayName>
   </translation>
 </site>
 <status nullFlavor=NA/>

For the third system:

 <problem code="363346000" codeSystem="[sct]">
   <displayName value="Cancer"/>
   <originalText>Malignant neoplastic disease</originalText>
   <translation code="cancer" codeSystem="[local3]">
      <displayname>Cancer</displayName>
   </translation>
 </problem>
 <site code="39607008" codeSystem="[sct]">
   <displayName value="Lungs"/>
   <originalText>Lung structure</originalText>
   <translation code="lung" codeSystem="[local4]">
      <displayname>Lungs</displayName>
   </translation>
 </site>
 <status code="415684004" codeSystem="[sct]">
   <displayName value="Suspected"/>
   <originalText>Suspected</originalText>
   <translation code="suspected" codeSystem="[local5]">
       <displayname>Suspected</displayName>
   </translation>
 </status>

Question C: How to you combine the codes?

In this case, the third system is sending to the first. Somewhere in the process (i.e. on some interface engine), the three fields are going to be combined. First of all, this could be done in the definitions:

Problem problem : CD site : CD {code | 363698007} (finding site) status : CD {code | 408729009} (finding context)

Where this means that the site modifies code with the modifier value 363698007. Note that it doesn’t have to be done that way - some one could just code that relationship in the interface engine, but capturing it in the definitions offers much more leverage (aside: I don’t think this is possible in the RIM, but might the more implementation focused the static model becomes, the more likely it is to be useful).

Given those relationships, you can easily compose this snomed expression:

363346000:363698007=39607008,408729009=415684004

A really smart snomed system could determine that this is equivalent to 162573006 (Suspected lung cancer) and therefore just replace the expression with the single code.

Actually, of course, I lie. Snomed CT just never delivers on stuff like that. In order to make that equivalence, you’d have to determine all the defining properties of 162573006 (Suspected lung cancer), which includes subject relationship = “Subject of record”, and temporal context = “current or specified”. These you could provide through the definitions or the interface engine knowledge somehow. But Suspected Lung Cancer as an “associated finding” of “Malignant tumour of lung”. I think this is wrong - once you have a definite finding of a malignant tumour of the lung, I think you have a definite lung cancer, though I suppose someone’s going to point out some edge case where this isn’t true. But is it always true that you must have a known malignant tumour before you have a suspected cancer? I don’t think so. And the finding site is only connected to “Suspected Lung Cancer” through the associated finding. So I don’t think that a smart snomed system could make that equivalence - a person is going to have to do that. So much for snomed. Anyhow, let’s imagine someone made this rule, and we know that in this context, 162573006 = 363346000:363698007=39607008,408729009=415684004.

We can represent this as a code either way:

  <problem code="162573006" codeSystem="[sct]"/>

  <problem code="363346000:363698007=39607008,408729009=415684004" codeSystem="[sct]"/>

That’s the easy part. What should the originalText and displayName be for those two?

With regard to displayName, that’s easy for the pre-coordinated case - it’s the preferred name (Suspected lung cancer, as above). But what about for the expression? What’s the displayName for an expression? Snomed CT doesn’t define a displayName for an expression, though there’s various approaches around. I guess that means you could just make something up:

  "[Suspected] [Malignant neoplastic disease] in [Lung Structure]"

Where these are the preferred names for the three codes, and “in” is provided in code. But you’d have to know Snomed CT pretty well to do that - you’d have to be a world expert, in fact, to have the confidence to put that in production (it’s a good thing there’s so many world experts then). And if you don’t have the display name, how is it supposed to be clinically usable? (i.e. how the users of the first system supposed to understand an expression?)

So my first conclusion is that until IHTSDO publishes a consistent agreed method to produce display names for expressions, expressions will continue to be a research curio, not a production option.

The second problem is original text. The original text of the concept that is trying to be built is spread across three attributes:

problem: Cancer site: Lung status: Suspected

Given that, we can easily compose a meaningful original text:

 problem="Cancer",site="Lung",status="Suspected"

That’s very straight forward to understand for a human - with one caveat: the words for the parts (problem, etc) need to be the UI names that the human saw, not the interoperability names.

We’ve never clarified this anywhere with regard to CD.originalText. I’m going to propose that we document this approach in the next version of the data types (whatever “next version” means!).

There was some concern expressed to me that this original text above isn’t computer processible - that’s right, it’s not. Original text isn’t meant to be computer processible, it’s for a human.

This does give us two CDs now:

  <problem code="162573006" codeSystem="[sct]">
    <displayName value="Suspected lung cancer"/>
    <originalText value="problem=&quot;Cancer&quot;,site=&quot;Lung&quot;,status=&quot;Suspected&quot;"/>
  </problem>
  <problem code="363346000:363698007=39607008,408729009=415684004" codeSystem="[sct]">
    <displayName value="Suspected Malignant neoplastic disease in Lung Structure"/>
    <originalText value="problem=&quot;Cancer&quot;,site=&quot;Lung&quot;,status=&quot;Suspected&quot;"/>
  </problem>

that’s still not quite it though, because now we run into the translation issue. The original set was:

 <problem code="cancer" codeSystem="[local3]">
   <displayName value="Cancer"/>
   <originalText>Cancer</originalText>
 </problem>
 <site code="lung" codeSystem="[local4]">
   <displayName value="Lungs"/>
   <originalText>Lungs</originalText>
 </site>
 <status code="suspected" codeSystem="[local5]">
   <displayName value="Suspected"/>
   <originalText>Suspected</originalText>
 </status>

and now we have to get the final form in somehow. It simplifies our approach to say that the immediate snomed translations we added in question B are only transient, and we aren’t going to add them here. So let’s just stick the translation in the problem:

 <problem code="162573006" codeSystem="[sct]">
    <displayName value="Suspected lung cancer"/>
    <originalText value="problem=&quot;Cancer&quot;,site=&quot;Lung&quot;,status=&quot;Suspected&quot;"/>
    <translation code="cancer" codeSystem="[local3]">
      <displayName value="Cancer"/>
    </translation>
 </problem>
 <site code="lung" codeSystem="[local4]">
   <displayName value="Lungs"/>
   <originalText>Lungs</originalText>
 </site>
 <status code="suspected" codeSystem="[local5]">
   <displayName value="Suspected"/>
   <originalText>Suspected</originalText>
 </status>

That particular structure gives me all sorts of problems:

The original text of the problem is wrong - it’s “Cancer”, we we don’t have a place for the conflated original text of the problem.
the problem is in tension with the site and status - should they now have a nullFlavor=UNK, with the local code as a translation? But how does that make sense?
it gets worse - much worse - if the transients are added back in. (and since it would simplify things to take them out, all good terminologists will insist that they are in)
it’s pretty hard to think that the translation in problem is valid.

My conclusion: you can’t use CD translations to capture translations across multiple data types - you’ll have to create structure to enable that in the model that uses the data types and contains problem, site, and status (or whatever your cases are).

Note that if I was talking about R1 data types, the order of the translation etc would be reversed, but that wouldn’t really change the problem at all.

Question D: How do you split the code?

In this case, the first system is sending to the third. Somewhere in the process (i.e. on some interface engine), the problem field is going to be split up.

This process is the reverse of the problem discussed above. Theoretically, it could be done using the snomed definitions, but in practice, this would hardly ever work because of Snomed being Snomed.

Anyhow, we start with

 <problem code="162573006" codeSystem="[sct]">
  <displayname value="Suspected lung cancer"/>
  <originalText value=">Suspected lung cancer"/>
 </problem>
 <site nullFlavor=NA/>
 <status nullFlavor=NA/>

And now we have to split this up. Let’s assume that we have some magic terminology server that can do that, and tell us which snomed codes we are interested in - and then translate them to the local code systems used by the third system.

We could build this, perhaps:

 <problem code="162573006" codeSystem="[sct]">
  <displayname value="Suspected lung cancer"/>
  <originalText value=">Suspected lung cancer"/>
  <translation code="cancer" codeSystem="[local3]">
    <displayName value="Cancer"/>
  </translation>
 </problem>
 <site nullFlavor="NA">
   <originalText value="Suspected lung cancer"/>
   <translation code="lung" codeSystem="[local4]">
     <displayName value="Lungs"/>
   </translation>
 </site>
 <status  code="suspected" codeSystem="[local5]">
   <displayName value="Suspected"/>
   <originalText>Suspected lung cancer</originalText>
   <translation nullFlavor="NA"/>
 </status>

You’ll note that for site and status the translations are reversed. I don’t have a clue which is right there. Technically, in R2, the site example is correct. But it still gives me all sorts of other problems, and I come back to the same conclusion: you can’t use CD translations to capture translations across multiple data types - you’ll have to create structure to enable that in the model that uses the data types and contains problem, site, and status (or whatever your cases are).

p.s. a bonus. this is the R1 representation:

 <problem code="363346000" codeSystem="[sct]" displayName="Cancer">
  <qualifier>
    <code code="363698007"  displayName="finding site"/>
    <value code="39607008" displayName="Lung"/>
  </qualifier>
  <qualifier>
    <code code="408729009" displayName="finding status"/>
    <value code="415684004" displayName="suspected"/>
  </qualifier>
 </problem>

Concerning this, the R1 specification says: “Qualifiers constrain the meaning of the primary code, but cannot negate it or change it’s meaning to that of another value in the primary coding system”. I’ll leave argument to the comments about whether the finding status of “suspected” counts as negation or not (and also whether it matters that they actually do change it’s meaning to that of another value in the primary coding system).