In which I visit Snomed-CT, and find myself in a tight place

May 17, 2011

I’m going to assume that you, the reader, are familiar with Snomed-CT. And that you’re well aware that Snomed-CT is marketed as the premier way to achieve consistent, unambiguous, reproducible expression in medical records. So why isn’t it being used in practice? Across the world, I’m aware of a scattering of successful pilots, and some somewhat grudging wide scale use in England (specific comments about such use welcome below, but please nothing less-or-equally specific than what I just said) (oh. and Kaiser Permanente. This isn’t the kind of wide-scale adoption that it should have, if it was the best way to achieve consistent, unambiguous, reproducible expression in medical records.

So why not?

First of all, to make it work, implementers need to have:

The ability to do logical Snomed subsumption testing
Frameworks for defining and using subsetting & mapping
UI widgets for presentation and inpu
A way to integrate logic with other search and analysis functionality

All of these things require advanced knowledge of Snomed. There’s no open source of cheap libraries that provide these things (or even expensive ones, really), and very few people have the skills, or the required way of thinking. It’s really hard for a vendor to do sustainable product development in this space. It’s going to really hard to get good outcomes given that background.

Unfortunately, that’s not the only problem. There’s problems that are inherent in Snomed-CT. Take, for example, this subset of Snomed-CT:

There’s some strange things in this little heirarchy. First all, some of the things that are not “by type” belong under “by type”. Why is Jones Motes given such special treatment? I couldn’t figure it out, and the definition doesn’t say. In fact, that’s a primary issue with Snomed-CT: the only definitions are in the relationships - and while that’s a powerful technique, it often leaves you at a loss.

What is tree resin, if not a plant allergin? Or maybe that’s not what “plant” means? How do you tell the difference between “Adhesive Bandage”, “Adhesive”, and “Gauze”. More generally, if Snomed-CT is about consistent, unambiguous, and reproducible expression, how does this little heirarchy help?

(And this heirarchy - though based on an older version of Snomed-CT - is not at all unusual. The internal quality of Snomed varies wildly. Some is really good, and some is… not really useable.)

The final issue with Snomed-CT is sort of not actually about Snomed-CT. The problem is that in order to use it, you need to subset it, into “refsets” that may be (or may be mapped to) “interface terminologies”. This makes perfect sense - after all, you don’t just give a user a drop down combo with 350,000+ terms in it. So one user chooses a set of terms, and then another user picks from the set of terms.

In this case, the meaning of the choice isn’t always the same as the meaning of the term. I’ll illustrate using a simple example. Say a doctor has choice A of the following terms:

163497009: Obstetric Examination
271992004: Obstetric Investigation
108108009: Obstetrics Manipulation

And choice B of the following terms:

163497009: Obstetric Examination
271992004: Obstetric Investigation
108106008: Obstetrics Destructive Procedure

In that case, the meaning of “Obstetrics Investigation” varies greatly between the two lists when the doctor is choosing a code to describe what he did.

Aha, everyone says, those are both really bad reference sets, so what’s your point? Well, they are “bad” reference sets - which mostly means, they are being used for something other than what they were intended to be. Which happens all the time. In fact, this problem is so ubiquitous that we (HL7) added valueSet/valueSetVersion to CD (coded datatype) in the v3 data types/ISO 21090 just for it. (btw, I think that’s a terribly bad idea - because of bad coding practices, we’ll try and patch it using really advanced coding techniques and try to reverse engineer the user’s choice. But just cause I think it’s a bad idea doesn’t mean people agree).

If there’s known quality criteria to prevent the mis-use of reference sets like that, I don’t know what they are, and even if they exist, they’re even more specialised knowledge than the rest of Snomed-CT. And this problem matters.

That’s my big 3 reasons why Snomed-CT isn’t ready for production clinical use. I don’t know how to solve them (which isn’t so bad because no one depends on me to do so, and IHTSDO has some very smart people thinking about this). But I know that Snomed-CT is the only game in town, so it’s not like there’s anywhere else to turn. Hopefully we can grow the community fast and overcome these teething issues.

There’s one final thing, one which dwarfs the other issue. When I talk to clinicians about using Snomed-CT - even histopathologists, who are the home of Snomed-CT - they very much tend to respond that they won’t use Snomed-CT ever, and they provide one or more of the following statements:

My use of language is sufficiently precise that Snomed-CT encoding is unneeded
Natural Language Processing will overtake Snomed-CT anyway
Even if I try, it’s so damn hard to code well - and to be confident that it’s being done well - that all these downstream usages people tell us aboutwon’t actually happen.

This is called defense in depth.

The first: language changes over time. I think people will gradually move past this. The second - I’m highly sceptical, but I’m going to discuss NLP in another post. The third - that deserves focus and consideration, which will happen through meaningful use - though I’m not sure how well that will be able to be leveraged outside USA.

Btw, as a bonus, here’s the Snomed-CT definitions for hyponatraemia. It’s a commonly misinterpreted thing by hospital physicians (clinical biochemistry text books are full of notes about that). So good luck figuring out from the snomed definitions which variants are of physiological significance:

Note: This content was first given as a presentation at the AEHRC colloquim 2011