Interoperability Requirements #2: Common Terminology

May 4, 2011

In order for people to understand each other, not only must they be able to exchange words with each other, they must agree about what they mean. In the context of natural language, dictionaries define the agreed meanings of the words that we use. In the same way, two information systems exchanging information need underlying formal definitions on which they can depend to use the information. The systems that define these meanings are referred to variously as “Terminologies”, “Vocabularies”, “Code Systems”, “Classifications”, “Semantic Webs”, and “Ontologies”, though a whole lot of other words are also used. None of these words have quite the same meaning; for simplicity, this book will use “terminology” as a general descriptor for structured definitions of meanings. This is a huge subject; it’s hard to know how much to try and cover in a reasonable length post. The first place to start is that everyone doing interoperability should read ISO 704 and Cimino’s landmark paper (Cimino JJ. Desiderata for Controlled Medical Vocabularies in the Twenty-First Century. Methods Inform Med 1998; 37: 394-403). These are heavy going - but you won’t really understand terminology unless you do.

Rather than trying to cover the basics, I’m going to make several observations about terminologies.

1. It’s all backwards for semantic interoperability

There’s an important conceptual difference between human and computer terminologies. Dictionaries are word based: they consist of lists of words with their multiple meanings. Not only that, they are reactive. (Example- “Green” has 11 meanings as of this post). On the other hand, terminologies for information systems work very differently; instead of retrospectively defining words that people actually use, they instead prospectively define underlying concepts and then assign code-words that are used to refer to the concept definitions. The point of this is to restrict how things are described.

Note that in order to achieve computing interoperability, we forgo expressiveness (i.e. semantics) in the real world. (See also, “We don’t need no semantic interoperability”).

Note: I often think that a well designed terminology has the following two properties: it lets me say what I want, and it doesn’t let you say what I don’t want you to.

2. Managing Terminologies

It seems to me that the central problem when defining a terminology is to balance between expressiveness, reproducibility, and manageability (like Interoperability rule #3).

Expressiveness is the degree to which a terminology provides users with the ability to sayexactlywhat they want. This can be done by providing more concepts with finely differentiated meaning, or by allowing the users to extend and/or combine concepts in a way that can build new or refined meaning
Reproducibility is the degree to which the definitions of the terminology lead to different users making a predictable choice of concept for the same use. This can be done by defining fewer concepts with more coarsely differentiated meanings, or by restricting the extensibility and combination options allowed. The quality of the definitions is a significant contributor to the reproducibility.
Manageability is the degree to which the users and the information systems can support the concept definitions. The more stringent the definitions, the more internal relationships, the more extension and combination of concept is allowed, the less manageable the concept definition system is.

The correct balance depends on the purpose for which the terminology is intended, though you can never get enough of all three in a single terminology.

3. Snomed-CT, LOINC, etc

Snomed-CT is going to get it’s own post later. So much promise… so little outcomes yet. So’s LOINC.

For now, I’ll just say: I don’t think either of them are ready for actual clinical use yet.

4. Terminology of Terminologies

One of the most confusing features of the existing work on terminologies is the terminology of terminologies themselves. The various maintainers of standard terminologies, including IHTSDO, WHO, ISO, HL7, NCI, and NLM have all chosen different words to describe related or similar concepts. For instance, the words “term”, “concept” and “code” are used for overlapping concepts by IHTSDO, ISO, and HL7. If the terminologists, the supposed experts in defining things clearly, cannot agree between themselves on consistent language in their own space, how can they add value in any other domain?

This is a clear case of the blind leading the blind. If terminologists can’t restrict their own selves consistently, what can they offer the rest of us? We should just stop listening to them. Except, then what?

(I should note that many of the terminologists are good friends of mine, and struggle mightily with this. And I’m not consistent myself either)

5. Alright then, what?

Terminologies are a problem. You have to have them to get anything done, but making them up for yourself is a formidable task, and the existing standards are far from ideal. It’s a problem.

You can always try to get by without them. You can take an ad-hoc approach, and hope that your lack of rigor never catches up with you. It’s something a lot of people do, especially in applications, where knowledge and meaning is often something that can be taken for granted – a single person defines the meaning – and it’s mostly operational rather than explicitly declared anywhere - and only a small circle of users have to buy into that model, through a mix of face-to-face indoctrination and institutional experience and/or memory. But when it comes to interoperability, this simple model breaks down, and there are all sorts of problems trying to map between two partially known definitions of meaning. Still, in spite of this, a lot of interoperability happens this way, especially the old administrative and simple clinical reports – and it works well, for a given value of “works well”†. In the end, however, this is not going to cut it for real clinical information in the real world, where people actually die. Sooner or later it’s going to be a problem, because the second law of interoperability still applies: you can’t get rid of complexity. The most you can hope to achieve is to externalize it into the future when it will be Somebody Else’s Problem.

If you can’t defer it to the future, or you actually take pride in your work, then you end up using one of the standard terminologies. Every one of them is less than perfect, and every one of them requires a huge learning curve. But it’s a worthwhile investment, if only because you’ll be able to reuse your knowledge again. Also, of course, it’s worth using a terminology that has been well inspected and much argued about.

Note: This is why I go to HL7, not IHTSDO. In theory, HL7’s problem space is at least tractable.

† “Works” is a meaningless word in its own right. In fact, quite possibly a meaningless concept. When I write my own terminology for the work of healthcare interoperability, it’ll be the first word up against the wall. If I had a dollar for every time some user came to me and said, “your program doesn’t work”…