Interoperability Requirements #4: Information Structures

May 7, 2011

Given an agreed way to exchange data, clearly defined concepts of meaning, and common ground for how to identify entities, systems are able to exchange meaningful data. But what data? What do the systems say to each other? Some structure is required that assembles these things into meaningful packets of information so that business functionality can be implemented. Continuing the language comparison, this is equivalent to grammar and structures such as lists and paragraphs. We start with a list of data elements. Each data element needs at least a name, and a data type that specifies what the values can be (“value domain”). That’s enough to get by, but other things about the data element can and should be specified (such as whether it is required or optional); as more information is properly defined, the interoperability will become more stable and easy to maintain.

ISO 11179 defines a a good list of things to say about a data element, though it seriously mixes up concept and representation for data types. But there’s a bigger problem with ISO 11179 and a whole slew of related specifications: they assume that the meaning of data elements is not affected by their grammatical context. But it doesn’t take long before the simple list of data elements breaks down and becomes unmanageable. You can be sure of that when you see an element named “Second given name of Third Next of Kin”. Data elements need structure, and as soon as you introduce this structure, it takes on a life of its own. The data element definitions depend on the grammar/structure, and the structure/grammar influences the data element definitions. They are a single whole, which is why I include data elements as part of information structures in my requirements breakdown.

Once we get to this point - data elements controlled by a proper grammar, the process of developing these takes on a life of it’s own, called “modeling”. It’s a form of addiction, I think. it occupies far too much of our focus. I say this from both the perspective of the floor, even where there are no standards involved, and from that of a standards governance perspective, where the question of a “fresh look” leads to serious long articles about whether there should be/can be “one model to rule them all” (Tom, I am coming, I’m going to join in!).

It’s this word called “control”. It’s like we can’t control the rest of the stack, so we’re going to do our damnedest to get this one right!

I’m not going to comment more - I could go on for many megabytes - beyond saying that I see a fundamental issue at heart in the modeling efforts of various standards committees and other open source modeling efforts: Do you model healthcare information, and then interoperability flows, or do you model interoperabilty, and then healthcare information flows? I don’t think we have grappled with that question yet.

Many definitional forms

Information structures exist in two forms: the definition and the instance. The instance is the form that is actually exchanged between the systems. The definition is some formal declaration of data elements in a form that provides humans or computers with the information they need to read the instance. The single most common form of the definition is a Microsoft Word document (sometimes even converted to a .PDF), but many other forms may be encountered, including UML diagrams and XML schemas.  Some of the definitions are intended to be computer readable, and may be involved directly as the instances are written or read (for instance, HL7 MIF in the HL7 JavaSIG tooling, or archetypes in openEHR tooling). This duality between definition and instance should be born in mind whenever working with information models. Though the duality exists for all of the interoperability elements discussed in this chapter, it is strongest for information structures, because the definition of the information model is actually an instance of some other information model (a meta-model, a model of models).

Which definitional form should we use? You know, I don’t really care. I’m probably not really representative here, because I’m pretty used to most of them, and I like writing code. One of the things I do is co-edit the openEHR Pathology test and Imaging Report archetypes for NEHTA. (I suspect that will probably mean I end up co-editing the main CKM ones too).

The first thing I learnt from that is that to properly understand, implement and maintain a model, you must know the intent, paradigms and features of the underlying definition framework well (openEHR in this case). But that’s no different to the HL7 RIM, or schema, or UML, or HL7 v2, or webservices… or anything else anyone comes up with. These are all a domain language. I prefer a mid level language; it’s not too souped up with hidden implications, but it’s still useful enough to say things in.

The second thing I learnt is that ideas can be translated across from one domain to another reasonably well (but not automatically). When I model those openEHR archetypes, I “think” in terms of v2 and CDA as I’m doing it. I gather it’s like being multilingual.(which I otherwise aren’t).

So I don’t care which form is used. Of course, you have to learn it, so it’s best to use an existing one, but after that, it’s all the same really. There, that’ll upset everybody ;-).

One of the reasons that’s an upsetting statement is because people like to generate code from some of the technical artefacts associated with these definitional frameworks - whole stacks of tooling exists (I wrote some of it). I understand that. It’s certainly an easy thing to use to get going - reduces the initial cost of development. But later, it costs more. Over the lifecycle of the product, I reckon that generated code costs more than it saves (specially as versions change). So I won’t use it in code I write. Sort of. What  I do is copy and paste out of the definition framework into my preferred editor, and then work the text over into the programming code that I want using keyboard macros. (UML is hardest for this, because it doesn’t really have a text format - so I wrote one myself). Is that generated code or not?

One last note: I really like Microsoft Word as a definitional form. I’ll tell you why:

  • It actually lets you do proper change tracking between versions. (How can I say how important this is?)
  • Anyone can author - it’s not a programming precursor, which requires specialised learning skills to be able to author at all (though some people and/or programmers - might see this as a feature)
  • The tools claim that they single source things - but they don’t. Word means that the editor just looks after that, and always knows how to fix it

(Ok, I don’t really like Word that much, but those points are really worth paying attention to. ISO 21090, I did in Word, along with a series of technical artefacts and a whole series of transforms and other programmatic error detection tools).

  Second first name of Third Next of Kin