Explaining the v3 II Type again

Previous: Exchanging Codes with Translations in practice »

Explaining the v3 II Type again

Jul 4, 2014

Several conversations in varying places make me think that it’s worth explaining the v3 II type again, since there’s some important subtleties about how it works that are often missed (though by now the essence of it is well understood by many implementers). The II data type is “An identifier that uniquely identifies a thing or object”, and it has 2 important properties: |root|A unique identifier that guarantees the global uniqueness of the instance identifier. The root alone may be the entire instance identifier. |extension|A character string as a unique identifier within the scope of the identifier root.

The combination of root and extension must be globally unique - in other words, if any system ever sees the same identifer elsewhere, it can know beyond any doubt that this it is the same identifier. In practice, this fairly simple claim causes a few different issues:

How can you generate a globally unique identifier?
The difference between root only vs. root+extension trips people up
How do you know what the identifier is?
What if different roots are used for the same thing?
The difference between the identifier and the identified thing is an end-less source of confusion
How to handle incompletely known identifiers

How can you generate a globally unique identifier?

At first glance, the rule that you have to generate a globally unique identifier that will never be used elsewhere is a bit daunting. How do you do that? However, in practice, it’s actually not that hard. The HL7 standard says that a root must be either a UUID, or an OID.

UUIDs

Wikipedia has this to say about UUIDs:

Anyone can create a UUID and use it to identify something with reasonable confidence that the same identifier will never be unintentionally created by anyone to identify something else. Information labeled with UUIDs can therefore be later combined into a single database without needing to resolve identifier (ID) conflicts.UUIDs were originally used in the Apollo Network Computing System and later in the Open Software Foundation’s (OSF) Distributed Computing Environment (DCE), and then in Microsoft Windows platforms as globally unique identifiers (GUIDs).

Most operating systems include a system facility to generate a UUID, or, alternatively, there are robust open source libraries available. Note that you can generate this with (more than) reasonable confidence that the same UUID will never be generated again - see Wikipedia or Google.

Note that UUIDs should be represented as uppercase (53C75E4C-B168-4E00-A238-8721D3579EA2), but lots of examples use lowercase (“53c75e4c-b168-4e00-a238-8721d3579ea2”). This has caused confusion in v3, because we didn’t say anything about this, so you should always use case-insensitive comparison. (Note: FHIR uses lowercase only, per the IETF OID URI registration).

For people for whom a 10^50 chance of duplication is still too much, there’s an alternative method: use an OID

OIDs

Again, from Wikipedia:

In computing, an object identifier or OID is an identifier used to name an object (compare URN). Structurally, an OID consists of a node in a hierarchically-assigned namespace, formally defined using the ITU-T’s ASN.1 standard, X.690. Successive numbers of the nodes, starting at the root of the tree, identify each node in the tree. Designers set up new nodes by registering them under the node’s registration authority.

The easiest way it illustrate this is to show it works for my own OID, 1.2.36.146595217:

1	ISO OID root
2	ISO issue each member (country) a number in this space
36	Issued to Australia. Seehttp://www.alvestrand.no/objectid/1.2.36.html- you can use your Australian Company Number (ACN) in this OID space
146595217	Health Intersections ACN

All Australian companies automatically have their own OID, then. But if you aren’t an Australian company, you can ask HL7 or HL7 Australia to issue you an OID - HL7 charges $100 for this, but if you’re Australian (or you have reasons to use an OID in Australia), HL7 Australia will do it for free. Or you can get an OID from anyone else who’ll issue OIDs. In Canada, for instance, Canada Health Infoway issues them.

Within that space, I (as the company owner) can use it how I want. Let’s say that I decide that I’ll use .1 for my clients, I’ll give each customer a unique number, and then I’ll use .0 in that space for their patient MRN, then an MRN from my client will be in the OID space “1.2.36.146595217.1.42.0”

So as long as each node within the tree of nodes is administered properly (each number only assigned/used once), then there’ll never be any problems with uniqueness. (Actually, I think that the chances of the OID space being properly administered are a lot lower than 1-1/10^50, so UUIDs are probably better, in fact)

Root vs Root + extension

In the II data type, you must have a root attribute (except in a special case, see the last point below). The extension is optional. The combination of the 2 must be unique. So, for example, if you the root is a GUID, and each similar object simply gets a new UUID, then there’s no need for a extension:

<id root="53C75E4C-B168-4E00-A238-8721D3579EA2"/>

If, on the other hand, you’re going to use “53C75E4C-B168-4E00-A238-8721D3579EA2” to identify the set of objects, then you need to assign each object an extension:

<id root="53C75E4C-B168-4E00-A238-8721D3579EA2" extension="42"/>

You’ll need some other system for uniquely assigning the extension to an object. Customarily, a database primary key is used.

With OIDs, the same applies. If you assign a new OID to each object (as is usually done in DICOM systems for images), then all you need is a root:

<id root="1.2.840.113663.1100.16233472.1.911832595.119981123.1153052"/> <!-- picked at random from a sample DICOM image -->

If, on the other hand, you use the same OID for the set of objects, then you use an extension:

<id root="1.2.840.113663.1100.16233472.1.911832595.119981123" extension="1153052"/>

And you assign a new value for the extension for each object, again, probably using a database primary key.

Alert readers will now be wondering, what’s the difference between those two? And the answer is, well, those 2 identifiers are not the same, even though they’re probably intended to be. When you use an OID, you have to decide: are sub-objects of the id represented as new nodes on the OID, or as values in the extension. That decision can only be made once for the OID. So it’s really useful how only about 0.1% of OIDs actually say in their registration (see below for discussing registration). So why even allow both forms? Why not simply say, you can’t use an extension with an OID? The answer is because not every extension is a simple number than can be in an OID. It’s common to see composite identifiers like “E34234” or “14P232344” or even simply MRNs with 0 prefixed, such as 004214 - none of these are valid OID node values.

Note for Australian readers: the Health Identifier OID requires that IHIs be represented as an OID node, not an extension, like this: 1.2.36.1.2001.1003.0.8003602346555520. Not everyone likes this (including me), but that’s how it is.

The notion that root + extension should be globally unique sounds simple, but I’ve found that it’s real tricky in practice for many implementers.

Identifier Type

When you look at an identifier, you have no idea what type of identifier it is:

<id root="53C75E4C-B168-4E00-A238-8721D3579EA2" extension="42"/>

What is this? You don’t know either what kind of object it identifies, or what kind of identifier it is. You’ll have to determine that from something else than the identifier - the II is only the identifier, not anything else. However it might be possible to look it up in a registry. For instance, the IHI OID is registered in the HL7 OID registry. You can look it up here. That can tell you what it is, but note two important caveats:

There’s no consistent definition of the type - it’s just narrative
There’s no machine API to the OID registry

For this reason, that’s a human mediated process, not something done on the fly in the interface. I’d like to change that, but the infrastructure for this isn’t in place.

So for a machine, there’s no way to determine what type of identifier this is from just reading the identifier - it has to be inferred from context - either the content around the identifier, or reading an implementation guide.

Different Roots for the same thing?

So what stops two different people assigning different roots to the same object? In this case, you’ll get two different identifiers:

<id root="53C75E4C-B168-4E00-A238-8721D3579EA2" extension="42"/>
<id root="1.2.3.4.5.6.7.8" extension="42"/>

Even though these are same thing. What stops this happening?

Well, the answer is, not much. The idea of registering identifiers is actually to try and discourage this - if the first person registers the identifier, and the second person looks for it before registering it themselves, then they should find it, and use the same value. but if they’re lazy, or the description and/or the search is bad, then there’ll be duplicates. And, in fact, there are many duplicates in the OID registry, and fixing them (or even just preventing them) is beyond HL7.

Difference between identifier and identified thing

In theory, any time you see the same object, it should have the same identity, and any time you see the same identifier, it should be the same object. Unfortunately, however, life is far from that simple:

The original system might not store the identifier, and it’s just assigned on the fly by middleware. Each time you see the same object, it’ll get a new identity. Personally, I think this is terrible, but it’s widely done.
The same identifier (e.g. lab report id) might be used for multiple different representations of the same report object
The same identifier (e.g. Patient MRN) might be used for multiple different objects that all represent the same real world thing
The same identifier (e.g. lab order number) might be used for multiple different real world entities that all connect to the same business process

R2 of the datatypes introduced a “scope” property so you could identify which of these the identifier is, but this can’t be used with CDA.

Incomplete identifiers

What if you know the value of the identifier, but you don’t know what identifier it actually is? This is not uncommon - here’s two scenarios:

A patient registration system that knows the patient’s driver license number, but not which country/state issued it
A point of care testing device that knows what the patient bar-code is, but doesn’t know what institution it’s in

In both these cases, the root is not known by the information provider. So in this case, you mark the identifer as unknown, but provide the extension:

<id nullFlavor="UNK" extension="234234234"/>

Note: there’s been some controversy about this over the years, but this is in fact true, even if you read something else somewhere else.