FHIR Ballot Issue: Representation of Identifiers
Oct 15, 2012Theory Most identifiers have two logical parts: the part that differs for each thing being identified, and the constant part that identifies the identifier itself.
The second part, the part that identifies the identifier itself, that‘s often taken as read, particularly in the old style work practices where each institution is an island to itself. Here, an institution assigns an MRN (medical record number) to a patient, and everyone exchanges the MRN, and just knows, by context, that it’s the local institutions MRN.
This pattern breaks down as soon as more than one institution starts exchanging information. The normal initial response from is to simply add a new field for each institution, so they all know each other’s identifier, but it rapidly becomes clear that you can’t go on like this, adding a new field for each participating institution.
The next step is to convert to a list of identifiers, where each has a name that identifies the local identifier. Using local names (“Acme Hospital”) won’t scale either, so the thing to do it to assign a namespace to the identifier, a formal naming system that identifies the identifier. To do this, you can either assign an opaque namespace (either OID or UUID) and keep a central registry of the namespaces and their usage, or assign a self-identifying namespace, which in practice means a URI. Aside: in v3, we decided that self-identifying namespaces wouldn’t work, that formal registration would be required in order to make this reliable. But in practice, curation hasn’t been funded, and isn’t working for the purpose that was intended. URIs will work better than OIDs/UUIDs
But once you start thinking in terms of URIs, why not simply make the identifier explicitly a URI, and keep the whole identifier always as a URL? In fact, that’s explicitly the way that the W3C is going, and the semantic web, and it’s certainly solid and scalable. (well, at least, as solid as your choice of specific URI is).
Exchange
So, how to exchange identifiers? I know of 3 options for this:
- Exchange local identifiers without any namespace. This is by far the most efficient in a local institution context, and still how we do things in v2, and still the majority practice, but it’s starting to become a legacy way to think.
- Exchange identifiers in two parts: namespace + identifier. This works really well where the context is in transition: legacy work practices in a wider context
- Exchange identifiers as single global identifiers
In v2, HL7 allows #1 and #2/#3 are hard: they require local agreement, which is kind of odd (you need local agreement to use identifiers that fall outside your local agreements…)
In v3, we required #2 or #3, though we only allowed #3 with OIDs and UUIDs, not URIs. #1 is not possible, which is an issue.
In FHIR, so far, we have a mix: some things – primarily technical identifiers – are specified as URIs. They can either be absolute or relative URLs (cases #3 and #1 respectively), and the absolute URLs can be urn:uuid: or urn:oid:. In other contexts – identifiers which are not part of the implementation framework, but external identifiers – we have used the Identifier type, which has two parts: system, and id. System is the namespace for the identifier – the system under which it was published, and id is the identifier itself. The system is a uri, which can be a OID or a UUID, for alignment with v3, or it can be anything else.
This type handles cases #1 (just an id) (a change agreed but not yet published in the FHIR site) and #2 (system and id). It also handles #3 by setting system to “urn:ietf:rfc:3986”, which identifies the id as a full uri.
The same pattern is used in the Coding Type, where the code is identified by a code and a system. However in the Coding type, the system is not just the namespace, it’s the logical definition of the terminology/classification/codes/enumeration, so you can’t use “urn:ietf:rfc:3986” here.
Open Issue
One regular comment that primarily comes from W3C/semantic web kind of folks is why to bother with the double form (system:id)? Why not simply use a URI, and allow either absolute or relative URLs? (And this has been made specifically as part of the FHIR ballot)
If you did this, you would handle #2 by defining a URI form for the concept so that it’s a single identifier, and then systems that want/need to do to case #2 can simply extract the local identifier out of the URL following the rules for the URL.
This has an advantage of making things simpler for case #3 (no longer need the “urn:ietf:rfc:3986”) and being consistent with the W3C / web approach.
However I see several issues with this approach:
- Variability within URIs
It’s common practice to conflate “identification” and “access” in URIs. Indeed, this is a primary advantage of them, but it’s also a problem. A typical example is Twitter, where I can be identified by:
- http://twitter.com/@GrahameGrieve
- http://twitter.com/GrahameGrieve
- https://twitter.com/@GrahameGrieve
- https://twitter.com/GrahameGrieve
All different URLs, but the same concept. If there’s a way to know which of these is the formally correct one that should be used to identify me, I didn’t find it in 5 minutes of googling, though my Twitter preferences page indicates that https://twitter.com/GrahameGrieve is the preferred id.
But this does rather complicates matters for the URI approach.
Behavior is specific to a URL
It’s not as simple as just pulling the terminal portion of the URL off. Consider the forthcoming specification from IHTSDO for identifying a concept by a URI. In this, concepts are identified by the general URI:
http://snomed.info/id/{sctid}
However there’s also a form
http://snomed.info/id/{sctid}/{aspect}
which also identifies the concept, but further identifies how it’s used. Still, it’s the same concept. There’s also this form:
http://snomed.info/sct/{sctid}/version/{timestamp}
I’m not saying that these forms shouldn’t be defined, or should be defined differently. They are each defined for a purpose (though I do think that if you’re going to use http: in a URI, you better organize for the URL to mean something). What this does show, however, is that processing the URI form is URI specific.
There’s no general solution
If we took away the split system:id approach currently supported in Coding, and insisted on a full URL, we’d need to get a URL system defined for everything…. It’s just not feasible. It’s been as much time as I can afford to simply get aligned on use of http://loinc.org and http://unitsofmeasure.org as URIs to identify LOINC and UCUM codes respectively. Aside: the initial versions of FHIR defined http://hl7.org/fhir/sid/loinc and http://hl7.org/fhir/sid/ucum respectively for these. I have received some strong comments that we shouldn’t use these, because we don’t own the loinc and ucum concepts. Well, of course we don’t own them. But nor do we own loinc.org and unitsofmeasure.org, so that we can simply assign these as the correct URIs (and create an expectation that they’ll resolve to an actual reference in a browser). So it has to be by negotiation. We still have some things using http://hl7.org/fhir/sid/… But this is absolutely not an assertion that we own them, only an assertion that this is how we identify that this is what they are.
Discussion
This is a FHIR ballot issue. Comments on this blog post are discouraged – unless to point out outright errors in this analysis. Discussion will be taken up on the FHIR email list, with substantial contributions added to this wiki page. A doodle poll will be held for the final vote – this will be announced on the FHIR email list.