Possible alternative syntaxes for #FHIR extensions
Feb 13, 2016At the last few HL7 working meetings, the FHIR core team has gathered on Thursday evening for a deep technical session. In Orlando, we talked about various open issues to do with profiling (and then we all went to Howl at the Moon). In Atlanta, we explored alternative syntaxes for extensions, which led to a proposal that we put to a fairly large committee meeting (~50 people), a combined cross-section of FHIR stakeholders in Orlando. That proposal died immediately - it got a very hostile reception, actually. A couple of the participants at the meeting thought that this was a reflection of insider bias, and asked me to present the proposal on my blog so that it gets a wider set of eyes looking at it. So here it is. Current Situation
Here’s an example resource with several extensions:
<Patient xmlns="[http://hl7.org/fhir](http://hl7.org/fhir)">
<id value="patient-example"/>
<extension url="[http://hl7.org/fhir/StructureDefinition/us-core-race](http://hl7.org/fhir/StructureDefinition/us-core-race)">
<valueCodeableConcept>
<coding>
<system value="[http://hl7.org/fhir/v3/Race](http://hl7.org/fhir/v3/Race)"/>
<code value="2106-3"/>
</coding>
</valueCodeableConcept>
</extension>
<extension url="[http://hl7.org/fhir/StructureDefinition/us-core-ethnicity](http://hl7.org/fhir/StructureDefinition/us-core-ethnicity)">
<valueCodeableConcept>
<coding>
<system value="[http://hl7.org/fhir/v3/Ethnicity](http://hl7.org/fhir/v3/Ethnicity)"/>
<code value="2135-2"/>
</coding>
</valueCodeableConcept>
</extension>
<extension url="[http://hl7.org/fhir/StructureDefinition/patient-clinicalTrial](http://hl7.org/fhir/StructureDefinition/patient-clinicalTrial)">
<extension url="clinicalTrialNCT">
<valueString value="NCT01647425"/>
</extension>
<extension url="clinicalTrialPeriod">
<valuePeriod>
<start value="2012-04-01"/>
<end value="2013-09-30"/>
</valuePeriod>
</extension>
<extension url="clinicalTrialReason">
<valueCodeableConcept>
<coding>
<system value="[http://snomed.info/sct](http://snomed.info/sct)"/>
<code value="254637007"/>
<display value="NSCLC - Non-small cell lung cancer"/>
</coding>
</valueCodeableConcept>
</extension>
</extension>
<extension url="[http://hl7.org/fhir/StructureDefinition/patient-birthTime](http://hl7.org/fhir/StructureDefinition/patient-birthTime)">
<valueDateTime value="2012-06-07T06:12:45-05:00"/>
</extension>
This is taken from one of the standard DAF examples, and excludes all the normal patient content (not related to this discussion). We - the core team - are not considering changing how extensions work, only how they are represented. Fundamentally, an extension is a pair: a URL to references a definition, and a value. But actually, on the wire, extensions are a triple: the URL, the value, and the type, so that parsers know how to parse the value without having to find the definition of the extension (and so that extensions can allow a choice of type for the value).
What we wanted to know was whether we could find a better syntax to represent extensions. What’s wrong with the current form? Well, feedback from implementers is consistent: the existing form is verbose, and the identity of the extension - the URL - is in an attribute, which is a step away from where you’d like it to be - in the element name.
Aside: it’s exactly the same issue in JSON as XML. And this blog entry will exclusively show XML, not JSON. That’s not because we don’t care about JSON - we do, greatly - but because as far as we are aware, the JSON issues are the same as XML, minus namespaces (yay!). So other than the namespace hacking option below, the same argument applies to JSON.
Prelude: XML Schema
The current approach to extensions is certainly the optimal approach to extensions if you want to use XML schema to describe the general wire format. As long as we want to support schema so that you can generate working code from the schema, then there’s no discussion to be had - we’re going to be sticking with the current extension format (at least in XML). This whole discussion presumes that we might change that if there was compelling advantage from dropping it. And if there is an advantage, it would be here.
Option #1
So the first option we considered looked like this:
<Patient xmlns="[http://hl7.org/fhir](http://hl7.org/fhir)" definition="[http://...qicore-patient](http://...qicore-patient/)">
<id value="patient-example"/>
<us-core-race>
<coding>
<system value="[http://hl7.org/fhir/v3/Race](http://hl7.org/fhir/v3/Race)"/>
<code value="2106-3"/>
</coding>
</us-core-race>
<us-core-ethnicity>
<coding>
<system value="[http://hl7.org/fhir/v3/Ethnicity](http://hl7.org/fhir/v3/Ethnicity)"/>
<code value="2135-2"/>
</coding>
</us-core-ethnicity>
<clinicalTrial>
<NCT value="NCT01647425"/>
<clinicalTrialPeriod>
<start value="2012-04-01"/>
<end value="2013-09-30"/>
</clinicalTrialPeriod>
<clinicalTrialReason>
<coding>
<system value="[http://snomed.info/sct](http://snomed.info/sct)"/>
<code value="254637007"/>
<display value="NSCLC - Non-small cell lung cancer"/>
</coding>
</clinicalTrialReason>
</clinicalTrial>
In this scheme, an applicable profile is declared on the root. The profile defines the wire format names for the extensions. In order to parse this, a parser needs to access the definition so it can parse the content. There’s several problems with this approach:
- if you can’t get the definition, you can’t parse the (unknown) content
-
This can be an issue of networks, but also time. (E.g. Looking at a record 20 years later). It would essentially force systems to store the definition with the data
- you have to pick one definition - one profile, and fall back to the old extension syntax for extensions defined in other profiles (and mixing profiles is a common thing to do)
- if you do this, it’s basically a ‘local syntax’, and you have to have perimeter exchange methods, or use a reference implementation to read the content. The practical ramifications of this are less than ideal
Note: The idea of perimeter exchange methods comes from the idea that a group of implementers could form their own private club, and use FHIR just among themselves. In this scheme, they can assume that they know each other’s secrets, and take all sorts of short cuts. But as soon as resources leak outside their little club (as will almost certainly happen), the secrets won’t be known. Hence, on the perimeter of the club, the secrets have to be unwound. In this case, the secret is the ‘known’ definition profile.
In effect, then, this would be a local optimization of FHIR - green FHIR, if you want. That’s fine, but that’s not the solution we’re looking for.
Option #2:
<Patient xmlns="[http://hl7.org/fhir](http://hl7.org/fhir)">
<id value="patient-example"/>
<us-core-race xmlns="[http://hl7.org/fhir/StructureDefinition/](http://hl7.org/fhir/StructureDefinition/)">
<valueCodeableConcept>
<coding>
<system value="[http://hl7.org/fhir/v3/Race](http://hl7.org/fhir/v3/Race)"/>
<code value="2106-3"/>
</coding>
</valueCodeableConcept>
</us-core-race>
This option misuses the namespace technique in xml to inline the extension name (for JSON, you’d just inline the name by itself). We didn’t seriously talk about this for very long. The biggest problem is that names are variable because there’s no imposed uniqueness on them. Unless you ban using name clashes (which we have no ability to do, and we wouldn’t have any structure that we could use to prevent implementers getting caught between two uses of the same name, the names have to change depending on the local context. This is also the form that json implementers didn’t like (and, btw, there’s no way to deal with modifiers). This option just wasn’t a good idea. Namespaces aren’t the problem, and they’re not the solution either.
Option #3:
<Patient xmlns="[http://hl7.org/fhir](http://hl7.org/fhir)">
<schema>
<item name="us-core-race"
url="[http://hl7.org/fhir/StructureDefinition/us-core-race](http://hl7.org/fhir/StructureDefinition/us-core-race)" type="CodeableConcept"/>
<item name="clinicalTrial"
url="[http://hl7.org/fhir/StructureDefinition/patient-clinicalTrial](http://hl7.org/fhir/StructureDefinition/patient-clinicalTrial)">
<item name="clinicalTrialNCT"
url="clinicalTrialNCT" type="string">
<item name="clinicalTrialPeriod"
url="clinicalTrialNCT" type="Period">
<item name="patient-clinicalTrialReason"
url="clinicalTrialNCT" type="CodeableConcept">
</item>
</schema>
<us-core-race>
<valueCodeableConcept>
<coding>
<system value="[http://hl7.org/fhir/v3/Race](http://hl7.org/fhir/v3/Race)"/>
<code value="2106-3"/>
</coding>
</valueCodeableConcept>
</us-core-race>
<clinicalTrial>
<clinicalTrialNCT value="NCT01647425"/>
<clinicalTrialPeriod>
<start value="2012-04-01"/>
<end value="2013-09-30"/>
</clinicalTrialPeriod>
<clinicalTrialReason>
<coding>
<system value="[http://snomed.info/sct](http://snomed.info/sct)"/>
<code value="254637007"/>
<display value="NSCLC - Non-small cell lung cancer"/>
</coding>
</clinicalTrialReason>
</clinicalTrial>
This form starts with a manifest that declares the names, types, and urls for all the extensions. Then you just use the name on the wire.
The ramification of this is that you have to iterate the content twice when you write, so the schema can be complete upfront - and the names are still variable too. It’s also really hard to build with xslt or equivalent - you need serious logic to build the schema on the fly. This would impose that serious logic on anyone writing resources that included extensions. Also, if you were migrating content from one resource to another, you’d have to track the manifest, and potentially rename everything. And you could never use stable names for extensions in your code- you’d still end up accessing extensions by their URLs
Summary:
Well, by the time we got here, we were a bit glum. We hadn’t come up with any thing that was remotely a candidate alternative. None of these approaches represents a net benefit of the current approach - while they look good (or, at least, the extensions look better), the practical ramifications for readers and writers are unhappy.
Final option:
The final option we considered was radically different to the others. Here’s what it looks like:
<Patient xmlns="[http://hl7.org/fhir](http://hl7.org/fhir)">
<id value="patient-example"/>
<birthTime value="2012-06-07T06:12:45-05:00"/>
<active value="true"/>
<name>
<use value="official"/>
<family value="Lerr"/>
<given value="Todd"/>
<given value="G."/>
<suffix value="Jr"/>
</name>
<telecom>
<system value="phone"/>
<value value="(555) 555 1212"/>
<use value="work"/>
</telecom>
<telecom>
<system value="email"/>
<value value="[email protected]"/>
<use value="work"/>
</telecom>
<gender value="male"/>
<birthDate value="2012-06-07"/>
<deceased type="boolean" value="false"/>
<address>
<use value="home"/>
<line value="123 North 102nd Street"/>
<line value="Apt 4d"/>
<city value="Harrisburg"/>
<state value="PA"/>
<postalCode value="17102"/>
<country value="USA"/>
</address>
<us-core-race type="CodeableConcept" list="true">
<coding>
<system value="[http://hl7.org/fhir/v3/Race](http://hl7.org/fhir/v3/Race)"/>
<code value="2106-3"/>
</coding>
</us-core-race>
<us-core-ethnicity type="CodeableConcept">
<coding>
<system value="[http://hl7.org/fhir/v3/Ethnicity](http://hl7.org/fhir/v3/Ethnicity)"/>
<code value="2135-2"/>
</coding>
</us-core-ethnicity>
<clinicalTrial>
<clinicalTrial.NCT type="string" value="NCT01647425"/>
<clinicalTrial.period type="Period">
<start value="2012-04-01"/>
<end value="2013-09-30"/>
</clinicalTrial.period>
<clinicalTrial.reason type="CodeableConcept">
<coding>
<system value="[http://snomed.info/sct](http://snomed.info/sct)"/>
<code value="254637007"/>
<display value="NSCLC - Non-small cell lung cancer"/>
</coding>
</clinicalTrial.reason>
</clinicalTrial>
</Patient>
You can tell that we liked this enough to consider it seriously because we went ahead and filled out the rest of the resource.
Note that there a second thing going on in this resource: we changed the way that choice types work. Instead of using value[x], and appending the type name to the element, we used a type attribute (of course, in practice, in spite of the difficulties it presents, if we actually did this, we’d use xsi:type in the XML. In JSON, you’d have a property named “type” or “dataType”. This is a basically separate notion, but is required to eliminate the “value” element.
This is a very nice representation for extensions. Sweet. But in order to make this work - e.g. to avoid name clashes - we would need a global registry that contained a list of wire format names, along with, for each name, the following information:
- name - the name in the instance
- url - the canonical url for the extension
- modifier - whether this is a modifier
- list - whether this is repeats or not
- type[] - what types this name can be
- path[] - where the name can appear
- children[]
- name
- type
- list
- children[] (etc)
And this global registry would need to be very carefully curated so that the names are useful, and remain useful as it grows. That’s a tough challenge.
Implementations would take one of two approaches to reading the instances:
- download a snapshot of the global registry occasionally, and use that snapshot when reading and writing resources (e.g. the reference implementations would do this)
- download a snapshot of the global registry at design time, and generate schema / code / something etc for whatever tool/code reads and writes the resources
So, if we established such a registry, and made general purpose implementations (the ones that have to deal with multiple profiles) more complicated, we’d have simpler instances, and it would be easier for single purpose applications. Is that worth it? Well, that’s the question we put to the combined committee group at Orlando. Nearly everyone in the group - except for a small but keen minority - was strongly against this option, for the following reasons:
- it complicates general purpose applications
- Although most individuals start with single purpose applications, most will quickly migrate to multi-purpose applications (or even to general purpose applications)
- Implementers like having a single general purpose schema.
- Running the registry well - particularly with a good SLA on turn around - would be onerous and expensive
ok. So perhaps that group over-represented the general implementers, and didn’t represent the single-purpose implementers - though the room was weighted towards vendors and consultants who support single-purpose implementers extensively, so it possibly wasn’t as unbalanced as it first looked. So here’s a write up of the discussion so that participants not in the room can consider and have their say.
p.s. My own opinion is that I’d much prefer not to have the registry, since running it would be my concern. And it doesn’t sound like fun. Actually updating the reference implementations to use the global registry would be straight forward, but it does introduce a step to manage - downloading the latest definitions. I think it would be hardest for server authors who don’t use the reference implementations - sufficiently hard that I wonder whether it would become infeasible to write servers without the reference implementations. So in the end, my personal preference is to keep things as they are.