#FHIR JSON format questions

Jun 6, 2018

There’s a thread running on chat.fhir.org about the FHIR JSON format. This thread needs some background, so here goes… Background

In FHIR, the base types are called ‘primitive types’ - things such as ‘string’, ‘integer’, ‘boolean’ etc. Like many other ‘primitive’ data types in healthcare specs (e.g. HL7 v2, HL7 v3, CDA, OpenEHR, EN13606) what are called ‘primitives’ are not actually quite as simple as ‘primitives’ in a pure 3GL sense (kind of like Boolean vs bool in C# or the equivalent in Java). In FHIR, the primitive data types all specialize “Element” which means that they are actually objects with an ‘id’ and extensions.

That’s not a solution to everyone’s liking… but all of these things are attempts to find the least worst balance between convenience and flexibility. If we changed the primitives so that they didn’t have id and and extensions, they’d have to be a whole lot bigger and heavier things… that still wouldn’t actually be primitives, they’d still be objects with lots of things in them, but things that hardly anyone would use

So when we represent the primitives in XML, they look something like this:

<string value="xxx">
  <extension url="yyyy"><valueString value="zzz"/></extension>
</string>

For XML users, this kind of XML is pretty standard. Originally, the JSON representation matched this exactly:

  "string" : {
    "value" : "xxx",
    "extension" : [{
       "url" : "yyyy",
       "valueString" : {
           "value" : "zzz"
       }
    }]
  }

If there’s no extension, then it looks like:

(XML)

<string value="xxx"/>

(JSON)

  "string" : {
    "value" : "xxx"
  }

That’s much more natural in XML (the content actually has to go somewhere) than in JSON, where it looks… not so good. And, in fact, our strong feedback was that we did the JSON this way because we didn’t use JSON and we we’re just replicating the XML in JSON without any understanding… which wasn’t true, but was something we heard.

Current JSON Form

So back when we balloted Release 1 of FHIR (DSTU 1), a ballot comment was made that we should represent primitives using the natural syntax in JSON:

  "string" : "xxx"

In the rare case where the element had an id or extensions - these are rare, but important, and they arise in a few places, mostly translations (subject I’ll take up later) - then the representation is rather more complex:

  
  "string" : "xxx",
  "_string" : { 
     "extension" : [{
       "url" : "yyyy",
       "valueString" : {
           "value" : "zzz"
       }
    }]
  }

It’s a basic rule in informatics that complexity won’t go away, you can only move it around. And in this case, we simplified 99% of all elements significantly, but complicated the representation of 1% of them significantly. Check out this worst case outcome:

 "name": {
    "family" : "du Marché",
    "_family": {
      "extension": [{
          "url": "http://example.org/fhir/StructureDefinition/qualifier",
          "valueString": "VV"
        }, {
          "url": "http://hl7.org/fhir/StructureDefinitioniso-21090#nullFlavor",
          "valueCode": "ASKU"
        }
      ]
    },
    "_given": [
        null,
        {
            "id": "a3",
            "extension": [{
                "url": "http://hl7.org/fhir/StructureDefinition/qualifier",
                "valueCode": "MID"
            }]
        },
        null
    ],
    "given": [
        "Bénédicte",
        "Denise",
        "Marie"
    ]
}

This was somewhat controversial but in the end, after significant discussion, we made this change. The community believed that the net simplification was a gain.

Revisiting the decision

The problem with this decision is that if you’re writing piecemeal client applications, this looks like a good trade-off. But if you’re using the JSON format as your storage format at scale in some JSON orientated database, then you have to deal with this complex outcome infrastructurally.

Aside: This is just like the reference implementations all had to do that back when we made the decision. The maintainers of the Reference implementations all voted against the JSON outcome or abstained, but were outvoted by the rest of the community, so we sucked it up and did the implementation - it sure complicated our code generators a lot….

If you have to deal with this everywhere, since it might happen very occasionally, this really doesn’t look like such a great outcome after all. And there’s quite a lot of people doing that. And so requests for us to revisit this decision bubble away in a very marrow part of the FHIR community, but they never actually get traction with the wider community. And the JSON thread that I referenced is similar - it’s not getting wider traction.

Note that the JSON thread throws several issues into the mix:

  1. revisiting the JSON decision above to the more consistent form
  2. changing the way polymorphic elements are represented
  3. changing the way internal references are represented

Each of these has it’s own pros and cons (though #2 depends on #1). #1 and #3 are strongly attractive in the database storage context, but less compelling for interoperability. #2 is probably context neutral (and really, should be a separate post about why we chose to do polymorphism the same way - it’s got it’s own long history).

So, what do we do about this? We could create a new … secondary … JSON format (though it would like quite a lot like the JSON-LD format that we decided to drop in favor of simply using turtle, partly because we didn’t want a second JSON format). Or we could just fix the JSON format, but that would pretty much mean JSON wouldn’t go normative along with everything else (and I can’t think of a bigger way to say ‘we don’t think JSON is as important as XML’, which is exactly not what we think). Or we can just say ‘for good or bad, we’ve made this decision’.

For me, I think this is last call on this issue - hence this blog post, to give it some airtime. (Unless we’re already past the point of last call - I suspect we are, but let’s be clear about that).

If you want to join the discussion on this - for or against - you can comment here on the blog - but better to reserve that for clarification questions, and go join the discussion here on chat.fhir.org