#FHIR, RDF, and JSON-LD

Previous: #FHIR DSTU ballots this year »

#FHIR, RDF, and JSON-LD

Apr 11, 2015

FHIR doesn’t use JSON-LD. Some people are pretty critical of that:

It’s a pity#FHIRhasn’t been made#JSONLDcompatible. Enormous missed opportunity for#ehealthinterop & simplicity.

That was from David Metcalfe by Twitter. The outcome of our exchange after that was that David came down to Melbourne from Sydney to spend a few hours with me discussing FHIR, rdf, and json-ld (I was pretty amazed at that, thanks David).

So I’ve spent a few weeks investigating this, and the upshot is, I don’t think that FHIR should use json-ld.

Linked Data

It’s not that the FHIR team doesn’t believe in linked data - we do, passionately. From the beginning, we designed FHIR around the concept of linked data - the namespace we use is http://hl7.org/fhir and that resolves right to the spec. Wherever we can, we ensure that the names we use in that namespace are resolvable and meaningful on the hl7.org server (though I see that recent changes in the hosting arrangements have somehow broken some of these links). The FHIR spec, as a RESTful API, imposes a linked data framework on all implementations.

It’s just a framework though - using the framework to do fully linked data requires a set of additional behaviours that we don’t make implementers do. Not all FHIR implementers care about linked data - many don’t, and the more closely linked to institutional healthcare, the more important specific trading partner agreements become. One of the major attractions FHIR has in the healthcare space is that it can serve as a common format across the system, so supporting these kind of implementers is critical to the FHIR project. Hence, we like linked data, we encourage it’s use, but it’s not mandatory.

JSON-LD

This is where json-ld comes into the picture - the idea is that you mark up you json with a some lightweight links, which link the information in the json representation to it’s formal definitions so that the data and it’s context can be easily understood outside the specific trading partner agreements.

We like that idea. It’s a core notion for what we’re doing in FHIR, so it sounds like that’s how we should do things. Unfortunately, for a variety of reasons, it appears that it doesn’t make sense for us to use json-ld.

RDF

Many of the reasons that json-ld is not a good fit for FHIR arise because of RDF, which sits in the background of json-ld. From the JSON-LD spec:

JSON-LD is designed to be usable directly as JSON, with no knowledge of RDF. It is also designed to be usable as RDF, if desired, for use with other Linked Data technologies like SPARQL.

FHIR has never had an RDF representation, and it’s a common feature request. There’s a group of experts looking at RDF for FHIR (technically, the ITS WGM RDF project) and so we’ve finally got around to defining RDF for FHIR. Note that this page is editors draft for committee discussion - there’s some substantial open issues. We’re keen, though, for people to test this, particular the generated RDF definitions.

RDF for FHIR has 2 core parts:

An RDF based definition of the specification itself - the class definitions of the resources, the vocabulary definitions, and all the mappings and definitions associated with them
A method for representing instances of resources as RDF

Those two things are closely related - the instances are represented in terms of the class model defined in the base RDF, and the base RDF uses the instance representation in a variety of ways.

Working through the process of defining the RDF representation for FHIR has exposed a number of issues for an RDF representation of FHIR resources:

Dealing with missing data: a number of FHIR elements have a default value, or, instead, have an explicit meaning for a missing element (e.g. MedicationAdministration: if there is no “notGiven” flag, then the medication as given as stated). In the RDF world (well, the ontology world built on top of it) you can’t reason about missing data, since it’s missing. So an RDF representation for FHIR has to make the meaning explicit by requiring default values to be explicit, and providing positive assertions about some missing elements
Order does matter, and RDF doesn’t have a good solution for it. This is an open issue, but one that can’t be ducked
It’s much more efficient, in RDF, to change the way extensions are represented; in XML and JSON, being hierarchies (and, in XML, and ordered one), having a manifest where mandatory extension metadata (url, type) is represented is painful, and, for schema reasons, difficult. So this data is inlined into the extension representation. In RDF, however, being triple based with an inferred graph, it’s much more effective to separate these into a manifest
for a variety of operational reasons, ‘concepts’ - references to other resources or knowledge in ontologies such as LOINC or SNOMED CT - are done indirectly. For Coding, for instance, rather than simply having a URL that refers directly to the concept, we have system + code + version. If you want to reason about the concept that represents, it has to be mapped to the concept directly. That level of indirection exists for good operational reasons, and we couldn’t take it out. However the mapping process isn’t trivial

In the FHIR framework, RDF is another representation like XML and JSON. Client’s can ask servers to return resources or sets of resources using RDF instead of JSON or XML. Servers or clients that convert between XML/JSON and RDF will have to handle these issues - and the core reference implementations that many clients and servers choose to use will support RDF natively (at least, that’s what the respective RI maintainers intend to do).

Why not to use JSON-LD

So, back to json-ld. The fundamental notion of json-ld is that you can add context references to your json, and then the context points to a conversion template that defines how to convert the json to RDF.

From a FHIR viewpoint, then, either the definition of the conversion process is sophisticated enough to handle the kinds of issues discussed above, or you have to compromise either the JSON or the RDF or both.

And the JSON –> RDF conversion defined by the JSON-LD specification is pretty simple. In fact, we don’t even get to the issues discussed above before we run into a problem. The most basic problem has to do with names - JSON-LD assumes that everywhere a JSON property name is used, it has the same meaning. So, take this snippet of JSON:

{ 
  "person" : {
    "dob" : "1975-01-01",
    "name" : {
      "family" : "Smith",
      "given" : "Joe"
    }
  },
  "organization" : {
     "name" : "Acme"
  } 
}

Here, the json property ‘name’ is used in 1 or 2 different ways. It depends on what you mean by ‘meaning’. Both properties associate a human usable label to a concept, one that humans use in conversation to identify an entity, though it’s ambiguous. That’s the same meaning in both cases. However the semantic details of the label - meaning at a higher level - are quite different. Organizations don’t get given names, family names, don’t change their names when they get married or have a gender change. And humans don’t get merged into other humans, or have their names changed for marketing reasons (well, mostly ;-) ).

JSON-LD assumes that anywhere that a property ‘name’ appears, it has the same RDF definition. So that snippet above can’t be converted to json-ld by a simple addition of a json-ld @context. Instead, you would have to rename the name properties to ‘personName’ and ‘organizationName’ or similar. In FHIR, however, we’ve worked on the widely accepted practice that names are scoped by their type (that’s what types do). The specification defines around 2200 elements, with about 1500 names - so 700 of them or so use names that other elements also use. We’re not going to rename all these elements to pre-coordinate their type context into the property name. (Note that JSON-LD discussed supporting having names scoped by context - but this is an ‘outstanding’ request that seems unlikely to get adopted anytime soon).

Beyond that, the other issues are not addressed by json-ld, and unlikely to be soon. Here’s what JSON-LD says about ordered arrays:

Since graphs do not describe ordering for links between nodes, arrays in JSON-LD do not provide an ordering of the contained elements by default. This is exactly the opposite from regular JSON arrays, which are ordered by default

and

List of lists in the form of list objects are not allowed in this version of JSON-LD. This decision was made due to the extreme amount of added complexity when processing lists of lists.

But the importance of ordering objects doesn’t go away just because the RDF graph definitions and/or syntax makes it difficult. We can’t ignore it, and no one getting healthcare would be happy with the outcomes if we managed to get healthcare process to ignore it. The same applies to the issue with missing elements - there’s no facilty to insert default values in json-ld, let alone to do so conditionally.

So we could either

Complicate the json format greatly to make the json-ld RDF useful
Accept the simple RDF produced by json-ld and just say that all the reasoning you would want to do isn’t actually necessary
(or some combination of those two)
Or accept that there’s a transform between the regular forms of FHIR (JSON and XML which are very close) and the optimal RDF form, and concentrate on making implementations of that transform easy to use in practice

I think it’s inevitable that we’ll be going for the 3rd.

p.s. should json-ld address these issues? I think JSON-LD has to address the ‘names scoped by types’ issue, but for the rest, I don’t know. The missing element problem is ubiquitous across interfaces - elements with default values are omitted for efficiency everywhere - but there is a lot of complexity in these things. Perhaps there could be an @conversion which is a reference to a server that will convert the content to RDF instead of a @context. That’s not so nice from a client’s perspective, but it avoids specifying a huge amount of complexity in the conversion process.

p.p.s there’s further analysis about this on the FHIR wiki.