Should you use #FHIR resources as your storage format?

Previous: Australian Digital Health Strategy Question »

Should you use #FHIR resources as your storage format?

Mar 4, 2018

In discussions with implementers, one question that has come up quite a lot recently is whether you should store FHIR resources natively in your database or not. In principle, FHIR resources (like all HL7 specifications) are designed for exchange between systems, rather than as a database storage format. That doesn’t mean that we won’t consider utility or design requests intended to meet storage use cases, just that we will always prioritise design priorities around robust exchange. One practical consequence of this is that FHIR resources are highly denormalised, so that granular exchanges are fairly stand alone.

To illustrate this, consider using an RxNorm code as a medication code. In a prescription resource (MedicationRequest), that will look like this:

<MedicationRequest xmlns="http://hl7.org/fhir">
  <code>
   <coding>
     <system value="http://www.nlm.nih.gov/research/umls/rxnorm"/>
     <value value=""/>
   </coding>
  </code>
</MedicationRequest>

If we were prioritizing storage efficiency, we would normalise this so that the code was just some primary key reference:

<MedicationRequest xmlns="http://hl7.org/fhir">
  <code key="123431232"/>
</MedicationRequest>

..well, we probably would do that. Whether to normalise or not is an engineering design question that has multiple different considerations, and very often product designers change their decision about this for engineering reasons with no associated change in functionality or requirements.

When designing exchange formats, on the other hand, you almost always end up choosing robustness and stability over efficiency. Using a key reference like the second example would require all the participants in an exchange to synchronize the key… often not possible, and rarely stable. Not always, but most of the time, and we choose reference points after considerable discussion about reliability. Performance considerations come a distant second.

Engineering Considerations

So you might immediately assume that it would always be a mistake to store resources natively, but that’s not always the case.

Many implementers assume that a general purpose REST based implementation stack will always be less efficient, reliable and manageable than a purpose built application stack… but there’s some pretty awesome stacks for handling generic resources out there, with deep support for application development - code generation, UI binding, management reporting… so it’s no longer a given that you can do a better engineering job than just storing the resources directly.

I’ve seen implementers storing resources natively in:

classic SQL servers with JSON support
NoSQL servers like Mongodb / Couch etc (or Hadoop)
Google’s Big Query
Some RDF based store using the RDF in the turtle format

So if you can consider something like this for your storage, you can consider storing resources natively. If you can’t… then you won’t be asking the question.

But just because you can doesn’t mean you should.

Requirements

The most important determinate of whether you should store resources natively is about your requirements.

If your information requirements are nailed down: well understood, clearly expressed, and not subject to uncontrolled change - then you can easily design a data storage schema that is completely fit for purpose and magically efficient compared to storing FHIR resources. Classic Enterprise Information Systems tend to behave like this.

If, on the other hand, your information requirements are not at all nailed down, and you have to deal with whatever comes, and you only find out what’s coming on an ongoing basis - then FHIR resources are actually a very sound way to store your data - the extensibility approach makes FHIR a particularly robust choice. In fact, FHIR might already be your best choice outright. Clinical Data Repositories tend to behave like this.

The really interesting thing, though, is that every information system I’ve ever worked with is somewhere between these 2 extremes. And so the choice of whether to store FHIR resources natively or not is not straight forward.

In fact, I’m starting to see a lot of system taking a hybrid approach - storing FHIR resources natively, and storing a well controlled subset of information in some expressly designed scheme. The balance between these is driven by the stability of information requirements for different parts of the system.

Managing Joins

Going back to the question about normalisation and references - most systems that store resources natively find the way references work in FHIR to be a pain point; references can be absolute or relative URLs, they can be version specific or not, they may or many not follow FHIR’s well defined RESTful interface pattern, and they may or may not resolve in the local system (and the kind of systems where you’re storing resources directly are the kind of systems where you can’t always enforce referential integrity). All this flexibility is required in various exchange scenarios, but it’s a real tax if you’re filling a coherent data store with resources.

For that reason, I recommend that if you’re storing resources natively, you consider extending the storage format of your choice with a resolved link in addition to the existing reference/identifier in the Reference data type. The RDF format does this explicitly for references and codings (fhir:reference and fhir:concept) and Google’s protobuf implementation does something similar. (note that if you’re storing JSON, then you’ll have to remove the extension for exchange, unless you use a native FHIR extension - which isn’t the most efficient way to do it)