JSON, property order, polymorphism, and streaming based parsers

Feb 25, 2014

On the FHIR email list, we’ve been discussing a proposal to simplify the FHIR JSON representation. The core problem is that in order to deal with a point in the structure where there’s a choice of types, we did it like this:

{  property-name : { 
  choice-name : { 
   choice-properties }  
  } 
}

There’s a list of what choice-name can be for each point where we have this structure. And the different choices share common properties - this is a case of polymorphism. This is a format that doesn’t suit consumers who navigate the the JSON directly, though. In order to navigate to the common choice properties of the thing that is represented by choice-name, you have to skip the choice-name. So it would be something like this in javascript:

property-name[Object.keys(property-name)[0]).choice-property-name]

Yuck. So after discussion about this at the HL7 face to face meeting, we changed the representation to this:

{  property-name : { 
   type : choice,  
   choice-properties  
  } 
}

This form of representation has two problems:

  • As far as I can read JSON schema, JSON schema works like XML schema - content models are tied to property names, and there’s no way to make it so that the content model depends on the value of a property. We decided we didn’t care. No one has mentioned JSON schema to us as an implementation good (though someone has now)  (later update: this may not be true now. JSON schema investigation is on the to do list)
  • You can’t use a streaming JSON parser (and there’slots of these) with the second approach, because the type property may not come first, and so you’re reading the thing before you know what type it is. Well, you can, but the scheme roughly works like this: cache the stream, saving the events until you come to the type property. Then reparse the stream again, and then go on. And… this can happen in a nested fashion. YUCK.

So we kind of have a choice:

  1. A JSON syntax that works well for clients, but is a bitch to parse efficiently for a server
  2. A JSON syntax that isn’t hard for streaming based parsers, but isn’t so nice for client type processing
  3. Fix the property order so that the type property comes first.

Note, btw, that this isn’t a problem for parsers based on an object model - the majority of parsers - because then you can access the data in an object structure, and order simply doesn’t matter. JSON defines properties as an unordered map of name / values, and therefore it doesn’t really cater to streaming based parsers (and option #3 above would be problematic in the extreme, since many libraries can’t control the order of the properties)

It seems to me that this problem is not specific to FHIR - it’s going to arise anywhere you mix polymorphism, JSON, and streaming parsers. But googling around, I can’t find any information about this. Take, for example, this JSON example:

{
  "animals":
   [
     {"type":"dog","name":"Spike"},
     {"type":"cat","name":"Fluffy"}
   ]
}

This is from a code snippet where the author is writing about deserializsation into Java classes - exactly the problem I’m talking about here: what do you do if type doesn’t come first? That’s never discussed, and I can’t be bothered reading the Jackson implementation in detail to figure out what the right questions to ask are. But note this, from the Jackson advantage list:

incremental (“streaming”) parsing and generation: high-performance, low-overhead sequential access. This is the lowest-level processing method, comparable to SAX and Stax APIs for XML processing. All packages must have such a parser internally, but not all expose it.

The alternate approach is here: http://stackoverflow.com/questions/5186973/json-serialization-of-array-with-polymorphic-objects - this is not so good for client based processors of the data, but nice for SAX based parsers.

It seems that JSON isn’t that simple after all.

Update: Json.NET, the most popular json parser for .NET is dependent on order: http://stackoverflow.com/questions/17032769/json-net-seems-to-rely-on-type-being-the-first-property-is-there-a-way-to-lift