FHIR Issue: Invariants based on dataAbsentReason

Sep 17, 2012

This is a ballot comment made against the FHIR specification:

The presense of a DAR is used in several cases in the datatypes as a means of loosening the rules for what datatype properties need to be present. However, this is mixing two things. DAR is relevant when there’s a use-case for knowing why an element is missing. This is a distinct use-case from choosing to allow partial or incomplete data. For example, I might want to allow a humanId that doesn’t allow unique resolution without wanting to capture “why the id isn’t fully specified”. We need to separate “Partial data allowed” from “reason for absent/incomplete data allowed”.

Background

In the v3 data types (+ISO 21090) you can label a data type as “mandatory”. If you do so, it must be present, and it must have a proper value. Specifically, this means that it must be not be null - there must be no nullFlavor, either applied explicitly, or implied by simply leaving the attribute right out of the an XML representation. Each type definition can link into this rule and make extra rules about what other data type attributes must have values if there’s no nullFlavor. For instance, with the type II, which has a root and an extension, if there’s no nullFlavor, there must be at least a root:

invariant(II x) where x.nonNull { root.nonNull; };

By implication, the root or root/extension must also be globally unique: this must be a proper identifier. This system makes it easy to say that an instance has to have a proper identifier for something: simply label the id : II attribute as mandatory.

FHIR follows this same pattern, though the presentation is different. When you include an element in a resource, you can indicate a minimum cardinality, and say whether a dataAbsentReason (which equates to a nullFlavor) is allowed.

 <**identifier** d?><!-- **1..1** Identifier A system id for this resource --></identifier>

This says that the resource must have an identifier, but it can have a dataAbsentReason. So you could do something like this:

** **<identifier>** **  
  <system>**http://acme.org/patient**</system>** **
  <id>**12345**</id>** **
 </identifier>

Ok, an identifier. But you could also do this:

**** <identifier dataAbsentReason="notasked">** ** 
   <system>**http://gov.country/health/id**</system>** **
 </identifier>

This indicates that the identifier (a national healthcare one in this case) simply wasn’t asked. So, how does a resource definition say that there must be an identifier - that you can’t get away with providing an incomplete identifier? like this:

 <**identifier**><!-- **1..1** Identifier A system idfor this resource --></identifier>

Because the identifier doesn’t allow a dataAbsentReason (no “d?”), the second form is not allowed. Only, what stops this following form from being allowed?:

 <identifier>** ** 
   <system>**http://gov.country/health/id**</system>** **
 </identifier>

The answer is this constraint made on the Identifier type:

Unless an Identifier element has a dataAbsentReason flag, it must contain an id (xpath: exists(@dataAbsentReason) or exists(f:id))

Response to Comment

The issue that the commenter has is that two separate ideas are conflated: whether you can allow incomplete data, and whether you need to say why incomplete data is provided. These are two different things, but we always conflated them in v3. And we did that because it’s easy: if you unhook these things, then it’s much more difficult to say that a proper value (i.e. identifier) must be provided. Instead of saying that you just can’t provide a dataAbsentReason, in addition (or alternatively), you have to define what a ‘proper’ that is required is - and potentially, therefore, how this relates to the expected use of dataAbsentReason. This will be much more complicated than the current system.

So there are two separate things to discuss with relation to this comment:

  • Does the case - providing incomplete data without having to/being able to provide a reason for incomplete data - justify making the implementation experience much more complicated?
  • If it does, how much would you provide these rules must effectively

1. Is the case justified?

I don’t think it is - it’s never come up all my v3 experience, either in my own implementations, in my experience as the go-to guy for the v3 data types, or in committee. I’m pretty sure I would remember it if it had. Why not just use unknown? I guess there’s some fractional use case where it might not be unknown, but you can’t say what it is, and you can’t use some other dataAbsentReason. Maybe we should add a dataAbsentReason “?” for use in this case?

2. How else to specify the rules

Well, in the end, the rules are specified by XPath (exists(@dataAbsentReason) or exists(f:id)) - this is what enforcement is based on. We the most obvious thing is to take this out of the Identifier datatype and push it to the resource definition. We’re going to get XPath all over the place… I think that this is a real cost for the implementers.

An alternative approach is to define a profile for each data type that says what a “proper” value is. This offers re-use and flexibility, but would mean that a key aspect of many resources - basic pre-conditions for validity - is moved to somewhere more obscure, which will make for more complexity. (note that you can’t profile data types at the moment - the commenter made a request to be able to profile data types as well, which we had not allowed at this point because the potential complexity seemed unjustified)

Discussion

This is a FHIR ballot issue. Comments on this blog post are discouraged - unless to point out outright errors in this analysis. Discussion will be taken up on the FHIR email list, with substantial contributions added to this wiki page. A doodle poll will be held for the final vote - this will be announced on the FHIR email list.