HL7 Standards and rules for handling errors

Mar 20, 2014

It’s a pretty common question:

What’s the required behavior if a (message document resource) is not valid? Do you have to validate it? what are you supposed to do?

That question can - and has been - asked about HL7 v2 messaging, v3 messaging, CDA documents, and now the FHIR API.

The answer is that HL7 itself doesn’t say. There’s several reasons for that:

HL7’s main focus is to define what is and isn’t valid, and how implementation guides and trading partners can define what is and isn’t valid in their contexts
HL7 generally doesn’t even say what your obligations are when the content is valid either - that’s nearly always delegated to implementation guides and trading partner agreements (such as, say, XDS, but I can’t even remember XDS making many rules about this)
We often discuss this - the problem is that there’s no right rule around what action to take. A system is allowed to choose to accept invalid content, and most choose to do that (some won’t report an error no matter what you send them). Others, on the other hand, reject the content outright. All that HL7 says is that you can choose to reject the content
In fact, you’re allowed to reject the content even if it’s valid with regard to the specification, because of other reasons (e.g. ward not known trying to admit a patient)
We believe in Postel’s law:

Be conservative in what you do, be liberal in what you accept from others

HL7 doesn’t know what you can accept - so it doesn’t try to make rules about that.

So what’s good advice?

Validation

I don’t think that there’s any single advice on this. Whether you should validate instances in practice depends on whether you can deal with the consequences of failure, and whether you can’t deal with the consequences of not validating. Here’s an incident to illustrate this point:

We set up our new (HL7 v2) ADT feed to validate all incoming messages, and tested the interface thoroughly during pre-production. There were no problems, and it all looked good. However, as soon as we put the system into production mode, we started getting messages rejected because the clerical staff were inputting data that was not valid. On investigation we found that the testing had used a set of data based on the formally documented practices of the institution, but the clerical staff had informal policies around date entry that we didn’t test. Rejected messages left the applications out of sync, and caused much worse issues. Rather than try to change the clerical staff, we ended up turning validation off

That was a real case. The question here is, what happens to the illegal dates now that you no longer accept them? If you hadn’t been validating, what would have happened? In fact, you have to validate the data, the only question is, do you validate everything up-front, or only what you need as you handle it?

Now consider the Australian PCEHR. It’s an XDS based system, and every submitted document is subjected to the full array of schema and schematron validation that we are able to devise. We do this because downstream processing of the documents - which happens to a limited degree (at the moment) - cannot be proven safe if the documents might be invalid. And we continuously add further validation around identified safety issues (at least, where we can, though many of the safety issues are not things that automated checks can do anything about).

But it has it’s problems too - because of Australian privacy laws, it’s really very difficult for vendors, let alone 3rd parties, to investigate incidents on site in production systems. The PCEHR has additional rules built around privacy and security which make it tougher (e.g. accidentally sharing the patient identifier with someone who is not providing healthcare services for the patient is a criminal offence). So in practice, when a document is rejected by the pcEHR, it’s the user’s problem. And the end-user has no idea what the problem is, or what to do about it (schematron errors are hard enough for programmers…).

So validation is a vexed question with no right answer. You have to do it to a degree, but you (or your users) will suffer for it too.

Handling Errors

You have to be able reject content. You might choose to handle failed content in line (let the sender know) or out of line (put it in a queue for a system administrator). Both actions are thoroughly wrong and unsafe. And the unsafest thing about either is that they’ll both be ignored in practice - just another process failure in the degenerate process called “healthcare”.

When you reject content, provide both as specific and verbose message as you can, loaded with context, details, paths, reasons, etc - that’s for the person who debugs it. And also provide a human readable version for the users, something they could use to describe the problem to a patient (or even a manager).

If you administer systems: it’s really good to be right on top of this and follow up every error, because they’re all serious - but my experience is that administrative teams are swamped under a stream of messages where the signal to noise ratio is low, but the real problems are beyond addressing anyway.