Ineroperability and Safety: Testing your healthcare integration

Mar 4, 2014

John Moehrke has a new post up about the importance of testing:

Testing is is critical both early and often. Please learn from others failures. The Apple “goto fail”provides a chance to learn testing governance lesson. It is just one in a long line of failures that one can learn the following governance lesson from. Learning these lessons is more than just knowing the right thing to do, but also putting it into practice.

Starting with the Apple Goto Fail: I was personally astounded that this bug existed for so long. John notes that this is an open-source library, though I think of this as “published source” not “open source”. And btw, NSA, thanks for letting Apple know about the bug when you found out about it - I’d hate to think that you preferred for us all to be insecure…

Anyway, the key thing for me is, why isn’t this tested? Surely such a key library on which so much depends, it’s surely tested every which way until it’s guaranteed to be correct? Well, no, and it’s not the only security library that has problems - even very similar ones. Though it’s probably properly tested properly by Apple now - or soon anyway (in fact, I figure that’s probably how they found the issue).

The interesting thing about this is how hard this bug is to test for - because it’s actually a bug in the interoperability code. Testing this code in a set of functional tests isn’t exactly hard, it just needs a custom written test server against which to test all the myriad variations for which it must be tested. That is, it’s not hard, it’s just a huge amount of work - and it’s work that programmers hate doing because it’s repetitive variations with little useful functionality, and terribly hard to keep the details straight.

Well, we can poke fun at Apple all we like, but the reality is that interfaces that integrate between different products are rarely tested in any meaningful sense. The only ongoing healthcare interoperability testing I know about that regularly happens in Australia is that the larger laboratory companies maintain suites of GP products so that they can check that their electronic reports appear correctly in them. Beyond that, I’ve not seen any testing at all. (well, of course, we always always sometimes to acceptance testing when the interface is first installed).

Interface engines typically include features to allow the integration code - the transforms that run in the interface engine - to be tested, and these things often are tested. But I’m not aware of them providing framework support for running tests that test the integration between two products that exchange information over the interface engine. Testing this kind of integration is actually really hard, because effective test cases are actually real world test cases - they test real world processes, and they need business level support.

And just like programmers have discovered: maintaining test cases is a lot of work, a lot of overhead. It’s the same for organizations doing business level test cases. What I do see almost everywhere is that production systems contain test records (Donald Duck is a favourite name for this in Australia) that allow people to create test cases in the production system; most staff automatically recognise the standard test patients and ignore references to them in worklists, etc. Interestingly, the pcEHR has no such arrangement, and end-users find that a very difficult aspect of the pcEHR - how do they find out how things work? Sure, they can read the doco, but that doesn’t contain the detail they need. In practice, many of the users use their own patient record for experimentation, and I wonder how many of the 15000 real clinical documents the system contains are actually test records against the author’s own patient record.

HL7 v2 contains a field in which “test” messages can be flagged. It’s not intended for use against the test records I discussed in the previous paragraph, but to indicate that the message is intended for test purposes. The field is MSH-11, and in v2.7 it has the following values:

D	1	Debugging
P	2	Production
T	3	Training

I’ve never seen this field used - instead, the test system - in the few cases where it exists - thinks it’s the production system, it’s just a different address, or maybe on an entirely physically separate network. So there’s no equivalent for this in CDA/FHIR.

So, real testing is hard. We’ll continue to get exposed by this, though sometimes it’s simply cheaper to pick up the pieces than prevent accidents - the degree to which that’s true depends on the level of human fall back systems have, but that’s gradually reducing, and it’s really hard to quantify. I don’t think we’re “safe” in this area at all.

What do you think? I’d welcome comments that describe actual testing regimes, or war stories about failures that happened due to lack of testing… I have more than a few myself.