Straw Poll: Choice of a CDA packaging specification

Jul 27, 2011

One of the more difficult issues facing the NEHTA CTI team is what to choose for a technical CDA packaging strategy. We need a coherent strategy because CDA documents have assorted attachments. These attachments can include things like: * Images

  • Alternative document format representations (such as pdf / rtf)
  • Digital Signatures

The CDA specification itself says (section 3) that when a CDA document is transferred from one place to another, all components of a CDA document must be able to be included in a single exchange package, including if the transfer is across a firewall, and that there is no need to change any of the references in the CDA document. In practice, in NEHTA specifications, this leads to the following rules:

  1. All ED.references to simple images or data must be relative URLs
  2. The reference name must be a GUID followed by a standard file extension where one exists (i.e. jpg, gif, png, etc)
  3. The location of the document must be the same location as the attachments
  4. When a document is moved from one location to another, the attachments must be moved with it
  5. If an image is too big to be moved around with the document it cannot be part of the attested content and other referencing methods will to be used instead)
  6. The digital signature(s) is part of the document package

The second rule – GUIDs – is for a different reason, and is discussed below. Example for #2:

      <value xsi:type=”ED” mediaType=”image/jpeg”>
        <reference value=” 0E7AD252-9C55-4499-8F11-75B2F4F4E584.jpg”/>
      </value>

These rules are fine – and relatively uncontroversial once the full breadth of the NEHTA specifications is understood. (for techie readers, what is above is the abstract CDA package syntax). But what is controversial is the choice of technology to use for the package form (for techies, the concrete syntax).

Broadly speaking, we have the following choices:

Package

  1. Mime packages
  2. Zip archives

Metadata

  1. Implicit in part names (very lightweight)
  2. Custom light metadata
  3. The full IHE XDS metadata

1-A is what the v2 standard describes. 2-C is IHE XDM. 1+2-B/1-+2C is what the DIRECT project is using (along with S/MIME)

We’ve been churning internally in NEHTA on what to choose – it’s not an obvious choice. And requirements don’t really seem to help. This is my list of requirements:

  • Easy / ubiquitous implementations
  • Acceptable to  implementers & standards committees
  • Can determine from outside the package
  • Can be represented as base64 in an xml document for SMD
  • Can easily be inserted into an HL7 v2 message
  • Can contain multiple xml documents without having to remediate unique ids or escape xml tags
  • Doesn’t load up the structure with needless features that cost to implement and aren’t relevant to implementers
  • Not overly inefficient (speed over storage requirements)

For me, the single most important requirement of these is that whatever we choose has to be palatable as a solution when it comes forwards to the IT-14 standards committees for approval. But these committees are committees and we get multiple different opinions from influential members, as we do from implementers

Hence this straw poll. Below, after the analysis, in the comments, please vote your preference. Along with your vote, please include your role. Are you a programmer? An analyst? A standards person? Something else?

Thanks

Mime vs Zip

Mime is the classic packaging format used most of all in email (for attachments). It’s also used in some other internet protocols, including html packages by IE, and by the SOAP protocol for attachments (sometimes). It’s also used in the DIRECT project which is widely regarded as the bees knees at the moment.

Mime has problems though:

  • It’s common to encounter broken mime packages (any regular email user will have seen them)
  • Mime is very flexible – too flexible for this task, which only uses a subset of it’s functionality
  • The important development platforms (DotNet, Java, Delphi, 4G, foxpro – list based on my knowledge of what is being used in Australia for clinical systems) don’t have libraries that support mime.

Zip has a different set of pros and con’s. It’s almost ubiquitous for transfer of groups of files between people (though we’re all getting good at renaming file extensions). There are standard zip libraries for zip on many development platforms, including the common languages listed above.

IHE XDM uses .zip format, as does the docx file format. IHE chose it because of problems with consistent support for mime packages, though I’ve also had problems with zip file consistency across the libraries as well.

Zip has problems:

  • Zip is very flexible – too flexible for this task, which only uses a subset of its functionality
  • Zip doesn’t have any real name-value headers for simple metadata (if that matters, see below)
  • The only relevant standard using Zip is XDM – see metadata discussion below
  • Zip imposes a fixed compression/decompression cost on the software, even when compression isn’t justified (i.e. it’s not delegated to the hardware) (and I bet some readers thought I was going to call zip compression an advantage ;-))

So you can see the question with zip is about metadata – but actually, the problem with Mime is kind of the same. And the choice of mime vs zip is sort of dependent on the metadata choice (even though it isn’t strictly technically linked).

Metadata

There are two kinds of metadata that might be in the package. One set of metadata which is pretty much needed is which of the parts is the CDA Document, and which is the digital signature. Those two parts independently point to all the other parts and give them meaning, and so they need to be clearly identified somehow. One way to do this is to just fix the part names (ie. Cda.xml and digsig.xml or something like that). Alternatively you can just add a light XML manifest file that represents these things more explicitly (and it turns out that there is a custom built zip profile with digsig and content support called OPC, used in office documents, which is an ISO Spec, but wasn’t a popular choice).

The other kind of metadata is to reproduce the stuff from inside the CDA document and put it in the metadata – patient, author, document type etc. This is what IHE does in the XDS metadata. The problem with this is the metadata can get quite extensive, and quite onerous. Some of the obligatory metadata may not even be contained in the document (such as practiceTypeCode).

This creates problems with synchronisation – is it wrong for the metadata to disagree with the document? That might be legitimate because the patient’s details have changed between writing the document (and freezing it with a digital signature) and actually submitting the document. But it might also be a mistake – the fields differ in error. What do you do about that?

IHE adds a further problem to this because the metadata is controlled by an affinity domain -  which may differ for each XDS repository. Someone has to define the affinity domain. If we aren’t careful, we might end up with multiple different affinity domains specifying different metadata in Australia – which would mean different metadata packages for different repositories – definitely a situation we want to avoid.

Finally, the XDS metadata format is frankly horrifying. CDA is a bit yuck, but that’s nothing compared to the metadata – it’s nearly the worst xml I’ve seen (nothing will catch up with XMI though in the worst XML stakes!).

XDM – Zip + XSD metadata – assumes an affinity domain (the IHE spec is confusing – claims it doesn’t need one, but refers to XDS metadata that does need one). The DIRECT project defines an affinity domain based on the HITSP code sets fixed in C64 (which is US specific). XDM also has the interesting property of allowing multiple different documents in the package and/or the repository (obviously) – as long as there’s no name clashes, which is why one of the rules above is that all names must be GUIDs, so that packages may include multiple documents and their attachments if required.

Making the base package use XDM is great for an XDS-based central repository provider and gateways to that – the metadata they want is already populated (especially if the metadata includes things not in the document). It’s quite onerous for all the perimeter end user systems, who populate it whether the document is going to a repository or not.

So, we can:

  • Just use parts with fixed names. (doesn’t scale into XDS – though we’re not sure whether this is an issue yet)
  • Define a simple XML form for the a set of metadata that may or not be the XDS metadata
  • Just simply bite the bullet and use the XDS metadata, and agree on an affinity domain for all Australia (at least the parts that affect the metadata)

Confused? That’s why we’ve been churning.

We need to pick an approach very soon, and whatever we pick has to be acceptable to the relevant standards committees when they come forward. ETP used OPC-Zip, but this was badly received, and probably not the best described above.

What is the best choice? Vote in the comments. I will approve all non-abusive comments, anonymous or otherwise, but I’d prefer non-anonymous comments. Please indicate in the comments what type of interest you have (implementer, analyst, standards, etc). The results of this straw poll won’t be binding, but will genuinely make a difference. p.s. I understand that many of the interested parties here aren’t allowed to comment on blogs, so I’ll also accept contributions by email at [email protected]