Question: v2 Encoding type field

Feb 19, 2014

Question:

As I understand the field 18 of the MSH segement defines the encoding of the complete HL7 message. But when receiving a HL7 message e.g. via TCP/IP, how can we know the encoding to interpret the received message, extract the MSH segement and analyse the field 18 to know which encoding is used?

Answer:

Yes, this is a tricky thing - you have to know what the encoding is, in order in read the encoding. At least that’s how it seems at first glance. However all the characters up until that point are ASCII, or safe to treat as ASCII. So this is how my parser works:

  • check to see if the stream is XML (use the XML/character detection routines described inappendix F of the XML specification)
  • Not XML? assume it’s vertical bars, and use equivalent logic to determine whether it’s 1, 2, or 4 byte encoding (first 3 characters are “MSH” or “FHS”)
  • Now that I know the characters/byte, read up to MSH-18 and read the encoding
  • Reset the stream to the start of the message, set up the right character encoding based on MSH-18, and reparse the message

Resetting the stream and re-parsing is actually not necessary - given that you can read to MSH-18, it’s (almost) safe to keep what you’ve read and just do character conversion of any strings read subsequently. Why don’t I do that? Two reasons:

  • It’s vaguely possible to get non-ASCII characters in MSH fields 3-6, and it might somehow matter (I’ve never seen it myself, but my parser is used in all sorts of contexts I’m not aware of)
  • The standard way to do things in most languages is to use the stream -> character routines (StreamReader in java etc). It’s easier, and I’ve found it simpler overall to maintain a very simple string based processor that reads MSH-18, and then to reset based on the standard class libraries - the performance hit is minimal

I think this is only an issue for general parsers that read messages without first knowing by arrangement what that character encoding is. XML works the same, btw - you have to read the xml character stream in order to read what the encoding is.