How do I represent end of lines in text content in HL7 v2?

Jul 6, 2011

This is a fairly common question because the HL7 v2 standard doesn’t actually describe how to do this explicitly. That’s because there’s several options. But first, background: An HL7 message consists of a series of “segments”, separated by the ASCII character 13 or 0x0D (sometimes in practice you also see 0x0A, or more comonly 0x0D0A, though these are not conformant). The segments are broken up into fields, components etc by special syntax characters (usually |, ^, & etc), and for each of these, there is an escape sequence for representing the special character if it’s actually part of the proper contents of the field. (note, btw, that some implementations don’t escape these characters in some types such as IS, because of a lack of clarity in old versions of the v2 specification, but this is wrong).

However there’s no escape character for #13 - why not?

The first part of the answer is that line breaks aren’t usually part of a proper field value, except when the content is narrative text (i.e. word processor content). So there’s no expectation that the line break characters will appear in normal fields. Word processor text is represented by fields with two particular data types: TX and FT

TX is plain text. Generally, TX fields repeat, and the expectation is that you would have a repeat (usually separated using the ~ character) between each line (see, for instance, section 2.A.78 or v2.6 chapter 2). However not all TX fields repeat. In these cases, the expectation really is that there won’t be multiple lines, but that’s just an expectation, not a rule. The way to represent a line break, if you have to, is to use the \X..\ sequence.

The \X..\ sequence introduces a series of bytes defined by their hexadecimal notation, such as \X0D0A. The standard carefully defines these as “bytes” not characters, and says that interpretation is up to local agreement. But it’s pretty much universal practice in my experience that a short sequence with some combination of the values 0D and 0A is interpreted as a line break. And, in fact, this sequence can be used in any ST data type as well, including e.g. names, but it’s pretty likely that most receiving systems will reject names like that, or otherwise behave badly.

FT is rich text, and behaves slightly differently. In FT, there’s a special escape sequence .br\ which means “Begin new output line”. This isn’t actually the same thing as a line break - it’s really a terminal instruction from the grand old days of yore (v2 really is old), but in my experience it’s universally mapped to a line break now. I’ve used “line-break” with it’s ambiguous meaning deliberately, btw. If you want to be sure which particular meaning - a short line break, or a paragraph break, you’ll have to discuss that with each particular application.

(This from a question first asked on anHL7 mailing list)