v2 escaping question

Aug 2, 2011

When a v2 message is created, 5 characters are reserved as special separators in the syntax. The usual characters are * | - field delimiter

  • ^ - component delimiter
  • & - subcomponent delimiter
  • ~ - repeat delimiter
  • \ escape delimiter

The last 4 characters can be changed - the actual characters used are found in the first field of the first segment. The escape delimiter is used when the other characters are part of the actual content of a field. \F\ represents that a “" is part of the character contents, for instance, and \E\ represents that a “" is part of the character content.

Note: This looks like really simple stuff: those 5 characters are replaced by the equivalent escape sequences when you write the message. But in practice, this is woefully done…

So the question is, if you escape a “" as \E\, how is “\E" represented? The answer is

\E\E\E|

Nice huh? The real point of the question is how you parse this - and the answer is, you must parse left to right. Regexes are dangerous. Be warned.

And don’t be too critical - this stuff is >20years old. I’d like people to stop using v2, but so many people are remarkably reluctant!