v2 escaping question
Aug 2, 2011When a v2 message is created, 5 characters are reserved as special separators in the syntax. The usual characters are * | - field delimiter
- ^ - component delimiter
- & - subcomponent delimiter
- ~ - repeat delimiter
- \ escape delimiter
The last 4 characters can be changed - the actual characters used are found in the first field of the first segment. The escape delimiter is used when the other characters are part of the actual content of a field. \F\ represents that a “" is part of the character contents, for instance, and \E\ represents that a “" is part of the character content.
Note: This looks like really simple stuff: those 5 characters are replaced by the equivalent escape sequences when you write the message. But in practice, this is woefully done…
So the question is, if you escape a “" as \E\, how is “\E" represented? The answer is
\E\E\E|
Nice huh? The real point of the question is how you parse this - and the answer is, you must parse left to right. Regexes are dangerous. Be warned.
And don’t be too critical - this stuff is >20years old. I’d like people to stop using v2, but so many people are remarkably reluctant!