Transforming Pathology PIT reports into NEHTA Clinical Documents

Jun 11, 2012

This is the second of a series of posts dealing with representing pathology reports in NEHTA Discharge summaries. The first post dealt with the transforming the TX format into CDA. This post deals with the PIT format. Follow up posts will deal with the FT format, and then with the general context of representing pathology reports in clinical documents.

The PIT format is an old format for sending pathology reports to the requesting doctor. It’s throughly outdated, everybody dislikes it, and no one is supposed to send it anymore. So, of course, you run into it all over the place. The documentation can be found here.

Here’s a sample of the core of a report (just the bits I’m interested in for this post):

301 ~SBLK~FINAL REPORT~EBLK~
301 ------------
301      Prothrombin Time    ~SBLD~ 17~EBLD~  seconds
301      I.N.R.              ~SBLD~1.6~EBLD~  (International Normalised Ratio)
301 __________________________________________________________________
301 Cumulative INR Report       Date        INR        Req.No.
301                          08/06/2010     ~FG05~1.4~FG15~        254084
301                          14/06/2010     2.0        259680
301                          28/06/2010     2.5        272727
301                          05/07/2010     ~FG04~4.8~FG99~        279365
301                          12/07/2010     3.0        286035
301                          19/07/2010     1.6        292225~DFLT~
301
301  ÚÄÄÄINRÄÄÄÂÄÄÄÄÄÄÄÄÄCONDITIONÄÄÄÄÄÄÄÄÄÄÄÄÄÄÂÄÄLENGTH OF THERAPYÄÄ¿
301  ³ 2.0-3.0 ³ Atrial Fibrillation            ³    Long Term        ³
301  ³ 2.0-3.0 ³ Bioprosthetic Valve            ³    3 months         ³
301  ³ 2.0-3.0 ³ Acute Myocardial Infarction    ³ 3 months ( > if AF) ³
301  ³ 2.0-3.0 ³ Cardioembolic CVA, Rec. DVT/PE ³    Long Term        ³
301  ³         ³ Dilated Cardiomyopathy         ³                     ³
301  ³ 2.0-3.0 ³ Venous Thrombosis and PE       ³    3-6 months~EUND~       ³
301  ³ 2.5-3.5 ³ Mechanical Heart Valve         ³    ~SUND~Long Term~DFLT~        ³
301  ÀÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ
309

The full PIT source is available here. The first part of the line is a number which tells you what the line means. The rest is text with formatting commands identified by tilde characters. Here’s a summary of the formatting commands:

  • BGnn    Specify a background color (see colour table below) The default colour is BG99, and the colour should always be reset at the end of the line
  • FGnn    Specify a text color (see colour table below) The default colour is FG99, and the colour should always be reset at the end of the line
  • SBLD/EBLD    Start and end bolding
  • SUND/EUND    Start and end underlining
  • SBLK, EBLK    Blinking – most targets do not support blinking
  • PIpp    Pitch control – meaning unclear
  • FOff    Meaning unknown
  • DFLT set everything to default

For further details, consult the PIT doco link above.

Converting PIT to CDA Narrative

In principle, the conversion process is relatively straightforward:

  1. Strip the leading number and spaces
  2. replace the old table border characters if they appear
  3. replace the formatting commands with CDA equivalents (see below)
  4. wrap the content with a CDA paragraph with style xPre

In practice, several things complicate the conversion process:

  • PIT formatting commands are not always paired. (They’re supposed to be, but most systems will overlook such errors, so they’re often not detected by the sender)
  • the commands do not need to be well formed (i.e. start underlining, then start bold, then stop underlining, then stop bold)
  • the DFLT command resets everything
  • Limitations around the CDA handling of narrative and whitespace

Step #1: leading spaces

The first task is to strip the line prefix (a 3 numeral line code prefix followed by a space):

~SBLK~FINAL REPORT~EBLK~
 ------------
 Prothrombin Time    ~SBLD~ 17~EBLD~  seconds
 I.N.R.              ~SBLD~1.6~EBLD~  (International Normalised Ratio)
 __________________________________________________________________
 Cumulative INR Report       Date        INR        Req.No.
 08/06/2010     ~FG05~1.4~FG15~        254084
 14/06/2010     2.0        259680
 28/06/2010     2.5        272727
 05/07/2010     ~FG04~4.8~FG99~        279365
 12/07/2010     3.0        286035
 19/07/2010     1.6        292225~DFLT~ 

 ÚÄÄÄINRÄÄÄÂÄÄÄÄÄÄÄÄÄCONDITIONÄÄÄÄÄÄÄÄÄÄÄÄÄÄÂÄÄLENGTH OF THERAPYÄÄ¿
 ³ 2.0-3.0 ³ Atrial Fibrillation            ³    Long Term        ³
 ³ 2.0-3.0 ³ Bioprosthetic Valve            ³    3 months         ³
 ³ 2.0-3.0 ³ Acute Myocardial Infarction    ³ 3 months ( > if AF) ³
 ³ 2.0-3.0 ³ Cardioembolic CVA, Rec. DVT/PE ³    Long Term        ³
 ³         ³ Dilated Cardiomyopathy         ³                     ³
 ³ 2.0-3.0 ³ Venous Thrombosis and PE       ³    3-6 months~EUND~       ³
 ³ 2.5-3.5 ³ Mechanical Heart Valve         ³    ~SUND~Long Term~DFLT~        ³
 ÀÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÁÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ

Step #2: table border characters

Some PIT - not all - includes old ASCII tables based on old character sets from windows pages (or even DOS!). Old, but the systems sending PIT are old.

Here’s a lookup table for the old DOS OEM code page:

DOS Char Unicode Char
DA 250C
C4 2500
C2 252C
BF 2510
B3 2502
C0 2514
C1 2534
D9 2518

This table is missing some characters (for inner horizontal lines) - I’ll add them when I find them out (if I can).

After that, it looks like this in UTF-8. (I have no idea whether this will render in the blog for your browser, and I don’t know how to control how WordPress renders it):

~SBLK~FINAL REPORT~EBLK~
 ------------
 Prothrombin Time    ~SBLD~ 17~EBLD~  seconds
 I.N.R.              ~SBLD~1.6~EBLD~  (International Normalised Ratio)
 __________________________________________________________________
 Cumulative INR Report       Date        INR        Req.No.
 08/06/2010     ~FG05~1.4~FG15~        254084
 14/06/2010     2.0        259680
 28/06/2010     2.5        272727
 05/07/2010     ~FG04~4.8~FG99~        279365
 12/07/2010     3.0        286035
 19/07/2010     1.6        292225~DFLT~
 ┌───INR───┬─────────CONDITION──────────────┬──LENGTH OF THERAPY──┐
 │ 2.0-3.0 │ Atrial Fibrillation            │    Long Term        │
 │ 2.0-3.0 │ Bioprosthetic Valve            │    3 months         │
 │ 2.0-3.0 │ Acute Myocardial Infarction    │ 3 months ( > if AF) │
 │ 2.0-3.0 │ Cardioembolic CVA, Rec. DVT/PE │    Long Term        │
 │         │ Dilated Cardiomyopathy         │                     │
 │ 2.0-3.0 │ Venous Thrombosis and PE       │    3-6 months~EUND~       │
 │ 2.5-3.5 │ Mechanical Heart Valve         │    ~SUND~Long Term~DFLT~        │
 └─────────┴────────────────────────────────┴─────────────────────┘

Step #3: replace the format tags

In principle, the mapping table goes like this:

SBLD
EBLD </content>
SUND
EUND </content>
SBLK
EBLK </content>
FGnn where the HHHHHH is taken from the table below
FG99 </content>
BGnn where the HHHHHH is taken from the table below
BG99 </content>
PIpp  ignore
FOff ignore
DFLT  </content>

Comments:

  • There’s no support for blink in CDA/XHTML. Since the purpose of blink is to draw attention, we’ll have to settle forbold,underline, anditalic, andMaroon(see note on use of colour below). That should make it stand out.
  • Because the control codes can be interlaced, the easiest way to manage the conversion process is to keep a set of flags for the various format options, start the paragraph with a tag, and just lay down a each time the styling changes, and then finish with a . Not pure, but this is only for presentation. after all

Colour table:

n  name  html name  code
00  Black  Black  000000
01  Blue  Blue  0000FF
02  Green  Green  008000
03  Cyan  Cyan  00FFFF
04  Red  Red  FF0000
05  Magenta  Magenta  FF00FF
06  Brown  Brown  A52A2A
07  Light Grey  Light Grey  D3D3D3
08  Dark Grey  Dark Grey  A9A9A9
09  Light Blue  Light Blue  ADD8E6
10  Light Green  Light Green  90EE90
11  Light Cyan  Light Cyan  E0FFFF
12  Light Red  Salmon  FA8072
13  Light Magenta  Violet  EE82EE
14  Yellow  Yellow  FFFF00
15  White  White  FFFFFF

99 is the default colour, which means white for background, and black for text. However there’s no reason from the PIT specification that the colours shouldn’t be entirely backwards as would suit an old console display (I heard of that once, but I don’t know if it’s still ever done - so check. Just to make the point, in this example, FG15 means default, not white. Don’t get caught doing white on white or blank on black).

That gives the following content:

<content> </content><content styleCode="Bold Underline Italics xFgColour800000">FINAL REPORT</content><content>
 ------------
 Prothrombin Time    </content><content styleCode="Bold"> 17</content><content>  seconds
 I.N.R.              </content><content styleCode="Bold">1.6</content><content>   (International Normalised Ratio)
 __________________________________________________________________
 Cumulative INR Report       Date        INR        Req.No.
 08/06/2010     </content><content styleCode="xFgColourFF00FF">1.4</content><content>        254084
 14/06/2010     2.0        259680
 28/06/2010     2.5        272727
 05/07/2010     </content><content styleCode="xFgColourFF0000">4.8</content><content>        279365
 12/07/2010     3.0        286035
 19/07/2010     1.6        292225</content><content>
 ┌───INR───┬─────────CONDITION──────────────┬──LENGTH OF THERAPY──┐
 │ 2.0-3.0 │ Atrial Fibrillation            │    Long Term        │
 │ 2.0-3.0 │ Bioprosthetic Valve            │    3 months         │
 │ 2.0-3.0 │ Acute Myocardial Infarction    │ 3 months ( > if AF) │
 │ 2.0-3.0 │ Cardioembolic CVA, Rec. DVT/PE │    Long Term        │
 │         │ Dilated Cardiomyopathy         │                     │
 │ 2.0-3.0 │ Venous Thrombosis and PE       │    3-6 months</content><content>       │
 │ 2.5-3.5 │ Mechanical Heart Valve         │    </content><content styleCode="Underline">Long Term</content><content>        │
 └─────────┴────────────────────────────────┴─────────────────────┘

Step #4: wrap into CDA Narrative

This bit is easy: just a paragraph with style xPre:

  <text>
   <paragraph styleCode="xPre">
 <content> </content><content styleCode="Bold Underline Italics xFgColour800000">FINAL REPORT</content><content>
 ------------
 Prothrombin Time    </content><content styleCode="Bold"> 17</content><content>  seconds
 I.N.R.              </content><content styleCode="Bold">1.6</content><content>   (International Normalised Ratio)
 __________________________________________________________________
 Cumulative INR Report       Date        INR        Req.No.
 08/06/2010     </content><content styleCode="xFgColourFF00FF">1.4</content><content>        254084
 14/06/2010     2.0        259680
 28/06/2010     2.5        272727
 05/07/2010     </content><content styleCode="xFgColourFF0000">4.8</content><content>        279365
 12/07/2010     3.0        286035
 19/07/2010     1.6        292225</content><content>
 ┌───INR───┬─────────CONDITION──────────────┬──LENGTH OF THERAPY──┐
 │ 2.0-3.0 │ Atrial Fibrillation            │    Long Term        │
 │ 2.0-3.0 │ Bioprosthetic Valve            │    3 months         │
 │ 2.0-3.0 │ Acute Myocardial Infarction    │ 3 months ( > if AF) │
 │ 2.0-3.0 │ Cardioembolic CVA, Rec. DVT/PE │    Long Term        │
 │         │ Dilated Cardiomyopathy         │                     │
 │ 2.0-3.0 │ Venous Thrombosis and PE       │    3-6 months</content><content>       │
 │ 2.5-3.5 │ Mechanical Heart Valve         │    </content><content styleCode="Underline">Long Term</content><content>        │
 └─────────┴────────────────────────────────┴─────────────────────┘
   </content></paragraph>
  </text>

So we claim that this is all one paragraph. Strictly, it’s wrong to claim that all this is one paragraph with line breaks, but in practice, it doesn’t matter. You could spend days refining an algorithm for recognising the end of a paragraph (it’s never as simple as a double end of line), and who’s going to care? This is only ever going to be for display.

Note that we have used xPre, for text containing whitespace and carriage returns which may not be ignored. It would be possible to use xFixed, and add
tags for line breaks. But the problem with xFixed and
is that leading spaces - they will be lost, and you can’t use   to preserve them, since   is not defined for CDA. So xPre it is, and the conversion process has to be careful with whitespace. (Of course, in this example, the whitespace probably isn’t that big a deal, but it serves to make the point).

Conclusion

So there you go - that’s how to convert the PIT formatted report into a NEHTA CDA document without losing any formatting. Two things to note:

  • I’ve not dealt with the issue of line wrapping here - that’s just hard whether you’re displaying the PIT directly or whether it’s being displayed through CDA
  • these extra style codes starting with x are all defined in the NEHTA stylesheet, and aren’t in the normal HL7 stylesheet. Ensure that the documents will only be sent to NEHTA-conformant rendering systems for safety here.