Reader vs Writer

Jun 19, 2011

There’s a a trade-off between reader and writer in most interoperability specifications, though it’s not always widely appreciated. What suits the reader is sometimes the exact opposite of what suits the writer. As an example, let’s consider exchanging a patient address. Generally speaking, in English language countries, an address has a two lines of text, a city, a postcode (usually), a country, and sometimes a state or province. Around the world, some countries do not have post codes, some have states or provinces, and the amount of structure in the other lines and the usual order of the parts varies wildly. There are a variety of approaches to handling addresses across different systems. Some systems simply treat the address as four lines of text; many systems pull out the city, state and postcode separately in order to either validate them or allow their use for statistical analysis. Usually such systems have to create a field for country too, though this does not always happen. Recently the postal services have started focusing on improved data handling to support high-volume automated solutions for postal delivery, and different countries have created different structured definitions for the street and delivery details . Given the amount of mail that healthcare organizations send, there has been strong interest in adopting these solutions.

Now, consider the general concept of defining a format so that a variety of systems can exchange addresses with each other. The first question is, what are we trying to achieve? Briefly, there are five main reasons for exchanging addresses:

  1. To allow the use of address to assist with identifying persons
  2. So physical mail can be sent to the person/organization associated with the address
  3. So physical mail can be sent to the person/organization associated with the address using high volume mailing methods
  4. To allow a correct billing address to be used as part of a financial exchange
  5. To allow for address based data analysis (usually be post code, but there are public-health monitoring applications that do more fine-grained analysis – though GPS coordinates are more useful for something like this, they are mostly not available)

Here are a few example choices for how we could represent an address:

  1. As plain text with line breaks Suitable for uses #1 - #2, maybe #4 (depending on the the financial standard)
  2. 2 lines of text, city, state, post code and country Suitable for uses #1#2, #4 and #5
  3. A data element for each known part. Suitable for all uses
  4. A series of parts with a type code. Suitable for all uses

Each of these approaches has advantages and disadvantages for the different use cases. but this is not the focus of this post. Now, imagine that we have three different systems:

  • System A Stores an address as 4 lines of plain text
  • System B Stores an address as 2 lines of text, with city, state and postcode
  • System C Stores an address as a set of data elements

This table summarizes how easy it is for each system to be the reader and writer for each structure:

  Structure #1 Structure #2 Structure #3 Structure #4
Writing        
System A Easy: Straight in Very hard: try to parse city/state/code Too hard? Too hard?
System B Easy. Convert to Text Easy. Straight In Very hard: try to parse lines of text? Very hard: try to parse lines of text?
System C Easy. Convert to Text Easy. Convert parts to text Easy – just fill data elements out (as long as parts match) Easy – just create parts
Reading        
System A Easy. Straight in Easy. Convert to text Easy. Convert to text Easy. Convert to text
System B Very hard: try to parse city/state/code? Easy. Straight In Easy. Read parts and build first two lines Easy. Read parts and build first two lines
System C Too hard Very hard: try to parse lines of text? Read Elements into parts Read Elements into parts (lose order)

The effects of entropy are clearly visible in this table: introducing structure is far harder than removing it. A more structured model is easier to read and harder to write. Likewise, more choices about how things are written makes it easier to write and harder to read. The key point of this table is that the different systems have different trade-offs for whether reading or writing addresses is easy.

If you sit representatives of these different systems down to hammer out a format they’re going to use to exchange data, they’re each going to want something different; who wins will be as much about personalities and influence as is it about the right choice (see 1st and 2nd laws of interoperability).