Question: Intervals and Boundary Imprecision

Nov 20, 2011

Introduction

This page addresses a long standing issue in the HL7 v3/CDA community about the impact of imprecision on boundaries on the meaning of an interval. Specifically, if an interval is given as from 20100404 to 20100406, is 10:30 am on 6-Apr 2010 in the interval or not?

Some people claim it should be, that 201004061030 is “in” the value of 20100406, and as long as 20100406 is in the boundary, so is 201004061030. Other people claim that no, although the boundary does have imprecision, it has to be ignored when determining what values are in set specified the interval

Executive Summary: The answer is the second - imprecision is not considered on the boundaries of intervals, and 201004061030 is not in the interval from 20100404 to 20100406.

This page explains the reasoning in some detail, and clarifies some apparent ambiguity in the specifications. This discussion applies equally to R1 and R2. In addition, this page documents a discovered issue in the R2 abstract specification which will probably result in a technical correction by HL7.

Background

The type IVL is defined in the V3 Abstract data types as a specialization of QSET where T can be any kind of quantity. The two kinds of quantities normally encountered in the real world a PQ (physical quantity - a floating point value with a coded unit) and TS (Timestamp - an instant in time with specified imprecision)

A QSET is some specification of an ordered set of values that specifies which value are in the set, and which are outside the set. One simple way to specify a QSET is to specify it as a simple interval - all the values between [low] and [high] are included in the set, and values outside that range are not included.

IVL has other properties than low and how. The properties lowClosed and HighClosed specify whether the boundaries themselves are actually included in the set of values that are in the interval. For instance, you can specify that the interval includes all the values from 2 to 5, **but not including 5**. Of course, if the interval can only contain integers, that's not tremendously useful - it's not different from the interval from 2 to 4. But if the type that the interval is describing has a continuous distribution range - floating point numbers and times - then this is useful and important.

The Abstract data types specification also describes a literal form, which is a textual presentation of the interval. Multiple literal forms are defined; in this discussion we only use the simple first form, the interval form using square brackets, e.g., “[3.5; 5.5[”; (where the square brackets denote whether the interval is closed or not. Pointing in means closed, pointing out means not closed). i.e. [3.5; 5.5[ means all the numbers from 3.5 to 5.5, not included 5.5 itself. Note that we also use the hull form below (discussed later). Note: The rest of the details of IVL are not explored further here. The rest of this discussion assumes that the features and usage of IVL are relatively well understood by the reader. See where can I get information about the datatypes?

Discussion

Although IVL and IVL are not often encountered in real world usage, they are the easiest place to start the discussion.

IVL

The simplest case is an interval of integers. The meaning of [3; 5] is very clear: the numbers 3, 4 and 5 are included in the interval. There is no question of the imprecision of the boundary, since integers are discretely separated from each other.

In the abstract specification, formal invariants are used to establish meaning- they are the master definition of meaning. The meaning of the boundary of an interval is defined this way. For the simple case of integer, we’ll illustrate how this works, since we’ll be relying on these later.

invariant(IVL<T> x; T e) where x.nonNull.and(x.contains(e)) { x.low.lessOrEqual(e); x.low.nullFlavor.implies(NullFlavor.PINF).not; }; 

Note: In this discussion we’ll focus exclusively on the low boundary; the exact argument applies to the high boundary (the invariant chain is simpler for the low boundary).

This invariant says that if the interval is not null, then it contains any value e if and only if low is less than or equal to e. If the interval is null - well, we make no rules. Note that we haven’t said that a non-null interval must have a non-null low property - only that if low is null, we cannot know whether the interval contains any particular value: since x.low.lessOrEqual(e) cannot be true for any value of e, neither can x.contains(e) (though we may be able to establish on other grounds (i.e. high boundary) that the interval does not contain e). Note: the invariant says that if x contains e, then x.low <= e. it doesn’t say that if x.low <= e, then x contains e - it’s important to keep track of what implies what.

The meaning of lessOrEqual for integer is defined on QTY:

invariant (QTY x, y, z) where x.nonNull.and(y.nonNull).and(z.nonNull) { x.lessOrEqual(x); /* reflexive */ x.equal(y).not.implies(x.lessOrEqual(y) .implies(y.lessOrEqual(x)).not); /* asymmetric */ x.lessOrEqual(y).and(y.lessOrEqual(z)) .implies(x.lessOrEqual(z)); /* transitive */ };

The lessOrEqual operation must be reflexive, assymmetric, and transitive (follow the links from this page on wikipedia for reasoning). This invariant doesn’t define how you determine what <= is (that’s done in text), but it does define how it behaves, and therefore what it means. The most interesting part for the rest of this discussion is the second one: if x != y, then if x < y then y > x. Not that we use implies. If x = y, we say nothing here (that’s said elsewhere). if x != y and not (x < y) then we don’t say whether x > y - why? Because x and y may not be “comparable”. Obviously integers, reals, etc, always are, but you can’t talk about whether 12g is less than 14m or not. However if we can compare them, and x != y and x < y, then it also must be true that y > x.

So, in an interval of [3; 5], 3 is in the interval, because 3 <= 3, but 2 is not in the interval because 3 <= 2 is not true.

Well, wow, you say, that bit about invariants was a waste of time. And for integer, it pretty much was - they’re simple beasts. But don’t skip it - we’ll be coming back to these below, and then they will start to become useful.

IVL

An interval of real introduces a two new considerations:

  • unlike integers, which are discrete (you can always tell them apart), real numbers do not behave like this. What’s the next value after 4? This has no answer.
  • In addition, real numbers have a precision, which specifies the number of significant digits to which the actual value is represented. The inherent notion of precision is that the actual value may differ slightly from the represented value beyond the specified precision

Operations and precision

Given that real numbers have precision, what impact does this have on operations? In mathematical operations, the precision of the number is combined. In multiplication/division, the precision of the outcome is generally the lower of the two precisions. For instance, 4.0 * 2.000 is 8.0, not 8.000. With addition, it’s more complicated: 4.0 + 0.200000 is 4.2, not 4.200000 . But what is 4.0 + 0.0000001? Intuitively, it’s 4.0, so that x + y = x… so actually, the precision isn’t part of the answer: 4.0 + 0.000001 is 4.000001 but the precision is still 2. (todo: follow up on this)

What about comparison? is 4.0 = 4.0000? Clearly, as stated, these numbers are different in intent. 4.0 represents an implicit boundary from 3.95 to 4.05, while 4.0000 represents an implicit boundary from 3.99995 to 4.00005. But are they equal? Well, the specification says:

Two nonNullREALare equal if they have the same value and precision.

This text was added as part of defining equality unambiguously for all data types (wiki page with discussion).

Firstly, a clarification: the correct inference from the rule “Two nonNull REAL are equal if they have the same value and precision” is that

 (4.0).equals(4.000).isNull

That was certainly my intent when I wrote that rule, but it didn’t get stated.

But is this notion that REAL values with different precision are not equal actually right?

Unfortunately, No.

Let’s start with an invariant associated with isComparableTo:

invariant (QTY x, y, z) where x.nonNull.and(y.nonNull) { x.isComparableTo(y).equal(x.lessOrEqual(y).or(y.lessOrEqual(x))); };

So if x and y can be compared, then they must be equal, or one less than the other. Therefore either 4.0 and 4.0000 are comparable and equal, or not comparable. And note that this invariant is equals, not implies, so that it follows that if x < y, then x.isComparableTo(y) is true. So if 4.0 != 4.0000, (3.8 < 4.00).not, since they cannot be compared - but no, 3.8 is definitely less than 4.00. Clearly there’s a tension here, and one of those invariants is wrong, or the rule that REALS must have the same precision to be equal is wrong.

To add to this, when we go back to the invariants for QSET, we have this:

invariant(QSET<T> s) where s.nonNull { forall(QTY x, y) where s.contains(x).and(s.contains(y)) { x.isComparableTo(y); }; };

This is relatively simple, and perfectly reasonable: all members of a nonNull QSET must be comparable. You can’t have a valid QSET that contains 5 m and 4g. It doesn’t make sense, and it’s not on. So, if 4.0 != 4.0000, then an interval [3.5; 5.5] cannot contain the value 4.00. But it obviously does and must. So the inevitable conclusion is that 4.0 = 4.000, and that precision cannot be a factor in testing the equality of REAL values - and therefore the rule is wrong. Note: we could alternately claim that the correct interpretation of equality for a REAL is to consider precision, and to say that 4.0 implies an implicit interval of 3.95 to 4.05, and that the implicit interval implied by 4.0000 is clearly within that boundary, so clearly 4.0000 is equal to 4.0. The problem is that under this scheme, 4.0 is not equal to 4.000, since 4.0 implies a possible value outside that boundaries of that implied by 4.0000. And Equality must be symmetric (follow the links from this page on wikipedia for reasoning). So this can’t be the answer (though an equivalent of “implies” would be a logical addition to REAL, because (4.0).implies(4.0000) and (4.0000).implies(4.0).not, and this is perfectly sensible).

This will be brought to HL7 as a technical correction to the R2 specification, to wit, that the equality rule should say: “Two nonNull REAL are equal if they have the same value irrespective of precision”. (Some additional example and discussion material should also be added)

Having established that precision cannot count for equality, it’s a straight forward conclusion that it can’t count on the border of an interval either. Given the rule:

invariant(IVL<T> x; T e) where x.nonNull.and(x.contains(e)) { x.low.lessOrEqual(e); x.low.nullFlavor.implies(NullFlavor.PINF).not; };

Value e can only be in interval x if e is lower than it. 2.99995 < 3.0, so it is not in the interval. We can say this with confidence because if the comparison of e and low cannot be null just because they are equal with different precisions, then the comparison cannot be null if their values are close with different precisions. (And even if the comparison was null, all we could say is “we don’t know whether they are in the interval”)

So, the interval [3.5;5.5[ does contain the values 4, 4.0, 4.0000000000000000000000000, 3.5, 3.5000000, 3.500000000000000001, 5.49, and 5.49999999999999999999999999999999999, but not the values 3.49999999999999999999, 5.51 or 5.50000000000000000000000.

IVL

The situation is the same for IVL - other than the fact that the units must all have the same canonical form in UCUM (to make x.isComparesTo(y) true), the behaviour of IVL with regard to boundaries is based on the value of PQ, which is a REAL.

IVL

TS differs from REAL in that the precision is not equally distributed around the stated value. Instead, it starts at the stated value, and goes to the end of the implied period. To illustrate this, a value of 5.1 implies 5.05 to 5.15, equally distributed around 5. On the other hand, the TS value of 20100404 implies the day 4-Apr 2010, and the implicit time is from 00:00 to 23:99 on that day (actually, [201004040000;20100405000[)

Other than this fact, the situation with regard to TS is the same as that with regard to REAL, and for exactly the same reasons: precision is not counted.

Of course, because of the way that the TS imprecision is distributed, the low boundary is not the interesting case, it’s the high boundary; Given an interval of [20100404;20100406], is 10pm on the 6-April in that interval? A careless reading of the interval - from the 4th to the 6th of April would imply that it is. But it isn’t, for the reasons described above. The interval [20100404;20100406] is not (from the 4th to the 6th, but from the start of the 4th to the start of the 6th).

TS must be the same as REAL because precision cannot count towards the comparisons, either the equality, or the lessOrEqual, or the greatorOrEqual. So when the R2 abstract Specification says that for TS:

“Two nonNullTSare only equal if they have the same precision”

invariant(TS x, y) where x.nonNull.and(y.nonNull) { x.equal(y).equal(x.offset.equal(y.offset)).and(x.precision.equal(y.precision)); };

This is the same error as for REAL, and will be part of the technical correction discussed above. The invariant should say:

invariant(TS x, y) where x.nonNull.and(y.nonNull) { x.equal(y).equal(x.offset.equal(y.offset)); };

and therefore 20100404 = 20100404000000.000

TS redefines lessOrEqual in R2. I’m the editor, and I can’t say that there’s any coherence in that redefinition at all. The definition is non-sensical in parts - a copy/paste error, and wrong where it differs from the definition of lessOrEqual on QTY, in that it says:

” The outcome of lessOrEqual between two TS is NULL unless they have the same precision”

This is wrong, for the same reasons as the equality tests on REAL and TS as discussed above.

Even worse is this invariant:

invariant(TS x, y) where x.nonNull.and(y.nonNull) { x.lessOrEqual(y).nonNull.implies(x.offset.equal(y.offset)); };

This is an outright typo. It should say, x.lessOrEqual(y).nonNull.implies(x.precision.equal(y.precision)), but as we have discussed, even that would be wrong. This whole section (QTY.lessOrEqual) should be removed in the technical correction - it doesn’t say anything useful at all, even when corrected.

The Hull Literal Form

Much of the confusion around this area comes the existence of the hull literal form, and some careless language associated with it’s definition. Quoting from the Abstract specification (same in R1 and R2):

Example: May 12, 1987 from 8 to 9:30 PM is “[198705122000;198705122130][198705122000;198705122130]”.NOTE: **The precision of a stated interval boundary is irrelevant for the interval. One might wrongly assume that the interval “[19870901;19870930]” stands for the entire September 1987 until end of the day of September 30. However, this is not so!, The proper way to denote an entire calendar cycle (e.g., hour, day, month, year, etc.) in the interval notation with is to use an open high boundary. For example, all of September 1987 is denoted as “[198709;198710[”.The “hull-form” of the literal is defined as the convex hull (see IVL.hull) of interval-promotions from two time stamps. For example, “19870901..1987093019870901..19870930” is a valid literal using the hull form. The value is equivalent to the interval form “[19870901;19871001[[19870901;19871001[**”.

Though the note in the quote above agrees with this document in regard to the interpretation of an interval, it’s unclear because the statement is not clear about whether this note concerns the interpretation of the interval, or just that particular literal form. The waters are further muddied by the comment immediately after regarding the definition of the hull form, where the interpretation of the literal form is dependent on the boundary precision.

So, to clarify: the Hull literal form is **not **a simple interval: it’s the convex hull of two intervals implied by the imprecision of the stated boundaries. The literal hull is not actually an interval. It’s a QSCH<IVL> where QSCH is QSetConvexHull - a type that we missed defining in R2 (and will add in R3) - and which will have a DSET of sets as it's operands (probably).

Since we are having a technical correction, we will clarify the uncertainty introduced by this definition of the literal hull at the same time, by being more explicit that the note concerns the definition of Interval, not the literal, and making more of the fact that the hull form is a convex hull of intervals, not an interval itself.

Status

This page is awaiting final approval by MnM (HL7 committee).