Design by Constraint - not as useful as people think (#2)

Apr 30, 2011

This post is part of the Design by Constraint series (first post) **More structured Representations**

OCL

An english representation of the constraint (as found in the previous post) is good, but a structured representation is even better, because this can be leveraged in various ways.

UML provides OCL (Object Constraint Language) as a means for making the constraint against the reference model. Here’s the same set of constraints in OCL:

context Business
 inv: isFamilyBusinessRestaurant
 def let FamilyBusinessRestaurant : Boolean =
  franchise.isNull and
  address->size() >= 3 and address->size() <= 4 and
  corporation.isNull and person->size() > 2 and
  person->size() <= 7 and
  person->collect(isAccountant)->size() = 1 and
  person->forAll(isAccountant or isFamily)

-- how to express "address order matters"?

context Person
  def let isAccountant : Boolean = role = "ACC" and
   corporation.size->() = 0 and sex.isNull and id.isNull
  -- how to express "taxId is renamed "businessId"?

  def let isFamily : Boolean = role = "EMP" and
    id = null and taxId = null and
    corporation.size->() = 0

This OCL could be converted code, and used to check that a particular instance of classes from the reference model conforms to these constraints and is therefore a proper “Family Business Restaurant”. That’s good and useful: a black box that gives a yes/no answer. (What isn’t so useful is that the error messages from an OCL engine in the case of failure are beyond cryptic and obtuse.)

But there’s a problem with OCL: that there’s no way to transform a set of OCL constraints to software modules that know how to produce valid instances, to classes that a compiler can use to force the programmer to build the instance properly.(It’s arguably possible, but certainly extremely difficult - a research project for a big multinational with a 3 letter name, perhaps).

Table Syntaxes

Because of this, various forms of tabular representation are popular. Here’s an example table:

Attribute Cardinality Null Notes
name 0..1 No no constraint
franchise 0..0 Yes  
address 3..4 Optional order matters
corporation 0..0 Yes  
accountant 1..1 No Person “Accountant”
family 2..6 No Person “Employeee”

These tables are widespread because it’s relatively easy to use table forms like these to generate useful code.

And that’s a problem with the table form - there’s a lot of subtle interactions between columns (cardinality, collections, and empty values are a big source of subtlety), and variations in how patterns are linked together between tables. In addition, there’s all sorts of interesting extensions to the tables to deal with things like co-occurrence constraints.

As a consequence, there’s a lot of different table syntaxes. XML schema is one form of table presentation that constrains the XML reference model (elements, attributes, text, etc). As a rule, syntax doesn’t matter - syntaxes are interconvertable. It’s the semantics underlying the syntax that make it hard to interconvert between forms. So there’ a lot of different forms, each with slightly different meaning and limitations.

ADL

ADL (Archetype description language) is a formally defined constraint language purpose built for constraining reference models. It’s easy to write an ADL statement that is equivalent to the OCL statement above, and you can use this to generate code

definition
	RESTAURANT[at0001] matches { -- family restaurant
		franchise occurrences matches {0}
		address cardinality matches {3..4; ordered} matches {*}
		corporation occurrences matches {0}
		employees matches {
			PERSON[at0002] occurrences matches {1} matches { -- the accountant
				role matches {[at0020]}		-- ACC code
				id occurrences matches {0}
				sex occurrences matches {0}
				corporation occurrences matches {0}
			}
			PERSON[at0003] occurrences matches {2..6} matches {	-- family
				role matches {[at0021]}	-- EMP code
				id occurrences matches {0}
				taxId occurrences matches {0}
				corporation occurrences matches {0}
			}
		}
	}

ontology
	term_definitions = <
		["en"] = <
			[at0001] = <
				text = <"family restaurant">
				description = <"some longer description of family restaurant">
			>
			[at0002] = <
				text = <"accountant">
				description = <"some longer description of accountant">
			>
			[at0003] = <
				text = <"family">
				description = <"some longer description of family">
			>
			[at0020] = <
				text = <"ACC code">
				description = <"some longer description of ACC code">
			>
			[at0003] = <
				text = <"EMP code">
				description = <"some longer description of EMP code">
			>
		>
	>

Note: Tom Beale wrote the ADL for me (Thanks very much, Tom). Tom also noted that in ADL (like most of the other constraining frameworks discussed), it’s wrong to rename the taxId to businessId; or else the reference model would need to vamped up to allow for it.

But ADL has it’s limitations. You can’t do anything you can do in OCL with ADL (ADL includes an expression syntax quite like OCL for the kind of things that can’t be built into useful software).

Aside: HL7 v3 static models and ADL are tightly related. You can think of v3 static models as a graphical equivalent to ADL. They are nearly interconvertable - there’s a few semantic differences (ADL does binding between models with one less level of re-direction; Static models allow choices at the root, and ADL doesn’t allow this; static models to terminology binding differently, etc). I’ll explore that similarity later in this series of posts.

Class Model

Or, you can say that really, the constraints aren’t very interesting. What would be really interesting - simple, easy to work with, and clear - would be a simplified class model that just describes the outcome of applying the constraints to the reference model. Like this:

That’s pretty simple - a class model you can do the normal things with. But what’s the relationship between these two class models?

Well, you can call the reference model a metamodel, and think of Restaurant as a metaclass, and FamilyBusinessRestaurant is an instance of this class. And this the first response of all the OMG-type people who look at the diagram above. But it’s not - we originally said that the reference model was a class model that is intended to be treated as a PIM.

The relationship between these two models is summarized by this diagram:

There’s a reference model (UML Class model), and a constraint specification (some other syntax). You perform a transform that applies the constraint to the reference model, and produces a new constrained model (also a UML Class model).

The pattern is allowed to recurse in some contexts - the output constrained class model can be treated as a reference model in a new cycle. (Sometimes the constraint languages are different: RIM -> CDA -> Implementation Guide; and sometimes they are they same: RIM -> (Static Model)n -> Schema)

This is design by constraint - widely use in healthcare.

Note that the transform is usually not reversible - the process of applying the constraint to the reference model is destructive.

Note also that the v3 static diagrams are neither wholly constraint model, nor wholly transformed model - they are a half hybrid. They are primarily anexpression of the constraint, but they have the form of the constrained model.

Design by Contract

We should note at this point that this a form of design by contract. The notion underlying design by contract is simple: a callable procedure/routine/function/operation/method etc or some other defined context takes a somewhat general parameter type - integer, or string, or some kind of class. Because the type allows a wider set of values (value domain) than the method can properly deal with, the context of use constrains the value domain to the set of values that are allowed (hopefully in some computable fashion, though it’s not unusual to find old 3GL code laced with comments capturing this kind of information).

All that we doing here is design by contract with complex and deep parameter types. A generic set of classes that have a wide application are used in particular contexts, and we want to document their limitations in a useful way. It’s that simple - but OMG hasn’t given us a proper computable framework for this problem.

Capturing Design By Contract in UML

So let’s see how far we can get in UML with this approach.

Here the constrained model has been marked up with stereotypes and property strings that capture the reference model links in the UML model. For instance, the FamilyBusinessRestaurant has the stereotype “Restaurant” which signifies that it’s a constraint on the class “Restaurant” in the reference model. There’s a lot more stereotypes and property strings than the visible ones, but the visible ones serve to show the purpose.

This is all fine - and in fact, there’s a UML profile for HL7 v3 that formalises this stuff. But no matter how “proper” a UML profile is, in in the end the UML tools don’t know what the stereotypes and property strings actually mean, and how they influence code generation, then what use is it? (In fact, it’s worse than that, as we’ll see below, because “how they influence code generation” turns out to be an impossible problem.) In order to get tooling support, we need to first get OMG to endorse the notion of design by contract, and publish an appropriate UML profile, one that has good engineering support.

Please OMG….

Instance Behaviour

But before we jut punt the problem over to the OMG, let’s ask what it would mean for instances, this design by constraint thing. After all, standards bodies etc might publish models, but implementers work with instances. And the it turns out that the instances have quite a bit to say about this design by constraint pattern. But I’ll leave this for the next post.

Next: Instances, designed by constraint

 

Attribute Cardinality Null Notes
name 0..1 No no constraint
franchise 0..0 Yes  
address 3..4 Optional order matters
corporation 0..0 Yes  
accountant 1..1 No Person “Accountant”
family 2..6 No Person “Employeee”