#FHIR: Testing a new search mechanism

Sep 8, 2014

The FHIR search mechanism is based on the HTTP parameter mechanism - a series of named parameters with values. This works quite well for simple cases, but we’ve pushed it pretty far, and there’s a few cases for which we’ve had to invent some pretty tricky hacks to get things done (modifiers!). Here’s an example of a moderately simple search: find all the observations for patient with a name including “peter” that have a LOINC code 1234-5: ```` GET [base]/Observation?name=http://loinc.org|1234-5&subject.name=peter

The [OData ](http://docs.oasis-open.org/odata/odata/v4.0/os/part1-protocol/odata-v4.0-os-part1-protocol.html#_Toc372793692)and [SCIM ](https://tools.ietf.org/html/draft-ietf-scim-api-10#section-3.2.2)specifications do search differently - a single http parameter "filter", with it's own search syntax. Following this pattern, the search would be:

GET [base]/Observation?filter=name eq http://loinc.org|1234-5 and subject.name co “peter”

That alternative syntax has different strengths and weaknesses - it's a little harder to get to to work (another syntax to deal with) but it's also more capable, particularly at expressing nested related searches, which is the limitation of the existing approach. On the other hand, the existing approach is amenable to making up html forms fairly simply, and that's not true of the filter based approach, so there's always a price to pay for complexity.

But since implementers regularly ask why we don't do search like OData/SCIM etc, I decided that for the connectathon this weekend, I'd add support for a _filter parameter ("_filter" not "filter" because of the way FHIR naming conventions work). Here's how what I've implemented works:
* A filter can be a logical one (x or x, or x and x, or not x)
* A filter can contain other filters in a set of parentheses : "()"
* A filter can be a test - path operation value, where operation is taken from the table below, and value is either a "true", "false", a JSON string, or a token (any sequence of non-whitespace characters, excluding ")" and "]". Values are never case sensitive
* A 'path' is a name, with chained searches done by name.name etc as per existing source. There can also be a filter: name[filter].name...
* The name is one of the defined search parameters that are used with the other search mechanism, with some special exemptions defined below.

Here's some example filters:
* Patient: name co "pet" - all patients with the characters "pet" in a given or family name
* Patient: given eq "peter" and birthdate ge 2014-10-10 - all patients with a given name of peter, born on or after 10-Oct 2014
* Observation: name eq http://loinc.org|1234-5 - all observations with the loinc code "1234-5"
* Observation: subject.name co "pet" - all observations on a patient with the characters "pet" in a given or family name
* Observation: related[type eq "has-component"].target pr true - all observations that have component observations (note: this uses one of the search parameters defined for this mechanism, see below)
* Observation: related[type eq has-component].target re Observation/4 - all observations that have Observation/v as a component

Note that the only difference between a "string" value and a "token" value is that a string can contain spaces and ) and ]. There is otherwise no significant difference between them.

Here's the formal grammar for the syntax:

filter = paramExp / logExp / (“not”) “(“ filter “)” logExp = filter (“and” / “or” filter)+ paramExp = paramPath SP compareOp SP compValue compareOp = (see table below) compValue = string / numberOrDate / token string = json string token = any sequence of non-whitespace characters (by Unicode rules) except “]” and “)” paramPath = paramName ((“[” filter “]”) “.” paramPath) paramName = ALPHA (nameChar)* nameChar = “_” / “-“ / DIGIT / ALPHA numberOrDate = DIGIT (DateChar)* dateChar = DIGIT / “T” / “-“ / “.” / “+” ````

Some additional notes about this:

Logical expressions are evaluated left to right, with no precedence between “and” and “or”. If there is ambiguity, use parentheses to be explicit
the compareOp is always evaluated against the set of values produced by evaluating the param path
the parameter names are those defined by the specification for search parameters, except for those defined below
the date format is a standard XML (i.e. XSD) dateTime (including timezone).

This table summarises the comparison operations available:

Operation	Definition
eq	an item in the set has an equal value
ne	An item in the set has an unequal value
co	An item in the set contains this value
sw	An item in the set starts with this value
ew	An item in the set ends with this value
gt / lt / ge / le	A value in the set is (greater than, less than, greater or equal, less or equal) the given value
pr	The set is empty or not (value is false or true)
po	True if a (implied) date period in the set overlaps with the implied period in the value
ss	True if the value subsumes a concept in the set
sb	True if the value is subsumed by a concept in the set
in	True if one of the concepts in the set is in the nominated value set by URI, either a relative, literal or logical vs
re	True if one of the references in set points to the given URL

The interpretation of the operation depends on the type of the search parameter it is being evaulated against. This table contains those details:

Operation	String	Number	Date	Token	Reference	Quantity
Eq	Character sequence is the same (case insensitive)	Number is the same incl same precision	Date is the same including same precision and timezone if provided	Token is the same, including namespace if specified (case insensitive)	n/a	Unit and value are the same
Ne	(same)————————————————————————————————————————————
Co	Character sequence matches somewhere (case insensitive)	An item in the set’s implicit imprecision includes the stated value	An item in the set’s implicit period includes the stated value	n/a	n/a	n/a?
Sw	Character sequence matches from first digit (left most, when L->R) (case insensitive)	n/a	n/a	n/a	n/a	n/a
ew	Character sequence matches up to last digit (right most, when L->R) (case insensitive)	n/a	n/a	n/a	n/a	n/a
gt / lt / ge / le	Based on Integer comparison of Unicode code points of starting character (trimmed) (case insensitive)	Based on numerical comparison	Based on date period comparison per 2.2.2.3	n/a	n/a	Based on numerical comparison if units are the same (or are canonicalised)
pr
po	n/a	n/a	Based on date period comparison per 2.2.2.3		n/a	n/a
ss	n/a	n/a	n/a	Based on logical subsumption; potentially catering for mapping between tx	n/a	n/a
sb	n/a	n/a	n/a	Based on logical subsumption; potentially catering for mapping between tx	n/a	n/a
in	n/a	n/a	n/a	Based on logical subsumption; potentially catering for mapping between tx	n/a	n/a
re	n/a	n/a	n/a	n/a	Relative or absolute url	n/a

Notes:

For token, the format is the same as the existing search parameter. For convenience, the codes “loinc”, “snomed”, “rxnorm” and “ucum” are predefined and can be used in place of the full namespace

Additional Parameters

I needed to define some additional parameters in order to get this to work well. This table summarises the search parameters I added:

Resource Type	Parameter Name	Children
Observation	related	target = related-target

Type = related-type |Group|characteristic|value = value

code = characteristic |DocumentReference|relatesTo|code = relation

code = item-code

site = bodysite

event = item-event |DiagnosticOrder|item-event|status = item-past-status date = item-date actor = actor

Explanation:

Any time these names are used in a parameter, they must have a filter and a chained name under them
The first column is the resource type against which this name can be used
The second column is the name that is used
The third column defines the names that can be used in the chained parameter, and in the filter, and shows which existing search parameters they equate to
For example, you could search on Observation for related[type eq has-component].target re url. “type” here refers to the search parameter “related-type”, and “target” to the search parameter “related-target”. Note that the names are not always aligned like this - FHIR itself may be revised to make it so (a gForge task already exists to do so)

Implementation

This is implemented on my server at http://fhir.healthintersections.com.au, though the implementation has got some things that aren’t done yet (none of the really cool operations in “ss”, “sb” or “in”, for instance). Hopefully I’ll be able to fill some of these out by this weekend.

I’ve done this so that we can get a feel for how this approach works. It may become a candidate to add to the specification, either in addition to the existing search, or instead of it. I’m dubious about the second option though - what I’ve discovered about this search is that it isn’t degenerate, like the existing search - either you implement all the search, or you return an http error code. The existing search, you just add parameters in case they apply. That’s pretty useful, actually.