#FHIR: Testing a new search mechanism
Sep 8, 2014The FHIR search mechanism is based on the HTTP parameter mechanism - a series of named parameters with values. This works quite well for simple cases, but we’ve pushed it pretty far, and there’s a few cases for which we’ve had to invent some pretty tricky hacks to get things done (modifiers!). Here’s an example of a moderately simple search: find all the observations for patient with a name including “peter” that have a LOINC code 1234-5: ```` GET [base]/Observation?name=http://loinc.org|1234-5&subject.name=peter
The [OData ](http://docs.oasis-open.org/odata/odata/v4.0/os/part1-protocol/odata-v4.0-os-part1-protocol.html#_Toc372793692)and [SCIM ](https://tools.ietf.org/html/draft-ietf-scim-api-10#section-3.2.2)specifications do search differently - a single http parameter "filter", with it's own search syntax. Following this pattern, the search would be:
GET [base]/Observation?filter=name eq http://loinc.org|1234-5 and subject.name co “peter”
That alternative syntax has different strengths and weaknesses - it's a little harder to get to to work (another syntax to deal with) but it's also more capable, particularly at expressing nested related searches, which is the limitation of the existing approach. On the other hand, the existing approach is amenable to making up html forms fairly simply, and that's not true of the filter based approach, so there's always a price to pay for complexity.
But since implementers regularly ask why we don't do search like OData/SCIM etc, I decided that for the connectathon this weekend, I'd add support for a _filter parameter ("_filter" not "filter" because of the way FHIR naming conventions work). Here's how what I've implemented works:
* A filter can be a logical one (x or x, or x and x, or not x)
* A filter can contain other filters in a set of parentheses : "()"
* A filter can be a test - path operation value, where operation is taken from the table below, and value is either a "true", "false", a JSON string, or a token (any sequence of non-whitespace characters, excluding ")" and "]". Values are never case sensitive
* A 'path' is a name, with chained searches done by name.name etc as per existing source. There can also be a filter: name[filter].name...
* The name is one of the defined search parameters that are used with the other search mechanism, with some special exemptions defined below.
Here's some example filters:
* Patient: name co "pet" - all patients with the characters "pet" in a given or family name
* Patient: given eq "peter" and birthdate ge 2014-10-10 - all patients with a given name of peter, born on or after 10-Oct 2014
* Observation: name eq http://loinc.org|1234-5 - all observations with the loinc code "1234-5"
* Observation: subject.name co "pet" - all observations on a patient with the characters "pet" in a given or family name
* Observation: related[type eq "has-component"].target pr true - all observations that have component observations (note: this uses one of the search parameters defined for this mechanism, see below)
* Observation: related[type eq has-component].target re Observation/4 - all observations that have Observation/v as a component
Note that the only difference between a "string" value and a "token" value is that a string can contain spaces and ) and ]. There is otherwise no significant difference between them.
Here's the formal grammar for the syntax:
filter = paramExp / logExp / (“not”) “(“ filter “)” logExp = filter (“and” / “or” filter)+ paramExp = paramPath SP compareOp SP compValue compareOp = (see table below) compValue = string / numberOrDate / token string = json string token = any sequence of non-whitespace characters (by Unicode rules) except “]” and “)” paramPath = paramName ((“[” filter “]”) “.” paramPath) paramName = ALPHA (nameChar)* nameChar = “_” / “-“ / DIGIT / ALPHA numberOrDate = DIGIT (DateChar)* dateChar = DIGIT / “T” / “-“ / “.” / “+” ````
Some additional notes about this:
- Logical expressions are evaluated left to right, with no precedence between “and” and “or”. If there is ambiguity, use parentheses to be explicit
- the compareOp is always evaluated against the set of values produced by evaluating the param path
- the parameter names are those defined by the specification for search parameters, except for those defined below
- the date format is a standard XML (i.e. XSD) dateTime (including timezone).
This table summarises the comparison operations available:
Operation | Definition |
eq | an item in the set has an equal value |
ne | An item in the set has an unequal value |
co | An item in the set contains this value |
sw | An item in the set starts with this value |
ew | An item in the set ends with this value |
gt / lt / ge / le | A value in the set is (greater than, less than, greater or equal, less or equal) the given value |
pr | The set is empty or not (value is false or true) |
po | True if a (implied) date period in the set overlaps with the implied period in the value |
ss | True if the value subsumes a concept in the set |
sb | True if the value is subsumed by a concept in the set |
in | True if one of the concepts in the set is in the nominated value set by URI, either a relative, literal or logical vs |
re | True if one of the references in set points to the given URL |
The interpretation of the operation depends on the type of the search parameter it is being evaulated against. This table contains those details:
Operation | String | Number | Date | Token | Reference | Quantity |
Eq | Character sequence is the same (case insensitive) | Number is the same incl same precision | Date is the same including same precision and timezone if provided | Token is the same, including namespace if specified (case insensitive) | n/a | Unit and value are the same |
Ne | (same)———————————————————————————————————————————— | |||||
Co | Character sequence matches somewhere (case insensitive) | An item in the set’s implicit imprecision includes the stated value | An item in the set’s implicit period includes the stated value | n/a | n/a | n/a? |
Sw | Character sequence matches from first digit (left most, when L->R) (case insensitive) | n/a | n/a | n/a | n/a | n/a |
ew | Character sequence matches up to last digit (right most, when L->R) (case insensitive) | n/a | n/a | n/a | n/a | n/a |
gt / lt / ge / le | Based on Integer comparison of Unicode code points of starting character (trimmed) (case insensitive) | Based on numerical comparison | Based on date period comparison per 2.2.2.3 | n/a | n/a | Based on numerical comparison if units are the same (or are canonicalised) |
pr | ||||||
po | n/a | n/a | Based on date period comparison per 2.2.2.3 | n/a | n/a | |
ss | n/a | n/a | n/a | Based on logical subsumption; potentially catering for mapping between tx | n/a | n/a |
sb | n/a | n/a | n/a | Based on logical subsumption; potentially catering for mapping between tx | n/a | n/a |
in | n/a | n/a | n/a | Based on logical subsumption; potentially catering for mapping between tx | n/a | n/a |
re | n/a | n/a | n/a | n/a | Relative or absolute url | n/a |
Notes:
- For token, the format is the same as the existing search parameter. For convenience, the codes “loinc”, “snomed”, “rxnorm” and “ucum” are predefined and can be used in place of the full namespace
Additional Parameters
I needed to define some additional parameters in order to get this to work well. This table summarises the search parameters I added:
Resource Type | Parameter Name | Children |
Observation | related | target = related-target |
Type = related-type |Group|characteristic|value = value
code = characteristic |DocumentReference|relatesTo|code = relation
target = relatesTo |DiagnosticOrder|event|status = event-status date = event-date |DiagnosticOrder|item|status = item-status
code = item-code
site = bodysite
event = item-event |DiagnosticOrder|item-event|status = item-past-status date = item-date actor = actor
Explanation:
- Any time these names are used in a parameter, they must have a filter and a chained name under them
- The first column is the resource type against which this name can be used
- The second column is the name that is used
- The third column defines the names that can be used in the chained parameter, and in the filter, and shows which existing search parameters they equate to
- For example, you could search on Observation for related[type eq has-component].target re url. “type” here refers to the search parameter “related-type”, and “target” to the search parameter “related-target”. Note that the names are not always aligned like this - FHIR itself may be revised to make it so (a gForge task already exists to do so)
Implementation
This is implemented on my server at http://fhir.healthintersections.com.au, though the implementation has got some things that aren’t done yet (none of the really cool operations in “ss”, “sb” or “in”, for instance). Hopefully I’ll be able to fill some of these out by this weekend.
I’ve done this so that we can get a feel for how this approach works. It may become a candidate to add to the specification, either in addition to the existing search, or instead of it. I’m dubious about the second option though - what I’ve discovered about this search is that it isn’t degenerate, like the existing search - either you implement all the search, or you return an http error code. The existing search, you just add parameters in case they apply. That’s pretty useful, actually.