#FHIR and Character encoding in URLs

Aug 23, 2016

One issue that is causing confusion for FHIR implementers is the question of what characters need to be escaped in an http: URL. The general shape of an http: url is ```` http://[server name]/[path]?[name]=[value]&[name]=[value]


Examples:
* http://acme.org/fhir/Patient/1
* http://acme.org/fhir/Patient?gender=male&address-postalcode=12345

For a FHIR implementer, we will assume that there is no need to do escaping in the [server name] and [path] fragments (there's possibly corner cases where you might need to, but these are either rare or non-existent in the FHIR community). On the other hand, there's certainly specified circumstances where the parameter value is specified to contain characters that may need escaping. For example, one possible URL is:

GET fhir/ValueSet?url=http%3A%2F%2Fhl7.org%2Ffhir%2FValueSet%2Fclinical-findings


In this URL, the characters : and / have been encoded using % encoding, as specified in the http standard. (See the encoding table [here](http://www.w3schools.com/tags/ref_urlencode.asp), but I prefer [this tool](http://meyerweb.com/eric/tools/dencoder/) for normal use). But what characters do you have to encode like that? well, that's where it gets a little slippery. Quoting [from wikipedia](https://en.wikipedia.org/wiki/Percent-encoding#Percent-encoding_reserved_characters):

> When a character from the reserved set (a "reserved character") has special meaning (a "reserved purpose") in a certain context, and a URI scheme says that it is necessary to use that character for some*other*purpose, then the character must be*percent-encoded*.

The key thing here is that which characters have to be encoded depends on which characters have special meaning. In a parameter value, the character '&' has special meaning - so you **have** to escape that. Escaping the rest is optional. So this URL is equal to the one above:

GET fhir/ValueSet?url=http://hl7.org/fhir/ValueSet/clinical-findings


as is this:

GET fhir/ValueSet?url=%68%74%74%70%3A%2F%2F%68%69%37%2E%6F%72%67%2F%66%68%69%72%2F%56%61%69%75%65S%65%74%2F%63%69%69%6E%69%63%61%69%2D%66%69%6Ed%69%6E%67%73 ````

These are all valid, and servers should support all the possible variants. Generally, we try to keep away from specifying characters that need escaping; they just cause problems for everyone. Yes, they’re resolvable, but no, we don’t want people losing time over them, so we don’t, e.g. define parameter names with ‘=’ in them.

So, as a client, you only need to escape in a very few places. There’s one place in the FHIR spec where this escaping arises as an explicit issue, we we need escape with in the value of the parameter itself. This edge case is discussed explicitly in the spec.

Note that there’s one other case where you absolutely have to escape the parameter values: if they contain characters not in the ASCII code range of characters 33 - 127 - typically, spaces or unicode characters.