Security Appliances and FHIR Servers

May 13, 2019

The FHIR Standard doesn’t say much about security. Given the critical importance of security for healthcare data, readers are sometimes surprised by this. There are, however, many different valid approaches to making a server secure, so the FHIR standard delegates making rules about security to other specifications such as the Smart App Launch Specification. Note that there are many aspects of security, of which the most important are:

resistance to malicious actors - firewalls, basic security discipline
authentication
authorization
access control

For the purpose of this post , security means ‘exercising control over which queries are allowed, and what information they return’.

When explaining about security, the standard includes the following diagram on the security page:

Security can be applied:

in the client
between the client and the server
and inside the server itself.

In most real world applications I look at, some security will exist in all those places.

On the client

The least important place for security is on the client – though since the client does have access to data, security does still matter (particularly in regard to side-channel attacks).

Initially app developers don’t bother about security, and assume that the infrastructure will look after side-channel attacks. But when you don’t worry about access control, you can get hard error messages that look like bugs in the application. Since developers are motivated to avoid these, they end up applying security on the client.

Of course it’s a very bad idea to rely solely on the client to solve all your security needs.

Between the client and server

In this approach, there’s a façade server between the client and server that focuses entirely on security. These façade servers are often called a ‘security appliance’. The security appliance checks that the requests coming from outside are valid and applies authentication / authorization / access control, and then passes the request on to the actual FHIR Server, still as a FHIR Request. Then it inspects the response and filters the returned information against the security policy before returning it to the original client.

Because of the importance of security, most real world applications use a some kind of security appliance. At the least, the security appliance will perform perimeter tasks like preventing obvious intrusions. But the appliance can do a whole lot more than that - it can authenticate the user, handle OAuth authorization/certificate validation, and apply access control to the requests and responses.

Using a security appliance like this is a standard part of a Defence-In-Depth strategy.

A security appliance is not enough

However, real world systems also need to implement access control into the FHIR server, because of the way FHIR works.

As an example, take the situation where the authenticated user is not permitted to see episodes marked as Mental Health (either using a particular encounter type code, or a security label), the patient at hand has 2 episodes, a normal admission (a-n) and a mental health admission (b-psy), and the appliance is enforcing this policy in front of a general purpose server that has no information about the user or their permissions.

For a request such as

GET [base]/Encounter/b-psy

the appliance will see the that the response is marked as a mental health encounter, and change the response to a 404 Not Found with an appropriate error message. When it gets a request list

GET [base]/Encounter

the appliance will see that the response contains 2 encounters, and it will remove b-psy from the list and set the count of responses to 1 instead of 2. So far, all good. However, consider a request like:

GET [base]/Encounter?class=inpatient&_summary=count

Enforcing the user permissions on this request - a simple request to count the inpatient encounters - a simple join on the server - now depends on information that is not explicit in the response, so the appliance can’t apply it to the response. In order to enforce the policy the appliance must perform a full search on the encounters, determine which meet the policy, and then return the count.

In practice, the FHIR standard includes many search parameters and other features (operations, reference resolution) that make the security appliance’s task infeasible - to make the queries work, a server must be aware of the access control rules when it iterates it’s indexes etc.

A security appliance that cannot depend on the FHIR server to implement access control will end up prohibiting most of these queries as unimplementable, but they are standard features that are common / necessary for clients to use to deliver effective user experiences.

Note that there’s another important consequence of applying all the security on the appliance: the server does not know the user, and can’t record the user identity - a key fact - in any audit trail it generates.

Integrated in the server

For these reasons, most real world applications end up enforcing access control in the server that performs the actual work of the handling the FHIR request - resolving references, iterating internal indexes, etc, and the security appliance is mostly used for perimeter security.

Most of the production servers deployed today use the Smart App Launch Specification - the FHIR Community’s standard profile for using OAuth - as their primary security approach. This is a great solution for user level authentication/authorization, but doesn’t yet provide the classic B2B security connections with system level trust that the healthcare community is mostly used to.

The Smart App Launch spec reinforces the importance of server side integrated security by not describing a standard interface between the authorization server and the resource server. This makes it natural to implement strongly coupled Authorization Servers (AS) and Resource servers (RS), and more generally, strongly coupled security systems. Note that this is not at all required - standard interfaces between RS and AS are allowed, but the absence of an accepted way to perform the decoupling encourages a strong integration of the security inside the server.

Note that the only current candidate for a full open standard between the Authorization and Resource Servers is the UMA/Heart specification which does a whole lot of other things, and hasn’t attracted much interest by the community. A lighter weight approach is for the authorization server to offer token introspection, so that a resource server can query the authorization server for details about the authorization. However both of these approaches are limited to expressing constraints that can be expressed using scopes and resource sets, while real security systems may require a richer language to meet requirements.

Mixed deployments

A single integrated server that includes all the security features internally is fragile in other respects. In practice, servers like this are hard to manage in terms of upgrades: security upgrades can be required very quickly in response to newly discovered issues, while the application side typically requires a great deal of testing prior to upgrade. In addition, closed server systems can be hard to adapt to shifting and diverse business requirements around the FHIR server.

This is driving interest in the community around deploying a mixed security system - using a mix of both security appliance and secure server. But to make that work, the two systems have to work together.

openId Connect

The first obvious approach for integrating appliance and server is for the server to collaborate with the appliance to enforce the join/integrity rules that are hard for the appliance while leaving all the rest of the security to the appliance. The appliance trusts the server to enforce the appropriate rules and the server trusts the appliance to correctly identify the user etc by whatever method is appropriate for the business.

In order to make this work, the security appliance as to communicate the details of the request to the FHIR Server. The most obvious way to do this is to pass a jwt in the Authorization header in the request from the security appliance to the FHIR Server. The jwt needs to communicate at least a user identity – for which the natural choice is to use openId connect tokens, though additional details around roles, groups, and authorizations may be required.

Obviously this approach requires trust, which would be established by contractual relationships. In addition, it requires a technical specification around the use of the jwt and/or openId connect token, but I haven’t yet seen enough interest in this approach to justify developing such a document. I will continue to look an opportunity to develop that.

Bulk Data

The forthcoming bulk data specification offers a different solution for organizations looking to integrate appliance and server. The basis of this solution is that the bulk data client security is established at the system level, and can access significant amounts of data. This makes it possible for the appliance to perform interesting new functions.

For instance, when a user logs in with patient level access on the security appliance, instead of the appliance enforcing access control on each request, the appliance could perform the following request on the FHIR Server in the background during the login process:

GET [base]/Patient/[id]/$everything

The appliance holds the user inside the authorization process until the $everything request is completed, and then uses an internal captive FHIR Server to provide complete services to the client based on the resource set returned by the FHIR Server. This allows the appliance to offer several improved services over the base server such as support for FHIR features not supported by the base server, or integration of record sets from multiple servers.

This approach is sometimes referred to as ‘decoupling’ the authorization server and resource server, but readers familiar with the details of OAuth will note that this is not decoupling in the direct OAuth sense.

Note that there’s some very evident limitations of this approach:

it doesn’t provide integrated audit trail in the base application
the information available to the client is frozen to what is available when the bulk data query is performed
it doesn’t easily provide for write access to the base server

All those problems are solvable to some degree or other, but require specific agreement between appliance and server. And alert users will note that it’s not really the bulk data access that makes this possible, it’s the system level trust that matters.

For this reason, some bulk data interfaces might include specific blocking arrangements to enforce the importance of the server’s authorization server; managing this would be matter for contracts.

I’m sure there are other ways to solve this problem. Comments are welcome, but rather than commenting on this post, I’d prefer it if people comment here.