#FHIR and R (Stats / graphing language)
Oct 27, 2017I’ve spent the last 2 days at the 2017 Australian R Unconference working on an R client for FHIR. For my FHIR readers, R is a language and environment for statistical computing and graphics. (Having spent the last couple of days explaining what FHIR is to R people). My goal for the 2 days was to implement a FHIR client in R so that anyone wishing to perform statistical analysis of information available in R could quickly and easily do so. I was invited to the R Unconference by Prof Rob Hyndman (a family friend) as it would be the best environment to work on that.
My work was a made a lot easier when Sander Laverman from Furore released an R package to do just what I intended to do earlier this week. We started by learning the package, and testing it. Here’s a graph generated by R using data from test.fhir.org:
Once we had the R Package (and the server) all working, I added a few additional methods to it (the other read methods). For sample code and additional graphs, see my rOpenSci write up.
I think it’s important to make the FHIR data easily available to R because it opens up a connection between two different communities - that’s good for both. Many of the participants are the Unconference were involved in health, and aware of how hard it is to make data available for analysis
Restructuring the data
FHIR data is nested, heirarchical, and focused on operational use. I’ve written about the difference between operational and analytical use before. Once we had the data being imported from a FHIR server into a set of R data frames, Rob and I looked at the data and agreed that that most important area to focus on was tools to help reshape the data down to a usable form. The thing about this is that it’s not a single problem - the ‘usable form’ will depend entirely on what the question that is being asked of the data is.
So I spent most of the time at the Unconference extending my graphQL implementation to allow reshaping of the data (in addition to it’s existing function in assembling and filtering the data). I defined 4 new directives:
- @flatten (seegraphQL Issue)
- @first
- @singleton
- @slice(path)
I’ve documented the details of this over on the rOpenSci write up, along with examples of how they work. They don’t solve all data restructuring problems by a very long shot, but they do represent a very efficient and reusable way to shift the restructuring work to the server.
There was some interest at the unconference in taking my graphQL code and building it into an R package to allow graphQL query of graphs of data frames in R, to assemble, filter, and restructure them - it’s an idea that’s useful anytime you want to do analysis of a graph of data. But we were all too busy for that - perhaps another time.
Where to now?
I think we should add R support to the AMIA FHIR datathon series, and maybe it’s time to encourage the research track at the main FHIR connectathons to try using R - I think it’s a very powerful tool to add to our FHIR toolkits.
Thanks to Adam Gruer from Melbourne’s Royal Children’s Hospital for helping - those graphs are his. Thanks also the organisers - particularly Nick Tierney (gitmeister estraordinaire). I picked up some ideas from the Unconference that will improve the FHIR connectathons.