Question: How to store a CDA document in a relational database
Aug 16, 2011
A question (by the ask me a question link above): I am trying to design a RDBMS to store the CDA documents, I have couple of questions
- If I create tables using the HMD, is that a good approach to do that?
- How to store the data types eg. CD in database, for example create a table CS and link it to Act or any table where it is used.
Well, taking these questions in turn:
Should I create the tables using the HMD?
Umm, well, you can. Sort of. But the HMD is abstract. That means that it’s an in principle statement of the content of a CDA document, not an in practice statement. You have to account, in addition to what’s in the HMD, for nullFlavor, realmCode and templateId, though only the latter is commonly encountered. Also, there’s the mapping from the abstract data type defined in the HMD to the actual way the XML works (one day I’ll have time to do a post about the abstract / ITS data type mapping). And you have to bridge between these models in your parser. But you can do this - it’s just work. But is that the best way?
For me, no, I’d use the schema. It’s just more tangible and final, and there’s too many pitfalls for a non-hl7 guru to fall into creating the tables off the HMD. (well, to tell the truth, I used the MIF, which is kind of like the HMD, but very definitely an insider only way).
How to store the data types
Well, if you do it from the schema, that answers your question - whatever way the schema -> RDMS schemata tool does the conversion, and don’t fiddle with it. But if you insist on doing it any other way, then you’ll become an expert in the abstract/its mapping. As I said, it can be done. I personally kill the recursion in the CD data type, and normalise it - a table for code systems, a table for codesystem references, a table for concepts (original text + code system references). But whether this is good for you depends on what you’re tying to do.
More generally, the process of splitting a CDA document into a fully structured relational database - why would you do that? It’s hard to query the individual parts without looking at the context in which they appear (you kind of query for clinical statement parts as the smallest level of granularity, and think about things like moodCode). And are you going to break the narrative into database parts too?
Personally, I’d keep the documents whole, and build specific secondary indexes from them based on the particular pieces I want to do queries against.