By this week I had hoped to be started on my modelling, but it became pretty clear that in order to test and verify my hypotheses, I would need to be able to quickly look at and spot check the data.
This means translating ICD-9-CM and ICD-10-CM codes to their human readable nomenclature. To do this, we just need to head on over to CMS.gov and download the code files. From there, it’s some pretty basic and SQL commands to get them loaded into some additional columns in our database.
(NOTE: It is a good idea to test your batch inserts on a small subset, lest you waste two nights thinking the script is working while it is, in fact, not.)
In the mean time, while we’re waiting for the script to run, let’s pop open another RStudio instance and go back and look at the NIS database. Upon sending my advisor my Week 5 Notebook, she asked if we could also display the breakout by sex. We sure can!
This first chart is striking. It appears that females are contracting C. diff at a much higher rate than their male counterparts. Overall, males contract C. diff almost 30% less than females. However, they also seem to contract it at younger ages. The blue bars represent the males first and third quartiles on either side and the median in the middle. Likewise for the females in red. The interquartile ranges of each group are off by 3, and the medians are off by 4.
Here we can see the sexes broken out by year as we did in Week 5. Nothing terribly interesting here, as the males and females seem to follow the overall pattern of expanding into the younger years.
Finally, we look at the time series breakdown of each gender group. Both have risen since 2001, but the females seem to be on a higher trajectory. The jagged lines represent the data, while the smooth lines represent a Loess regression (local smoothing), with the gray section representing the standard error.
I anticipate the update script to take a couple days to run. Occasionally RStudio crashes, so I started saving a cursor position to a flat file so I know where to start from, but this is a very time consuming process.
I do believe it will be worth it in the long run. Being able to spot check analyses and predictions can provide a bit of a sanity check beyond numbers.