At a glance

More exploratory data analysis

Since I’m still waiting on the NRD dataset, I thought I would do some further exploration of the NIS dataset, this time taking a closer look at the ages of the patients diagnosed with C. diff.

Simply plotting the distribution of age, we get the following distribution.

We have an interquartile range of 25, with \(q_1\) = 57 and \(q_3\) = 82, and a mode of 82. The blue lines represent the interquartile range.

The mode is interesting here because it represents the peak of the distribution. 82 is the most common age at which C. diff is contracted overall.

Because the distribution is highly left-skewed, the mean (\(\overline{x} \approx\) 67) is not very useful. In such skewed distributions, the median (median = 72) proves to be more useful. The red line represents the median.

Over time

But C. diff infections have changed over time. There are numerous reports that C. diff cases are on the rise (PDF). So there must be a trend over time. Let’s take a look at how the rise in C. diff over time is affecting various age groups.

Here we see a clear trend. C. diff once affected mostly people in their 70s and 80s, the peak is getting fatter and widening out to people in their 50s and 60s.

Over time by age groups

Breaking this down further into bin widths of 5 years, we start to see some interesting trends. Age groups 75+ appear to be at least on a short-term decline, while all other age groups continue to rise.

Those particularly affected lie in the 50 - 75 age group. If the trends continue, it is possible that the 70-75 age group could overtake the 75-85 groups as the predominantly affected group.

Next Steps

I hope to get the NRD dataset this week which will require the same amount of preprocessing and importing as the NIS dataset, so I anticipate that will keep me busy for at least a week.

Until then, I will continue to examine the NIS dataset and look for any other nuggets of wisdom.

Particularly, I would like to model the age group as an ARIMA time series and forecast future trends.