Last week, I had built a model including all comorbidities with the intention of using stepwise regression techniques to trim down the unnecessary coefficients.
Having done some more research on the topic, it appears that stepwise regression is generally frowned upon in the statistics community. See Stopping stepwise: Why stepwise and similar selection methods are bad, and what you should use
One of the suggestions mentioned by Cassell is to have a fuller model, leaving in statistically insignificant variables. While this seems to contradict the idea of model parsimony, I can also see why blindly throwing different variables against the wall and seeing what sticks would probably be a bad idea, particularly in medicine.
However, renal failure as a significant comorbidity did catch my eye, and indeed, there is research on the effects of renal failure and C. diff. After talking with Dr. Vivekanandan, she suggested I break out the renal failures by type.
In the ICD-9-CM codings, we have the following:
Code | Description |
---|---|
584 | Acute kidney failure |
584.5 | Acute kidney failure with lesion of tubular necrosis convert |
584.6 | Acute kidney failure with lesion of renal cortical necrosis convert |
584.7 | Acute kidney failure with lesion of renal medullary [papillary] necrosis |
584.8 | Acute kidney failure with lesion of with other specified pathological lesion in kidney |
584.9 | Acute kidney failure, unspecified |
585 | Chronic kidney disease (ckd) |
585.1 | Chronic kidney disease, Stage I |
585.2 | Chronic kidney disease, Stage II (mild) |
585.3 | Chronic kidney disease, Stage III (moderate) |
585.4 | Chronic kidney disease, Stage IV (severe) |
585.5 | Chronic kidney disease, Stage V (mild) |
585.6 | End stage renal disease |
585.9 | Chronic kidney disease, unspecified |
586 | Renal failure, unspecified |
I grouped all Acute kidney failure items into a single category. Important to note, 585.6 is when a patient goes on dialysis.
Printed below are the fitting times.
## Time difference of 1.933551 mins
## Time difference of 2.106292 mins
## Time difference of 2.120834 mins
## Time difference of 2.819717 mins
## Time difference of 2.8135 mins
Now, we’ll plot each coefficient by year, showing it’s effect with confidence interval and labeled with \(p\)-values, showing whether they were significant (<0.05) or not.
Most interesting here is the consistency and large statistically significant influence end stage renal failure (dialysis) has on 90-day readmissions.
The influence of age as a predictor seems to be declining almost linearly over the years. I believe this to be due to the distribution becoming less skewed and more platykurtic over time. This can be seen in the chart below.
Here we see that the C. diff age distribution is becoming less left-skewed and more platykurtic over the years, meaning younger people are contracting C. diff more often over time. This would explain the decline in the age coefficient as a predictor over time.
Now, why did I model each separately by year? Two reasons; in the NRD, each year is considered a separate sample. And two, when I tried running a regression on all data, it ate up all of my resources and locked up my machine. I may try this again, using the years as regression variables, but I’ll need to find a way to get it to actually complete. However, I feel I am still justified in running separate models for each year, provided I keep the models constant and explicitly mention this.
Finally, this was only for 90 day readmissions because the other datasets were still being built, but I now have 60 and 30 day readmission datasets and will be running the same models on those for my paper.
I believe I have everything I need to begin my paper, and with less than a month to go, this will be my last update as the rest of my time will be spent compiling my paper and presentation. I will reach out seeking questions or opinions as needed.