Having been unable to find FMTs in the NRD dataset, we need to take a different route. If we can’t look at FMTs for effect on readmissions, what can we look at?
We have a rich dataset of C. diff index admissions within the NRD dataset, and whether they were readmitted within \(d\) days. The index admission comes with a primary diagnosis and and up to 29 secondary diagnoses. However, they do not come with a present on admission indicator, so we cannot tell directly what ICD-9-CM codes were added on admission and what codes were acquired during their hospital stay. The NRD does however, provide us with a list of 29 comorbidity indicators:
Column | Description | |
---|---|---|
1 | CM_AIDS | Acquired immune deficiency syndrome |
2 | CM_ALCOHOL | Alcohol abuse |
3 | CM_ANEMDEF | Deficiency anemias |
4 | CM_ARTH | Rheumatoid arthritis/collagen vascular diseases |
5 | CM_BLDLOSS | Chronic blood loss anemia |
6 | CM_CHF | Congestive heart failure |
7 | CM_CHRNLUNG | Chronic pulmonary disease |
8 | CM_COAG | Coagulopath |
9 | CM_DEPRESS | Depression |
10 | CM_DM | Diabetes, uncomplicated |
11 | CM_DMCX | Diabetes with chronic complications |
12 | CM_DRUG | Drug abuse |
13 | CM_HTN_C | Hypertension, uncomplicated and complicated |
14 | CM_HYPOTHY | Hypothyroidism |
15 | CM_LIVER | Liver disease |
16 | CM_LYMPH | Lymphoma |
17 | CM_LYTES | Fluid and electrolyte disorders |
18 | CM_METS | Metastatic cancer |
19 | CM_NEURO | Other neurological disorders |
20 | CM_OBESE | Obesity |
21 | CM_PARA | Paralysis |
22 | CM_PERIVASC | Peripheral vascular disorders |
23 | CM_PSYCH | Psychoses |
24 | CM_PULMCIRC | Pulmonary circulation disorders |
25 | CM_RENLFAIL | Renal failure |
26 | CM_TUMOR | Solid tumor without metastasis |
27 | CM_ULCER | Peptic ulcer disease excluding bleeding |
28 | CM_VALVE | Valvular disease |
29 | CM_WGHTLOSS | Weight loss |
We can examine which among these indicators, as well as other factors, have a higher effect on readmissions.
Most of the variables are categorical. Before we build a model, we need to assess independent variables for multicollinearity. We can use \(\chi\)-square tests for independence coupled with Cramer’s Contingency Coefficient, also known as Cramer’s phi (\(\phi_c\)) or Cramer’s V.
Selecting only those with a v of 0.70 or greater, we find
Column A | Column B | Cramer’s V |
---|---|---|
hosp_ur_teach_metro | pay_nc | 0.9378910 |
hosp_ur_teach_metro | pay_other | 0.8950175 |
hosp_ur_teach_metro | pay_self | 0.8911674 |
hosp_hcontrl_priv_np | pay_nc | 0.7979664 |
hosp_ur_teach_metro | pay_medicaid | 0.7779238 |
hosp_hcontrl_priv_np | pay_other | 0.7508931 |
cm_htn_c | cm_ulcer | 0.7496489 |
cm_lytes | cm_ulcer | 0.7473666 |
hosp_hcontrl_priv_np | pay_self | 0.7467059 |
cm_htn_c | pay_nc | 0.7418716 |
cm_lytes | pay_nc | 0.7395715 |
female | cm_ulcer | 0.7318469 |
female | pay_nc | 0.7239267 |
female | cm_aids | 0.7222253 |
cm_htn_c | cm_lymph | 0.7085926 |
Pay type seems to correlate strongly with the hospital urban/rural and teaching indicators. We are interested in the hospital types more than the pay types, so we’ll remove the pay types from the model.
That leaves us with a few comorbidities and the female indicator being correlated.
Column A | Column B | Cramer’s V |
---|---|---|
cm_htn_c | cm_ulcer | 0.7496489 |
cm_lytes | cm_ulcer | 0.7473666 |
female | cm_ulcer | 0.7318469 |
female | cm_aids | 0.7222253 |
cm_htn_c | cm_lymph | 0.7085926 |
If we dig into this a bit, we find that cm_aids accounts for 0.4575544% of the dataset, cm_lymph accounts for 1.841609%, and ulcer accounts for 0.0727784%, so any correlation should be minimally impactful, and we will probably end up removing them from the model altogether, along with the pay types.
Now, we’ll run the full with all variables of interest for each year and print out the fitting times below.
## [1] "Year: 2010"
## Time difference of 3.080439 mins
## [1] "Year: 2011"
## Time difference of 3.532216 mins
## [1] "Year: 2012"
## Time difference of 3.714203 mins
## [1] "Year: 2013"
## Time difference of 4.297769 mins
## [1] "Year: 2014"
## Time difference of 4.473339 mins
## Time difference of 19.11628 mins
Now, we’ll plot each coefficient by year, showing it’s effect with confidence interval and labeled with \(p\)-values, showing whether they were significant (<0.05) or not.
At this point, we must do some model selection. We will strive for parsimony while still trying to explain as much of the variance as possible. To do this, we will remove the least significant variable and refit the model. We do this iteratively (step-wise) until there are only significant coefficients left.
This will be the goal for next week.
Model selection, analysis, and interpretation.