At a glance

Contingency plan

Having been unable to find FMTs in the NRD dataset, we need to take a different route. If we can’t look at FMTs for effect on readmissions, what can we look at?

We have a rich dataset of C. diff index admissions within the NRD dataset, and whether they were readmitted within \(d\) days. The index admission comes with a primary diagnosis and and up to 29 secondary diagnoses. However, they do not come with a present on admission indicator, so we cannot tell directly what ICD-9-CM codes were added on admission and what codes were acquired during their hospital stay. The NRD does however, provide us with a list of 29 comorbidity indicators:

Column Description
1 CM_AIDS Acquired immune deficiency syndrome
2 CM_ALCOHOL Alcohol abuse
3 CM_ANEMDEF Deficiency anemias
4 CM_ARTH Rheumatoid arthritis/collagen vascular diseases
5 CM_BLDLOSS Chronic blood loss anemia
6 CM_CHF Congestive heart failure
7 CM_CHRNLUNG Chronic pulmonary disease
8 CM_COAG Coagulopath
9 CM_DEPRESS Depression
10 CM_DM Diabetes, uncomplicated
11 CM_DMCX Diabetes with chronic complications
12 CM_DRUG Drug abuse
13 CM_HTN_C Hypertension, uncomplicated and complicated
14 CM_HYPOTHY Hypothyroidism
15 CM_LIVER Liver disease
16 CM_LYMPH Lymphoma
17 CM_LYTES Fluid and electrolyte disorders
18 CM_METS Metastatic cancer
19 CM_NEURO Other neurological disorders
20 CM_OBESE Obesity
21 CM_PARA Paralysis
22 CM_PERIVASC Peripheral vascular disorders
23 CM_PSYCH Psychoses
24 CM_PULMCIRC Pulmonary circulation disorders
25 CM_RENLFAIL Renal failure
26 CM_TUMOR Solid tumor without metastasis
27 CM_ULCER Peptic ulcer disease excluding bleeding
28 CM_VALVE Valvular disease
29 CM_WGHTLOSS Weight loss

We can examine which among these indicators, as well as other factors, have a higher effect on readmissions.

Checking for independence

Most of the variables are categorical. Before we build a model, we need to assess independent variables for multicollinearity. We can use \(\chi\)-square tests for independence coupled with Cramer’s Contingency Coefficient, also known as Cramer’s phi (\(\phi_c\)) or Cramer’s V.

Selecting only those with a v of 0.70 or greater, we find

Column A Column B Cramer’s V
hosp_ur_teach_metro pay_nc 0.9378910
hosp_ur_teach_metro pay_other 0.8950175
hosp_ur_teach_metro pay_self 0.8911674
hosp_hcontrl_priv_np pay_nc 0.7979664
hosp_ur_teach_metro pay_medicaid 0.7779238
hosp_hcontrl_priv_np pay_other 0.7508931
cm_htn_c cm_ulcer 0.7496489
cm_lytes cm_ulcer 0.7473666
hosp_hcontrl_priv_np pay_self 0.7467059
cm_htn_c pay_nc 0.7418716
cm_lytes pay_nc 0.7395715
female cm_ulcer 0.7318469
female pay_nc 0.7239267
female cm_aids 0.7222253
cm_htn_c cm_lymph 0.7085926

Pay type seems to correlate strongly with the hospital urban/rural and teaching indicators. We are interested in the hospital types more than the pay types, so we’ll remove the pay types from the model.

That leaves us with a few comorbidities and the female indicator being correlated.

Column A Column B Cramer’s V
cm_htn_c cm_ulcer 0.7496489
cm_lytes cm_ulcer 0.7473666
female cm_ulcer 0.7318469
female cm_aids 0.7222253
cm_htn_c cm_lymph 0.7085926

If we dig into this a bit, we find that cm_aids accounts for 0.4575544% of the dataset, cm_lymph accounts for 1.841609%, and ulcer accounts for 0.0727784%, so any correlation should be minimally impactful, and we will probably end up removing them from the model altogether, along with the pay types.

Now, we’ll run the full with all variables of interest for each year and print out the fitting times below.

## [1] "Year: 2010"
## Time difference of 3.080439 mins
## [1] "Year: 2011"
## Time difference of 3.532216 mins
## [1] "Year: 2012"
## Time difference of 3.714203 mins
## [1] "Year: 2013"
## Time difference of 4.297769 mins
## [1] "Year: 2014"
## Time difference of 4.473339 mins
## Time difference of 19.11628 mins

Now, we’ll plot each coefficient by year, showing it’s effect with confidence interval and labeled with \(p\)-values, showing whether they were significant (<0.05) or not.

At this point, we must do some model selection. We will strive for parsimony while still trying to explain as much of the variance as possible. To do this, we will remove the least significant variable and refit the model. We do this iteratively (step-wise) until there are only significant coefficients left.

This will be the goal for next week.

Next steps

Model selection, analysis, and interpretation.