At a glance

No longer looking for FMTs
Evaluating other variables of interest and looking for strong predictors, notably renal failure comorbidities
Began model fitting and evaluation

Contingency plan

Having been unable to find FMTs in the NRD dataset, we need to take a different route. If we can’t look at FMTs for effect on readmissions, what can we look at?

We have a rich dataset of C. diff index admissions within the NRD dataset, and whether they were readmitted within \(d\) days. The index admission comes with a primary diagnosis and and up to 29 secondary diagnoses. However, they do not come with a present on admission indicator, so we cannot tell directly what ICD-9-CM codes were added on admission and what codes were acquired during their hospital stay. The NRD does however, provide us with a list of 29 comorbidity indicators:

	Column	Description
1	CM_AIDS	Acquired immune deficiency syndrome
2	CM_ALCOHOL	Alcohol abuse
3	CM_ANEMDEF	Deficiency anemias
4	CM_ARTH	Rheumatoid arthritis/collagen vascular diseases
5	CM_BLDLOSS	Chronic blood loss anemia
6	CM_CHF	Congestive heart failure
7	CM_CHRNLUNG	Chronic pulmonary disease
8	CM_COAG	Coagulopath
9	CM_DEPRESS	Depression
10	CM_DM	Diabetes, uncomplicated
11	CM_DMCX	Diabetes with chronic complications
12	CM_DRUG	Drug abuse
13	CM_HTN_C	Hypertension, uncomplicated and complicated
14	CM_HYPOTHY	Hypothyroidism
15	CM_LIVER	Liver disease
16	CM_LYMPH	Lymphoma
17	CM_LYTES	Fluid and electrolyte disorders
18	CM_METS	Metastatic cancer
19	CM_NEURO	Other neurological disorders
20	CM_OBESE	Obesity
21	CM_PARA	Paralysis
22	CM_PERIVASC	Peripheral vascular disorders
23	CM_PSYCH	Psychoses
24	CM_PULMCIRC	Pulmonary circulation disorders
25	CM_RENLFAIL	Renal failure
26	CM_TUMOR	Solid tumor without metastasis
27	CM_ULCER	Peptic ulcer disease excluding bleeding
28	CM_VALVE	Valvular disease
29	CM_WGHTLOSS	Weight loss

We can examine which among these indicators, as well as other factors, have a higher effect on readmissions.

Checking for independence

Most of the variables are categorical. Before we build a model, we need to assess independent variables for multicollinearity. We can use \(\chi\)-square tests for independence coupled with Cramer’s Contingency Coefficient, also known as Cramer’s phi (\(\phi_c\)) or Cramer’s V.

Selecting only those with a v of 0.70 or greater, we find

Column A	Column B	Cramer’s V
hosp_ur_teach_metro	pay_nc	0.9378910
hosp_ur_teach_metro	pay_other	0.8950175
hosp_ur_teach_metro	pay_self	0.8911674
hosp_hcontrl_priv_np	pay_nc	0.7979664
hosp_ur_teach_metro	pay_medicaid	0.7779238
hosp_hcontrl_priv_np	pay_other	0.7508931
cm_htn_c	cm_ulcer	0.7496489
cm_lytes	cm_ulcer	0.7473666
hosp_hcontrl_priv_np	pay_self	0.7467059
cm_htn_c	pay_nc	0.7418716
cm_lytes	pay_nc	0.7395715
female	cm_ulcer	0.7318469
female	pay_nc	0.7239267
female	cm_aids	0.7222253
cm_htn_c	cm_lymph	0.7085926

Pay type seems to correlate strongly with the hospital urban/rural and teaching indicators. We are interested in the hospital types more than the pay types, so we’ll remove the pay types from the model.

That leaves us with a few comorbidities and the female indicator being correlated.

Column A	Column B	Cramer’s V
cm_htn_c	cm_ulcer	0.7496489
cm_lytes	cm_ulcer	0.7473666
female	cm_ulcer	0.7318469
female	cm_aids	0.7222253
cm_htn_c	cm_lymph	0.7085926

If we dig into this a bit, we find that cm_aids accounts for 0.4575544% of the dataset, cm_lymph accounts for 1.841609%, and ulcer accounts for 0.0727784%, so any correlation should be minimally impactful, and we will probably end up removing them from the model altogether, along with the pay types.

Now, we’ll run the full with all variables of interest for each year and print out the fitting times below.

## [1] "Year: 2010"
## Time difference of 3.080439 mins
## [1] "Year: 2011"
## Time difference of 3.532216 mins
## [1] "Year: 2012"
## Time difference of 3.714203 mins
## [1] "Year: 2013"
## Time difference of 4.297769 mins
## [1] "Year: 2014"
## Time difference of 4.473339 mins

## Time difference of 19.11628 mins

Now, we’ll plot each coefficient by year, showing it’s effect with confidence interval and labeled with \(p\)-values, showing whether they were significant (<0.05) or not.

At this point, we must do some model selection. We will strive for parsimony while still trying to explain as much of the variance as possible. To do this, we will remove the least significant variable and refit the model. We do this iteratively (step-wise) until there are only significant coefficients left.

This will be the goal for next week.

Next steps

Model selection, analysis, and interpretation.

Grad Project: Week 12

Brian Detweiler

April 1, 2018

At a glance

Contingency plan

Checking for independence

Next steps