A Longitudinal Study of the Effect of Renal Failure on Readmission Rates of Patients with Clostridium Difficile

Brian Detweiler

May 4, 2018

Hello and thank you for coming!

Alternative titles

  • How to lose friends and infect people
  • Epidemiology! For fun and for profit
  • C. diff in 60 Minutes

  • Longitudinal
    Study
    On the Effects of
    Renal Failure
    On Readmission Rates of Patients
    With Clostridium Difficile

About me

  • University of Nebraska, Omaha
    • B.S. Computer Science and Mathematics (2009)
    • M.S. Mathematics, Data Science (May, 2018)
  • Software Engineer (2004-present)
  • Flight Operations, U.S. Army National Guard (2000-2009)
    Army National Guard
    Aurora Cooperative
    Army National Guard
    Army National Guard

Agenda

  1. Introduction and motivation
  2. C. diff and renal failure
  3. The data
  4. Methodology
  5. Results

Introduction and motivation

Why are we here?

Pr(You)

  • \(Pr(\text{boy meets girl}) = \frac{1}{20000}\)
  • \(Pr(\text{same boy knocks up same girl}) = \frac{1}{2000}\)
  • \(Pr(\text{right sperm meets right egg})\) = 1 in 400 quadrillion
  • \(Pr(\text{lineage})\): 1 in \(10^{45000}\)
  • \(Pr(\text{you})\) = 1 in \(10^{2685000}\)

So what’s the probability of your existing? It’s the probability of 2 million people getting together – about the population of San Diego – each to play a game of dice with trillion-sided dice. They each roll the dice, and they all come up the exact same number – say, 550,343,279,001.

A miracle is an event so unlikely as to be almost impossible. By that definition, I’ve just shown that you are a miracle.

Benazir, A. What are the chances of your coming into being? (2011)

Data Science

Data Science Venn Diagram

Big Data

  • Volume
    • How big is BIG? Gigabytes? Petabytes? Exabytes?
  • Velocity
  • Variety
    • Plain text, XML, JSON, video, audio, etc.
  • And sometimes veracity
    • Questionable data quality

Finding a project

  • 🚫 Union Pacific PTC data (2016-2017) - SCRAPPED
  • ✅ HCUP project through Creighton

C. diff and renal failure

C. diff

  • Clostridium difficile
  • Gram-positive, anaerobic, rod-shaped, Endospore-forming bacterium
  • Has surpassed MRSA as most common nosocomial (hospital-acquired) disease

Clostridium Difficile baccilus

Scientific Classification

Kingdom

Bacteria

Phylum

Firmicutes

Class

Clostridia

Order

Clostridiales

Family

Clostridiaceae

Genus

Clostridium

Species

C. difficile

Where it lives

  • Intestinal tract of healthy people
  • Soil
  • Water
  • Feces of infected animals and humans
  • Surfaces for up to 5 months

Signs / Symptoms

  • Diarrhea
  • Fever
  • Nausea
  • Abdominal pain
  • Pseudomembranous colitis
  • Toxic megacolon
  • Perforation of the colon
  • Sepsis

CDI risk

Clostridium Difficile risk

CDC

How CDI spreads

Clostridium Difficile risk

CDC

CDI treatments

  • Antibiotics
    • Flagyl (metronidazole) - Cheap, no longer recommended
    • Vanco (vancomycin) - Expensive but effective
    • Dificid (fidaxomicin) - Most expensive, most effective
      Louie, T., et. al. Fidaxomicin versus Vancomycin for Clostridium difficile Infection. (2011)
  • Fecal Microbiota Transplants (FMT)
    • via ng-tube or colonoscopy (similar outcomes)
      Postigo, R. and Kim, J.H. Colonoscopic versus nasogastric fecal transplantation for the treatment of Clostridium difficile infection: a review and pooled analysis. (2012)
    • via OpenBiome capsules
  • Probiotics are not recommended

Renal (kidney) disease

  • Acute kidney injury (AKI)
  • Chronic kidney disease
    • Stages 1-4
  • Stage 5 - End-stage renal disease (ESRD)
  • Dialysis or transplant

Renal disease signs / symptoms

  • Nausea
  • Vomiting
  • Loss of appetite
  • Fatigue and weakness
  • Sleep problems
  • Changes in urine volume
  • …much more

Renal disease treatments

  • Treat the underlying cause

AKI Causes

  • Decreased blood flow
  • Direct damage to kidneys
  • Urinary tract blockage

AKI Risks

  • Hospitalization
  • Advanced age
  • Blood vessel blockage in arms/legs
  • Diabetes
  • High blood pressure
  • Heart failure
  • Kidney diseases
  • Liver diseases

CKD Causes

  • Type I/II diabetes
  • High blood pressure
  • …much more

CKD risk

  • Cardiovascular disease
  • Smoking
  • Obesity
  • Race
    • African-, Native-, or Asian-American
  • Family history of kidney disease
  • Abnormal kidney structure
  • Older age

Measuring kidney function

  • Glomerular Filtration Rate (GFR)
  • MDRD

    \[ GFR = 175 \times S_{cr} - 1.154 \times \text{Age}^{-0.203} \times 0.742 \cdot I(\text{F}) \times 1.212 \cdot I(\text{AA}) \]

  • CKD-EPI

    \[ GFR = 141 \times min\bigg(\frac{S_{cr}}{\kappa}, 1\bigg)^{\alpha} \times max\bigg(\frac{S_{cr}}{\kappa}, 1\bigg)^{-1.209} \\ \times 0.993^{\text{Age}} \times 1.018 \cdot \text{I}(\text{F}) \times 1.159 \cdot \text{I}(\text{AA}) \]

  • F is female sex
  • AA is African American race
  • I is 1 if true, else reciprocal of preceding term
  • \(S_{cr}\) is serum creatinine in mg/dL
  • \(\kappa\) is 0.7 for females and 0.9 for males
  • \(\alpha\) is -0.329 for females and -0.411 for males

CKD Stages

Stage Description GFR/Kidney Function

1

Normal function

90+/90%+

2

Mild loss

60-89/60-89%

3

Mild to severe

30-59/30-59%

4

Severe

15-29/15-29%

5

Kidney failure (ESRD)

15 or less/15% or less

End-stage renal disease (ESRD)

  • When stage 5 is reached
  • Dialysis or kidney transplant

Readmissions

  • If hospital has “excess readmissions”, penalties are assessed
  • 30-day risk standardized measure to calculate Payment Readjustment Factor (PRF)

    All-cause unplanned readmissions to the same or another applicable acute care hospital, occurring within 30 days - for any reason, regardless of principal diagnosis - from the index admission are counted in this measure. Some planned readmissions are not counted. HRRP

    \[ \text{PRF} = 1 - min\bigg(0.03, \sum_{dx} \frac{\text{Payment}(dx) \cdot max\big((\text{ERR}(dx) - 1.0), 0\big)}{\text{All payments}}\bigg) \]

  • Where \(dx\) is one of six measure cohorts, incl heart failure, pneumonia, et. al.
  • ERR is a hospital’s performance measure \(dx\), and payment refers to base operating DRG payments.

The data

A fun experiment

  • Step 1: Pick a random percentage. e.g. 54%, 28%, 77%, etc.
  • Step 2: Type that number into Google followed by “of Americans”
  • Step 3: Follow rabbit hole for hours

Simple random sample

  • Pólya urn model
  • With (SRSWR) or Without Replacement (SRSWOR)
    • With replacement - makes use of i.i.d. assumption
    • Without replacement - not i.i.d. but still exchangeable
  • Requires access to the entire population
Polya Urn Model

Sampling design

Sampling Plan Design-based inference Model-based inference

Probability sample

A

C

Model-dependent sample

B

D

Quota sampling

E

F

Convenience sampling

G

H

Snowball sampling

I

J

Peer nomination

K

L

Design effects

Design effects

  • “deft”
  • Similar to variance inflation factor (VIF)
  • Effective sample size

\[ D^2(\hat{\theta}) = \frac{SE(\hat{\theta})^2_{complex}}{SE(\hat{\theta})^2_{srs}} = \frac{var(\hat{\theta})_{complex}}{var(\hat{\theta})_{srs}} \]

\[ n_{eff} = \frac{n_{complex}}{d^2(\hat{\theta})} \]

Clustering

  • Grouping people by geographic regions
  • SRS to choose a geographic region
    Cluster Sampling

Clustering

Cluster Sampling

Stratification

Stratified Sampling

Stratification

Stratified Sampling

Weighting

  • \(N = 51\)
    Stratified Sampling

Weighting

  • \(N_{men} = 30\)
  • \(p_{men} = \frac{30}{51} = 0.588\)
    Stratified Sampling

Weighting

  • \(N_{women} = 21\)
  • \(p_{women} = \frac{21}{51} = 0.412\)
    Stratified Sampling

Weighting

  • \(N_{women} = 21\)
  • Women Odds Ratio: \(\frac{p_{women}}{p_{men}} = \frac{0.588}{0.412} = 1.427\)
  • Men Odds Ratio: \(\frac{p_{men}}{p_{women}} = \frac{0.412}{0.588} = 0.701\)

H-CUP Databases

  • Healthcare Cost and Utilization Project
  • Includes NIS and NRD
  • Must be purchased
  • Data usage agreement required for analysts

NIS Sampling Design

  • Nationwide Inpatient Sample
    • 1988-2011: 100% sample of 20% of HCUP hospitals
  • National Inpatient Sample
    • 2012-present: 20% sample of 100% of HCUP hospitals

HCUP Complex Survey Design

  • Clustered on hospital ID
  • Weights included in discwt field for national estimates
  • 1988-2011: Stratified by census region and bed size
  • 2012-present: Stratified by census division and bedside
    • Region 1 (Northeast)
      • Division 1 (New England) -Division 2 (Mid Atlantic)
    • Region 2 (Midwest)
      • Division 3 (East North Central)
      • Division 4 (West North Central) (incl. Nebraska)
    • Region 3 (South)
      • Division 5 (South Atlantic)
      • Division 6 (East South Central)
      • Division 7 (West South Central)
    • Region 4 (West)
      • Division 8 (Mountain)
      • Division 9 (Pacific)

Importance of survey design

  • Treating as SRS
summary(lm(los~age, data=cdiff))
## 
## Call:
## lm(formula = los ~ age, data = cdiff)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -14.19  -7.12  -3.98   2.27 349.01 
## 
## Coefficients:
##              Estimate Std. Error t value            Pr(>|t|)    
## (Intercept) 14.190744   0.187955   75.50 <0.0000000000000002 ***
## age         -0.043275   0.002687  -16.11 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.94 on 73264 degrees of freedom
## Multiple R-squared:  0.003528,   Adjusted R-squared:  0.003514 
## F-statistic: 259.4 on 1 and 73264 DF,  p-value: < 0.00000000000000022

Importance of survey design

  • Accounting for survey design with R survey package
library('survey')

cdiff.design <- svydesign(ids = ~hospid, data = cdiff, weights = ~discwt,  strata = ~nis_stratum, nest=TRUE)
summary(svyglm(los~age, design=cdiff.design))
## 
## Call:
## svyglm(formula = los ~ age, design = cdiff.design)
## 
## Survey design:
## svydesign(ids = ~hospid, data = cdiff, weights = ~discwt, strata = ~nis_stratum, 
##     nest = TRUE)
## 
## Coefficients:
##             Estimate Std. Error t value             Pr(>|t|)    
## (Intercept) 13.95231    0.55033  25.353 < 0.0000000000000002 ***
## age         -0.04657    0.00637  -7.311    0.000000000000627 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 180.579)
## 
## Number of Fisher Scoring iterations: 2

SRS vs. complex design

SRS line in red, complex design in blue

SRS line in red, complex design in blue

Research design checklist

  • Only detect disease conditions, procedures, and diagnostic tests in hospital settings
  • “Encounters” not patients
  • No volume-specific assessments for:
    • Geographic units, like U.S. states
    • Healthcare facilities (after 2011)
    • Individual health care providers
  • Determine diseases and procedures using validated administrative codes
  • Limit assessment to in-hospital outcomes
  • Distinguish between complications and comorbidities or state where you cannot
  • Account for NIS/NRD survey design
  • Address changes in data structure over time for trend analysis
Khera, R. and Krumholz, M. With Great Power Comes Great Responsibility: Big Data Research From the National Inpatient Sample. (2017)

NIS Features

  • 2009 - Added DX and DXCCS codes 16-25
  • 2012 - NIS redesign (removed hospital-specific features)
  • 2014 - Added DX and DXCCS codes 26-30

NIS dimensions

  • Big data?
  • Definitely large data
  • ~3 GB per year (raw CSV)
  • 108,683,763 total rows
  • 10868376.3 per year

NRD Features

  • Years 2010-2013 had 116 features
  • 2014 had 126 features
    • Added DX and DXCCS 26-30
  • ~10 GB per year (raw CSV)
  • 87,699,909
  • 17539981.8 per year

Features of interest

  • Age
  • Female
  • Length of Stay
  • Died
  • Hospital categorization
  • DXn codes (ICD-9-CM codes)
    • C. diff - 00845
    • AKI - 584, 584.5-584.9
    • CKD - 585, 585.1-585.6, 585.9
    • Renal Failure, unspecified - 586

Methodology

Dealing with big data

MonetDB

  • Column-store relational database
    • Data warehouse

Getting data into the database

  • Lots of Excel work

    Weary

  • NIS CSV: ~3GB
  • read_csv(…NIS…): ~64 GB in RAM
  • NRD CSV: ~6GB
  • read_csv(…NRD…): CRASHED
    • Had to split data

Getting data back out

SELECT *
  FROM nis
 WHERE nis.dx1  = '00845' 
    OR nis.dx2  = '00845' 
    OR nis.dx3  = '00845' 
    OR nis.dx4  = '00845' 
    OR nis.dx5  = '00845' 
    OR nis.dx6  = '00845' 
    OR nis.dx7  = '00845' 
    OR nis.dx8  = '00845' 
    OR nis.dx9  = '00845' 
    OR nis.dx10 = '00845' 
    OR nis.dx11 = '00845' 
    OR nis.dx12 = '00845' 
    OR nis.dx13 = '00845' 
    OR nis.dx14 = '00845' 
    OR nis.dx15 = '00845'
    OR nis.dx16 = '00845' 
    OR nis.dx17 = '00845' 
    OR nis.dx18 = '00845' 
    OR nis.dx19 = '00845' 
    OR nis.dx20 = '00845' 
    OR nis.dx21 = '00845' 
    OR nis.dx22 = '00845' 
    OR nis.dx23 = '00845' 
    OR nis.dx23 = '00845' 
    OR nis.dx25 = '00845' 
    OR nis.dx26 = '00845' 
    OR nis.dx27 = '00845' 
    OR nis.dx28 = '00845' 
    OR nis.dx29 = '00845'
    OR nis.dx30 = '00845'

Readmissions

  • nrd_visitlink allows analyst to link visits
  • los and nrd_daystoevent used to determine sequence and length of stay
  • Up to analyst to determine “index” admission and readmissions

Readmission Algorithm

Other considerations

  • 2010-2014: (1 ≤ DMONTH ≤ 12 − ceil(d/30))
    • Cut off index events with enough time to track readmissions
  • DIED \(\ne\) 0
    • A death on index does not allow for readmission
  • Length of stay > 0
    • LOS == zero represents transfers and same-day stays (more complex)
  • AGE > 0
    • Infants are often asymptomatic carriers of C. diff

Models

  • Linear logistic regression
  • Start with all C. diff patients
  • Fit each year separately
    • Accounts for changes in years
    • Allowed fitting to complete under hardware limitations
readmitted ~ hosp_hcontrl_govt +
             hosp_hcontrl_priv_np +
             hosp_urcat4 +
             hosp_ur_teach_metro +
             hosp_ur_teach_metro_teaching +
             hosp_bedsize +
             female +
             acute_kidney_failure +
             chronic_kidney_disease2 +
             chronic_kidney_disease3 +
             chronic_kidney_disease4 +
             chronic_kidney_disease5 +
             chronic_kidney_disease6 +
             chronic_kidney_disease_unk +
             renal_failure_unspecified

Results

Model fit

Conclusions

  • CDI spreading into younger demographics
  • Age - correlation or causation?
    • Other comorbidities associated with age
  • ESRD, AKI, and some CKD stages strong predictors for CDI readmission

Future work

  • Age as a risk factor?
  • What is causing spread into younger groups?
  • Why are females more likely to get CDI?
  • Mortality?
  • Treatment studies?

Acknowledgements

Very special thanks

  • Renuga Vivekanandan, M.D.
  • Ryan Walters, Ph.D.
  • Dora Matache, Ph.D.

Thank you!

Graduation!