Data for Waitlist Mortality

Pediatric HR Candidates

Author

R. Jerome Dixon, Michael D. Porter

Published

July 5, 2024

This notebook generates the waitlist mortality data.

The required input files are:

The outputs are:

Get Population

Thoracic Data

The thoracic data has information collected on transplant candidates and recipients. The data was retrieved from UNOS on 2022-04-01.

Show the code
thoracic = read_rds(file.path(dir_data, "thoracic.rds"))
Show the code
censoring_date = as.Date("2022-04-01")

Outcomes

The REM_CD is the coded reason for removal from the waitlist. The details are in the STAR documentation.

  • The created outcome variable corresponds to an adverse outcome defined as death (REM_CD = 8) or Candidate too sick to transplant (REM_CD = 13, “Cand. cond. deteriorated,too sick to tx”).
    • outcome = 1 is adverse.
    • outcome = 0 is everything else (i.e., candidate was removed from the waitlist for anything besides death or deteriorization).
Show the code
REM_CD_code = readxl::read_excel(file.path(dir_data, "STAR File Documentation.xls"),
                                sheet = "THORACIC_FORMATS_FLATFILE",
                                skip=1) %>%
  filter(`SAS ANALYSIS FORMAT` == "REMCD") %>%
  select(
    REM_CD = `Data Field Value`,
    REMOVAL_REASON = `Data Field Formatted Value`
  ) %>% 
  mutate(
    outcome = 1L*(REM_CD %in% c("8", "13")),
    outcome_full = case_match(REM_CD,
                         #: adverse outcomes
                         "8" ~ "Death",
                         "13" ~ "Too sick", # Cand. cond. deteriorated,too sick to tx
                         #: Still alive or Tx'ed
                         c("2", "3", "4", "14", "15", "21", "23") ~ "Tx",
                         "6" ~ "Alive",  # Refused transplant
                         "7" ~ "Alive",  # Transferred to another center 
                         "12" ~ "Alive", # Cand. condition improved, tx not needed
                         "16" ~ "Alive", # Candidate Removed in Error
                         #: other
                         "10" ~ "Listed in Error",
                         "24" ~ "Lost contact",
                         .default = "Other"
                         )
  )

Population

1. Pediatric Candidate (Age < 18) on waitlist for heart

Show the code
thor1 = thoracic %>% 
  filter(
    WL_ORG == "HR",    
    INIT_AGE < 18, 
  )

2. Added to waitlist between 2010 - 2020

Show the code
date_rng = c("2010-01-01", "2020-12-31") %>% as.Date()
thor2 = thor1 %>% 
  filter(between(as.Date(INIT_DATE), date_rng[1], date_rng[2]))

There are 6244 pediatric waitlists between 2010-01-01 and 2020-12-31. There were 5972 unique candidates (since some were on multiple waitlists during this period).

3. No prior heart transplants

Show the code
thor3 = thor2 %>% 
  filter(
    NUM_PREV_TX == 0,    
    !(PREV_TX %in% "Y"),
    !(THORACIC_DGN %in% c(1700, 1100:1199)) # codes related to RE-TX/GF
    )  

This removed 399 and leaves 5845.

4. Patient’s first heart waitlist

Keep only the data from the first time a patient was listed. This will exclude the waitlists corresponding to patients i) who had previous heart transplants, ii) are listed multiple times, and iii) were transferred to another center.

Show the code
first_wl = thoracic %>% 
  group_by(PT_CODE) %>% 
    slice_min(INIT_DATE, with_ties = FALSE) %>% 
  ungroup()

thor4 = thor3 %>% semi_join(first_wl, by = "WL_ID_CODE")

This removed 176 and leaves 5669.

5. Remove waitlists with mistakes

The removal code corresponding to listing mistakes (REM_CD = 10 “Candidate listed in error”) are removed from the data.

Show the code
thor5 = thor4 %>% filter( !(REM_CD %in% c("10")) )

This removed 0 and leaves 5669.

6. Remove waitlists with unknown outcomes

The removal codes corresponding to candidates lost (REM_CD = 24 “Unable to contact candidate”) and candidates still alive at data collection/censored (REM_CD = NA) are removed from the data.

Show the code
thor6 = thor5 %>% filter( REM_CD != 24, !is.na(REM_CD))

This removed 146 and leaves 5523.

Summary of removal reasons for our population

This gives our final population

Show the code
population = thor6 %>% select(WL_ID_CODE, REM_CD) %>%   
  mutate(REM_CD = as.character(REM_CD)) %>% 
  left_join(REM_CD_code, by = "REM_CD")

write_csv(population, file.path(dir_save, "population.csv"))
Show the code
population %>% 
  count(REM_CD, sort=TRUE) %>% 
  mutate(p = n/sum(n)) %>% 
  left_join(REM_CD_code, by = "REM_CD") %>% 
  print_table()
REM_CD n p REMOVAL_REASON outcome outcome_full
4 4495 0.814 Deceased Donor tx, removed by tx center 0 Tx
8 338 0.061 Died 1 Death
12 317 0.057 Cand. condition improved, tx not needed 0 Alive
13 202 0.037 Cand. cond. deteriorated,too sick to tx 1 Too sick
9 75 0.014 Other 0 Other
7 71 0.013 Transferred to another center 0 Alive
6 11 0.002 Refused transplant 0 Alive
14 8 0.001 Tx at another center (multiple-listing) 0 Tx
16 3 0.001 Candidate Removed in Error 0 Alive
21 3 0.001 Patient died during TX procedure 0 Tx

And overall waitlist survival for our population (\(0\) is survive, \(1\) is death or deterioration):

Show the code
population %>%   
  count(outcome) %>% 
  mutate(p = n/sum(n)) %>% 
  print_table()
outcome n p
0 4983 0.902
1 540 0.098

Make Data for Modeling

Functions for variable recoding:

Show the code
## Functions for recoding variables.

#: Recoding Race/Ethnicity
recode_race <- function(ETHCAT){
  recode(ETHCAT, 
          `1` = "White", 
          `2` = "Black", 
          `4` = "Hispanic", 
          `5` = "Asian", 
          # `6` = "Amer Ind/Alaska Native", 
          # `7` = "Native Hawaiian/other Pacific Islander",
          # `9` = "Multiracial",
          .default = "Other", .missing = "Unknown"
          ) %>% factor()  
}

#: Recoding Blood Type
recode_ABO <- function(ABO){
  recode(ABO, A1 = "A", A2 = "A", A1B = "AB", A2B = "AB", 
         .missing = "Unknown") %>% factor()  
}

#: Recode Y = Yes, N = No, U = Unknown
recode_YNU <- function(YNU){
  recode(YNU, N = "No", Y = "Yes", U = "Unknown", .missing = "Unknown") %>% factor()
}

#: Recode P = Yes, N = No, anything else = Unknown
recode_PNO <- function(PNO){
  recode(PNO, N = "No", P = "Yes", .default = "Unknown", .missing = "Unknown") %>% factor()
}

recode_STATUS <- function(STATUS){
  case_match(STATUS, 
             2010 ~ "Status 1A", 
             2020 ~ "Status 1B",
             2030 ~ "Status 2", 
             2999 ~ "Inactive",
             .default = "Other"
             ) %>% 
    factor(levels = c("Status 1A", "Status 1B", "Status 2", "Inactive"), 
           ordered = TRUE)
}

Waitlist Data

Load waitlist data

Show the code
WL = read_rds(file.path(dir_data, "cand_wl_hist.rds"))

Get waitlist predictor variables

Show the code
DATA_WL = WL %>% 
  filter(CHG_TY == "A") %>% # collect data when added to waitlist
  semi_join(population, by = "WL_ID_CODE") %>% 
  transmute(
    WL_ID_CODE, 
    WL_DT = CHG_DT,   # Date-time candidate first added to waitlist
    PRELIM_XMATCH_REQ = recode_YNU(PRELIM_XMATCH_REQ), 
  ) 

EDA: Waitlist Data

Show the code
model_data = DATA_WL %>% full_join(population, by = "WL_ID_CODE")
vars = colnames(model_data) %>% setdiff(colnames(population))
walk(vars, ~plot_data(model_data, var = .x) %>% print)

Candidate Demographic Attributes

Get candidate demographic predictor variables

Show the code
candidate_demographic_data = thoracic %>% 
  semi_join(population, by="WL_ID_CODE") %>% 
  transmute(
    WL_ID_CODE, 
    #: Age at time of registration or listing
    AGE = pmax(INIT_AGE, 0),  #set -1 to age 0 
    #: Gender {M,F}
    GENDER = factor(GENDER),
    #: Race
    RACE = recode_race(ETHCAT),    
    #: Weight in kg
    WEIGHT_KG = coalesce(INIT_WGT_KG_CALC, WGT_KG_TCR), 
    #: Height in cm
    HEIGHT_CM = coalesce(INIT_HGT_CM_CALC, HGT_CM_TCR),
    #: BMI
    BMI = coalesce(INIT_BMI_CALC, BMI_TCR),
    #: Body Surface Area (BSA) using Mosteller's formula
    BSA = sqrt(HEIGHT_CM * WEIGHT_KG / 3600), 
    #: Blood Type
    ABO = recode_ABO(ABO),
    #: Citizenship
    CITIZENSHIP = case_match(CITIZENSHIP, 
                      c(1,2) ~ "US", # 2 ~ "Resident Alien",
                      #3 ~ "Non-resident Alien",
                      #c(4, 5) ~ "Non US",
                      6 ~ "Travel for Tx", 
                      .default = "Other"
                    )
  )

EDA: Candidate Demographics

Show the code
model_data = candidate_demographic_data %>% 
  full_join(population, by = "WL_ID_CODE")
vars = colnames(model_data) %>% setdiff(colnames(population))
walk(vars, ~plot_data(model_data, var = .x) %>% print)

Candidate Risk Level Attributes

Code to convert functional status code into description

Show the code
# Functional Status
library(readxl)
FUNC_STAT_dict = read_excel(
  file.path(dir_data, "STAR File Documentation.xls"), 
  sheet = "THORACIC_FORMATS_FLATFILE",
  skip = 1
) %>% 
  filter(`SAS ANALYSIS FORMAT` == "FUNCSTAT") %>% 
  transmute(
    code = `Data Field Value`,
    descr = `Data Field Formatted Value`
  ) %>% 
  # convert to integer code; drop non-numeric values
  filter(!is.na(code), code != "**OTHER**", code != "Null or Missing") %>% 
  mutate(code = as.integer(code)) 


recode_FUNC_STAT <- function(FUNC_STAT){
  levs = FUNC_STAT_dict %>% 
    filter(code %in% c(996, 998, 4000:4900)) %>% 
    pull(descr)
  ind = match(FUNC_STAT, FUNC_STAT_dict$code)
  factor(FUNC_STAT_dict$descr[ind], levels = levs)
}

Get candidate risk predictor variables

Show the code
candidate_risk_data = thoracic %>% 
  semi_join(population, by="WL_ID_CODE") %>% 
  transmute(
    WL_ID_CODE,
    STATUS = recode_STATUS(INIT_STAT),
    #: Life Support in general 
    LIFE_SUPPORT_CAND_REG = recode_YNU(LIFE_SUP_TCR),
    LIFE_SUPPORT_OTHER = OTH_LIFE_SUP_TCR,  # Binary
    PGE_TCR, # LIFE_SUPPORT_PGE = PGE_TCR             # Binary 
    #: ECMO (Life Support)
    ECMO_CAND_REG = ECMO_TCR,  # ifelse(ECMO_TCR == 1, "Yes", "No"),
    #: VAD 
    VAD_CAND_REG = case_when(
      VAD_DEVICE_TY_TCR == 1 ~ "No", 
      VAD_DEVICE_TY_TCR  > 1 ~ "Yes", 
      .default = "No"  # not many missing
    ),
    VAD_DEVICE_TY_TCR = case_match(VAD_DEVICE_TY_TCR,
                                   1~"NONE",
                                   2~"LVAD",
                                   3~"RVAD",
                                   4~"TAH",
                                   5~"LVAD+RVAD",
                                   6~"LVAD/RVAD/TAH Unspecified"
                                   ) %>% factor(),
    #: PATIENT ON LIFE SUPPORT - VENTILATOR
    VENTILATOR_CAND_REG = VENTILATOR_TCR, #ifelse(VENTILATOR_TCR == 1, "Yes", "No"),
    #: Life Support (Other)
    # IABP_TCR
    # IABP_TRR
    # PGE_TCR
    # PGE_TRR
    
    #: Functional Status 
    FUNC_STAT_CAND_REG = recode_FUNC_STAT(FUNC_STAT_TCR),

    # on other waitlists?
    WL_OTHER_ORG = case_when( 
      MULTIORG == "Y" ~ "Y",
      WLHL == "Y" ~ "Y",
      WLIN == "Y" ~ "Y",
      WLKI == "Y" ~ "Y",
      WLKP == "Y" ~ "Y",
      WLLI == "Y" ~ "Y",
      WLLU == "Y" ~ "Y",
      .default = "N"
    ) %>% recode_YNU(),
    
    #: transplant history
    # NUM_PREV_TX,         # number of prev HR TX (self-reported?)
    # PREV_TX_HR = PREV_TX,# {Y,N} previous HR TX (from data); NA if no Tx (data leakage warning: this is missing if not transplanted)
    # PREV_TX_ANY_ORGAN = PREV_TX_ANY,
    # DAYS_SINCE_PREV_TX_HR = PRVTXDIF, this is at time of TX
    
    #: Other
    CEREB_VASC = recode_YNU(CEREB_VASC), 
    DIAB = case_match(DIAB, 1~"None", 2~"Type I", 3~"Type II", c(4,5) ~ "Type Unknown", .default = "Unknown") %>% factor(),
    DIALYSIS_CAND = case_match(DIAL_TY_TCR, 1~"No", c(2,3)~"Yes", .default="Unknown") %>% factor(),
    HEMODYNAMICS_CO = HEMO_CO_TCR, 
    IMPL_DEFIBRIL = recode_YNU(IMPL_DEFIBRIL),
    # INHALED_NO,
    INOTROP_VASO_CO_REG = recode_YNU(INOTROP_VASO_CO_TCR), 
    INOTROPES_TCR,
    MOST_RCNT_CREAT, # CREATININE_REG = 
    eGFR = 0.412 * coalesce(INIT_HGT_CM_CALC, HGT_CM_TCR) / pmax(MOST_RCNT_CREAT, .001),
    TOT_SERUM_ALBUM,
  )

EDA: Candidate Risk

Show the code
model_data = candidate_risk_data %>% full_join(population, by = "WL_ID_CODE")
vars = colnames(model_data) %>% setdiff(colnames(population))
walk(vars, ~plot_data(model_data, var = .x) %>% print)

Candidate Diagnosis

  • The candidate diagnosis is found in the thoracic data.
    • THORACIC_DGN: Waitlist Candidate Diagnosis
    • TCR_DGN: Candidate Diagnosis at Listing
    • DIAG: Diagnosis from at Transplant and if missing, from TCR
    • TCR_DGN_OSTXT and DIAG_OSTXT give text diagnosis

Recode candidate diagnosis into: Myocarditis, Congenital Heart Disease Without Surgery, Congenital Heart Disease With Surgery, Dilated Cardiomyopathy, Restrictive Cardiomyopathy, Hypertrophic Cardiomyopathy, Valvular Heart Disease, or Other.

Show the code
#: From McCulloch. If code == 999 (Other), then check free text
Other_Cardiomyopathy = 
c("ARRHYTHMOGENIC RIGHT VENTRICULAR CARDIOMYOPATHY", "ARRHYTHMOGENIC RV DYSPLASIA", 
  "ARRYTHMOGENIC BIVENTRICULAR DYSFUNCTION", "BARTH SYNDROME", 
  "BARTH SYNDROME, AND DCM", "BIVENTRICULAR NON-COMPACTION CARDIOMYOPATHY", 
  "CARDIOMYOPATHY", "CONGENITAL ARRHYTHMIA", "CONGENITAL COMPLETE HEART BLOCK", 
  "CONGENITAL HEART BLOCK", "CONGENITAL LONG QT SYNDROME", "HYPERTROPHIC", 
  "HYPERTROPHIC/DILATED CARDIOMYOPATHY", "KAWASAKI'S SYNDROME, CORONARY ARTERY ANEURYSMS", 
  "LEFT VENTRICULAR NON-COMPACTION", "LONG Q-T SYNDROME TYPE 3", 
  "LONG QT SYNDROME", "LV NON-COMPACTION", "LVNC/MIXED PHENOTYPE CARDIOMYOPATHY", 
  "MALIGANT VENTRICULAR TACHYCARDIA", "MALIGNANT ARRHYTHMIA", "MALIGNANT LONG QT SYNDROME, TYPE 3", 
  "MATERNAL SJOGRENS", "MIXED HYPERTROPHIC AND DILATED CARDIOMYOPATHY", 
  "MYOCARDITIS", "NON COMPACTION CARDIOMYOPATHY", "NON-COMPACTION", 
  "NON-COMPACTION CARDIOMYOPATHY", "NON-COMPACTION WITH RESTRICTIVE PHYSIOLOGY", 
  "POLYMORPHIC VENTRICULAR TACHYCARDIA", "PROLONGED QT SYNDROME", 
  "REFRACTORY VENTRICULAR ARRHYTHMIA", "SUPRAVENTRICULAR TACHYCARDIA, WOLFF-PARKINSON WHIT", 
  "VENTRICULAR ARRYTHMIA", "VTACH/SVT")

Other_CHD = 
c("CHD, TRICUSPID ATRESIA", "CONGENITAL HEART DISEASE, PULMONARY ATRESIA", 
  "CONGESTIVE HEART FAILURE", "EBSTEINS ANOMALY", "HYPOPLASTIC AORTIC ARCH", 
  "TETRALOGY OF FALLOT", "TETRALOGY OF FALLOT - CONGENITAL", "TETRALOGY OF FALLOT WITH VSD", 
  "TRICUSPID ATRESIA", "UNBALANCED AV CANAL", "UNBALANCED AV CANAL WITH HYPOPLASTIC AORTIC ARCH"
)

recode_DIAG <- function(CODE, TXT){
  case_when(
      CODE %in% c(1004,1006) ~ "Myocarditis",
      CODE %in% c(1205, 1206) ~ "Congenital Heart Disease Without Surgery",
      CODE == 1207 ~ "Congenital Heart Disease With Surgery",
      CODE %in% c(1000, 1001, 1002, 1003, 1005, 1007, 1049, 1209) ~ "Dilated Cardiomyopathy",
      CODE %in% c(1050, 1052, 1054, 1099) ~ "Restrictive Cardiomyopathy",
      CODE == 1201 ~ "Hypertrophic Cardiomyopathy",
      CODE == 1202 ~ "Valvular Heart Disease",
      CODE == 999 & (TXT %in% Other_Cardiomyopathy) ~ "Dilated Cardiomyopathy",
      #CAND_DIAG_CODE == 999 & CAND_DIAG_TXT %in% Other_CHD ~ "Congenital Heart Disease",
      .default =  "Other" ) %>% 
    factor()  
}

Get candidate diagnosis predictors

Show the code
candidate_diagnosis = thoracic %>% 
  semi_join(population, by="WL_ID_CODE") %>% 
  transmute(
    WL_ID_CODE,
    CAND_DIAG = recode_DIAG(THORACIC_DGN, TCR_DGN_OSTXT),
    CAND_DIAG_LISTING = recode_DIAG(TCR_DGN, TCR_DGN_OSTXT),
    CAND_DIAG_CODE = str_c(THORACIC_DGN,": ", CAND_DIAG) %>% factor()
    ) 

EDA: Candidate Diagnosis

Show the code
model_data = candidate_diagnosis %>% full_join(population, by = "WL_ID_CODE")
vars = colnames(model_data) %>% setdiff(colnames(population))
walk(vars, ~plot_data(model_data, var = .x) %>% print)

Outcomes using the individual diagnosis codes:

Show the code
candidate_diagnosis %>% 
  left_join(population, by = "WL_ID_CODE") %>% 
  count(CAND_DIAG_CODE, CAND_DIAG, outcome) %>% 
  spread(outcome, n, fill = 0) %>% 
  mutate(p = (`0` + .9*5) / (`0` + `1` + 5)) %>% # smooth toward 0.90
  arrange(p)%>% 
  print_table()
CAND_DIAG_CODE CAND_DIAG 0 1 p
1202: Valvular Heart Disease Valvular Heart Disease 9 4 0.750
1054: Restrictive Cardiomyopathy Restrictive Cardiomyopathy 7 2 0.821
1206: Congenital Heart Disease Without Surgery Congenital Heart Disease Without Surgery 280 55 0.837
1205: Congenital Heart Disease Without Surgery Congenital Heart Disease Without Surgery 35 6 0.859
1207: Congenital Heart Disease With Surgery Congenital Heart Disease With Surgery 2124 326 0.867
999: Other Other 57 8 0.879
1203: Other Other 7 1 0.885
1002: Dilated Cardiomyopathy Dilated Cardiomyopathy 8 1 0.893
1004: Myocarditis Myocarditis 180 19 0.904
999: Dilated Cardiomyopathy Dilated Cardiomyopathy 32 3 0.912
1005: Dilated Cardiomyopathy Dilated Cardiomyopathy 1 0 0.917
1010: Other Other 1 0 0.917
1204: Other Other 1 0 0.917
1099: Restrictive Cardiomyopathy Restrictive Cardiomyopathy 48 4 0.921
1008: Other Other 2 0 0.929
1052: Restrictive Cardiomyopathy Restrictive Cardiomyopathy 2 0 0.929
1001: Dilated Cardiomyopathy Dilated Cardiomyopathy 42 3 0.930
1201: Hypertrophic Cardiomyopathy Hypertrophic Cardiomyopathy 174 12 0.935
1049: Dilated Cardiomyopathy Dilated Cardiomyopathy 280 17 0.942
1050: Restrictive Cardiomyopathy Restrictive Cardiomyopathy 233 14 0.942
1209: Dilated Cardiomyopathy Dilated Cardiomyopathy 5 0 0.950
1000: Dilated Cardiomyopathy Dilated Cardiomyopathy 1178 58 0.953
1006: Myocarditis Myocarditis 26 1 0.953
1200: Other Other 10 0 0.967
1003: Dilated Cardiomyopathy Dilated Cardiomyopathy 191 6 0.968
1208: Other Other 19 0 0.979
1007: Dilated Cardiomyopathy Dilated Cardiomyopathy 31 0 0.986

Center Level Predictors

Listing Center Volume

Calculate the number of pediatric heart transplants each year by listing center.

Show the code
#: Number of pediatric hr transplants in previous year
TX_YR = thoracic %>% 
  filter(
    coalesce(AGE, INIT_AGE) < 18,     # Age of transplant recipient
    ORGAN == "HR"
  ) %>% 
  count(LISTING_CTR_CODE, YR = TX_YEAR) %>% 
  complete(LISTING_CTR_CODE, YR, fill = list(n=0)) %>% 
  group_by(LISTING_CTR_CODE) %>% 
    arrange(YR) %>% 
    mutate(
      LISTING_CTR_PEDHRTX_PREV_YR = dplyr::lag(n, default=0)
    ) %>% 
  ungroup() %>% 
  select(-n)

Listing Center Practice

  • median_refusals: median number of offer refusals per candidate at listing center. Averaged over 2010-2019.

  • p_refusals: proportion of offers at listing center that are refused. Averaged over 2010-2019.

  • LISTING_CTR_PEDHRTX_PREV_YR: number of pediatric heart transplants at listing center in the previous year.

  • LIST_YR: year of waitlist

  • REGION: UNOS Region

Load data to calculate center stats

Show the code
refusals = read_csv(file.path(dir_data, "refusals.csv"))
LC_effects = read_csv(file.path(dir_data, "LC_effects.csv"))

Make center level predictors

Show the code
center_data = thoracic %>% 
  semi_join(population, by="WL_ID_CODE") %>% 
  transmute(
    WL_ID_CODE, 
    LISTYR,       # year of listing
    #: Center
    LISTING_CTR_CODE,
    #: UNOS Region
    REGION, 
  ) %>% 
  # Listing center volume (in previous year)
  left_join(
    TX_YR,
    by = c("LISTYR" = "YR", "LISTING_CTR_CODE")
  ) %>% 
  # Refusal rates
  left_join(
    refusals %>% select(LISTING_CTR_CODE, median_refusals, p_refusals),
    by = "LISTING_CTR_CODE"
    ) %>% 
  # add listing center effects from cox-ph
  left_join(LC_effects, by = "LISTING_CTR_CODE") %>% 
  # format
  transmute(
    WL_ID_CODE, 
    LISTING_CTR_CODE = factor(LISTING_CTR_CODE),
    LIST_YR = LISTYR,
    REGION = factor(REGION, 1:11), 
    pedhrtx_prev_yr = coalesce(LISTING_CTR_PEDHRTX_PREV_YR, 0), 
    median_refusals,
    p_refusals,
    LC_effect,
  )

EDA: Listing Center Variables

Show the code
model_data = center_data %>% full_join(population, by = "WL_ID_CODE")
vars = colnames(model_data) %>% setdiff(colnames(population))
walk(vars, ~plot_data(model_data, var = .x) %>% print)

Final Survival Analysis Dataset

Combine all predictor variables into model_data:

Show the code
model_data = list(
  population %>% select(outcome, WL_ID_CODE), 
  candidate_demographic_data,
  candidate_risk_data,
  candidate_diagnosis, 
  center_data,
  DATA_WL
) %>% 
  purrr::reduce(left_join, by = "WL_ID_CODE")

Save model_data.{csv, rds}

model_data %>%
  write_csv(file.path(dir_save, "model_data.csv"))

model_data %>%
  write_rds(file.path(dir_save, "model_data.rds"))