Statistical issues in survival analysis (Part XVVII)

Usha Govindarajulu
2 min readFeb 5, 2024

January 31, 2024

In an article that appeared in Biometrical Journal, Hu described a new random-intercept accelerated failure time model with Bayesian additive regression trees (riAFT-BART), which can be used to draw causal inferences on population treatment effects on patient survival from clustered and censored survival data. They showed in their work how riAFT-BART can be used to solve two important statistical questions with clustered survival data: estimating the treatment effect heterogeneity and variable selection. The basis for their model was an accelerated failure time model with random intercept and Metropolis within Gibbs sampler proposed to draw posterior inferences about population average treatment effect on patient survival. Then individual survival treatment effects for each individual in each cluster are computed from these draws from the posterior predictive distribution. After this, each candidate predictor was separately added to the random forests model as an outcome.

They then used their model to do variable selection using a permutation based approach with the variable inclusion proportion (VIP) of each predictor variable which is supplied by the BART model. They then permuted event times together with censoring indictors and established thresholds for variable selection using VIP from observed data and permuted data. After they set all this up, they compared their method to a piecewise exponential additive mixed model, a frailty model with regularization, and a backward stepwise selection approach for a random-intercept Cox model. Finally, they also discussed handling missing data through a bootstrap imputation.

In their simulations, they used inverse probability weighting methods along with Super Learner (Van der Laan et al, 2007. They simulated a COVID-19 data set with remdesivir and dexamethasone treatments. They found their method, riAFT-BART worked best across all scenarios and restricted mean survival time (RMST) did also. Their method also worked well on variable selection and missing data compared to the other methods of interest. They also applied this to a real COVID-19 dataset of patients see at Mount Sinai Hospital system in New York from March 2020 to February 2021 who were in the ICU at some point and had had the previous treatments as mentioned before. Their method suggested the combination of remdesivir and dexamethasone offered enhanced treatment benefit for healthier patients with lower oxygen saturation.

In summary, their method was developed to handle estimating population average treatment effects on patient survival while accounting for variation in institutional effects. The authors did not discuss any limitations or issues with their method but only that it could be extended to incorporate more than it currently handles. The authors did not discuss the computational complexity of their model, which can be a very important topic, especially for big datasets. Also, their discussion about the causal inference was limited.

Written by,

Usha Govindarajulu

Keywords: survival, accelerated failure time model, Bayesian, random forests, clustering, population average


Hu L, (2023) “A new method for clustered survival data: Estimation of treatment effect heterogeneity and variable selection” Biometrical Journal. Open Access.