August 2, 2023
In this article published in June online, Lancker et al describe a way for valid inference of hazard ratios from a Cox regression after variable selection. The problem has been that it is unknown which amount of covariates to use to control for confounding and informative censoring as well as the impacts on sample size. Regularized techniques were developed to do these adjustments like lasso (Tibshiarni, 1997) regression which definitely pick up on covariates strongly associated with the outcome, but may not pick up on those strongly associated with the exposure. Informative censoring bias can further occur when the covariates are associated with censoring.
The authors have then provided two strategies for obtaining valid post selection inference. They point out they had also thought about the causal interpretation of the hazard ratio. Their first proposal is called, “Poor man’s approach” which uses 3 selection steps with the Lasso where one step tries to pick up important confounders, the second tries to pick up those that explain censoring, then again a 2nd chance to pick up confounders, and finally the last step tries to ensure proper estimation of the exposure.
The second strategy they published was their triple selection one. It refits a Cox model on the exposure plus covariates selected by the Lasso in step 1 and these uses these estimated values of exposure and covariates to fit a linear model for the Schoenfeld residuals for both the exposure and the covariates and then use Lasso here to select covariates with non-zero estimated coefficients. Finally they fit a Cox model for survival tie given exposure and all covariates selected in either one of the first three steps to obtain final estimates of exposure and covariates. The authors claim that this method avoids problem of invalid inferences after variable selection of exposure since it uses a different score test statistic for this parameter, which is asymptotically normal and de-correlates the score function of the exposure and the covariates.
In simulations, the poor man approach was closet to maintaining Type I error while the triple selection method was also but had some deviations in extreme settings, like when the effect of the covariates on censoring was rather weak. As they said, based on their theoretical and Mont-Carlo results from the simulations, they have recommeded using either their poor man’s apporach or triple selection over prior published methods they considered in this area. They also ran their methods through a real dataset example but the discussion around the results of these was not clearly laid out. In their discussion, they said that the simple poor man’s approach had emerged as the true winner in the simulations. One would have thought the authors would recommend their triple selection method but they have given it a fair review and criticism. Finally, at the very end of their dicussion, they admit their proposed test has a disadvantage that it may not converge to something fairly interpretable when the proportional hazards assumption fails. This would seem like a fairly large issue but is only given minimal consideration at the end. Perhaps the consideration of this along with other factors is another aspect for the researchers to need to consider. Furthermore, their considerations of causal inference throughout the manuscript are weak.
Keywords: survival, Cox model, hazard ratio, confounding, censoring, causal inference, variable selection
Lancker KV, Dukes O, and Vansteelandt S. (2023). “Ensuring valid inference for Cox hazard ratios after variable selection” Biometrics. https://doi.org/10.1111/biom.13889.
Tibshirani, R. (1997) The lasso method for variable selection in the Cox model. Statistics in Medicine, 16(4), 385–395.