Statistical issues in survival analysis (Part VII)

Usha Govindarajulu
3 min readMay 30


May 24, 2023

The authors developed a semiparametric maximum likelihood estimation procedure via a kernel smoothed-aided expectation-maximization algorithm. The variances for this were estimated through weighted bootstrap. The authors focused on this for the illness-death model with accelerated failure time models (AFT) for each transition to different states of state0: healthy, state1: disease, and state2: death. There is not a lot in the literature about AFT frailty models on top of this.

The authors laid out the AFT frailty models for each transition, which are multiplicative. In a table prior to this, the authors laid out the literature of Cox frailty and AFT frailty, but they only go back to 2010, when Cox frailty model definitely existed way before then. Finally they setup their likelihood with all transitions and treated the frailtiies as a missing data problem using the EM algorithm referring to the Dempster, Laird, and Rubin paper from 1977. In the M-step estimation, they decided to fit a smoothed kernel-smoothing approach to accommodate their semicompeting risks setting

The authors used the function, aftgee, in the R package, to start with naïve estimates of the coefficients which are rank based and also used adaptive quadrature through the function, integrate. Finally the “code”, they stated, although unclear if they meant integrate, applies the EM algorithm. For bandwidth paramters, they used modified versions of the optimal bandwidths of Jones et al (1990 and 1991). For variance estimation, they employed a weighted bootstrap, where weights are derived from a standard exponential distribution assigned to each observation. Finally, the authors also employed a goodness of fit (GOF) method based on randomized survival probabilities (RSP), which replaces the the survival probability of a censored failure time with a uniform random number between 0 and the survival probability at the censored time. Then the distribution of the RSPs are compared to the standard uniform distribution. They admitted there is a lack of reference distrubtions for the RSP, but they went on anyway and extended this to their illness-death model.

In a simulation study, the authors found that their method performed well in terms of bias, coverage, and empirical standard deviations with an assumed gamma distributed frailty. Also, under a misspecified model, their model performed well. For a real data analysis, they used the Rotterdamn tumor bank dataset of 1546 breast cancer patients who underwent a tumor removal surgery between the years 1978 and 1993 which is available in the survival package in R. They applied their AFT model with gamma frailty, the gamma-frailty Cox model of Lee et al (2015), the marginalized gamma-frailty model of Gorfine et al (2021). They again employed their GOF method and found their method performed the best as compared to these other methods, but lacking in detail in terms of what were truly the best aspects.

To summarize, their estimation method can be used with or without frailty. The had used shared frailty to handle the dependency between nonterminal and terminal failure times in their transition, multi-state type modeling. Their method, as they said, allowed interpretability of AFT models with observed covariates and yet simple and intutitve interpretation of hazard functions. The authors only mentioned they had not yet considered time-dependent covariates, but they did not mention any disadvantages of their method. Also, they never discussed any potential computational challenges of their method in this article. One would have thought this would have come up in the review or pre-publication of their article but this was not mentioned.

Written by,

Usha Govindarajulu

Keywords: survival, accelerated failure time models, kernel density, frailty, Cox, multi-state


Dempster, A.P., Laird, N.M. & Rubin, D.B. (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B, 39(1), 1– 22.

Gorfine, M., Keret, N., Ben Arie, A., Zucker, D. & Hsu, L. (2021) Marginalized frailty-based illness-death model: Application to the UK-Biobank survival data. Journal of the American Statistical Association, 116(535), 1155- 1167.

Jones, M.C. (1990). “The performance of kernel density function in kernel distribution function estimation.” Statistics & Probability Letters, 9(2): 129–132.

Jones M & Sheather S (1991). “Using non-stochastic terms to advantage in kernel-based estimation of integrated squared density derivatives. Statistics & Probability Letters, 11(6): 511–514.

Kats L and Gorfine M (2023). “An accelerated failure time regression model for illness-death data: A frailty approach” Biometrics.