Statistical issues in survival analysis (BMC Public Health article 519)

Usha Govindarajulu
2 min readMay 13, 2024

May 8, 2024

The main goal of this article was to effectively assess COVID-19 rumor patterns and causes of their persistence using survival analysis methods to thereby reduce the misinformation during the pandemic. Their data came from 754 instances of rumors from January 18, 2020 to January 17, 2023 on a “Jiao Zhen” fact checking source platform. The platform used deep learning, image recognition, and algorithms for popularity assessment. The authors obtained their dataset from this site. In order to add time stamps, they used the Baidu platform, where they identified the start time as N0 and death time as N1. The duration of the rumor was then calculated as between these two time points. The survival time of the rumor was then determined by their subtracting the start time from the death time. The independent variables were variables affecting rumor dissemination. They also employed two coders and calculated their interrater reliability to be 0.874 to 0.938. The authors then described that they would be using survival analysis methods, Kaplan-Meier, the log-rank test, and Cox proportional hazards regression, but why they did not at all address for especially the log-rank test and the Cox model was testing the assumption of proportional hazards prior to their use.

In Figure 1, each them had crossing curves so this was concerning that they used the log-rank test or even Wilcoxon test to compare the curves to each other for each them, some with significant results and others with none. Clearly the authors misunderstood the methods and used them incorrectly so one cannot fully trust their results from the survival analyses log-rank or Wilcoxon tests. The Kaplan-Meier results are okay. However another important item is that rumors are fueled by time and are not time-invariant so the authors should have utilized methods taking those into account. The authors claim on their Cox model results, while seemingly important to have adjusted results is not valid given their lack of proportionality. They determined from their results that content themes and emotional needs produced significant results and were influential in determining the survival time of rumors and they added that their results diverged from the Kaplan-Meier results. They then incorrectly interpreted hazard ratios from the Cox model as relative risks. Ratios are not risks. It is unclear if there had been any statistical reviewer for this article. At least one can more or less trust their Kaplan-Meier results for survival times but that is about it. This is an example of yet another article with misuse of survival analytic tools. Let’s not start any rumors.

Written by,

Usha Govindarajulu, PhD

Keywords: survival analysis, hazard, Kaplan-Meier, Cox model, log-rank test, COVID-19

References

Liu, X., Zhang, L., Sun, L. et al. Survival analysis of the duration of rumors during the COVID-19 pandemic. BMC Public Health 24, 519 (2024). https://doi.org/10.1186/s12889-024-17991-3

--

--