Exploring Survival Analysis Designs for Clinical Trials
Power for TTE related to several choices:
Power driven primarily by number of events (E) not sample size (N):
Calculating E separate from N:
Accrual/Follow-up
Survival Distribution/Effect Size
Other Consideration:
Reference
Introduction
One reason of log-rank tests are useful is that they provide an objective criteria (statistical significance) around which to plan out a study:
In survival analysis, we need to specify information regarding the censoring mechanism and the particular survival distributions in the null and alternative hypotheses.
We shall assume that the patients enter a trial over a certain accrual period of length \(a\), and then followed for an additional period of time \(f\) known as the follow-up time. Patients still alive at the end of follow-up are censored.
Exponential Approximation
In general, it is assumed we have constant hazards (i.e., exponential distributions) for the sake of simplicity. Because other work in literature has indicated that the power/sample size obtained from assuming constant hazards is fairly close to the empirical power of the log-rank test, provided that the ratio between the two hazard functions is constant. Typically in a power analysis, we are simply trying to find the approximate number of subjects required by the study, and many approximations/guesses are involved, so using formulas based on the exponential distribution is often good enough.
Method | Description |
---|---|
Log-Rank | “Average Hazard Ratio” – same as from univariate Cox Regression model |
Linear-Rank (Weighted) | Gehan-Breslow-Wilcoxon, Tarone-Ware, Farrington-Manning, Peto-Peto, Threshold Lag, Modestly Weighted Linear-Rank (MWLRT) |
Piecewise Linear-Rank | Piecewise Parametric, Weighted Piecewise Model (e.g. APPLE), Change Point Models |
Combination | Maximum Combination (MaxCombo) Test Procedure |
Survival Time | Milestone Survival (KM), Restricted Mean Survival Time, Landmark Analysis |
Relative Time | Ratio of Times to Reach Event Proportion, Accelerated Failure Time Models |
Others | Responder-Based, Frailty Models, Renyi Models, Net Benefit (Buyse) |
1. Concept: - The MaxCombo test is designed to handle multiple linear-rank tests simultaneously and to select the “best” test from the candidate tests. This approach helps in controlling Type I error rates while still allowing flexibility in the choice of statistical tests.
2. Test Variants: - Various forms of the
Fleming-Harrington family of tests (denoted as F-H(G) Tests) are used,
each specified by different parameterizations (G(p,q)
) that
emphasize different portions of the survival curve. For example, some
may focus more on early failures while others on late failures.
F-H (G) Tests | Proposal |
---|---|
G(0,1; 1,0) | Lee (2007) |
G(0,0*; 0,1; 1,0) | Karrison (2016) |
G(0,0; 0,1; 1,0; 1,1) | Lin et al (2020) |
G(0,0; 0,0.5; 0.5,0; 0.5,0.5) | Roychoudhury et al (2021) |
G(0,0; 0,0.5) | Mukhopadhyay et al (2022) |
G(0,0; 0,0.5; 0.5,0) | Mukhopadhyay et al (2022) |
3. Common Usage: - Typically, 2-4 candidate tests are considered with Fleming-Harrington being popular due to its flexibility. It can accommodate Log-Rank and Peto-Peto tests, among others, allowing researchers to tailor the analysis to the specific characteristics of their survival data.
Issues with MaxCombo Tests
1. Type I Error and Estimand: - Critics point out that MaxCombo tests, while versatile, can sometimes lead to significant results even when the treatment effect is not better than the control across all times. This can mislead the conclusions about a treatment’s efficacy, especially if it is only effective late in the follow-up period (late efficacy).
2. Interpretability: - There are concerns about the interpretability of using an average hazard ratio as the estimand because it might not accurately reflect the dynamics of the treatment effect over time, particularly under non-proportional hazards scenarios.
3. Alternatives for Improvement: - Modifications to
the Fleming-Harrington weights (G(p,q)
parameters) are
suggested to better handle scenarios with non-proportional hazards. For
example, changing the focus from early to late survival times can be
achieved by adjusting these parameters.
4. Communication of Results: - It’s recommended to use the MaxCombo for analytical purposes but to communicate the results using more interpretable measures such as the Restricted Mean Survival Time (RMST), which provides a direct, clinically meaningful measure of survival benefit.