Statistical Considerations for Group Sequential Design
Interim analysis, performed according to a predefined analytical plan before the completion of an entire study, is a crucial process in clinical research. This type of analysis is especially integral in Group Sequential Trials (GTS), where the findings from interim analyses can dictate whether a trial should be terminated early. The design of a study can incorporate one or several interim analyses, with each additional analysis increasing the chance of stopping the trial early if necessary. This not only affects the maximum sample size—requiring a larger number to maintain statistical integrity—but also reduces the average sample size needed throughout the trial.
In the realm of clinical trials, interim analyses are critical components that allow for the potential early termination of trials based on efficacy or futility findings. This detailed summary explores the rationale and implications of these analyses, particularly within group-sequential trials in drug development.
Group-sequential trials are designed to allow for early trial termination to save time and resources. For instance, with a hypothetical endpoint such as a hazard ratio (HR) of 0.75, a traditional single-stage trial might require 380 events. However, incorporating interim analyses for futility and efficacy can adjust this number. An interim analysis for futility might be set when 30% of the events have occurred (if HR > 1), and for efficacy at about 66% of events, using an O’Brien-Fleming alpha-spending approach. This could increase the maximum number of events to 408 but offers the possibility of stopping earlier if criteria are met. Statistically, some trials will cease at these interim points, significantly reducing the number of events needed on average.
The effect size that a trial is powered at, such as an HR of 0.75, directly relates to the thresholds used at interim stops for efficacy. For a trial to stop early for efficacy, the observed effect must be strong enough to cross the predefined efficacy boundary, which is typically not drastically higher than the powered effect. This ensures that stopping early does not necessarily mean expecting a much larger effect than initially powered for.
Early stopping is sometimes viewed skeptically, as if it were a way to “cheat” the trial process. However, group-sequential methodologies are designed to maintain the overall type I error rate (familywise-error rate, FWER), adjusting the significance thresholds for each interim look. This methodological rigor ensures that the integrity of the trial’s statistical conclusions is preserved, not compromised.
Concerns about bias when stopping trials early are valid, especially regarding the accuracy of the estimated treatment effect. However, methodologies have been developed to adjust for such early stopping, providing unbiased estimates of the treatment effect. For example, a trial might stop for efficacy if an interim analysis shows an HR significantly below the threshold, with adjusted analyses confirming the robustness of this finding.
From an operational standpoint, stopping a trial early for efficacy doesn’t mean all activities cease immediately. Data continues to be collected on primary and secondary outcomes to enrich the findings and ensure the durability of the treatment effect over time. Moreover, certain data collection efforts that are crucial for regulatory filing might continue, while others may be halted. The trial’s unblinding at this point necessitates careful management and analysis of the continuing data collection.
1. Blinding: - To prevent bias, it’s essential that all personnel involved in the trial, except those conducting the interim analysis, remain blinded to the unmasked data and results. Researchers are typically informed only about whether to continue the trial or make amendments to the study protocol.
2. Rigorous Pre-Design: - A poorly designed interim analysis can compromise the reliability of the trial results. Thus, it’s vital to avoid unplanned interim analyses unless absolutely necessary. If an unplanned analysis is conducted, the study report should detail the reason for the analysis, the necessity of unblinding, the potential bias introduced, and its impact on the interpretation of results.
3. Frequency and Timing of Analyses: - Frequency: For instance, a trial might plan for three interim analyses. - Timing: These analyses can be evenly spaced (e.g., after 25%, 50%, and 75% of data collection) or at irregular intervals, depending on various factors.
4. Error Control: - Overall Error Rate: Typically set at a significance level of 0.05. - Stopping Boundaries: These are critical for deciding if a trial should be halted early. + Efficacy Boundary: Utilizes an alpha-spending function to adjust the total sample size and control for Type I error. Crossing this boundary means the null hypothesis (H0) is rejected. + Futility Boundary: Employs a beta-spending function to adjust the total sample size and control for Type II error. Crossing this boundary suggests acceptance of the alternative hypothesis (H1).
5. Adaptive Design: - Depending on the results of the interim analysis, modifications may be made to the trial’s design. This could involve adjusting the sample size or altering the trial’s objective (e.g., shifting from a superiority trial to a non-inferiority trial).
Group sequential designs (GSDs) are a sophisticated methodology in clinical trials that allow for interim analyses of the data at pre-specified points during the study. These designs can provide the opportunity to stop the trial early for efficacy, futility, or in some cases, safety concerns. This flexibility makes group sequential designs very appealing for both ethical and economic reasons.
Fully Sequential Designs: These designs allow for continuous monitoring of data as they become available. Each incoming piece of data can potentially trigger an interim analysis, which might lead to early termination of the trial. This method is very flexible but requires rigorous control of the Type I error rate due to the multiple analyses conducted.
Group Sequential Designs: Unlike fully sequential designs, GSDs involve analyses at specified “looks” or points within the collection of data. These points are usually predetermined and occur after a cohort of subjects has been assessed. The data are evaluated in groups rather than continuously, which can simplify the logistics and statistical handling of interim analyses.
Key Features of Group Sequential Designs
Early Stopping: The trial may be terminated early for efficacy if the treatment effect is stronger than expected, or for futility if preliminary results suggest that continuing the trial is unlikely to show a treatment benefit.
Adjustments at Interim Analyses: Adjustments may be made to the trial parameters based on interim findings, such as sample size re-estimation or protocol modifications.
Multiple Opportunities for Success: These designs allow multiple looks at the data, thereby increasing the chances of stopping early if the treatment proves effective or avoiding unnecessary exposure to ineffective or harmful treatments.
1. Error Spending - Definition: This method involves ‘spending’ alpha (Type I error, related to falsely declaring efficacy) and/or beta (Type II error, related to falsely declaring futility) across multiple interim looks. - Application: Highly flexible in both the design and analysis stages of a clinical trial, allowing adjustments based on accumulating data.
2. Haybittle-Peto - Definition: Sets boundaries for efficacy in terms of unadjusted p-values. This method adjusts the final p-value to retain the correct Type I error rate. - Application: Typically involves a more conservative approach to stopping for efficacy, often using a stringent p-value threshold like 0.001 at interim analyses.
3. Wang-Tsiatis - Definition: Includes designs that focus solely on efficacy or both efficacy and futility. The Pampallona-Tsiatis Parameter Method aims to optimize the trial by minimizing the average sample size. - Application: Provides a structured approach to balance between the number of interim looks and the conservation of statistical power.
4. “Classic” Designs - Definition: Includes well-known methods like O’Brien-Fleming and Pocock designs, which use fixed statistical boundaries for interim analyses. - Application: These designs are less flexible during monitoring but are well-established for controlling Type I error across multiple looks. They are not to be confused with O’Brien-Fleming and Pocock error spending functions which are more about how the error probability is allocated across the interim analyses.
5. Unified Family - Definition: A unified method that encompasses the Wang-Tsiatis and “Classic” designs using two parameters to define the stopping boundaries. - Application: Offers less flexibility than error spending approaches but provides a structured framework that can incorporate different statistical considerations.
6. Whitehead - Definition: An extension of the fully sequential probability ratio test (SPRT) adapted to group sequential settings. This method is known for its “Triangular” or “Christmas Tree” design patterns. - Application: Useful in scenarios where the traditional SPRT needs adaptation to handle grouped sequential data, providing a balance between early stopping for efficacy and overall trial integrity.
7. Others - Definition: Includes custom designs for futility such as Conditional Power, Adaptive GSD (which may include sample size re-estimation and other adaptive features), and Multi-Arm Multi-Stage (MAMS) GSD. - Application: These methods allow for a high degree of customization and can adapt to the specific needs of a trial, offering flexibility in how interim data are used to make decisions about the trial’s continuation.
Group Sequential Design (GSD) simulation is an essential tool in the design and analysis of clinical trials, allowing researchers to assess the properties of complex GSDs before actual trial data are collected.
GSD simulation uses hypothetical (simulated) study data to predict various characteristics of a group sequential design. It is particularly valuable for:
Key Benefits of Simulation in GSD
Regulatory Perspective
Regulatory bodies often view simulation positively as it provides a rigorous framework for evaluating and justifying trial designs. Simulated data can demonstrate to regulators that a trial is likely to meet its objectives and that the statistical methods employed are appropriate and robust.
Steps in Designing a Simulated GSD
Step 1: Generating Model Assumptions - Define the clinical context and parameters for the simulated data, such as the expected rates of events, variability, and the effect size of the intervention.
Step 2: Design Assumptions - Establish the structure of the trial, including the number of arms, enrollment criteria, and planned duration.
Step 3: Boundary/Information Assumptions - Set the criteria for interim analyses, including statistical boundaries for efficacy and futility, and decide how much data (information fraction) will be considered at each analysis point.
Step 4: Simulation Control Assumptions - Determine the number of simulations, the randomization seed, and other technical parameters that control the simulation process. This step ensures that the simulation is both robust and reproducible.
The design of many clinical trials includes some strategy for early stopping if an interim analysis reveals large differences between treatment groups, or shows obvious futility such that there is no chance that continuing to the end would show a clinically meaningful effect. In addition to saving time and resources, such a design feature can reduce study participants’ exposure to an inferior or useless treatment. However, when repeated significance testing on accumulating data is done, some adjustment of the usual hypothesis testing procedure must be made to maintain an overall significance level. The methods described by Pocock and O’Brien & Fleming, among others, are popular implementations of group sequential testing for clinical trials.
The Pocock boundary is a method for determining whether to stop a clinical trial prematurely. The typical clinical trial compares two groups of patients. One group are given a placebo or conventional treatment, while the other group of patients are given the treatment that is being tested. The investigators running the clinical trial will wish to stop the trial early for ethical reasons if the treatment group clearly shows evidence of benefit. In other words, “when early results proved so promising it was no longer fair to keep patients on the older drugs for comparison, without giving them the opportunity to change.”
The Pocock boundary is simple to use in that the p-value threshold is the same at each interim analysis. The disadvantages are that the number of interim analyses must be fixed at the start and it is not possible under this scheme to add analyses after the trial has started. Another disadvantage is that investigators and readers frequently do not understand how the p-values are reported: for example, if there are five interim analyses planned, but the trial is stopped after the third interim analysis because the p-value was 0.01, then the overall p-value for the trial is still reported as <0.05 and not as 0.01.
Pocock边界易于使用,因为在每个临时分析中,P值阈值相同。缺点是必须在开始时固定临时分析的数量,并且在试验开始后不可能添加分析。另一个缺点是,调查人员和读者经常不了解如何报告P值:例如,如果计划进行五个临时分析,但是在第三次临时分析后停止了试验,因为P值为0.01,则该试验是在第三次中期分析,则应停止。该试验的总体p值仍报告为<0.05,不为0.01。
With the Pocock design early rejections are more likely.
Pocock (1977) first proposed that the crossing boundary be constant for all equally spaced analyses. O’Brien and Fleming \((1979)\) suggested that the crossing boundaries for the kth analysis, \(z_{c}(k)\), be changed over the total number of analyses \(\mathrm{K}\) such that
\[ z_{c}(k)=z_{O B F} \sqrt{K / k} \] The O’Brien-Fleming boundaries have been used more frequently because they preserve a nominal significance level at the final analysis that is close to that of a single test procedure.
With the O’Brien & Fleming design the early rejection bounds are large and early rejections are rather unlikely, at least under \(\mathrm{H} _0\). With the O’Brien& Fleming design the last rejection bound is close to the un-adjusted bound \(Z_{\alpha / 2}\). This is not the case for Pocock’s design. In practice, the O’Brien & Fleming design is generally preferred, since early rejection without overwhelming evidence may not be convincing.
使用O’Brien&Fleming设计,早期的拒绝界限很大且早期拒绝不太可能
Wang & Tsiatis (1987) suggest the rejection boundaries
\[ u_{k}=\mathrm{c}_{\mathrm{WT}}(K, \alpha, \beta, \Delta) k^{\Delta-0.5} \]
Rejection bounds
Rejection regions with rejection bounds \(u_{k}, k=1, \ldots, K\). \[
\begin{array}{c}
\mathcal{R}_{k}^{*}=\left\{\left|Z_{k}^{*}\right| \geq
u_{k}\right\}=\left(-\infty,-u_{k}\right) \cup\left(u_{k}, \infty\right)
\\
\mathcal{A}_{k}^{*}=\emptyset, \mathcal{C}_{k}^{*}=\left(-u_{k},
u_{k}\right) \text { for } k<K, \text { and }
\mathcal{A}_{K}^{*}=\left(-u_{K}, u_{K}\right)
\end{array}
\] Design of Pocock (1977): Constant rejection bounds, i.e. \[u_{1}=\cdots=u_{K}=c_{\mathrm{P}} \text{with} c
P=c P(K, \alpha)\] such that \[\mathbf{P}_{\mu_{0}}\left(\bigcup_{k=1}^{K}\left\{\left|Z_{k}^{*}\right|
\geq c_{P}\right\}\right)=\alpha\] Design of O’Brien &
Fleming (1979): Decreasing rejection bounds with \(u_{k}=c_{\mathrm{OBF}} / \sqrt{k}\) with
\(c \mathrm{OBF}=c \mathrm{OBF}(K,
\alpha)\)
\[\mathbf{P}_{\mu_{0}}\left(\bigcup_{k=1}^{K}\left\{\left|Z_{k}^{*}\right|
\geq C_{\mathrm{OBF}} / \sqrt{k}\right\}\right)=\alpha\]
Local p-values
\[ \begin{array}{c} p_{k}^{*}:=2\left(1-\Phi\left(\left|Z_{k}^{*}\right|\right)\right), \quad k=1, \ldots, K \\ \text { Namely: } \quad\left|Z_{k}^{*}\right| \geq u_{k} \quad \Longleftrightarrow \quad p_{k}^{*} \leq \alpha_{k}:=2\left(1-\Phi\left(u_{k}\right)\right) \end{array} \]
As argued before, the maximum sample size \(N_{K}\) required for power \(1-\beta\) is a multiple of the fixed size \[ \begin{array}{l} n_{f}=\left(z_{\alpha / 2}+z_{\beta}\right)^{2} / \delta_{1}^{2} \\ N_{K}=n_{f} \cdot I(K, \alpha, \beta) \end{array} \] The inflation factor \(I(K, \alpha, \beta)\) depends on the type of design, e.g. whether Pocock or O’Brien & Fleming.
Average Sample Size
The required stage-wise sample size is \[n=N_{K} / K=n_{f} \cdot I(K, \alpha, \beta) / K\] Consequently, we get for the ASN \[\frac{\operatorname{ASN}(\mu)}{n_{f}}=\frac{I(K, \alpha, \beta)}{K} \underbrace{\left(1+\sum_{k=2}^{K} \mathbf{P}_{\vartheta_{1}}\left(Z_{1}^{*} \in \mathcal{C}_{1}^{*}, \ldots, Z_{k-1}^{*} \in \mathcal{C}_{k-1}^{*}\right)\right)}_{\text {average number of stages }}\]
Pampallona & Tsiatis (1994) suggest to use a GSD with a symmetric interim acceptance region: \[ \mathcal{A}_{k}^{*}=\left[-u_{k k}^{0}, u_{k}^{0}\right], \quad \mathcal{C}_{k}^{*}=\left(-u_{k}^{1},-u_{k}^{0}\right) \cup\left(u_{k}^{0}, u_{k}^{1}\right), \quad k=1, \ldots, K \] for specific \(0<u_{k}^{0}<u_{k}^{1}(k<K)\) and \(0<u_{K}^{0}=u_{K}^{1}\) This means to accept \(H_{0}\) at stage k if \(|Z k| \leq u k^{0}\), and to reject \(H_{0}\) at stage k if \(\left|Z_{k}\right| \geq u_{k}^{1}\)
The method can be implemented by using the acceptance bounds \[ u_{k}^{0}:=\max (\underbrace{\left[\sqrt{k / K}\left(c^{0}+c^{1}\right)-c^{0}\right] K^{\Delta-0.5}}_{\text {the original } u_{k}^{0}}, 0) \]
Without early acceptance we would now choose: \[ \begin{array}{c} \mathcal{C}_{k}^{*}=\left(-\infty, u_{k}\right), \quad \mathcal{A}_{k}^{*}=\emptyset, \quad k=1, \ldots, K-1 \\ \mathcal{R}_{k}^{*}=\left[u_{k}, \infty\right), \quad k=1, \ldots, K, \quad \mathcal{A}_{K}^{*}=\left(-\infty, u_{K}\right) \end{array} \] Type I error rate control: \[ \mathbf{P}_{\mu_{0}}\left(\bigcup_{k=1}^{K}\left\{Z_{k}^{*} \geq u_{k}\right\}\right)=\alpha \] One-sided GSD with early acceptance: DeMets & Ware \((1980,1982)\) suggest to use a constant futility boundary \(u^{L}<u_{k}\) for all \(k \leq K-1\), With this futility boundary the decision regions become \[ \begin{array}{c} \mathcal{C}_{k}^{*}=\left(u^{L}, u_{k}\right), \quad \mathcal{A}_{k}^{*}=\left(-\infty, u^{L}\right), \quad k=1, \ldots, K-1 \\ \mathcal{R}_{k}^{*}=\left(u_{k}, \infty\right), \quad k=1, \ldots, K, \quad \mathcal{A}_{K}^{*}=\left(-\infty, u_{K}\right) \end{array} \]
GSD with unequally sized stages
Assume that we have planned the GSD with equally spaced stages, but the stage-wise sample sizes \(n_k\) are unequal. We can calculate all boundaries (e.g. Pocock or O’Brien & Fleming) also with unequal stages.
如果只是为了控制假阳性,可以将提前比较视为多次比较,使用多重检验方法进行校正。但是实验中数据相关性较强,多重校正会增大假阴性错误,并不合适。alpha消耗函数的思路是将假阳性错误按照某种方案分配给每次比较,每次比较消耗一定的假阳性配额,合计后刚好等于预设水平。 当某个临床研究分若干阶段进行整体决策时(如基于有效性或无效性所做的期中分析),每个阶段都要消耗一定的α,随着研究进展,研究所完成的比例(如1/3、1/2、60%等)与累积的I类错误率呈现某种函数关系. 每次消耗为 \[ \alpha_{1}^{*}=\alpha\left(t_{1}^{*}\right)-\alpha(0),\] \[\alpha_{2}^{*}=\alpha\left(t_{2}^{*}\right)-\alpha\left(t_{1}^{*}\right), \ldots,\] \[\alpha_{n}^{*}=\alpha(1)-\alpha\left(t_{n-1}^{*}\right)\] 根据定义可知\(\alpha_{1}^{*}=\alpha\left(t_{1}^{*}\right), \sum \alpha^{*}=\alpha,\)
It is usually difficult or even impossible to achieve the pre-planned sample sizes perfectly. One reason is that we need to fix a date for the DMC to meet. So solve this issue, Lan & DeMets suggest to fix the maximum sample size \(\mathrm{N}\) and a “spending function at level” \(\alpha^{*}(t), \quad t \in[0,1]\) that is strictly increasing with \(\alpha(0)=0\) and \(\alpha(1)=\alpha\)
\[ \alpha^{*}\left(t_{k-1}\right)+\mathbf{P}\left(\left|Z_{1}^{*}\right|<u_{1}, \ldots,\left|Z_{k-1}^{*}\right|<u_{k-1},\left|Z_{k}^{*}\right| \geq u_{k}\right)=\alpha^{*}\left(t_{k}\right) \] This means to use the level \(\alpha^{*}\left(t_{k}\right)\) up to stage \(k\)
Examples for spending functions
R Package for Adaptive Clinical Trials
Design Functions
Sample Size Calculation Functions
Power Calculation Functions
Simulation Functions
Dataset and Analysis Results Functions
getDesignGroupSequential
defining efficacy boundariestypeOfDesign = c("OF", "P", "WT", "HP", "WToptimum",
"asP", "asOF", "asKD", "asHSD", "asUser")
library(rpact)
# Standard O’Brien & Fleming boundary
design <- getDesignGroupSequential(sided = 1, alpha = 0.025,
informationRates = c(0.33, 0.67, 1), typeOfDesign = "OF")
# User-defined α-spending functions
# User-defined αα-spending functions (typeOfDesign = "asUser") can be obtained via the argument userAlphaSpending which must contain a numeric vector with elements 0<α1<…<αkMax=α0<α1<…<αkMax=α that define the values of the cumulative alpha-spending function at each interim analysis.
# Example: User-defined alpha-spending function which is very conservative at first interim (spend alpha = 0.001), conservative at second (spend an additional alpha = 0.01, i.e., total cumulative alpha spent is 0.011 up to second interim), and spends the remaining alpha at the final analysis (i.e., cumulative alpha = 0.025)
design <- getDesignGroupSequential(sided = 1, alpha = 0.025,
informationRates = c(0.33, 0.67, 1),
typeOfDesign = "asUser",
userAlphaSpending = c(0.001, 0.01 + 0.001, 0.025))
# O’Brien & Fleming type α-spending
design <- getDesignGroupSequential(sided = 1, alpha = 0.025,
informationRates = c(0.33, 0.67, 1),
typeOfDesign = "asOF")
# plot the design with default type 1 (Boundary Plot)
plot(design)
## plot.TrialDesign
## 1: creates a 'Boundaries' plot
## 3: creates a 'Stage Levels' plot
## 4: creates a 'Error Spending' plot
## 5: creates a 'Power and Early Stopping' plot
## 6: creates an 'Average Sample Size and Power / Early Stop' plot
## 7: creates an 'Power' plot
## 8: creates an 'Early Stopping' plot
## 9: creates an 'Average Sample Size' plot
## "all": creates all available plots and returns it as a grid plot or list
# summary() creates a nice presentation of the design:
summary(design)
Sequential analysis with a maximum of 3 looks (group sequential design)
O’Brien & Fleming type alpha spending design, one-sided overall significance level 2.5%, power 80%, undefined endpoint, inflation factor 1.013, ASN H1 0.8657, ASN H01 0.9827, ASN H0 1.0109.
Stage | 1 | 2 | 3 |
---|---|---|---|
Planned information rate | 33% | 67% | 100% |
Cumulative alpha spent | <0.0001 | 0.0062 | 0.0250 |
Stage levels (one-sided) | <0.0001 | 0.0061 | 0.0231 |
Efficacy boundary (z-value scale) | 3.731 | 2.504 | 1.994 |
Cumulative power | 0.0174 | 0.4227 | 0.8000 |
# display the design characteristics
# Stopping probabilities and expected sample size reduction
getDesignCharacteristics(design)
Group sequential design characteristics
## Comparison of multiple designs
# O'Brien & Fleming, 3 equally spaced stages
d1 <- getDesignGroupSequential(typeOfDesign = "OF", kMax = 3)
# Pocock
d2 <- getDesignGroupSequential(typeOfDesign = "P", kMax = 3)
designSet <- getDesignSet(designs = c(d1, d2), variedParameters = "typeOfDesign")
plot(designSet, type = 1)
## futilityBounds
defining futility boundaries
# Example: non-binding futility boundary at each interim in case
# estimated treatment effect is null or goes in "the wrong direction"
# See Futility boundary (z-value scale)
design <- getDesignGroupSequential(sided = 1, alpha = 0.025,
informationRates = c(0.33, 0.67, 1), typeOfDesign = "asOF",
futilityBounds = c(0,0), bindingFutility = FALSE)
summary(design)
Sequential analysis with a maximum of 3 looks (group sequential design)
O’Brien & Fleming type alpha spending design, non-binding futility, one-sided overall significance level 2.5%, power 80%, undefined endpoint, inflation factor 1.0605, ASN H1 0.8628, ASN H01 0.8689, ASN H0 0.6589.
Stage | 1 | 2 | 3 |
---|---|---|---|
Planned information rate | 33% | 67% | 100% |
Cumulative alpha spent | <0.0001 | 0.0062 | 0.0250 |
Stage levels (one-sided) | <0.0001 | 0.0061 | 0.0231 |
Efficacy boundary (z-value scale) | 3.731 | 2.504 | 1.994 |
Futility boundary (z-value scale) | 0 | 0 | |
Cumulative power | 0.0191 | 0.4430 | 0.8000 |
Futility probabilities under H1 | 0.049 | 0.003 |
Figure: Sample Sizes for Different Types of Endpoints without IA
## without interim analyses witrh equal sample size of 2 arms
## Assumption: Targeted mean difference is >0 under the alternative hypothesis
# Example of a standard trial:
# - targeted mean difference is 10 (alternative = 10)
# - standard deviation in both arms is assumed to be 24 (stDev = 24)
# - two-sided test (sided = 2), Type I error 0.05 (alpha = 0.05) and power 80%
# - (beta = 0.2)
sampleSizeResult <- getSampleSizeMeans(alternative = 10, stDev = 24, sided = 2,
alpha = 0.05, beta = 0.2)
# sampleSizeResult
summary(sampleSizeResult)
Sample size calculation for a continuous endpoint
Fixed sample analysis, two-sided significance level 5%, power 80%. The results were calculated for a two-sample t-test, H0: mu(1) - mu(2) = 0, H1: effect = 10, standard deviation = 24.
Stage | Fixed |
---|---|
Stage level (two-sided) | 0.0500 |
Efficacy boundary (z-value scale) | 1.960 |
Lower efficacy boundary (t) | -7.006 |
Upper efficacy boundary (t) | 7.006 |
Number of subjects | 182.8 |
Legend:
## Unequal randomization between the treatment groups
# - 2(intervention):1(control) randomization (allocationRatioPlanned = 2)
summary(getSampleSizeMeans(alternative = 10, stDev = 24,
allocationRatioPlanned = 2, sided = 2, alpha = 0.05, beta = 0.2))
Sample size calculation for a continuous endpoint
Fixed sample analysis, two-sided significance level 5%, power 80%. The results were calculated for a two-sample t-test, H0: mu(1) - mu(2) = 0, H1: effect = 10, standard deviation = 24, planned allocation ratio = 2.
Stage | Fixed |
---|---|
Stage level (two-sided) | 0.0500 |
Efficacy boundary (z-value scale) | 1.960 |
Lower efficacy boundary (t) | -7.004 |
Upper efficacy boundary (t) | 7.004 |
Number of subjects | 205.4 |
Legend:
## Calculate power for the 2:1 rendomized trial with total sample size 206
## (as above) assuming a larger difference of 12
powerResult <- getPowerMeans(alternative = 12, stDev = 24, sided = 2,
allocationRatioPlanned = 2, maxNumberOfSubjects = 206, alpha = 0.05)
summary(powerResult)
Power calculation for a continuous endpoint
Fixed sample analysis, two-sided significance level 5%. The results were calculated for a two-sample t-test, H0: mu(1) - mu(2) = 0, H1: effect = 12, standard deviation = 24, number of subjects = 206, planned allocation ratio = 2.
Stage | Fixed |
---|---|
Stage level (two-sided) | 0.0500 |
Efficacy boundary (z-value scale) | 1.960 |
Lower efficacy boundary (t) | -6.994 |
Upper efficacy boundary (t) | 6.994 |
Power | 0.9203 |
Number of subjects | 206.0 |
Legend:
## Two groups continuous endpoint (non-inferiority)
## Plpot the overall power
# Example: Calculate power for design with sample size 206 as above
# alternative values ranging from 5 to 15
powerResult <- getPowerMeans(alternative = 5:15, stDev = 24, sided = 2,
allocationRatioPlanned = 2, maxNumberOfSubjects = 206, alpha = 0.05)
plot(powerResult,type = 7) # one of several possible plots
# - One-sided alpha = 0.05, 1:1 randomization
# - H0: treatment difference <= -12 (i.e., = -12 for calculations, thetaH0 = -1)
# vs. alternative H1: treatment difference = 0 (alternative = 0)
sampleSizeNoninf <- getSampleSizeMeans(thetaH0 = -12,alternative = 0,
stDev = 14,alpha = 0.025,beta = 0.2,sided = 1)
sampleSizeNoninf
Design plan parameters and output for means
Design parameters
User defined parameters
Default parameters
Sample size and output
Legend
# - probability 25% in control (pi2 = 0.25) vs 40% (pi1 = 0.4) in intervention
# - one-sided test (sided = 1)
# - Type I error 0.025 (alpha = 0.025) and power 80% (beta = 0.2)
sampleSizeResult <- getSampleSizeRates(pi2 = 0.25, pi1 = 0.4,
sided = 1, alpha = 0.025, beta = 0.2)
summary(sampleSizeResult)
Sample size calculation for a binary endpoint
Fixed sample analysis, one-sided significance level 2.5%, power 80%. The results were calculated for a two-sample test for rates (normal approximation), H0: pi(1) - pi(2) = 0, H1: pi(1) = 0.4, control rate pi(2) = 0.25.
Stage | Fixed |
---|---|
Stage level (one-sided) | 0.0250 |
Efficacy boundary (z-value scale) | 1.960 |
Efficacy boundary (t) | 0.103 |
Number of subjects | 303.7 |
Legend:
# Example: Calculate power for a simple trial with total sample size 304
# as in the example above in case of pi2 = 0.25 (control) and
# pi1 = 0.37 (intervention)
powerResult <- getPowerRates(pi2 = 0.25, pi1 = 0.37, allocationRatioPlanned = 2,
maxNumberOfSubjects = 304, sided = 1,alpha = 0.025)
summary(powerResult)
Power calculation for a binary endpoint
Fixed sample analysis, one-sided significance level 2.5%. The results were calculated for a two-sample test for rates (normal approximation), H0: pi(1) - pi(2) = 0, power directed towards larger values, H1: pi(1) = 0.37, control rate pi(2) = 0.25, number of subjects = 304, planned allocation ratio = 2.
Stage | Fixed |
---|---|
Stage level (one-sided) | 0.0250 |
Efficacy boundary (z-value scale) | 1.960 |
Efficacy boundary (t) | 0.112 |
Power | 0.5571 |
Number of subjects | 304.0 |
Legend:
# Example: Calculate power for simple design (with sample size 304 as above)
# for probabilities in intervention ranging from 0.3 to 0.5
powerResult <- getPowerRates(pi2 = 0.25,pi1 = seq(0.3,0.5,by = 0.01),
maxNumberOfSubjects = 304,sided = 1,alpha = 0.025)
# one of several possible plots, this one plotting true effect size vs power
plot(powerResult,type = 7)
## for a single arm trial without interim analyses
# Example: Sample size for a single arm trial which tests
# H0: pi = 0.1 vs. H1: pi = 0.25
# (use conservative exact binomial calculation)
samplesSizeResults <- getSampleSizeRates(groups = 1, thetaH0 = 0.1, pi1 = 0.25,
normalApproximation = FALSE, sided = 1, alpha = 0.025, beta = 0.2)
summary(samplesSizeResults)
Sample size calculation for a binary endpoint
Fixed sample analysis, one-sided significance level 2.5%, power 80%. The results were calculated for a one-sample test for rates (exact test, conservative solution), H0: pi = 0.1, H1: pi = 0.25.
Stage | Fixed |
---|---|
Stage level (one-sided) | 0.0250 |
Efficacy boundary (z-value scale) | 1.960 |
Efficacy boundary (t) | 0.181 |
Number of subjects | 53.0 |
Legend:
Sample size calculation for a group-sequential trials is performed in two steps:
# Example: Group-sequential design with O'Brien & Fleming type alpha-spending
# and one interim at 60% information
design <- getDesignGroupSequential(sided = 1, alpha = 0.025, beta = 0.2,
informationRates = c(0.6,1), typeOfDesign = "asOF")
# Trial assumes an effect size of 10 as above, a stDev = 24, and an allocation
# ratio of 2
sampleSizeResultGS <- getSampleSizeMeans(
design, alternative = 10, stDev = 24, allocationRatioPlanned = 2)
# Standard rpact output (sample size object only, not design object)
summary(sampleSizeResultGS)
Sample size calculation for a continuous endpoint
Sequential analysis with a maximum of 2 looks (group sequential design), one-sided overall significance level 2.5%, power 80%. The results were calculated for a two-sample t-test, H0: mu(1) - mu(2) = 0, H1: effect = 10, standard deviation = 24, planned allocation ratio = 2.
Stage | 1 | 2 |
---|---|---|
Planned information rate | 60% | 100% |
Cumulative alpha spent | 0.0038 | 0.0250 |
Stage levels (one-sided) | 0.0038 | 0.0238 |
Efficacy boundary (z-value scale) | 2.669 | 1.981 |
Efficacy boundary (t) | 12.393 | 7.050 |
Cumulative power | 0.3123 | 0.8000 |
Number of subjects | 124.3 | 207.1 |
Expected number of subjects under H1 | 181.3 | |
Exit probability for efficacy (under H0) | 0.0038 | |
Exit probability for efficacy (under H1) | 0.3123 |
Legend:
# Example: Group-sequential design with O'Brien & Fleming type alpha-spending and
# one interim at 60% information
design <- getDesignGroupSequential(sided = 1, alpha = 0.025, beta = 0.2,
informationRates = c(0.6, 1), typeOfDesign = "asOF")
# Sample size calculation assuming event probabilities are 25% in control
# (pi2 = 0.25) vs 40% (pi1 = 0.4) in intervention
sampleSizeResultGS <- getSampleSizeRates(design,pi2 = 0.25,pi1 = 0.4)
# Standard rpact output (sample size object only, not design object)
sampleSizeResultGS
Design plan parameters and output for rates
Design parameters
User defined parameters
Default parameters
Sample size and output
Legend
These designs are used in trials where the endpoint is time until the occurrence of a specific event (like death or disease progression). The complexity in these designs often comes from several factors:
Patient Follow-Up: Decisions need to be made whether to follow patients until an event occurs or only for a fixed period, which can impact the detection of treatment effects and the overall study duration.
Non-Proportional Hazards: This occurs when the risk (hazard) of an endpoint varies over time between treatment groups. Handling non-proportional hazards requires more sophisticated statistical techniques to ensure accurate interpretations of the treatment effect over the study period.
Complex Accrual, Survival, Dropout: Managing varying rates of patient accrual, different survival rates, and dropout rates can complicate the analysis and interpretation of the trial data.
Stratified Analyses: These are used to control for factors that might affect the outcome, allowing for more precise estimates of the treatment effect within subgroups of patients.
The relevant rpact functions for survival are:
getPowerSurvival()
: This function is the analogue to
getSampleSizeSurvival() for the calculation of power rather than the
sample size.getEventProbabilities()
: Calculates the probability of
an event depending on the time and type of accrual, follow-up time, and
survival distribution. This is useful for aligning interim analyses for
different time-to-event endpoints.getSimulationSurvival()
: This function simulates
group-sequential trials. For example, it allows to assess the power of
trials with delayed treatment effects or to assess the data-dependent
variability of the timing of interim analyses even if the protocol
assumptions are perfectly fulfilled. It also allows to simulate
hypothetical datasets for trials stopped early.Exponential survival distributions
eventTime = 24, pi2 = 0.3, pi1 = 0.2
Weibull survival distributions
Additional scale parameter kappa needs to be provided which is 1 for the exponential distribution.
Exponential survival, flexible accrual intensity, no interim analyses
sampleSize1 <- getSampleSizeSurvival(sided = 2,alpha = 0.05,beta = 0.2,
lambda2 = log(2)/60,hazardRatio = 0.74,
dropoutRate1 = 0.025, dropoutRate2 = 0.025, dropoutTime = 12,
accrualTime = c(0,1,2,3,4,5,6),
accrualIntensity = c(6,12,18,24,30,36,42),
maxNumberOfSubjects = 1200)
summary(sampleSize1)
Sample size calculation for a survival endpoint
Fixed sample analysis, two-sided significance level 5%, power 80%. The results were calculated for a two-sample logrank test, H0: hazard ratio = 1, H1: hazard ratio = 0.74, control lambda(2) = 0.012, number of subjects = 1200, accrual time = c(1, 2, 3, 4, 5, 6, 31.571), accrual intensity = c(6, 12, 18, 24, 30, 36, 42), dropout rate(1) = 0.025, dropout rate(2) = 0.025, dropout time = 12.
Stage | Fixed |
---|---|
Stage level (two-sided) | 0.0500 |
Efficacy boundary (z-value scale) | 1.960 |
Lower efficacy boundary (t) | 0.810 |
Upper efficacy boundary (t) | 1.234 |
Number of subjects | 1200.0 |
Number of events | 346.3 |
Analysis time | 53.11 |
Expected study duration under H1 | 53.11 |
Legend:
Adaptive designs provide flexibility to modify trial parameters based on interim data without undermining the validity and integrity of the trial. These adjustments include:
Treatment Arms - MAMS GSD (Multi-Arm Multi-Stage Group Sequential Design): This approach allows several treatment arms to be tested simultaneously. Based on interim results, ineffective treatments can be dropped (“drop the loser”), and promising ones can be continued (“pick the winner”).
Sample Size - BSSR/USSR (Blinded Sample Size Reestimation/Unblinded Sample Size Reestimation): Adjustments to the sample size based on interim data to maintain statistical power or precision of estimates. BSSR is done without knowledge of the treatment groups’ data, whereas USSR involves using this information.
Patient Subgroups: Adaptations can also include focusing on specific patient subgroups that show more significant benefits or lesser side effects, which can be identified as the trial progresses.
Bayesian Approach: Uses prior distributions and updates probabilities with accumulating data, allowing for continuous learning and adaptation within the trial. This approach is particularly useful in adaptive GSDs for handling uncertainty and incorporating real-time data effectively.
Rufibach, K. (n.d.). Why do we do interim analyses in clinical trials? Retrieved from https://www.linkedin.com/pulse/why-do-we-interim-analyses-clinical-trials-kaspar-rufibach/?trackingId=4m6sV4oaRFeyrfPyQw%2Bdaw%3D%3D
Haybittle, J. L. (1971). Repeated assessment of results in clinical trials of cancer treatment. The British Journal of Radiology, 44, 793-797.
Peto, R., Pike, M. C., Armitage, P., Breslow, N. E., Cox, D. R., Howard, S. V., Mantel, N., McPherson, K., Peto, J., & Smith, P. G. (1976). Design and analysis of randomized clinical trials requiring prolonged observation of each patient. I. Introduction and design. British Journal of Cancer, 34(6), 585-612.
O’Brien, P. C., & Fleming, T. R. (1979). A multiple testing procedure for clinical trials. Biometrics, 35, 549-556.
Pocock, S. J. (1977). Group sequential methods in the design and analysis of clinical trials. Biometrika, 64(2), 191-199.
Whitehead, J., & Stratton, I. (1983). Group Sequential clinical trials with triangular continuation regions. Biometrics, 39, 227-236.
Whitehead, J. (2001). Use of the Triangular Test in Sequential Clinical Trials. In Handbook of Statistics in Clinical Oncology (pp. 211-228). New York: Marcel Dekker.
Lan, K. K. G., & DeMets, D. L. (1983). Discrete Sequential Boundaries for Clinical Trials. Biometrika, 70(3), 659-663.
Kim, K., & DeMets, D. L. (1987). Design and Analysis of Group Sequential Tests Based on the Type I Error Spending Rate Function. Biometrika, 74, 149-154.
Hwang, I. K., Shih, W. J., & DeCani, J. S. (1990). Group sequential designs using a family of type I error probability spending functions. Statistics in Medicine, 9, 1439-1445.
Lan, K. K. G., Rosenberger, W. F., & Lachin, J. M. (1993). Use of Spending Functions for Occasional or Continuous Monitoring of Data in Clinical Trials. Statistics in Medicine, 12(23), 2219-2231.
Demets, D. L., & Lan, K. G. (1994). Interim analysis: the alpha spending function approach. Statistics in Medicine, 13(13-14), 1341-1352.
Wang, S. K., & Tsiatis, A. A. (1987). Approximately optimal one-parameter boundaries for group sequential trials. Biometrics, 43, 193-199.
Pampallona, S., Tsiatis, A. A., & Kim, K. (2001). Interim monitoring of group sequential trials using spending functions for the type I and type II error probabilities. Drug Information Journal, 35, 1113-1121.
Emerson, S. S., & Fleming, T. R. (1989). Symmetric Group Sequential Designs. Biometrics, 45, 905-923.
Kittelson, J. M., & Emerson, S. S. (1999). A Unifying Family of Group Sequential Test Designs. Biometrics, 55, 874-882.
Jennison, C., & Turnbull, B. W. (1999). Group sequential methods with applications to clinical trials. CRC Press.
Gsponer, T., Gerber, F., Bornkamp, B., Ohlssen, D., Vandemeulebroecke, M., Schmidli, H. (2014). A Practical Guide to Bayesian Group Sequential Designs. Pharmaceutical Statistics, 13(1), 71-80.
Berry, S. M., Carlin, B. P., Lee, J. J., & Muller, P. (2010). Bayesian adaptive methods for clinical trials. CRC Press.
ICH E20 Concept Paper. Retrieved from https://database.ich.org/sites/default/files/E20_FinalConceptPaper_2019_1107_0.pdf
Bauer, P., Bretz, F., Dragalin, V., König, F., & Wassmer, G. (2016). Twenty‐five years of confirmatory adaptive designs: opportunities and pitfalls. Statistics in Medicine, 35(3), 325-347.
Kelly, P. J., Sooriyarachchi, M. R., Stallard, N., & Todd, S. (2005). A practical comparison of group-sequential and adaptive designs. Journal of Biopharmaceutical Statistics, 15(4), 719-738.
Cui, L., Hung, H. J., & Wang, S. J. (1999). Modification of sample size in group sequential clinical trials. Biometrics, 55(3), 853-857.
Mehta, C. R., & Tsiatis, A. A. (2001). Flexible Sample Size Considerations under Information Based Interim Monitoring. Drug Information Journal, 35, 1095-1112.
Chen, Y. J., DeMets, D. L., & Lan, K. K. (2004). Increasing the sample size when the unblinded interim result is promising. Statistics in Medicine, 23(7), 1023-1038.
Freidlin, B., & Korn, E. L. (2017). Sample size adjustment designs with time-to-event outcomes: a caution. Clinical Trials, 14(6), 597-604.
Muller, H-H., & Schafer, H. (2001). Adaptive group sequential designs for clinical trials: Combining the advantages of adaptive and of classical group sequential approaches. Biometrics, 57, 886-891.
Wassmer, G. (2006). Planning and analyzing adaptive group sequential survival trials. Biometrical Journal: Journal of Mathematical Methods in Biosciences, 48(4), 714-729.
Zelen, M. (1969). Play the winner rule and the controlled clinical trial. Journal of the American Statistical Association, 64(325), 131-146.
Ghosh, P., Liu, L., Senchaudhuri, P., Gao, P., & Mehta, C. (2017). Design and monitoring of multi‐arm multi‐stage clinical trials. Biometrics, 73(4), 1289-1299.
Ghosh, P., Liu, L., & Mehta, C. (2020). Adaptive multiarm multistage clinical trials. Statistics in Medicine, 73(4), 1289-1299.