1 Introduction

Interim analysis, performed according to a predefined analytical plan before the completion of an entire study, is a crucial process in clinical research. This type of analysis is especially integral in Group Sequential Trials (GTS), where the findings from interim analyses can dictate whether a trial should be terminated early. The design of a study can incorporate one or several interim analyses, with each additional analysis increasing the chance of stopping the trial early if necessary. This not only affects the maximum sample size—requiring a larger number to maintain statistical integrity—but also reduces the average sample size needed throughout the trial.

1.1 Why do We Perform Interim Analyses

In the realm of clinical trials, interim analyses are critical components that allow for the potential early termination of trials based on efficacy or futility findings. This detailed summary explores the rationale and implications of these analyses, particularly within group-sequential trials in drug development.

Purpose of Group-Sequential Trials in Drug Development

Group-sequential trials are designed to allow for early trial termination to save time and resources. For instance, with a hypothetical endpoint such as a hazard ratio (HR) of 0.75, a traditional single-stage trial might require 380 events. However, incorporating interim analyses for futility and efficacy can adjust this number. An interim analysis for futility might be set when 30% of the events have occurred (if HR > 1), and for efficacy at about 66% of events, using an O’Brien-Fleming alpha-spending approach. This could increase the maximum number of events to 408 but offers the possibility of stopping earlier if criteria are met. Statistically, some trials will cease at these interim points, significantly reducing the number of events needed on average.

Relationship Between Powered Effect and Stopping Criteria

The effect size that a trial is powered at, such as an HR of 0.75, directly relates to the thresholds used at interim stops for efficacy. For a trial to stop early for efficacy, the observed effect must be strong enough to cross the predefined efficacy boundary, which is typically not drastically higher than the powered effect. This ensures that stopping early does not necessarily mean expecting a much larger effect than initially powered for.

Ethical Considerations: Is Early Stopping “Cheating”?

Early stopping is sometimes viewed skeptically, as if it were a way to “cheat” the trial process. However, group-sequential methodologies are designed to maintain the overall type I error rate (familywise-error rate, FWER), adjusting the significance thresholds for each interim look. This methodological rigor ensures that the integrity of the trial’s statistical conclusions is preserved, not compromised.

Bias in Early Stopping

Concerns about bias when stopping trials early are valid, especially regarding the accuracy of the estimated treatment effect. However, methodologies have been developed to adjust for such early stopping, providing unbiased estimates of the treatment effect. For example, a trial might stop for efficacy if an interim analysis shows an HR significantly below the threshold, with adjusted analyses confirming the robustness of this finding.

Operational Implications of Early Stopping

From an operational standpoint, stopping a trial early for efficacy doesn’t mean all activities cease immediately. Data continues to be collected on primary and secondary outcomes to enrich the findings and ensure the durability of the treatment effect over time. Moreover, certain data collection efforts that are crucial for regulatory filing might continue, while others may be halted. The trial’s unblinding at this point necessitates careful management and analysis of the continuing data collection.

1.2 Key Components of Interim Analysis

1. Blinding: - To prevent bias, it’s essential that all personnel involved in the trial, except those conducting the interim analysis, remain blinded to the unmasked data and results. Researchers are typically informed only about whether to continue the trial or make amendments to the study protocol.

2. Rigorous Pre-Design: - A poorly designed interim analysis can compromise the reliability of the trial results. Thus, it’s vital to avoid unplanned interim analyses unless absolutely necessary. If an unplanned analysis is conducted, the study report should detail the reason for the analysis, the necessity of unblinding, the potential bias introduced, and its impact on the interpretation of results.

3. Frequency and Timing of Analyses: - Frequency: For instance, a trial might plan for three interim analyses. - Timing: These analyses can be evenly spaced (e.g., after 25%, 50%, and 75% of data collection) or at irregular intervals, depending on various factors.

4. Error Control: - Overall Error Rate: Typically set at a significance level of 0.05. - Stopping Boundaries: These are critical for deciding if a trial should be halted early. + Efficacy Boundary: Utilizes an alpha-spending function to adjust the total sample size and control for Type I error. Crossing this boundary means the null hypothesis (H0) is rejected. + Futility Boundary: Employs a beta-spending function to adjust the total sample size and control for Type II error. Crossing this boundary suggests acceptance of the alternative hypothesis (H1).

5. Adaptive Design: - Depending on the results of the interim analysis, modifications may be made to the trial’s design. This could involve adjusting the sample size or altering the trial’s objective (e.g., shifting from a superiority trial to a non-inferiority trial).

1.3 Fully Sequential vs. Group Sequential Designs

Group sequential designs (GSDs) are a sophisticated methodology in clinical trials that allow for interim analyses of the data at pre-specified points during the study. These designs can provide the opportunity to stop the trial early for efficacy, futility, or in some cases, safety concerns. This flexibility makes group sequential designs very appealing for both ethical and economic reasons.

Fully Sequential Designs: These designs allow for continuous monitoring of data as they become available. Each incoming piece of data can potentially trigger an interim analysis, which might lead to early termination of the trial. This method is very flexible but requires rigorous control of the Type I error rate due to the multiple analyses conducted.
Group Sequential Designs: Unlike fully sequential designs, GSDs involve analyses at specified “looks” or points within the collection of data. These points are usually predetermined and occur after a cohort of subjects has been assessed. The data are evaluated in groups rather than continuously, which can simplify the logistics and statistical handling of interim analyses.

Key Features of Group Sequential Designs

Early Stopping: The trial may be terminated early for efficacy if the treatment effect is stronger than expected, or for futility if preliminary results suggest that continuing the trial is unlikely to show a treatment benefit.
Adjustments at Interim Analyses: Adjustments may be made to the trial parameters based on interim findings, such as sample size re-estimation or protocol modifications.
Multiple Opportunities for Success: These designs allow multiple looks at the data, thereby increasing the chances of stopping early if the treatment proves effective or avoiding unnecessary exposure to ineffective or harmful treatments.

1.4 Types of Group Sequential Design

1. Error Spending - Definition: This method involves ‘spending’ alpha (Type I error, related to falsely declaring efficacy) and/or beta (Type II error, related to falsely declaring futility) across multiple interim looks. - Application: Highly flexible in both the design and analysis stages of a clinical trial, allowing adjustments based on accumulating data.

2. Haybittle-Peto - Definition: Sets boundaries for efficacy in terms of unadjusted p-values. This method adjusts the final p-value to retain the correct Type I error rate. - Application: Typically involves a more conservative approach to stopping for efficacy, often using a stringent p-value threshold like 0.001 at interim analyses.

3. Wang-Tsiatis - Definition: Includes designs that focus solely on efficacy or both efficacy and futility. The Pampallona-Tsiatis Parameter Method aims to optimize the trial by minimizing the average sample size. - Application: Provides a structured approach to balance between the number of interim looks and the conservation of statistical power.

4. “Classic” Designs - Definition: Includes well-known methods like O’Brien-Fleming and Pocock designs, which use fixed statistical boundaries for interim analyses. - Application: These designs are less flexible during monitoring but are well-established for controlling Type I error across multiple looks. They are not to be confused with O’Brien-Fleming and Pocock error spending functions which are more about how the error probability is allocated across the interim analyses.

5. Unified Family - Definition: A unified method that encompasses the Wang-Tsiatis and “Classic” designs using two parameters to define the stopping boundaries. - Application: Offers less flexibility than error spending approaches but provides a structured framework that can incorporate different statistical considerations.

6. Whitehead - Definition: An extension of the fully sequential probability ratio test (SPRT) adapted to group sequential settings. This method is known for its “Triangular” or “Christmas Tree” design patterns. - Application: Useful in scenarios where the traditional SPRT needs adaptation to handle grouped sequential data, providing a balance between early stopping for efficacy and overall trial integrity.

7. Others - Definition: Includes custom designs for futility such as Conditional Power, Adaptive GSD (which may include sample size re-estimation and other adaptive features), and Multi-Arm Multi-Stage (MAMS) GSD. - Application: These methods allow for a high degree of customization and can adapt to the specific needs of a trial, offering flexibility in how interim data are used to make decisions about the trial’s continuation.

1.5 GSD Simulation

Group Sequential Design (GSD) simulation is an essential tool in the design and analysis of clinical trials, allowing researchers to assess the properties of complex GSDs before actual trial data are collected.

GSD simulation uses hypothetical (simulated) study data to predict various characteristics of a group sequential design. It is particularly valuable for:

Understanding Complex GSDs: Simulations help clarify how GSDs will perform under various scenarios, which can be complex and not intuitive, especially with adaptations and interim analyses.
Verification of Design: By simulating trial outcomes based on the original design specifications, researchers can verify if the design meets the intended statistical properties (e.g., maintaining Type I error rates).
Exploration of Alternative Scenarios: Simulations allow researchers to test different design modifications, such as changes in the timing of interim analyses or the effect of different sample sizes and information fractions.

Key Benefits of Simulation in GSD

Type I and II Error Verification: Ensures that the trial is correctly powered to detect a treatment effect and maintains the correct Type I error rate across multiple interim analyses.
Expected Sample Size and Stopping Time: Helps predict the likely sample size needed and the potential for early stopping, which are crucial for budgeting and logistical planning.
Boundary and Information Fraction Calculations: Assists in setting appropriate boundaries for interim analyses and deciding how much information to review at each interim look.
Evaluation of Alternative Hypotheses: Simulations can test how robust the trial design is against various clinical scenarios, such as weaker or stronger than expected treatment effects.
Flexibility in Design Adjustments: Allows for the exploration of substantial modifications to the trial without the risk and expense of making these changes in a real-world setting.

Regulatory Perspective

Regulatory bodies often view simulation positively as it provides a rigorous framework for evaluating and justifying trial designs. Simulated data can demonstrate to regulators that a trial is likely to meet its objectives and that the statistical methods employed are appropriate and robust.

Steps in Designing a Simulated GSD

Step 1: Generating Model Assumptions - Define the clinical context and parameters for the simulated data, such as the expected rates of events, variability, and the effect size of the intervention.

Step 2: Design Assumptions - Establish the structure of the trial, including the number of arms, enrollment criteria, and planned duration.

Step 3: Boundary/Information Assumptions - Set the criteria for interim analyses, including statistical boundaries for efficacy and futility, and decide how much data (information fraction) will be considered at each analysis point.

Step 4: Simulation Control Assumptions - Determine the number of simulations, the randomization seed, and other technical parameters that control the simulation process. This step ensures that the simulation is both robust and reproducible.

2 Classical Designs without futility stopping

The design of many clinical trials includes some strategy for early stopping if an interim analysis reveals large differences between treatment groups, or shows obvious futility such that there is no chance that continuing to the end would show a clinically meaningful effect. In addition to saving time and resources, such a design feature can reduce study participants’ exposure to an inferior or useless treatment. However, when repeated significance testing on accumulating data is done, some adjustment of the usual hypothesis testing procedure must be made to maintain an overall significance level. The methods described by Pocock and O’Brien & Fleming, among others, are popular implementations of group sequential testing for clinical trials.

2.1 Pocock Method

The Pocock boundary is a method for determining whether to stop a clinical trial prematurely. The typical clinical trial compares two groups of patients. One group are given a placebo or conventional treatment, while the other group of patients are given the treatment that is being tested. The investigators running the clinical trial will wish to stop the trial early for ethical reasons if the treatment group clearly shows evidence of benefit. In other words, “when early results proved so promising it was no longer fair to keep patients on the older drugs for comparison, without giving them the opportunity to change.”

The Pocock boundary is simple to use in that the p-value threshold is the same at each interim analysis. The disadvantages are that the number of interim analyses must be fixed at the start and it is not possible under this scheme to add analyses after the trial has started. Another disadvantage is that investigators and readers frequently do not understand how the p-values are reported: for example, if there are five interim analyses planned, but the trial is stopped after the third interim analysis because the p-value was 0.01, then the overall p-value for the trial is still reported as <0.05 and not as 0.01.

Pocock边界易于使用，因为在每个临时分析中，P值阈值相同。缺点是必须在开始时固定临时分析的数量，并且在试验开始后不可能添加分析。另一个缺点是，调查人员和读者经常不了解如何报告P值：例如，如果计划进行五个临时分析，但是在第三次临时分析后停止了试验，因为P值为0.01，则该试验是在第三次中期分析，则应停止。该试验的总体p值仍报告为<0.05，不为0.01。

With the Pocock design early rejections are more likely.

2.2 O’Brien & Fleming Method

Pocock (1977) first proposed that the crossing boundary be constant for all equally spaced analyses. O’Brien and Fleming \((1979)\) suggested that the crossing boundaries for the kth analysis, \(z_{c}(k)\), be changed over the total number of analyses \(\mathrm{K}\) such that

\[ z_{c}(k)=z_{O B F} \sqrt{K / k} \] The O’Brien-Fleming boundaries have been used more frequently because they preserve a nominal significance level at the final analysis that is close to that of a single test procedure.

With the O’Brien & Fleming design the early rejection bounds are large and early rejections are rather unlikely, at least under \(\mathrm{H} _0\). With the O’Brien& Fleming design the last rejection bound is close to the un-adjusted bound \(Z_{\alpha / 2}\). This is not the case for Pocock’s design. In practice, the O’Brien & Fleming design is generally preferred, since early rejection without overwhelming evidence may not be convincing.

使用O’Brien＆Fleming设计，早期的拒绝界限很大且早期拒绝不太可能

2.3 Wang & Tsiatis Method

Wang & Tsiatis (1987) suggest the rejection boundaries

\[ u_{k}=\mathrm{c}_{\mathrm{WT}}(K, \alpha, \beta, \Delta) k^{\Delta-0.5} \]

For \(\Delta=0\) we obtain the O’Brien & Fleming boundaries.
For \(\Delta=0.5\) we obtain Pocock’s boundaries
For \(0<\Delta<0.5\) we obtain a compromise between \(\mathrm{P}\) and OBF.

2.4 Rejection bounds and Local p-values

Rejection bounds

Rejection regions with rejection bounds \(u_{k}, k=1, \ldots, K\). \[ \begin{array}{c} \mathcal{R}_{k}^{*}=\left\{\left|Z_{k}^{*}\right| \geq u_{k}\right\}=\left(-\infty,-u_{k}\right) \cup\left(u_{k}, \infty\right) \\ \mathcal{A}_{k}^{*}=\emptyset, \mathcal{C}_{k}^{*}=\left(-u_{k}, u_{k}\right) \text { for } k<K, \text { and } \mathcal{A}_{K}^{*}=\left(-u_{K}, u_{K}\right) \end{array} \] Design of Pocock (1977): Constant rejection bounds, i.e. \[u_{1}=\cdots=u_{K}=c_{\mathrm{P}} \text{with} c P=c P(K, \alpha)\] such that \[\mathbf{P}_{\mu_{0}}\left(\bigcup_{k=1}^{K}\left\{\left|Z_{k}^{*}\right| \geq c_{P}\right\}\right)=\alpha\] Design of O’Brien & Fleming (1979): Decreasing rejection bounds with \(u_{k}=c_{\mathrm{OBF}} / \sqrt{k}\) with \(c \mathrm{OBF}=c \mathrm{OBF}(K, \alpha)\)
\[\mathbf{P}_{\mu_{0}}\left(\bigcup_{k=1}^{K}\left\{\left|Z_{k}^{*}\right| \geq C_{\mathrm{OBF}} / \sqrt{k}\right\}\right)=\alpha\]

Local p-values

\[ \begin{array}{c} p_{k}^{*}:=2\left(1-\Phi\left(\left|Z_{k}^{*}\right|\right)\right), \quad k=1, \ldots, K \\ \text { Namely: } \quad\left|Z_{k}^{*}\right| \geq u_{k} \quad \Longleftrightarrow \quad p_{k}^{*} \leq \alpha_{k}:=2\left(1-\Phi\left(u_{k}\right)\right) \end{array} \]

Constant local levels in the Pocock design: \[\alpha_{1}=\cdots=\alpha_{K}=2\left(1-\Phi\left(C_{\mathrm{P}}\right)\right)\]
Decreasing local levels of the O’Brien & Fleming design: \[\alpha_{k}=2\left(1-\Phi\left(c_{\mathrm{OBF}} / \sqrt{k}\right)\right), \quad k=1, \ldots, K\]

2.5 Power and Sample size

As argued before, the maximum sample size \(N_{K}\) required for power \(1-\beta\) is a multiple of the fixed size \[ \begin{array}{l} n_{f}=\left(z_{\alpha / 2}+z_{\beta}\right)^{2} / \delta_{1}^{2} \\ N_{K}=n_{f} \cdot I(K, \alpha, \beta) \end{array} \] The inflation factor \(I(K, \alpha, \beta)\) depends on the type of design, e.g. whether Pocock or O’Brien & Fleming.

Average Sample Size

The required stage-wise sample size is \[n=N_{K} / K=n_{f} \cdot I(K, \alpha, \beta) / K\] Consequently, we get for the ASN \[\frac{\operatorname{ASN}(\mu)}{n_{f}}=\frac{I(K, \alpha, \beta)}{K} \underbrace{\left(1+\sum_{k=2}^{K} \mathbf{P}_{\vartheta_{1}}\left(Z_{1}^{*} \in \mathcal{C}_{1}^{*}, \ldots, Z_{k-1}^{*} \in \mathcal{C}_{k-1}^{*}\right)\right)}_{\text {average number of stages }}\]

3 Classical Designs with binding futility stopping

3.1 Symmetric designs

Pampallona & Tsiatis (1994) suggest to use a GSD with a symmetric interim acceptance region: \[ \mathcal{A}_{k}^{*}=\left[-u_{k k}^{0}, u_{k}^{0}\right], \quad \mathcal{C}_{k}^{*}=\left(-u_{k}^{1},-u_{k}^{0}\right) \cup\left(u_{k}^{0}, u_{k}^{1}\right), \quad k=1, \ldots, K \] for specific \(0<u_{k}^{0}<u_{k}^{1}(k<K)\) and \(0<u_{K}^{0}=u_{K}^{1}\) This means to accept \(H_{0}\) at stage k if \(|Z k| \leq u k^{0}\), and to reject \(H_{0}\) at stage k if \(\left|Z_{k}\right| \geq u_{k}^{1}\)

The method can be implemented by using the acceptance bounds \[ u_{k}^{0}:=\max (\underbrace{\left[\sqrt{k / K}\left(c^{0}+c^{1}\right)-c^{0}\right] K^{\Delta-0.5}}_{\text {the original } u_{k}^{0}}, 0) \]

3.2 One-Sided Designs

Without early acceptance we would now choose: \[ \begin{array}{c} \mathcal{C}_{k}^{*}=\left(-\infty, u_{k}\right), \quad \mathcal{A}_{k}^{*}=\emptyset, \quad k=1, \ldots, K-1 \\ \mathcal{R}_{k}^{*}=\left[u_{k}, \infty\right), \quad k=1, \ldots, K, \quad \mathcal{A}_{K}^{*}=\left(-\infty, u_{K}\right) \end{array} \] Type I error rate control: \[ \mathbf{P}_{\mu_{0}}\left(\bigcup_{k=1}^{K}\left\{Z_{k}^{*} \geq u_{k}\right\}\right)=\alpha \] One-sided GSD with early acceptance: DeMets & Ware \((1980,1982)\) suggest to use a constant futility boundary \(u^{L}<u_{k}\) for all \(k \leq K-1\), With this futility boundary the decision regions become \[ \begin{array}{c} \mathcal{C}_{k}^{*}=\left(u^{L}, u_{k}\right), \quad \mathcal{A}_{k}^{*}=\left(-\infty, u^{L}\right), \quad k=1, \ldots, K-1 \\ \mathcal{R}_{k}^{*}=\left(u_{k}, \infty\right), \quad k=1, \ldots, K, \quad \mathcal{A}_{K}^{*}=\left(-\infty, u_{K}\right) \end{array} \]

4 Alpha Spending Function Approach

GSD with unequally sized stages

Assume that we have planned the GSD with equally spaced stages, but the stage-wise sample sizes \(n_k\) are unequal. We can calculate all boundaries (e.g. Pocock or O’Brien & Fleming) also with unequal stages.

如果只是为了控制假阳性，可以将提前比较视为多次比较，使用多重检验方法进行校正。但是实验中数据相关性较强，多重校正会增大假阴性错误，并不合适。alpha消耗函数的思路是将假阳性错误按照某种方案分配给每次比较，每次比较消耗一定的假阳性配额，合计后刚好等于预设水平。当某个临床研究分若干阶段进行整体决策时（如基于有效性或无效性所做的期中分析），每个阶段都要消耗一定的α，随着研究进展，研究所完成的比例（如1/3、1/2、60%等）与累积的I类错误率呈现某种函数关系. 每次消耗为 \[ \alpha_{1}^{*}=\alpha\left(t_{1}^{*}\right)-\alpha(0),\] \[\alpha_{2}^{*}=\alpha\left(t_{2}^{*}\right)-\alpha\left(t_{1}^{*}\right), \ldots,\] \[\alpha_{n}^{*}=\alpha(1)-\alpha\left(t_{n-1}^{*}\right)\] 根据定义可知\(\alpha_{1}^{*}=\alpha\left(t_{1}^{*}\right), \sum \alpha^{*}=\alpha,\)

It is usually difficult or even impossible to achieve the pre-planned sample sizes perfectly. One reason is that we need to fix a date for the DMC to meet. So solve this issue, Lan & DeMets suggest to fix the maximum sample size \(\mathrm{N}\) and a “spending function at level” \(\alpha^{*}(t), \quad t \in[0,1]\) that is strictly increasing with \(\alpha(0)=0\) and \(\alpha(1)=\alpha\)

In the 1 st interim analysis (IA) calculate \(t_{1}=n_{1} / \mathrm{N}\) and \(u_{1}, \mathrm{~s}\).th. \(\mathbf{P}\left(\left|Z_{1}^{*}\right| \geq u_{1}\right)=\alpha^{*}\left(t_{1}\right)\)
In the 2nd IA we calculate \(t_{2}=\left(n_{1}+n_{2}\right) / N\) and \(u_{2}, \mathrm{~s}\).th. \(\alpha_{1}^{*}\left(t_{1}\right)+\mathbf{P}\left(\left|Z_{1}^{*}\right|<u_{1},\left|Z_{2}^{*}\right| \geq u_{2}\right)=\alpha^{*}\left(t_{2}\right)\)
In the \(k^{\text {th }}\) IA calculate \(t k=(n 1+\cdots+n k) / N\) and \(u_{k}\)

\[ \alpha^{*}\left(t_{k-1}\right)+\mathbf{P}\left(\left|Z_{1}^{*}\right|<u_{1}, \ldots,\left|Z_{k-1}^{*}\right|<u_{k-1},\left|Z_{k}^{*}\right| \geq u_{k}\right)=\alpha^{*}\left(t_{k}\right) \] This means to use the level \(\alpha^{*}\left(t_{k}\right)\) up to stage \(k\)

Examples for spending functions

To mimic Pocock’s design (Lan & DeMets, 1983): \(\alpha_{1}^{*}(t)=\alpha \log (1+(e-1) t), \quad t \in[0,1]\)
To mimic the O’Brien & Fleming design (Lan & DeMets, 1983): \(\alpha_{2}^{*}(t)=\left\{\begin{array}{l}4\left\{1-\phi^{-1}\left[\Phi(1-\alpha / 4) / \sqrt{t_{k}}\right]\right\} \\ 4\left\{1-\phi^{-1}\left[\Phi(1-\alpha / 2) / \sqrt{t_{k}}\right]\right\} & \begin{array}{l}\text { one-sided } \\ \text { two-sided }\end{array}\end{array}\right.\)
Spending function family of Kim & DeMets (1987) \(\alpha_{3}^{*}(\rho, t)=\alpha t^{\rho}, \quad t \in[0,1], \quad\) for some \(\rho>0\)

5 R Implementation using rpact

R Package for Adaptive Clinical Trials

Shiny Apps

5.1 Basic Functions

Design Functions

getDesignGroupSequential()
getDesignInverseNormal()
getDesignFisher()
getDesignCharacteristics()

Sample Size Calculation Functions

getSampleSizeMeans()
getSampleSizeRates()
getSampleSizeSurvival()

Power Calculation Functions

getPowerMeans()
getPowerRates()
getPowerSurvival()

Simulation Functions

getSimulationMeans()
getSimulationRates()
getSimulationSurvival()
getSimulationMultiArmMeans()
getSimulationMultiArmRates()
getSimulationMultiArmSurvival()

Dataset and Analysis Results Functions

getDataset()
getAnalysisResults()
getStageResults()

5.2 `getDesignGroupSequential` defining efficacy boundaries

typeOfDesign = c("OF", "P", "WT", "HP", "WToptimum", 
                "asP", "asOF", "asKD", "asHSD", "asUser")

library(rpact)
# Standard O’Brien & Fleming boundary
design <- getDesignGroupSequential(sided = 1, alpha = 0.025, 
    informationRates = c(0.33, 0.67, 1), typeOfDesign = "OF")

# User-defined α-spending functions
# User-defined αα-spending functions (typeOfDesign = "asUser") can be obtained via the argument userAlphaSpending which must contain a numeric vector with elements 0<α1<…<αkMax=α0<α1<…<αkMax=α that define the values of the cumulative alpha-spending function at each interim analysis.
# Example: User-defined alpha-spending function which is very conservative at first interim (spend alpha = 0.001), conservative at second (spend an additional alpha = 0.01, i.e., total cumulative alpha spent is 0.011 up to second interim), and spends the remaining alpha at the final analysis (i.e., cumulative alpha = 0.025)
design <- getDesignGroupSequential(sided = 1, alpha = 0.025, 
    informationRates = c(0.33, 0.67, 1),
    typeOfDesign = "asUser",
    userAlphaSpending = c(0.001, 0.01 + 0.001, 0.025))


# O’Brien & Fleming type α-spending
design <- getDesignGroupSequential(sided = 1, alpha = 0.025, 
                                   informationRates = c(0.33, 0.67, 1), 
                                   typeOfDesign = "asOF")


# plot the design with default type 1 (Boundary Plot)
plot(design)

## plot.TrialDesign
## 1: creates a 'Boundaries' plot
## 3: creates a 'Stage Levels' plot
## 4: creates a 'Error Spending' plot
## 5: creates a 'Power and Early Stopping' plot
## 6: creates an 'Average Sample Size and Power / Early Stop' plot
## 7: creates an 'Power' plot
## 8: creates an 'Early Stopping' plot
## 9: creates an 'Average Sample Size' plot
## "all": creates all available plots and returns it as a grid plot or list

# summary() creates a nice presentation of the design:
summary(design)

Sequential analysis with a maximum of 3 looks (group sequential design)

O’Brien & Fleming type alpha spending design, one-sided overall significance level 2.5%, power 80%, undefined endpoint, inflation factor 1.013, ASN H1 0.8657, ASN H01 0.9827, ASN H0 1.0109.

Stage	1	2	3
Planned information rate	33%	67%	100%
Cumulative alpha spent	<0.0001	0.0062	0.0250
Stage levels (one-sided)	<0.0001	0.0061	0.0231
Efficacy boundary (z-value scale)	3.731	2.504	1.994
Cumulative power	0.0174	0.4227	0.8000

# display the design characteristics
# Stopping probabilities and expected sample size reduction
getDesignCharacteristics(design)

Group sequential design characteristics

Number of subjects fixed: 7.8489
Shift: 7.9510
Inflation factor: 1.0130
Informations: 2.624, 5.327, 7.951
Power: 0.01739, 0.42271, 0.80000
Rejection probabilities under H1: 0.01739, 0.40532, 0.37729
Futility probabilities under H1: 0, 0
Ratio expected vs fixed sample size under H1: 0.8657
Ratio expected vs fixed sample size under a value between H0 and H1: 0.9827
Ratio expected vs fixed sample size under H0: 1.0109

## Comparison of multiple designs
# O'Brien & Fleming, 3 equally spaced stages
d1 <- getDesignGroupSequential(typeOfDesign = "OF", kMax = 3) 
# Pocock
d2 <- getDesignGroupSequential(typeOfDesign = "P", kMax = 3)
designSet <- getDesignSet(designs = c(d1, d2), variedParameters = "typeOfDesign")
plot(designSet, type = 1)

## futilityBounds defining futility boundaries

A futility bound of z=0 corresponds to an estimated treatment effect of zero or “null”, i.e., in this case futility stopping is recommended if the treatment effect estimate at the interim analysis is zero or “goes in the wrong direction”.
utility bounds of z=−∞ (which are numerically equivalent to z=−6) correspond to no futility stopping at an interim.

# Example: non-binding futility boundary at each interim in case  
# estimated treatment effect is null or goes in "the wrong direction"
# See Futility boundary (z-value scale) 
design <- getDesignGroupSequential(sided = 1, alpha = 0.025, 
    informationRates = c(0.33, 0.67, 1), typeOfDesign = "asOF",
    futilityBounds = c(0,0), bindingFutility = FALSE)
summary(design)

Sequential analysis with a maximum of 3 looks (group sequential design)

O’Brien & Fleming type alpha spending design, non-binding futility, one-sided overall significance level 2.5%, power 80%, undefined endpoint, inflation factor 1.0605, ASN H1 0.8628, ASN H01 0.8689, ASN H0 0.6589.

Stage	1	2	3
Planned information rate	33%	67%	100%
Cumulative alpha spent	<0.0001	0.0062	0.0250
Stage levels (one-sided)	<0.0001	0.0061	0.0231
Efficacy boundary (z-value scale)	3.731	2.504	1.994
Futility boundary (z-value scale)	0	0
Cumulative power	0.0191	0.4430	0.8000
Futility probabilities under H1	0.049	0.003

6 Sample Size

6.1 Sample Sizes for Different Types of Endpoints without IA

Figure: Sample Sizes for Different Types of Endpoints without IA

6.2 Two groups continuous endpoint (without IA)

## without interim analyses witrh equal sample size of 2 arms
## Assumption: Targeted mean difference is >0 under the alternative hypothesis
# Example of a standard trial:
# - targeted mean difference is 10 (alternative = 10)
# - standard deviation in both arms is assumed to be 24 (stDev = 24)
# - two-sided test (sided = 2), Type I error 0.05 (alpha = 0.05) and power 80% 
# - (beta = 0.2)
sampleSizeResult <- getSampleSizeMeans(alternative = 10, stDev = 24, sided = 2, 
         alpha = 0.05, beta = 0.2)  
# sampleSizeResult
summary(sampleSizeResult)

Sample size calculation for a continuous endpoint

Fixed sample analysis, two-sided significance level 5%, power 80%. The results were calculated for a two-sample t-test, H0: mu(1) - mu(2) = 0, H1: effect = 10, standard deviation = 24.

Stage	Fixed
Stage level (two-sided)	0.0500
Efficacy boundary (z-value scale)	1.960
Lower efficacy boundary (t)	-7.006
Upper efficacy boundary (t)	7.006
Number of subjects	182.8

Legend:

(t): treatment effect scale

## Unequal randomization between the treatment groups
# - 2(intervention):1(control) randomization (allocationRatioPlanned = 2) 
summary(getSampleSizeMeans(alternative = 10, stDev = 24, 
    allocationRatioPlanned = 2, sided = 2, alpha = 0.05, beta = 0.2))

Sample size calculation for a continuous endpoint

Fixed sample analysis, two-sided significance level 5%, power 80%. The results were calculated for a two-sample t-test, H0: mu(1) - mu(2) = 0, H1: effect = 10, standard deviation = 24, planned allocation ratio = 2.

Stage	Fixed
Stage level (two-sided)	0.0500
Efficacy boundary (z-value scale)	1.960
Lower efficacy boundary (t)	-7.004
Upper efficacy boundary (t)	7.004
Number of subjects	205.4

Legend:

(t): treatment effect scale

## Calculate power for the 2:1 rendomized trial with total sample size 206 
## (as above) assuming a larger difference of 12
powerResult <- getPowerMeans(alternative = 12, stDev = 24, sided = 2, 
    allocationRatioPlanned = 2, maxNumberOfSubjects = 206, alpha = 0.05)
summary(powerResult)

Power calculation for a continuous endpoint

Fixed sample analysis, two-sided significance level 5%. The results were calculated for a two-sample t-test, H0: mu(1) - mu(2) = 0, H1: effect = 12, standard deviation = 24, number of subjects = 206, planned allocation ratio = 2.

Stage	Fixed
Stage level (two-sided)	0.0500
Efficacy boundary (z-value scale)	1.960
Lower efficacy boundary (t)	-6.994
Upper efficacy boundary (t)	6.994
Power	0.9203
Number of subjects	206.0

Legend:

(t): treatment effect scale

## Two groups continuous endpoint (non-inferiority)
## Plpot the overall power
# Example: Calculate power for design with sample size 206 as above
# alternative values ranging from 5 to 15
powerResult <- getPowerMeans(alternative = 5:15, stDev = 24, sided = 2, 
    allocationRatioPlanned = 2, maxNumberOfSubjects = 206, alpha = 0.05)
plot(powerResult,type = 7) # one of several possible plots

# - One-sided alpha = 0.05, 1:1 randomization
# - H0: treatment difference <= -12 (i.e., = -12 for calculations, thetaH0 = -1) 
#    vs. alternative H1: treatment difference = 0 (alternative = 0)
sampleSizeNoninf <- getSampleSizeMeans(thetaH0 = -12,alternative = 0,
    stDev = 14,alpha = 0.025,beta = 0.2,sided = 1) 
sampleSizeNoninf

Design plan parameters and output for means

Design parameters

Critical values: 1.960
Significance level: 0.0250
Type II error rate: 0.2000
Test: one-sided

User defined parameters

Theta H0: -12
Alternatives: 0
Standard deviation: 14

Default parameters

Mean ratio: FALSE
Normal approximation: FALSE
Treatment groups: 2
Planned allocation ratio: 1

Sample size and output

Number of subjects fixed: 44.7
Number of subjects fixed (1): 22.4
Number of subjects fixed (2): 22.4
Critical values (treatment effect scale): -3.556

Legend

(i): values of treatment arm i

6.3 Two groups binary endpoint (without IA)

# - probability 25% in control (pi2 = 0.25) vs 40% (pi1 = 0.4) in intervention
# - one-sided test (sided = 1)
# - Type I error 0.025 (alpha = 0.025) and power 80% (beta = 0.2)
sampleSizeResult <- getSampleSizeRates(pi2 = 0.25, pi1 = 0.4,
    sided = 1, alpha = 0.025, beta = 0.2) 
summary(sampleSizeResult)

Sample size calculation for a binary endpoint

Fixed sample analysis, one-sided significance level 2.5%, power 80%. The results were calculated for a two-sample test for rates (normal approximation), H0: pi(1) - pi(2) = 0, H1: pi(1) = 0.4, control rate pi(2) = 0.25.

Stage	Fixed
Stage level (one-sided)	0.0250
Efficacy boundary (z-value scale)	1.960
Efficacy boundary (t)	0.103
Number of subjects	303.7

Legend:

(t): treatment effect scale

# Example: Calculate power for a simple trial with total sample size 304 
# as in the example above in case of pi2 = 0.25 (control) and 
# pi1 = 0.37 (intervention)
powerResult <- getPowerRates(pi2 = 0.25, pi1 = 0.37, allocationRatioPlanned = 2,
    maxNumberOfSubjects = 304, sided = 1,alpha = 0.025) 
summary(powerResult)

Power calculation for a binary endpoint

Fixed sample analysis, one-sided significance level 2.5%. The results were calculated for a two-sample test for rates (normal approximation), H0: pi(1) - pi(2) = 0, power directed towards larger values, H1: pi(1) = 0.37, control rate pi(2) = 0.25, number of subjects = 304, planned allocation ratio = 2.

Stage	Fixed
Stage level (one-sided)	0.0250
Efficacy boundary (z-value scale)	1.960
Efficacy boundary (t)	0.112
Power	0.5571
Number of subjects	304.0

Legend:

(t): treatment effect scale

# Example: Calculate power for simple design (with sample size 304 as above)
# for probabilities in intervention ranging from 0.3 to 0.5
powerResult <- getPowerRates(pi2 = 0.25,pi1 = seq(0.3,0.5,by = 0.01),
    maxNumberOfSubjects = 304,sided = 1,alpha = 0.025) 
# one of several possible plots, this one plotting true effect size vs power
plot(powerResult,type = 7)

## for a single arm trial without interim analyses
# Example: Sample size for a single arm trial which tests
# H0: pi = 0.1 vs. H1: pi = 0.25
# (use conservative exact binomial calculation)
samplesSizeResults <- getSampleSizeRates(groups = 1, thetaH0 = 0.1, pi1 = 0.25, 
    normalApproximation = FALSE, sided = 1, alpha = 0.025, beta = 0.2)
summary(samplesSizeResults)

Sample size calculation for a binary endpoint

Fixed sample analysis, one-sided significance level 2.5%, power 80%. The results were calculated for a one-sample test for rates (exact test, conservative solution), H0: pi = 0.1, H1: pi = 0.25.

Stage	Fixed
Stage level (one-sided)	0.0250
Efficacy boundary (z-value scale)	1.960
Efficacy boundary (t)	0.181
Number of subjects	53.0

Legend:

(t): treatment effect scale

6.4 Group-sequential designs for conti and binary

Sample size calculation for a group-sequential trials is performed in two steps:

Define the (abstract) group-sequential design using the function getDesignGroupSequential(). For details regarding this step, see the R markdown file “Defining group-sequential boundaries with rpact”.
Calculate sample size for the continuous endpoint by feeding the abstract design into the function getSampleSizeMeans().

# Example: Group-sequential design with  O'Brien & Fleming type alpha-spending 
# and one interim at 60% information
design <- getDesignGroupSequential(sided = 1, alpha = 0.025, beta = 0.2,
    informationRates = c(0.6,1), typeOfDesign = "asOF")

# Trial assumes an effect size of 10 as above, a stDev = 24, and an allocation 
# ratio of 2
sampleSizeResultGS <- getSampleSizeMeans(
    design, alternative = 10, stDev = 24, allocationRatioPlanned = 2) 
# Standard rpact output (sample size object only, not design object)
summary(sampleSizeResultGS)

Sample size calculation for a continuous endpoint

Sequential analysis with a maximum of 2 looks (group sequential design), one-sided overall significance level 2.5%, power 80%. The results were calculated for a two-sample t-test, H0: mu(1) - mu(2) = 0, H1: effect = 10, standard deviation = 24, planned allocation ratio = 2.

Stage	1	2
Planned information rate	60%	100%
Cumulative alpha spent	0.0038	0.0250
Stage levels (one-sided)	0.0038	0.0238
Efficacy boundary (z-value scale)	2.669	1.981
Efficacy boundary (t)	12.393	7.050
Cumulative power	0.3123	0.8000
Number of subjects	124.3	207.1
Expected number of subjects under H1		181.3
Exit probability for efficacy (under H0)	0.0038
Exit probability for efficacy (under H1)	0.3123

Legend:

(t): treatment effect scale

# Example: Group-sequential design with  O'Brien & Fleming type alpha-spending and 
# one interim at 60% information
design <- getDesignGroupSequential(sided = 1, alpha = 0.025, beta = 0.2,
    informationRates = c(0.6, 1), typeOfDesign = "asOF")

# Sample size calculation assuming event probabilities are 25% in control 
# (pi2 = 0.25) vs 40% (pi1 = 0.4) in intervention
sampleSizeResultGS <- getSampleSizeRates(design,pi2 = 0.25,pi1 = 0.4) 
# Standard rpact output (sample size object only, not design object)
sampleSizeResultGS

Design plan parameters and output for rates

Design parameters

Information rates: 0.600, 1.000
Critical values: 2.669, 1.981
Futility bounds (non-binding): -Inf
Cumulative alpha spending: 0.003808, 0.025000
Local one-sided significance levels: 0.003808, 0.023798
Significance level: 0.0250
Type II error rate: 0.2000
Test: one-sided

User defined parameters

Assumed treatment rate: 0.400
Assumed control rate: 0.250

Default parameters

Risk ratio: FALSE
Theta H0: 0
Normal approximation: TRUE
Treatment groups: 2
Planned allocation ratio: 1

Sample size and output

Direction upper: TRUE
Maximum number of subjects: 306.3
Maximum number of subjects (1): 153.2
Maximum number of subjects (2): 153.2
Number of subjects [1]: 183.8
Number of subjects [2]: 306.3
Reject per stage [1]: 0.3123
Reject per stage [2]: 0.4877
Early stop: 0.3123
Expected number of subjects under H0: 305.9
Expected number of subjects under H0/H1: 299.3
Expected number of subjects under H1: 268.1
Critical values (treatment effect scale) [1]: 0.187
Critical values (treatment effect scale) [2]: 0.104

Legend

(i): values of treatment arm i
[k]: values at stage k

7 Survival Group Sequential Designs

These designs are used in trials where the endpoint is time until the occurrence of a specific event (like death or disease progression). The complexity in these designs often comes from several factors:

Patient Follow-Up: Decisions need to be made whether to follow patients until an event occurs or only for a fixed period, which can impact the detection of treatment effects and the overall study duration.
Non-Proportional Hazards: This occurs when the risk (hazard) of an endpoint varies over time between treatment groups. Handling non-proportional hazards requires more sophisticated statistical techniques to ensure accurate interpretations of the treatment effect over the study period.
Complex Accrual, Survival, Dropout: Managing varying rates of patient accrual, different survival rates, and dropout rates can complicate the analysis and interpretation of the trial data.
Stratified Analyses: These are used to control for factors that might affect the outcome, allowing for more precise estimates of the treatment effect within subgroups of patients.

7.1 rpact for Survival endpoint

The relevant rpact functions for survival are:

getPowerSurvival(): This function is the analogue to getSampleSizeSurvival() for the calculation of power rather than the sample size.
getEventProbabilities(): Calculates the probability of an event depending on the time and type of accrual, follow-up time, and survival distribution. This is useful for aligning interim analyses for different time-to-event endpoints.
getSimulationSurvival(): This function simulates group-sequential trials. For example, it allows to assess the power of trials with delayed treatment effects or to assess the data-dependent variability of the timing of interim analyses even if the protocol assumptions are perfectly fulfilled. It also allows to simulate hypothetical datasets for trials stopped early.

7.1.1 Specifying survival distributions

Exponential survival distributions

Event probability at a specific time point known eventTime = 24, pi2 = 0.3, pi1 = 0.2
Exponential parameter \(\lambda\) known
Median survival known \(\lambda = \log(2)/\text{median}\)

Weibull survival distributions

Additional scale parameter kappa needs to be provided which is 1 for the exponential distribution.

7.1.2 Without interim analyses

Exponential survival, flexible accrual intensity, no interim analyses

Exponential PFS with a median PFS of 60 months in control (lambda2 = log(2)/60) and a target hazard ratio of 0.74 (hazardRatio = 0.74).
Log-rank test at the two-sided 5%-significance level (sided = 2, alpha = 0.05), power 80% (beta = 0.2).
Annual drop-out of 2.5% in both arms (dropoutRate1 = 0.025, dropoutRate2 = 0.025, dropoutTime = 12).
Recruitment is 42 patients/month from month 6 onwards after linear ramp up. (accrualTime = c(0,1,2,3,4,5,6), accrualIntensity = c(6,12,18,24,30,36,42))
Randomization ratio 1:1 (allocationRatioPlanned = 1). This is the default and is thus not explicitly set in the function call below.
Two sample size choices will be initially explored:
- A fixed total sample size of 1200 (maxNumberOfSubjects = 1200).
- Alternatively, the total sample size will be implicitly determined by specifying that every subject must have a minimal follow-up duration of at 12 months at the time of the analysis (followUpTime = 12).

sampleSize1 <- getSampleSizeSurvival(sided = 2,alpha = 0.05,beta = 0.2,
    lambda2 = log(2)/60,hazardRatio = 0.74,
    dropoutRate1 = 0.025, dropoutRate2 = 0.025, dropoutTime = 12,
    accrualTime = c(0,1,2,3,4,5,6), 
    accrualIntensity = c(6,12,18,24,30,36,42),
    maxNumberOfSubjects = 1200)
summary(sampleSize1)

Sample size calculation for a survival endpoint

Fixed sample analysis, two-sided significance level 5%, power 80%. The results were calculated for a two-sample logrank test, H0: hazard ratio = 1, H1: hazard ratio = 0.74, control lambda(2) = 0.012, number of subjects = 1200, accrual time = c(1, 2, 3, 4, 5, 6, 31.571), accrual intensity = c(6, 12, 18, 24, 30, 36, 42), dropout rate(1) = 0.025, dropout rate(2) = 0.025, dropout time = 12.

Stage	Fixed
Stage level (two-sided)	0.0500
Efficacy boundary (z-value scale)	1.960
Lower efficacy boundary (t)	0.810
Upper efficacy boundary (t)	1.234
Number of subjects	1200.0
Number of events	346.3
Analysis time	53.11
Expected study duration under H1	53.11

Legend:

(t): treatment effect scale

7.2 Adaptive Group Sequential Designs

Adaptive designs provide flexibility to modify trial parameters based on interim data without undermining the validity and integrity of the trial. These adjustments include:

Treatment Arms - MAMS GSD (Multi-Arm Multi-Stage Group Sequential Design): This approach allows several treatment arms to be tested simultaneously. Based on interim results, ineffective treatments can be dropped (“drop the loser”), and promising ones can be continued (“pick the winner”).
Sample Size - BSSR/USSR (Blinded Sample Size Reestimation/Unblinded Sample Size Reestimation): Adjustments to the sample size based on interim data to maintain statistical power or precision of estimates. BSSR is done without knowledge of the treatment groups’ data, whereas USSR involves using this information.
Patient Subgroups: Adaptations can also include focusing on specific patient subgroups that show more significant benefits or lesser side effects, which can be identified as the trial progresses.
Bayesian Approach: Uses prior distributions and updates probabilities with accumulating data, allowing for continuous learning and adaptation within the trial. This approach is particularly useful in adaptive GSDs for handling uncertainty and incorporating real-time data effectively.

8 Reference

Rufibach, K. (n.d.). Why do we do interim analyses in clinical trials? Retrieved from https://www.linkedin.com/pulse/why-do-we-interim-analyses-clinical-trials-kaspar-rufibach/?trackingId=4m6sV4oaRFeyrfPyQw%2Bdaw%3D%3D
Haybittle, J. L. (1971). Repeated assessment of results in clinical trials of cancer treatment. The British Journal of Radiology, 44, 793-797.
Peto, R., Pike, M. C., Armitage, P., Breslow, N. E., Cox, D. R., Howard, S. V., Mantel, N., McPherson, K., Peto, J., & Smith, P. G. (1976). Design and analysis of randomized clinical trials requiring prolonged observation of each patient. I. Introduction and design. British Journal of Cancer, 34(6), 585-612.
O’Brien, P. C., & Fleming, T. R. (1979). A multiple testing procedure for clinical trials. Biometrics, 35, 549-556.
Pocock, S. J. (1977). Group sequential methods in the design and analysis of clinical trials. Biometrika, 64(2), 191-199.
Whitehead, J., & Stratton, I. (1983). Group Sequential clinical trials with triangular continuation regions. Biometrics, 39, 227-236.
Whitehead, J. (2001). Use of the Triangular Test in Sequential Clinical Trials. In Handbook of Statistics in Clinical Oncology (pp. 211-228). New York: Marcel Dekker.
Lan, K. K. G., & DeMets, D. L. (1983). Discrete Sequential Boundaries for Clinical Trials. Biometrika, 70(3), 659-663.
Kim, K., & DeMets, D. L. (1987). Design and Analysis of Group Sequential Tests Based on the Type I Error Spending Rate Function. Biometrika, 74, 149-154.
Hwang, I. K., Shih, W. J., & DeCani, J. S. (1990). Group sequential designs using a family of type I error probability spending functions. Statistics in Medicine, 9, 1439-1445.
Lan, K. K. G., Rosenberger, W. F., & Lachin, J. M. (1993). Use of Spending Functions for Occasional or Continuous Monitoring of Data in Clinical Trials. Statistics in Medicine, 12(23), 2219-2231.
Demets, D. L., & Lan, K. G. (1994). Interim analysis: the alpha spending function approach. Statistics in Medicine, 13(13-14), 1341-1352.
Wang, S. K., & Tsiatis, A. A. (1987). Approximately optimal one-parameter boundaries for group sequential trials. Biometrics, 43, 193-199.
Pampallona, S., Tsiatis, A. A., & Kim, K. (2001). Interim monitoring of group sequential trials using spending functions for the type I and type II error probabilities. Drug Information Journal, 35, 1113-1121.
Emerson, S. S., & Fleming, T. R. (1989). Symmetric Group Sequential Designs. Biometrics, 45, 905-923.
Kittelson, J. M., & Emerson, S. S. (1999). A Unifying Family of Group Sequential Test Designs. Biometrics, 55, 874-882.
Jennison, C., & Turnbull, B. W. (1999). Group sequential methods with applications to clinical trials. CRC Press.
Gsponer, T., Gerber, F., Bornkamp, B., Ohlssen, D., Vandemeulebroecke, M., Schmidli, H. (2014). A Practical Guide to Bayesian Group Sequential Designs. Pharmaceutical Statistics, 13(1), 71-80.
Berry, S. M., Carlin, B. P., Lee, J. J., & Muller, P. (2010). Bayesian adaptive methods for clinical trials. CRC Press.
ICH E20 Concept Paper. Retrieved from https://database.ich.org/sites/default/files/E20_FinalConceptPaper_2019_1107_0.pdf
Bauer, P., Bretz, F., Dragalin, V., König, F., & Wassmer, G. (2016). Twenty‐five years of confirmatory adaptive designs: opportunities and pitfalls. Statistics in Medicine, 35(3), 325-347.
Kelly, P. J., Sooriyarachchi, M. R., Stallard, N., & Todd, S. (2005). A practical comparison of group-sequential and adaptive designs. Journal of Biopharmaceutical Statistics, 15(4), 719-738.
Cui, L., Hung, H. J., & Wang, S. J. (1999). Modification of sample size in group sequential clinical trials. Biometrics, 55(3), 853-857.
Mehta, C. R., & Tsiatis, A. A. (2001). Flexible Sample Size Considerations under Information Based Interim Monitoring. Drug Information Journal, 35, 1095-1112.
Chen, Y. J., DeMets, D. L., & Lan, K. K. (2004). Increasing the sample size when the unblinded interim result is promising. Statistics in Medicine, 23(7), 1023-1038.
Freidlin, B., & Korn, E. L. (2017). Sample size adjustment designs with time-to-event outcomes: a caution. Clinical Trials, 14(6), 597-604.
Muller, H-H., & Schafer, H. (2001). Adaptive group sequential designs for clinical trials: Combining the advantages of adaptive and of classical group sequential approaches. Biometrics, 57, 886-891.
Wassmer, G. (2006). Planning and analyzing adaptive group sequential survival trials. Biometrical Journal: Journal of Mathematical Methods in Biosciences, 48(4), 714-729.
Zelen, M. (1969). Play the winner rule and the controlled clinical trial. Journal of the American Statistical Association, 64(325), 131-146.
Ghosh, P., Liu, L., Senchaudhuri, P., Gao, P., & Mehta, C. (2017). Design and monitoring of multi‐arm multi‐stage clinical trials. Biometrics, 73(4), 1289-1299.
Ghosh, P., Liu, L., & Mehta, C. (2020). Adaptive multiarm multistage clinical trials. Statistics in Medicine, 73(4), 1289-1299.

Statistical Considerations for Group Sequential Design

1 Introduction

1.1 Why do We Perform Interim Analyses

1.2 Key Components of Interim Analysis

1.3 Fully Sequential vs. Group Sequential Designs

1.4 Types of Group Sequential Design

1.5 GSD Simulation

2 Classical Designs without futility stopping

2.1 Pocock Method

2.2 O’Brien & Fleming Method

2.3 Wang & Tsiatis Method

2.4 Rejection bounds and Local p-values

2.5 Power and Sample size

3 Classical Designs with binding futility stopping

3.1 Symmetric designs

3.2 One-Sided Designs

4 Alpha Spending Function Approach

5 R Implementation using rpact

5.1 Basic Functions

5.2 getDesignGroupSequential defining efficacy boundaries

6 Sample Size

6.1 Sample Sizes for Different Types of Endpoints without IA

6.2 Two groups continuous endpoint (without IA)

6.3 Two groups binary endpoint (without IA)

6.4 Group-sequential designs for conti and binary

7 Survival Group Sequential Designs

7.1 rpact for Survival endpoint

7.1.1 Specifying survival distributions

7.1.2 Without interim analyses

7.2 Adaptive Group Sequential Designs

8 Reference

5.2 `getDesignGroupSequential` defining efficacy boundaries