Power for Complex Hypotheses
Summary
Choosing an effect size for sample size determination depends on factors such as scientific & clinical considerations, uncertainty about the effect size & practical trial resources available.
While traditionally estimated effect sizes have been used, there is increasing guidance favoring the use of MCID for more scientifically relevant SSD (see DELTA guidance).
Sensitivity analysis and assurance help evaluate the effect of effect size uncertainty on power and sample size and can quantify the robustness of the study design to effect size deviations
Promising zone designs offer adaptive approach that can bridges conventional and MCID perspectives by allowing to power initially on expected effect size but increase sample size at interim analyses for lower but still promising results
Objective of Clinical Trials: The main goal is to evaluate the efficacy and safety of a new treatment.
Definition of Efficacy: The term “efficacy” can have different meanings depending on the clinical and regulatory context. For instance, it could be the basis for:
Changing Hypotheses: Depending on the regulatory scenario, the hypothesis regarding the efficacy of a treatment may vary from proving superiority (a new treatment is better than existing options) to proving other aspects like equivalence or non-inferiority.
Types of Hypotheses in Clinical Trials:
Statistical Definitions:
Pater C. Equivalence and noninferiority trials - are they viable alternatives for registration of new drugs? (III). Curr Control Trials Cardiovasc Med. 2004 Aug 17;5(1):8. doi: 10.1186/1468-6708-5-8. PMID: 15312236; PMCID: PMC514891.
Equivalence testing in clinical trials aims to establish that the efficacy and safety of a new treatment are equivalent to those of an existing treatment within pre-defined margins. This is critical in generic drug development and biosimilar approval processes, where demonstrating similarity to an established product is necessary for regulatory approval.
Equivalence testing plays a crucial role in ensuring that new or alternative treatments provide therapeutic results consistent with existing options, without significant deviations that could affect efficacy or safety. This testing framework supports regulatory and clinical decisions, helping to maintain high standards in drug development and approval processes, and ensuring that patients receive effective and safe therapeutic alternatives.
Overview of Equivalence Testing
Common Applications
Setting Equivalence Margins
Defining Equivalence
These methods are tailored to the type of data and the statistical test used: - Means: Schuirmann’s dual criterion method is popular for continuous outcomes, ensuring that the sample size is adequate to detect or reject equivalence within specified margins. - Proportions: Methods by Miettinen & Nurminen and Farrington & Manning focus on calculating the required sample size to detect a significant difference in proportions, ensuring that the observed proportion falls within the predefined equivalence margins. - Survival/Time-to-Event: Schoenfeld and others provide formulas based on survival analysis metrics to ensure enough events occur during the study to confidently assess equivalence. - Counts/Incidence Rates: Zhu, Tang, and others have developed methods suitable for count data, often seen in epidemiological studies.
The example formulas provided in the slide use standard parameters for hypothesis testing: - For Means: \[ n = \frac{(Z_\alpha + Z_\beta/2)^2 \sigma^2}{(\delta - |e|)^2} \] Here, \(Z_\alpha\) and \(Z_\beta\) are the critical values for type I and type II errors, \(\sigma^2\) is the variance, \(\delta\) is the equivalence margin, and \(|e|\) is the expected difference.
For Proportions: \[ n = \frac{(Z_\alpha + Z_\beta/2)^2 p(1-p)}{(\delta - |e|)^2} \] Where \(p\) represents the proportion in the reference group.
For Survival/Time-to-Event: \[ n = \frac{(Z_\alpha + Z_\beta/2)^2}{(\delta - |b|)^2 p_1 p_2 d} \] \(b\) is derived from the log hazard ratios, \(p_1\) and \(p_2\) are the probabilities of event occurrence, and \(d\) integrates the variance over time.
For Counts/Incidence Rates: \[ n = \frac{(Z_\alpha V_0 + Z_\beta V_1)^2}{\delta^2} \] Where \(V_0\) and \(V_1\) represent the variances based on different rates in the treatment and control groups.
A case study on non-inferiority testing for comparing two types of stents—sirolimus-eluting and paclitaxel-eluting—in diabetic patients concerning in-segment late luminal loss, which is a measure used to assess the efficacy of stents in preventing re-narrowing of the artery after implantation.
Objective: To determine if paclitaxel-eluting stents are not inferior to sirolimus-eluting stents by a specified margin regarding in-segment late luminal loss.
Non-Inferiority Margin:
Statistical Design:
Standard Deviation (SD):
Sample Size:
Interpretation:
The clinical trial aims to compare the efficacy of ketamine to electroconvulsive therapy (ECT) for the treatment of nonpsychotic treatment-resistant major depression. The primary endpoint of interest is the proportion of patients who respond to treatment.
Key Parameters - Non-Inferiority Margin (Δ₀): -10 percentage points. This margin is chosen to define the maximum allowable inferiority of ketamine compared to ECT. In essence, ketamine’s response rate should not be more than 10 percentage points lower than that of ECT to consider ketamine non-inferior. - Expected Difference (Δ): 5 percentage points. This is the hypothesized actual difference in the response rate between ketamine and ECT, favoring ECT. - Standard Proportion (π₂): 50%. This is the expected response rate for ECT based on previous studies or expert opinion. - Significance Level (α): 2.5% one-sided. This lower alpha level reflects the stringent criteria for declaring non-inferiority, thus reducing the risk of type I error. - Sample Size: 346 participants in total. This size is calculated to achieve the desired statistical power while accounting for the expected difference and non-inferiority margin.
Methodology
The Farrington-Manning method for power calculation was used. This approach is specifically tailored for non-inferiority and equivalence trials involving two proportions. It adjusts for the fact that the non-inferiority margin and expected difference could alter the traditional power calculation dynamics.
Statistical Considerations
Clinical Implications
This study design allows clinicians and researchers to evaluate whether ketamine, which might be less invasive or have different side effects profiles compared to ECT, can be a viable treatment option without significantly compromising on efficacy. The choice of a -10 percentage point margin as non-inferiority criteria balances clinical judgment and statistical rigor, ensuring that any clinically meaningful deterioration in efficacy (from the perspective of patient outcomes) is detected.
A clinical trial designed to test the superiority of adjustable intragastric balloons (aIGB) for obesity treatment over a control, using non-adjustable intragastric balloons (IGBs). This type of trial, targeting a measure called total body loss (TBL), is structured to determine whether the difference in TBL between the two groups is significant and clinically meaningful.
Study Design
Statistical Parameters
Sample Size Calculation
A study design of a Phase 3 equivalence trial comparing the efficacy of a new drug (MW032) with an innovator drug (Denosumab) in treating solid tumor-related bone metastases. The case study focuses on a specific pharmacokinetic marker: the change in the logarithm of the urine N-telopeptide to creatinine ratio (log uNTx/uCr) from baseline to week 13. Let’s break down the key components and the setup of this clinical trial:
Trial Objectives and Design - Objective: To demonstrate that the new drug, MW032, is equivalent to Denosumab in terms of their effect on uNTx/uCr, a marker of bone resorption. - Primary Endpoint: The mean difference in log uNTx/uCr values at week 13 from baseline between the two treatments.
Statistical Setup - Equivalence Margins: These are set at -0.135 and 0.135. These margins define the limits within which the two treatments’ effects must fall to be considered equivalent. The choice of these margins is based on half of the upper limit of the 50% confidence interval of the difference observed in a pivotal study, indicating a precise and scientifically justified range. - Expected Mean Difference: Set at 0, indicating that under the null hypothesis, there is no difference between the new drug and the innovator drug. - Significance Level: 5% (two-sided), which is standard for clinical trials, providing a balance between type I error control and statistical power. - Standard Deviation: 0.58, reflecting variability in the measurement of the log uNTx/uCr across participants. - Sample Size: 317 patients per group are required to achieve 80% power, ensuring a high probability of detecting equivalence if it truly exists. - Power: 80%, typical for clinical trials, indicating a strong likelihood of correctly rejecting the null hypothesis if the new drug is indeed equivalent to the standard treatment.
Interpretation of Results - 95% Confidence Interval of Difference: The reported interval from the pivotal study is [-0.444, -0.188], suggesting a significant difference favoring Denosumab over MW032 in the earlier study. However, for the purpose of this trial, the upper limit of this interval is used to establish a conservative equivalence margin. - Sample Size Justification: The required sample size of 317 per group is calculated based on the standard deviation and the desired power to detect differences within the specified equivalence margins, ensuring the trial is adequately powered to confirm or refute equivalence. - Equivalence Testing: This is crucial in the context of biosimilars or second-generation formulations where therapeutic equivalence to an established treatment must be demonstrated without significant reductions in efficacy or safety. - Regulatory Approval: Successfully demonstrating equivalence within the defined margins can lead to regulatory approval for the new drug, offering a similar therapeutic option to patients and potentially affecting market dynamics with a new competitor to Denosumab.
Food and Drug Administration. Non-inferiority clinical trials to establish effectiveness. Guidance for industry [EB/OL]. November 2016[2024-10-05]. https://www.fda.gov/downloads/Drugs/Guidances/UCM202140.pdf.
European Medicines Agency. GUIDELINE ON THE CHOICE OF THE NON-INFERIORITY MARGIN [EB/OL]. January 2006[2024-10-05]. https://www.ema.europa.eu/en/documents/scientific-guideline/guideline-choice-non-inferiority-margin_en.pdf.
International Conference on Harmonisation. Choice of control group and related issues in clinical trials; availability [N]. PubMed, 2001, 66(93): 24390–24391.
Chow S, Shao J, Wang H. Sample size calculations in clinical research [M]. Chapman and Hall/CRC, 2007.
Chow S, Shao J. On Non-Inferiority Margin and Statistical Tests in Active Control Trial [J]. Statistics in Medicine, 2006, 25: 1101–1113.
Senn S. Cross-over trials in clinical research (2nd Edition) [M]. John Wiley & Sons, 2002.
Farrington CP, Manning G. Test statistics and sample size formulae for comparative binomial trials with null hypothesis of non-zero risk difference or non-unity relative risk [J]. Statistics in Medicine, 1990, 9(12): 1447–1454.
Miettinen O, Nurminen M. Comparative analysis of two rates [J]. Statistics in Medicine, 1985, 4(2): 213–226.
Gart JJ, Nam JM. Approximate Interval Estimation of the Difference in Binomial Parameters: Correction for Skewness and Extension to Multiple Tables [J]. Biometrics, 1990, 46(3): 637.
Dixon WJ, Massey FJ. Introduction to Statistical Analysis (4th ed.) [M]. McGraw-Hill, 1983: 123–126.
O’Brien RG, Muller KE. Unified power analysis for t-tests through multivariate hypotheses [M]//Edwards LK. Statistics: Textbooks and monographs, Vol. 137. Applied analysis of variance in behavioral science. Marcel Dekker, 1993: 297–344.
Schuirmann DJ. A comparison of the Two One-Sided Tests Procedure and the Power Approach for assessing the equivalence of average bioavailability [J]. Journal of Pharmacokinetics and Biopharmaceutics, 1987, 15(6): 657–680.
Phillips KF. Power of the two one-sided tests procedure in bioequivalence [J]. Journal of Pharmacokinetics and Biopharmaceutics, 1990, 18(2): 137–144.
Owen DB. A Special Case of a Bivariate Non-Central t-Distribution [J]. Biometrika, 1965, 52(3/4): 437-446.
Fleming TR. Current Issues in Non-inferiority Trials [J]. Statistics in Medicine, 2008, 27: 317–332.
Chaplin S. Biosimilars in the EU: a new guide for health professionals [J]. Prescriber, 2017, 28(10): 27–31.
Dibra A, et al. Paclitaxel-eluting or sirolimus-eluting stents to prevent restenosis in diabetic patients [J]. New England Journal of Medicine, 2005, 353(7): 663–670.
Anand A, Mathew SJ, Sanacora G, Murrough JW, Goes FS, Altinay M, et al. Ketamine versus ECT for Nonpsychotic Treatment-Resistant Major Depression [J]. New England Journal of Medicine, 2023, 388(25): 2315–2325.
Dayyeh BKA, Maselli DB, Rapaka B, Lavin T, Noar M, Hussan H, et al. Adjustable intragastric balloon for treatment of obesity: a multicentre, open-label, randomised clinical trial [J]. The Lancet, 2021, 398(10315): 1965–1973.
Zhang S, Yin Y, Xiong H, Wang J, Liu H, Lu J, et al. Efficacy, safety, and population pharmacokinetics of MW032 compared with denosumab for Solid Tumor–Related bone metastases [J]. JAMA Oncology, 2024. https://doi.org/10.1001/jamaoncol.2023.6520.