Summary
Sample Size Determination (SSD) is a critical component of study
design that ensures the study is capable of making valid and reliable
inferences. The primary goal of SSD is to ensure that the study is
sufficiently powered to detect a meaningful effect if one exists,
thereby minimizing the risk of Type II errors.
The process of SSD is not merely a mathematical exercise; it involves
critical design decisions that influence the study’s overall validity
and feasibility. The assumptions made during SSD, such as effect size
and variability estimates, are as important as the statistical methods
used. Inaccuracies in these assumptions can lead to underpowered studies
or unnecessary participant burden due to overestimation.
Sample Size Determination is a vital process that requires careful
attention to detail at every step. By thoroughly planning the study,
specifying accurate parameters, choosing an appropriate effect size,
computing the correct sample size, and exploring uncertainty,
researchers can ensure that their studies are well-designed to provide
meaningful and valid results. These considerations are not just
technical requirements but essential practices that uphold the integrity
of the research and its potential impact on clinical practice and
policy.
Summary of the 5 Essential Steps for Sample Size
Determination:
Plan the Study: This initial step involves
outlining the study’s objectives, identifying key research questions,
and defining the trial’s design and randomization scheme. The planning
phase sets the foundation for all subsequent decisions in the
study.
Specify Parameters: Parameters such as
significance level, effect size, and variability (nuisance parameters)
must be carefully chosen. These parameters should be based on existing
literature, expert opinion, or preliminary data. Accurate specification
of parameters is crucial for reliable SSD.
Choose Effect Size: Deciding on the effect size
is central to SSD. The effect size should reflect a clinically
meaningful difference, often defined as the Minimum Clinically Important
Difference (MCID). The choice of effect size impacts the sample size
required to achieve adequate power.
Compute Sample Size: With the study design and
parameters in place, the next step is to calculate the required sample
size to achieve the desired statistical power. This involves using
appropriate statistical formulas or software tools, and considering
potential adjustments for factors such as dropout rates.
Explore Uncertainty: Finally, it’s essential to
assess how uncertainties in the assumptions made during SSD could affect
the study’s power and conclusions. Sensitivity analysis and assurance
methods can be used to explore the impact of varying assumptions and
ensure the robustness of the study design.
Sample Size Determination
Detailed Explanation of Sample Size Determination (SSD)
What is Sample Size Determination (SSD)?
Sample Size Determination (SSD) is a sophisticated statistical
approach employed in planning research studies, particularly clinical
trials, to calculate the minimum number of participants necessary to
achieve valid scientific results. The goal is to determine the smallest
sample size that provides a high probability of detecting a true effect,
should one exist, under predefined conditions. This process is
especially critical in confirmatory trials where the typical criterion
for success is obtaining a significant p-value, often guided by FDA
standards at a Type I error rate of 0.025.
- Objective:
- Ensuring Appropriate Power: SSD aims to ascertain a
sample size that balances the risk of Type I errors (false positives)
and Type II errors (false negatives). This balance is crucial to not
only avoid invalid conclusions but also to uphold the ethical obligation
to participants who contribute their time and expose themselves to
potential risks.
- Methods and Metrics:
- Using Various Techniques: Researchers employ
formulae, exact methods, and simulations to estimate the sample size.
These tools help predict the likelihood of trial success based on
assumptions about the anticipated effects and the study’s design.
- Metrics Considered: While statistical power is
paramount, other factors like the width of confidence intervals and cost
considerations (cost-effectiveness or decision-theoretic approaches)
also play significant roles.
- Importance of SSD:
- Ethical and Practical Dimensions: Proper SSD
prevents underpowered studies that cannot offer conclusive results and
avoids the ethical dilemma of unnecessarily large studies that may waste
resources and potentially harm participants.
- Regulatory Compliance: SSD is a standard
requirement for clinical trial approval to ensure that a study can
reasonably achieve its objectives with a justified sample size.
Importance of Statistical Power in SSD
Statistical Power Defined:
- Power of a Study: The probability that the study
will reject a false null hypothesis, thereby correctly identifying a
true effect. It is typically set at 80% or 90% in clinical trials to
ensure robustness in findings.
Implementation and Challenges:
- Setting and Reaching Desired Power Levels: To
achieve desired power levels, researchers might need to increase the
sample size, affecting the study’s duration and cost, particularly in
fields dealing with rare diseases or specific demographics.
- Balancing Statistical and Practical Realities:
Increasing a study’s sample size enhances its ability to detect smaller
effects but also raises ethical concerns, such as exposing more
participants to unproven treatments.

Expanded Challenges and Considerations in SSD
Beyond Simple Calculations: - Dealing with
Type M (magnitude) and Type S (sign) Errors: Small sample sizes
can lead to errors where the detected effect size is vastly different or
even opposite to the true effect, exacerbating issues like publication
bias and challenges in replicability. - Ethical and Logistical
Concerns: In clinical trials, there is always a trade-off
between reaching conclusive results and the practical limitations of
recruiting enough participants, especially when dealing with new
treatments with unknown efficacy or safety profiles.
Regulatory and Publication Practices: -
Standard Requirement in Trial Design: Detailed sample
size justification is required in study protocols, and the quality of
these justifications can vary, affecting the reliability of the
research. - Impact on Research Quality and
Replicability: Insufficient sample sizes contribute to the
ongoing replicability crisis in scientific research, where studies fail
to produce consistent results when repeated under similar
conditions.
Step-by-Step Guide to Sample Size

1. Plan Study
Design Considerations:
- Context and Constraints: Every study must consider
the specific context—such as the disease, target population, and
treatment modalities—and constraints like budget, time, and available
technology.
- Research Questions and Trial Phase: The phase of
the trial (e.g., Phase I-III) influences the design, as early phases
focus more on safety while later phases test efficacy on larger
populations.
- Study Design and Randomization: Essential to
minimize bias and ensure the reliability of the results. The choice of
randomization scheme is crucial for maintaining the integrity of the
statistical analysis.
- Endpoints and Estimators: Selection of primary and
secondary endpoints and how they are measured (e.g., total symptom
score, survival rate) directly impacts the type of data collected and
the statistical methods used.
- Regulatory Guidance: Adherence to guidelines such
as ICH E9 and FDA guidances ensures that the trial design meets
international standards for clinical research.
Logical Sequence:
- Sample size and statistical power considerations should naturally
follow the initial design decisions, providing a basis to compare
different study designs for efficacy and efficiency.

2. Specify Parameters
Fixed Parameters
- Definition: Fixed parameters are known before the
trial begins and do not change throughout the study. They are
predetermined based on regulatory requirements, past research, or the
specific goals of the study.
- Examples:
- Significance Level: This is often set at 0.05 for
two-sided tests or 0.025 for one-sided tests, especially in Phase III
confirmatory trials, as commonly required by the FDA. This parameter
defines the threshold at which results will be considered statistically
significant.
- Study Duration: The length of the study is usually
fixed and based on the natural progression of the condition being
studied or the expected change due to interventions.
- Sample Allocation: How participants are distributed
across treatment groups can also be a fixed parameter, determined by the
study’s design.
- Importance: Fixed parameters are essential for
defining the framework within which the study operates. They help
maintain the integrity and comparability of the results, ensuring that
the conclusions drawn are based on predefined criteria.
Nuisance Parameters
- Definition: Nuisance parameters are those elements
of a trial that are not of primary interest but must be accounted for in
the statistical analysis to ensure accurate results.
- Examples:
- Variance: Variability in the data that affects the
precision and power of the study’s outcomes.
- Dropout Rates: Expected rates at which participants
may leave the study, which can impact the required sample size and the
study’s conclusions.
- Background Noise: Other variabilities in
measurements that could affect data interpretation.
- Importance: Nuisance parameters need careful
estimation and adjustment in the analysis. Misestimation of these
parameters can lead to biased results, underpowered studies, or false
conclusions.
Parameter Sources:
- Utilizing previous research, pilot studies, and literature reviews
helps in estimating these parameters accurately, ensuring that the trial
is properly powered to detect meaningful effects.
Estimation of Nuisance Parameters
- Prior Studies: Historical data from previous
research can provide estimates for many nuisance parameters. This
includes data on variability, standard deviations, and common
complications or side effects.
- Expert Elicitation: Subject matter experts can
provide insights based on their experience and understanding of the
condition or treatment under study. This method is particularly useful
when historical data are sparse or when studying novel treatments.
- Pilot Studies: Small, preliminary studies can be
conducted before the main trial to estimate crucial nuisance parameters.
These studies are especially valuable for testing assumptions, refining
protocols, and ensuring that the full-scale study is well-founded.
- Literature Review: A thorough review of existing
literature can provide estimates for both fixed and nuisance parameters,
particularly through meta-analyses of similar studies.
3. Choose Effect Size
Choosing the effect size is a crucial aspect of designing a clinical
trial as it directly influences the study’s power and the precision
needed to detect clinically meaningful changes brought about by a new
treatment. Focusing on the two primary methods of determining it: the
Minimum Clinically Important Difference (MCID) and the Expected Effect
Size (Expected ES), along with the importance of parameterization in
expressing these sizes.
Understanding Effect Size Parameterization
Parameterization refers to the way in which the
effect size is expressed mathematically in the context of a study.
Common forms include:
- Difference: Used when the outcome is measured on a
continuous scale, representing the absolute change between the control
and treatment groups.
- Ratio: Often used for time-to-event data, such as
survival analysis, indicating how many times more likely an event is to
occur in one group compared to another.
- Odds Ratio: Commonly used in binary outcomes (e.g.,
improved vs. not improved), showing the odds of an event occurring in
the treatment group relative to the control group.
These parameterizations help to tailor the statistical analysis to
the specific characteristics of the data and the clinical questions at
hand.
Two Methods to Choose Effect Size
- Minimum Clinically Important Difference (MCID):
- Definition: MCID is the smallest change in a
treatment outcome that would result in a noticeable difference in
patient experience or clinical outcome. It is a threshold that reflects
a meaningful improvement as perceived by patients, clinicians, or other
stakeholders.
- Determination: MCID is often determined through
clinical expertise, patient input, previous studies, or consensus
panels. It bridges clinical relevance with statistical analysis,
ensuring that the study is designed to detect a change that is
significant not just statistically but also clinically.
- Importance: Utilizing MCID allows researchers to
focus on outcomes that are tangible and valuable to patients,
potentially enhancing the clinical applicability and impact of the trial
results.
- Expected Effect Size (Expected ES):
- Definition: Expected ES is an estimate of the
actual effect that the new treatment will have on the outcome based on
prior evidence. This estimate is typically used when planning a study to
calculate the necessary sample size to achieve adequate power.
- Determination: Like nuisance parameters, the
Expected ES can be estimated using data from previous studies, pilot
studies, meta-analyses, or expert opinion. This approach considers
historical data and predictive analytics to forecast the likely impact
of the treatment.
- Use in Trial Design: Incorporating an Expected ES
helps in designing a trial that is realistically powered to detect a
statistically and clinically significant effect, thereby aligning the
study’s objectives with its statistical design.
Delta2 Guidance
For a more comprehensive understanding and application of effect size
in clinical trials, the Delta2 Guidance offers extensive resources. It
provides detailed methodologies for estimating MCID and Expected ES,
aligning them with study objectives, and integrating clinical
significance with statistical precision.
The Delta2 Guidance is a pivotal resource designed to improve the
planning and analysis of randomized controlled trials (RCTs) by
providing comprehensive guidelines on determining the target difference,
also known as the minimum clinically important difference (MCID), for
clinical trials. This guidance aims to enhance the robustness and
relevance of trial findings by ensuring that the trials are
appropriately powered to detect differences that are not only
statistically significant but also clinically meaningful.
Purpose: - The Delta2 Guidance specifically
addresses the challenges in specifying the target difference or effect
size in clinical trials, which is crucial for calculating the required
sample size and designing an effective study.
Development: - Delta2 was developed through a
collaboration among statisticians, clinicians, and trialists,
incorporating extensive research, expert opinion, and practical trial
experience. The guidance is a part of a broader initiative to improve
the quality of health research.
Reference: - Cook, J. A., Julious, S. A., Sones, W.,
Hampson, L. V., Hewitt, C., Berlin, J. A., … & Walters, S. J.
(2018). DELTA2 guidance on choosing the target difference and
undertaking and reporting the sample size calculation for a randomised
controlled trial. BMJ, 363, k3750. doi:10.1136/bmj.k3750
Key Components of Delta2 Guidance
- Defining the Target Difference:
- The guidance provides methods and considerations for determining
what constitutes a meaningful difference in outcomes between treatment
groups in clinical trials. It emphasizes the importance of basing this
target difference on clinical relevance rather than merely statistical
parameters.
- Involvement of Stakeholders:
- Delta2 advocates for involving a range of stakeholders, including
patients, clinicians, and policymakers, in determining the MCID. This
approach ensures that the trial outcomes are relevant and valuable to
all parties interested in the results.
- Statistical Considerations:
- It outlines statistical techniques for estimating the MCID,
including anchor-based and distribution-based methods. The guidance also
discusses the implications of different estimations of the target
difference for study power and sample size.
- Sensitivity Analyses:
- Delta2 recommends conducting sensitivity analyses to understand how
variations in the target difference might affect trial outcomes and
interpretations. This is crucial for ensuring that the conclusions drawn
from the trial are robust under different plausible scenarios.
- Documentation and Reporting:
- Proper documentation of the rationale, methods, and assumptions used
in determining the target difference is emphasized. Delta2 suggests that
all these details should be transparently reported in the trial protocol
and publications to enhance the credibility and replicability of the
trial.
- Examples and Case Studies:
- The guidance includes examples and case studies to illustrate how to
apply its principles in real-world scenarios. These examples help
clarify the processes and decisions involved in determining the target
difference for various types of clinical trials.
4. Compute Sample Size
- Define Desired Statistical Power:
- Desired Power: This is typically set as a fixed
parameter, commonly at 80% or 90%. It represents the probability of
correctly rejecting the null hypothesis (i.e., detecting an effect if
there is one). Higher power decreases the likelihood of a Type II error
(failing to detect a true effect).
- Fixed Sample Size: If the sample size is
constrained by budget, time, or participant availability, calculate what
power the trial will have with that fixed number of participants.
- Effect Size Determination:
- Based on Previous Studies: Use data from previous
research to estimate the effect size. This could be the difference in
means between treatment groups, odds ratios, or any other relevant
measure depending on the outcome type.
- Minimum Effect Size: For a given sample size and
power, determine the smallest effect size that the study could reliably
detect. This can be particularly useful when planning a study to ensure
that even subtle but clinically meaningful changes are detected.
- Adjustments for Dropout and Treatment Switching:
- Dropout Rate Estimation: Anticipate the proportion
of participants who might drop out before completing the study based on
historical data from similar studies. Dropout rates can significantly
affect the study’s power because they effectively reduce the sample
size.
- Treatment Switching: Consider the possibility that
participants might switch treatments during the trial. This can dilute
the treatment effect, requiring a larger initial sample size to maintain
power.
Practical Considerations
- Financial Constraints: Often, the budget may limit
the maximum feasible sample size, so calculations must be adjusted
accordingly.
- Ethical Considerations: Especially in clinical
trials, it’s unethical to expose more participants to experimental
treatments than necessary. Calculations should ensure that the sample
size is large enough to be informative but not excessively large.
- Sensitivity Analysis: Assess how sensitive the
required sample size is to changes in assumptions such as effect size,
variance, dropout rates, etc. This helps understand the robustness of
the study design.
- Assurance Calculations: Instead of fixing
parameters at a single value, use a range of plausible values to
calculate the probability that the study will have sufficient power.
This provides a more comprehensive picture of the study’s potential
success under various scenarios.
5. Explore Uncertainty
Sensitivity Analysis
- Sensitivity analysis is used to assess how changes
in assumptions impact the calculated sample size and the statistical
power of the trial. This method helps to understand the stability of
trial results under different scenarios.
- Conducting sensitivity analysis involves recalculating the required
sample size or power using a range of plausible values for each key
parameter (e.g., effect size, variance, dropout rates).
- This analysis helps identify which parameters are most sensitive and
could significantly affect trial outcomes if their estimated values
deviate from reality.
- Including sensitivity analyses and assurance calculations in the
protocol is crucial. These analyses demonstrate that the trial is
designed to be robust across a range of possible conditions, not just
under a single set of assumptions.
- LimitationsThe analysis is confined to the range of
values chosen for each parameter. If these ranges are not representative
of possible real-world variations, the analysis may not fully capture
the potential variability in trial outcomes.
Assurance
- Assurance provides a probabilistic assessment of a
trial’s success, incorporating uncertainty about parameter estimates
into the sample size calculation.
- Power is calculated not just for a single value of each parameter
but averaged over a range of values, often using a statistical prior
that reflects the uncertainty about these parameters.
- This approach gives a more comprehensive measure of the likelihood
that the trial will achieve its objectives, taking into account the
variability in potential parameter values.
- By averaging power across a range of plausible parameter values,
assurance accounts for the uncertainty inherent in the estimates used
for sample size calculations. This provides a more realistic assessment
of the trial’s potential for success.
- Assurance can be particularly valuable when there is considerable
uncertainty about key parameters such as effect size, helping to ensure
that the trial is robust to a variety of possible realities.
Documentation and Transparency: - Detailed
documentation of the SSD process, including all estimates and
assumptions, should be included in the trial protocol. This transparency
helps in peer review and regulatory evaluation, ensuring that the study
design is both robust and capable of yielding meaningful results.
“The method by which the sample size is calculated should be given in
the protocol, together with the estimates of any quantities used in the
calculations (such as variances, mean values, response rates, event
rates, difference to be detected)… It is important to investigate the
sensitivity of the sample size estimate to a variety of deviations from
these assumptions…” —– ICH E9: Statistical Principles for Clinical
Trials
Common Sample Size Pitfalls
Navigating the process of Sample Size Determination (SSD) is fraught
with potential pitfalls, each of which can significantly impact the
validity and success of a clinical trial. Understanding these pitfalls
and how they relate to the broader context of trial design is
crucial.
Common Pitfalls in Sample Size Determination
- Overemphasis on Formulas Over Assumptions:
- Formula Focus: Researchers often worry excessively
about which statistical formula to use for SSD. While important, this
concern is usually less critical than the assumptions underlying these
formulas.
- Assumption Criticality: The assumptions made about
the study parameters (e.g., effect size, variance) before calculating
the sample size are pivotal. Incorrect or unrealistic assumptions can
lead to an underpowered or overly large study, regardless of the formula
used.
- Prior Information: In Bayesian statistics, prior
information is updated with actual data as it comes in. Before the
trial, these priors are used to estimate the likelihood of various
outcomes to determine the initial sample size.
- Pre-Trial Guesswork:
- Inherent Uncertainty: SSD happens before the trial
starts, necessitating a degree of guesswork regarding study outcomes and
variability.
- Principled Approach: Employing a principled
approach to SSD, even with some guesswork involved, is preferable to not
performing a thorough SSD. A systematic approach helps in laying down a
foundation that can be adjusted with interim analyses if needed.
- Integration with Trial Design:
- Design Emphasis: Many issues attributed to sample
size are often deeply rooted in trial design choices. Ensuring robust
trial design is crucial—hence the mantra, “Design, Design, Design.”
- Holistic Approach: Effective trial design
encompasses everything from the selection of endpoints to the choice of
statistical models and the consideration of potential data issues. This
holistic approach can mitigate many problems that might later manifest
during SSD.
- Study (Trial and Data):
- Endpoint Selection: Choosing inappropriate
endpoints can render even a well-sized study ineffective if the
endpoints are not clinically relevant.
- Data Collection Methods: Poor data collection
methodologies can introduce biases or variability that no amount of
correct sizing can correct.
- Model (Statistical Choices):
- Model Fit: Selecting an incorrect statistical model
for the data and the question at hand can lead to misinterpretations of
the data and, subsequently, to erroneous conclusions about the efficacy
or safety of an intervention.
- Overfitting/Underfitting: Using overly complex
models that do not align with the simplicity or complexity of the data
can lead to overfitting or underfitting, affecting the study’s
outcomes.
- Assumptions (SSD Inputs):
- Effect Size Estimates: Overly optimistic or
pessimistic effect size estimates can lead to unrealistic expectations
and study designs that are either too small to detect true effects or
unnecessarily large, wasting resources.
- Variability Underestimation: Underestimating the
variability within the population can result in underpowered studies
that are unable to achieve conclusive results.
- Dynamic Nature of Sample Size: The concept is that
the initially determined sample size might not hold as the trial
progresses (“disappears once the reality hits”). This reflects the
adaptive nature of some clinical trials where sample sizes are adjusted
based on interim results.
Study Pitfalls
1. Simplifying Data Unnecessarily
Problem: - Researchers often simplify complex data
types, such as converting continuous outcomes into binary outcomes
(known as “dichotomania”), treating time-to-event data as single
endpoint events, or misclassifying ordinal data. This approach can
significantly distort the analysis, leading to loss of information and
potentially requiring larger sample sizes to detect the same
effects.
Solutions: - Preserve Data
Complexity: Analyze data in its most detailed form whenever
possible. Only simplify data when there is a compelling justification,
and it enhances the interpretability or relevance of the results without
compromising statistical power. - Demonstrate Cost of
Simplification: Use sample size determination to explicitly
show how simplifying data (e.g., dichotomizing continuous variables) can
inflate the required sample size, often by more than 50%. This can serve
as a persuasive argument against unnecessary data reduction.
2. Ignoring Real Trial Constraints
Problem: - Often, sample size calculations ignore
practical constraints such as budget, resources, and time available for
the study. This results in theoretical sample sizes that are unfeasible
in practice, leading to what can be referred to as “Sample Size Theatre”
— where calculations appear robust but are not practically
applicable.
Solutions: - Account for Constraints in
SSD: When determining sample size, incorporate real-world
limitations from the outset. Adjust effect size and power calculations
to reflect the maximum feasible sample size. - Cost-Based SSD
Approaches: Consider integrating cost-effectiveness analyses
into sample size determination. This approach helps ensure that the
sample size chosen not only meets statistical requirements but is also
economically viable, enhancing the overall sustainability of the
trial.
3. Know the “True” Sample Size
Problem: - A common misconception in clinical trials
is misunderstanding what constitutes the “true” sample size needed for
adequate power. The effective sample size must consider the level of
treatment randomization and only include subjects who have had the
outcome(s) of interest. Misunderstandings can lead to underpowered
studies due to inappropriate sample size calculations.
Solutions: - Education on Sample Size
Foundations: Clearly communicate and teach that the sample size
should align with the level of randomization (e.g., individual, cluster,
sequence in crossover trials). This understanding is crucial for
accurately calculating the power of the study. - Clarify Which
Outcomes and Subjects Contribute to Power: Educate researchers
on which specific outcomes and which subset of subjects should be
considered when determining the sample size. For example, in survival
analysis, only subjects who experience the event of interest should be
included in the power calculation.
Model Pitfalls
The pitfalls associated with model selection in the context of Sample
Size Determination (SSD) are substantial and can significantly impact
the validity and reliability of a clinical trial. Each of these pitfalls
arises from common errors in choosing or applying statistical models and
estimators. Understanding these pitfalls and their solutions is
essential for ensuring accurate and meaningful research outcomes.
1. Selecting the Inappropriate Estimator and
Model
Problem: - Researchers may default to using a
“standard” model or estimator without considering whether it is the most
appropriate for their specific data and research question. This includes
both the choice of the model for analysis and the underlying model used
in sample size calculations. Using the wrong model can lead to
inaccuracies in estimating the necessary sample size, potentially
resulting in underpowered or overpowered studies.
Solutions: - Model Consideration at Design
Stage: From the outset, carefully evaluate the most suitable
model and estimator for the study. Consider the nature of the data and
the specific hypotheses being tested. - Appropriate SSD
Formulae: Ensure that the formulae used for calculating sample
size are based on the correct estimator, especially in cases of
non-normal data distributions or for specific study designs like
non-inferiority or equivalence trials. - Advanced Modeling
Techniques: Utilize model selection methods such as Multiple
Comparisons Procedures and Modeling (MCP-Mod) or MaxCombo, which can
help in choosing the most appropriate statistical model based on the
data characteristics.
2. Not Converting to Correct Parameter Scale
Problem: - In sample size calculations, it’s crucial
that all parameters are on a consistent scale, particularly in studies
where time-dependent outcomes are involved. Failing to convert
parameters like coefficient of variation (CV) to standard error (SE) or
not aligning time units across data inputs can lead to incorrect
calculations.
Solutions: - Standardize Units:
Convert all parameters that depend on scale to a consistent unit (e.g.,
converting all time-related measures to months if some are initially in
years by dividing by 12). - Check for Conversions:
Always verify if there are known conversions that should be applied to
the parameters used in the model to ensure that all inputs are
compatible and correctly scaled.
3. Misunderstanding Potential Value of Additional
Data
Problem: - The misuse or underuse of additional
information, such as prognostic covariates, can result in less efficient
models, thereby reducing the study’s power. For example, using change
scores instead of analysis of covariance (ANCOVA) can be less effective
if pre-treatment scores are available and informative.
Solutions: - Sensitivity Analysis for Model
Choice: Conduct sensitivity analyses to evaluate how different
model choices (e.g., ANOVA vs ANCOVA) affect the study’s outcomes,
especially in terms of power and the precision of estimates. -
Leverage Covariate Information: When applicable,
include relevant covariates in the model to increase efficiency and
power. Covariates that explain variability in the outcome can
significantly enhance the accuracy of the effect estimates. -
Educational Outreach: Provide training and resources on
the importance and methods of integrating additional data into
statistical models. This helps ensure that researchers understand the
best practices for utilizing all available data.
Assumption Pitfalls
1. Misunderstanding the Effect Size Choice
Problem: - Neglecting the MCID:
Researchers may overlook the Minimum Clinically Important Difference
(MCID), instead opting for “standardized” effect sizes or relying on
rough estimates with minimal justification. This approach can lead to
the selection of effect sizes that are not clinically meaningful or
realistic, resulting in an underpowered or irrelevant study.
Solutions: - Define MCID
Collaboratively: Work with the research team to define the
MCID, utilizing relevant literature, expert opinions, and elicitation
techniques. This ensures that the chosen effect size is grounded in
clinical significance and reflects the true impact on patient outcomes.
- Use Sensitivity Analysis and Assurance: Conduct
sensitivity analysis and assurance to explore how changes in the effect
size impact the study’s power and overall success probability. This can
help in understanding the robustness of the study design. -
Avoid Standardized Effect Sizes: Resist the temptation
to use generic or standardized effect sizes. Instead, tailor the effect
size to the specific clinical context of the study, ensuring it is
meaningful and realistic. - Consider Adaptive Designs:
Implement adaptive designs such as the promising zone or unblinded
Sample Size Re-estimation (SSR) designs to adjust the sample size based
on interim data, keeping the study responsive to actual findings.
2. Misestimating Nuisance Parameters
Problem:
- Inaccurate Estimates of Nuisance Parameters:
Nuisance parameters, such as the standard deviation, are crucial for
accurate SSD but are often estimated with minimal information or
adjusted post-hoc based on constraints (referred to as “Sample Size
Theatre”). This can lead to inappropriate sample size calculations and
affect the study’s validity.
Solutions:
- Conduct Proper Pilot Studies: Use properly sized
pilot studies to gather empirical data on nuisance parameters. This
provides a solid foundation for accurate SSD.
- Blinded Sample Size Re-estimation: In cases where a
pilot study isn’t feasible, consider using blinded sample size
re-estimation during the trial to refine nuisance parameter estimates
without compromising study integrity.
- Expert Elicitation and Literature Review: Where
direct data is unavailable, consult with experts and review existing
literature to obtain reliable estimates of nuisance parameters.
- Use Sensitivity Analysis and Assurance: Evaluate
the impact of varying nuisance parameters on study power using
sensitivity analysis or assurance techniques. This can help identify
critical parameters that require more accurate estimation.
3. Not Accounting for Dropout (Properly)
Problem:
- Ignoring or Miscalculating Dropout Effects: The
impact of dropout on sample size is often either ignored or improperly
calculated, which can result in an underpowered study. Simplified
calculations or incorrect formulas are sometimes used when more accurate
methods are available.
Solutions:
- Accurate Dropout Rate Estimates: Obtain accurate
dropout rate estimates through historical data, pilot studies, or expert
opinions. Make sure these estimates are specific to the study’s context
and population.
- Correct Calculation Methods: Use the correct
formula for adjusting sample size due to dropout: \(N_d = N / (1 - p_d)\), where \(p_d\) is the dropout rate. This method
accounts for the true impact of dropout on the study’s power.
- Model-Specific Adjustments: For time-to-event (TTE)
or count models, incorporate dropout into the sample size calculations
using parameters specific to these models. This ensures that the study
remains adequately powered despite participant attrition.
- Consider Drop-in/Crossover Scenarios: Plan for
scenarios where participants might switch treatments (drop-in) or move
between study arms (crossover). These events can affect the analysis and
should be factored into the SSD.
Other Common Pitfalls
1. Sample Size “Rules of Thumb”
Problem:
- Reliance on Rules of Thumb: Researchers sometimes
use simple rules of thumb for determining sample size (e.g., “30
participants per group”) without considering whether these rules are
statistically justified. These rules often rely on unrealistic
assumptions and can lead to underpowered or overly large studies.
Solutions:
- Use Proper SSD Methods: Always use formal sample
size determination methods that are tailored to the specifics of the
study design, expected effect size, and variance. There are many SSD
methods available that account for the complexity of different study
designs.
- Justify the Sample Size Statistically: Ensure that
the sample size is statistically justified based on the study’s goals
and not just a rule of thumb. This will improve the likelihood of
detecting a true effect if it exists.
2. Post-hoc Power Analysis
Problem:
- Misleading Post-hoc Power: Performing a power
analysis after the study (post-hoc) adds little value and can be
misleading. If a study finds a non-significant result, the post-hoc
power is, by definition, low. It doesn’t provide additional insight
beyond what the p-value already indicates.
Solutions:
- Focus on Pre-study Power Calculation: Emphasize the
importance of calculating power before the study begins. This helps
ensure that the study is designed with adequate power to detect a
clinically meaningful effect.
- Avoid Post-hoc Power: Instead of conducting
post-hoc power analysis, focus on interpreting the results within the
context of the study design, effect sizes, and p-values.
3. Multiple Studies/Endpoints
Problem:
- Challenges with Multiple Endpoints: When a trial
involves multiple studies or endpoints, it can be challenging to
determine the appropriate power and testing strategy. There may be a
need to decide whether to use a conjunctive (all endpoints must succeed)
or disjunctive (only one endpoint must succeed) power approach.
Solutions:
- Define a Clear Testing Strategy: Before the trial
begins, clearly define the testing strategy. Decide whether the trial
will use conjunctive or disjunctive power, and determine how you will
handle multiple comparisons to control for Type I error.
- Evaluate Power for Secondary and Safety Endpoints:
Consider the power for secondary and safety endpoints as well,
especially if these are critical to the study’s overall conclusions.
Ensure that the trial is adequately powered for these endpoints if they
are of importance.
4. Failing to Consider Complex Designs
Problem:
- Ignoring Benefits of Complex Designs: Traditional
fixed-term designs may not be the most efficient approach. Failing to
consider adaptive designs (e.g., group sequential designs) can result in
larger than necessary sample sizes and longer trials.
Solutions:
- Explore Adaptive Designs: Investigate the potential
benefits of adaptive designs, such as group sequential designs, which
can significantly reduce the expected sample size. For instance,
Jennison’s work shows that three looks in a sequential design can reduce
the expected sample size by approximately 30%, with only a 5% maximum
increase in sample size.
- Value and Evaluate Designs: When considering
complex designs, compare the value they bring in terms of sample size
reduction and efficiency. Also, evaluate the potential impacts on Type I
error rates and other statistical properties, often requiring simulation
studies to fully understand the implications.
- Consult FDA Guidance: Use the FDA’s Adaptive Design
Guidance as a resource when planning complex trial designs, ensuring
that your design choices meet regulatory standards and optimize study
efficiency.
References
Sample Size Books
- Julious, S. A. (2023). Sample sizes for clinical trials
(2nd ed.). CRC Press.
- Chow, S. C., Shao, J., Wang, H., & Lokhnygina, Y. (2017).
Sample size calculations in clinical research. Chapman and
Hall/CRC.
- Machin, D., Campbell, M. J., Tan, S. B., & Tan, S. H. (2018).
Sample sizes for clinical, laboratory and epidemiology studies.
John Wiley & Sons.
- Ryan, T. P. (2013). Sample size determination and power.
John Wiley & Sons.
- Cohen, J. (2013). Statistical power analysis for the behavioral
sciences. Academic Press.
- Cohen, J. (1992). A power primer. Psychological Bulletin,
112(1), 155-159.
Regulatory Bodies + Trial Guidances
Trial Design Overviews
- Senn, S. S. (2021). Statistical issues in drug development
(3rd ed.). John Wiley & Sons.
- Meinert, C. L. (2012). Clinical Trials: Design, conduct and
analysis (Vol. 39). OUP USA.
- Friedman, L. M., Furberg, C. D., DeMets, D. L., Reboussin, D. M.,
& Granger, C. B. (2015). Fundamentals of clinical trials.
Springer.
- International Council for Harmonisation of Technical Requirements
for Pharmaceuticals for Human Use (ICH). (n.d.). STATISTICAL
PRINCIPLES FOR CLINICAL TRIALS E9. Retrieved from https://database.ich.org/sites/default/files/E9_Guideline.pdf
Prevalence of Underpowered Research
- Cohen, J. (1962). The statistical power of abnormal-social
psychological research: A review. The Journal of Abnormal and Social
Psychology, 65, 145-153.
- BuPon, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint,
J., Robinson, E. S. J., & Munafo, M. R. (2011). Power failure: Why
small sample size undermines the reliability of neuroscience. Nature
Reviews Neuroscience, 14(12), 365-376.
- Sedlmeier, P., & Gigerenzer, G. (1989). Do studies of
statistical power have an effect on the power of studies?
Psychological Bulletin, 105(2), 309–316.
- Maxwell, S. E. (2004). The Persistence of Underpowered Studies in
Psychological Research: Causes, Consequences, and Remedies.
Psychological Methods, 9(2), 147–163.