Power and sample size calculations
As statisticians, we often have to deal with power or sample size calculations. It can be very beneficial to visualise the impact that different factors have on the sample size or power. Produce data visualisation(s) or share tools that help your audience understand how power and sample size calculations work and what impacts the actual results of these calculations.
Power and sample size calculations are fundamental components in the design of any clinical trial or scientific study. These calculations ensure that a study has a sufficient number of participants (sample size) to detect a statistically significant effect if one exists, while controlling for the risk of Type I (false positive) and Type II (false negative) errors. Visualizing these calculations can enhance understanding and decision-making in study planning by demonstrating the relationships between power, sample size, effect size, and significance level.
# library(pwr)
power1 <-pwr.2p.test(h = ES.h(p1 = 0.60, p2 = 0.50), sig.level = 0.05, power = .80)
plot(power1)
fn_pwr <- function(n, d) {
power <- pwr.t.test(n = n,
d = d,
sig.level = 0.05,
alternative = "two.sided",
type = "two.sample")$power
return(power * 100)
}
ssize = c(2:12)
msd <- tibble(effect = rep(c(6, 8, 10), each = 3),
sd_inc = rep(c(1, 1.25, 1.5), 3)) %>%
mutate(sd = 4 * sd_inc) %>%
mutate(delta = effect / sd)
pwr_tab <- tibble(effect = factor(rep(msd$effect, each = length(ssize)),
levels = c(6, 8, 10),
labels = c("6%", "8%", "10%")),
sd_inc = as.factor(rep(msd$sd_inc, each = length(ssize))),
n = rep(ssize, length(msd$delta)),
d = rep(msd$delta, each = length(ssize))) %>%
mutate(power = fn_pwr(n, d))
pwr_fig <- ggplot(data = pwr_tab,
aes(x = n, y = power, colour = sd_inc)) +
geom_line() +
geom_hline(yintercept = 80) +
geom_line(linewidth = 1) +
scale_x_continuous(breaks = ssize, name = "Sample size") +
scale_y_continuous(breaks = seq(0, 100, by = 20)) +
labs(
y = "Power (%)",
title = "Power to detect deltas of 6, 8 and 10% in ejection fraction",
subtitles = "Two-sided, two-sample t-test, alpha=0.05, sd=4",
colour = "SD increase"
) +
geom_point(size = 2, shape = 21, fill = "white") +
theme_bw() +
theme(
axis.text = element_text(size = 10),
axis.title = element_text(size = 12),
panel.grid.minor = element_blank(),
legend.position = "bottom") +
scale_colour_viridis_d() +
facet_wrap(effect ~ .)
pwr_fig
The pwrss
R package enables statistical power and
minimum required sample size calculations for:
The alternative hypothesis can be formulated as “not equal,” “less,” “greater,” “non-inferior,” “superior,” or “equivalent” in (1), (2), (3), and (4); as “not equal,” “less,” or “greater” in (5), (6), (7), and (8); and always as “greater” in (9), (10), (11), (12), (13), and (14).
# library(pwrss)
# library(binom)
n_chronos <- seq(250,350,by=5)
# Create a data frame with n_chronos values and their corresponding confidence intervals
data_listing <- data.frame(n_chronos)
# Calculate and store the confidence intervals using binom.confint
data_listing <- data_listing %>%
rowwise() %>%
mutate(
lower_ci = binom.confint(x = ceiling(0.25 * n_chronos), n = n_chronos, conf.level = 0.95)%>%filter(method == "exact") %>% select(lower),
upper_ci = binom.confint(x = ceiling(0.25 * n_chronos), n = n_chronos, conf.level = 0.95)%>%filter(method == "exact") %>% select(upper),
ci_width = upper_ci-lower_ci
)
data_listing %>%
kable(col.names = c("Sample Size CHRONOS US", "CI Lower Limit", "CI Upper Limit", "CI Width"),
caption = "95% Clopper-Pearson confidence interval (two-sided) for 25% ORR") %>%
add_footnote(c("Confidence Interval based on Clopper-Pearson method")) %>%
kable_styling(latex_options = "striped")
Sample Size CHRONOS US | CI Lower Limit | CI Upper Limit | CI Width |
---|---|---|---|
250 | 0.1994115 | 0.3105726 | 0.11116113 |
255 | 0.1989777 | 0.3088724 | 0.10989471 |
260 | 0.1985643 | 0.3072330 | 0.10866871 |
265 | 0.2016417 | 0.3096395 | 0.10799774 |
270 | 0.2012030 | 0.3080379 | 0.10683492 |
275 | 0.2007834 | 0.3064906 | 0.10570719 |
280 | 0.2003819 | 0.3049948 | 0.10461288 |
285 | 0.2032368 | 0.3072509 | 0.10401417 |
290 | 0.2028140 | 0.3057864 | 0.10297232 |
295 | 0.2024083 | 0.3043681 | 0.10195979 |
300 | 0.2020187 | 0.3029940 | 0.10097526 |
305 | 0.2046807 | 0.3051175 | 0.10043677 |
310 | 0.2042731 | 0.3037695 | 0.09949636 |
315 | 0.2038808 | 0.3024615 | 0.09858073 |
320 | 0.2035029 | 0.3011918 | 0.09768883 |
325 | 0.2059962 | 0.3031974 | 0.09720118 |
330 | 0.2056029 | 0.3019497 | 0.09634681 |
335 | 0.2052234 | 0.3007370 | 0.09551360 |
340 | 0.2048569 | 0.2995576 | 0.09470072 |
345 | 0.2072013 | 0.3014577 | 0.09425642 |
350 | 0.2068215 | 0.3002973 | 0.09347576 |
_Note:__ | |||
a^Confidence Interval b | sed on Clopper- | earson method |
# Plot the results
ggplot(Power, aes(x = n, y = power, color = nARES)) +
geom_point() +
labs(x = "Sample Size", y = "Power") +
scale_x_continuous(breaks = seq(100, 200, by = 10)) +
scale_y_continuous(breaks = seq(0.75, 0.99, by = 0.01)) +
geom_hline(yintercept = 0.8, linetype = "dashed", color = "red") +
ggtitle("Power Analysis vs Sample Size (25% ORR)") +
theme_bw() +
guides(color = guide_legend(title = "Senario",
title.position = "top",
title.hjust = 0.5))
SSC_60 %>%
mutate(CHRONOS_p = percent(CHRONOS_p),
ARES_p = percent(ARES_p)) %>%
`rownames<-`(NULL) %>%
kable(caption = "Sample Size Calculation for ORR 60%", format = "html") %>%
kable_styling(latex_options = "striped")
CHRONOS_p | ARES_p | Power | CHRONOS_n | ARES_n | ARES_EU_n | ARES_US_n |
---|---|---|---|---|---|---|
20.00% | 60.00% | 80% | 27 | 21 | 75 | 0 |
21.00% | 60.00% | 80% | 29 | 22 | 75 | 0 |
22.00% | 60.00% | 80% | 31 | 23 | 75 | 0 |
23.00% | 60.00% | 80% | 33 | 25 | 75 | 0 |
24.00% | 60.00% | 80% | 35 | 27 | 75 | 0 |
25.00% | 60.00% | 80% | 38 | 28 | 75 | 0 |
26.00% | 60.00% | 80% | 40 | 30 | 75 | 0 |
27.00% | 60.00% | 80% | 43 | 33 | 75 | 0 |
28.00% | 60.00% | 80% | 46 | 35 | 75 | 0 |
29.00% | 60.00% | 80% | 50 | 37 | 75 | 0 |
30.00% | 60.00% | 80% | 53 | 40 | 75 | 0 |
31.00% | 60.00% | 80% | 57 | 43 | 75 | 0 |
32.00% | 60.00% | 80% | 62 | 47 | 75 | 0 |
33.00% | 60.00% | 80% | 67 | 51 | 75 | 0 |
34.00% | 60.00% | 80% | 73 | 55 | 75 | 0 |
35.00% | 60.00% | 80% | 79 | 60 | 75 | 0 |
36.00% | 60.00% | 80% | 86 | 65 | 75 | 0 |
37.00% | 60.00% | 80% | 94 | 71 | 75 | 0 |
38.00% | 60.00% | 80% | 103 | 78 | 75 | 3 |
39.00% | 60.00% | 80% | 114 | 86 | 75 | 11 |
40.00% | 60.00% | 80% | 126 | 95 | 75 | 20 |
41.00% | 60.00% | 80% | 140 | 105 | 75 | 30 |
42.00% | 60.00% | 80% | 156 | 118 | 75 | 43 |
43.00% | 60.00% | 80% | 176 | 132 | 75 | 57 |
44.00% | 60.00% | 80% | 199 | 150 | 75 | 75 |
45.00% | 60.00% | 80% | 227 | 171 | 75 | 96 |
# Create the scatter plot
corresponding_x <- as.numeric(SSC_60$CHRONOS_p[which.min(abs(SSC_60$ARES_n - 138))])
SSC_60 %>% ggplot(aes(x = CHRONOS_p, y = ARES_n)) +
geom_point() +
labs(x = "CHRONOS_p", y = "ARES_n") +
scale_x_continuous(breaks = seq(0.20, 0.45, by = 0.01)) +
scale_y_continuous(breaks = seq(0,400,by = 50)) +
geom_hline(yintercept = 138, linetype = "dashed", color = "red") +
geom_vline(xintercept = corresponding_x, linetype = "dashed", color = "blue") +
geom_text(aes(x = corresponding_x+0.005, y = 138, label = "138"), color = "red")+
xlab("CHRONOS ORR") +
ylab("Total Sample Size in ARES") +
ggtitle("Scatter Plot ARES Total Sample Size vs ORR") +
theme_bw()
## Power
Analysis Software::G*Power
Results are provided from a simulation of a phase III study in schizophrenia patients, investigating efficacy at 3 dose levels (Low, Medium, High) versus placebo. The study includes a primary endpoint and two key secondary endpoints. The study includes a total of 9 hypotheses, and multiplicity adjustments are required to control for type I error. The endpoints can be considered as 3 families of hypotheses as defined below:
Hypotheses to be tested (labelled as H1 to H9) are tested using a serial gatekeeping procedure, as illustrated below:
Below is a detailed description of the multiple testing procedure:
p-values are produced for each of the 9 hypotheses, adjusted using an appropriate method [1] that ensures that the overall type I error rate for the study is maintained at 0.05. Simulations work has been carried out with the purpose of finding optimum values of parameters gamma1 and gamma2, in terms of operating characteristics (i.e. maximising power while controlling for type I error). The simulations included a range of true treatment effects, and implemented the multiple testing procedure described above using a range of values for gamma1 and gamma2.
Produce data visualisation(s) to provide insights into the relationships between the variables. For example:
Bulus, M., & Polat, C. (2023). pwrss R paketi ile istatistiksel güç analizi [Statistical power analysis with pwrss R package]. Ahi Evran Üniversitesi Kırşehir Eğitim Fakültesi Dergisi, 24(3), 2207-2328. https://doi.org/10.29299/kefad.1209913
SIG (2023, Nov. 8). VIS-SIG Blog: Wonderful Wednesdays November 2023. Retrieved from https://graphicsprinciples.github.io/posts/2023-12-17-wonderful-wednesdays-november-2023/
Dmitrienko A et al. 2016. Mixture-based gatekeeping procedures for multiplicity problems with multiple sequences of hypotheses. J. Biopharm. Stats. (26), 758–780