Sample Size and Power for Non-Parametric Analysis
Parametric assumptions are foundational in statistical analysis, involving assumptions about the form or distribution of the underlying data. These assumptions allow statisticians and researchers to model data using a specified, finite set of parameters, which simplifies analysis and aids in making predictions and inferences.
Examples and Implications of Parametric Assumptions
Limitations of Parametric Tests
Despite their widespread use, parametric tests have limitations, particularly when the assumptions they rely on do not hold. For example, real clinical data often do not follow simple parametric distributions; they may show skewness, kurtosis, or contain extreme outliers. In such cases, applying parametric models without adjusting for these features can lead to biased and misleading results.
Non-Parametric and Semi-Parametric Methods
In contrast to purely parametric methods, non-parametric and semi-parametric methods provide flexibility to deal with data that do not meet the strict assumptions of parametric tests.
Non-parametric methods are statistical techniques that make few or no assumptions about the underlying distribution of the data. This feature distinguishes them from parametric methods, which require specific distributional assumptions like normality. Non-parametric methods are especially useful in handling data types that are difficult to fit into classical parametric frameworks, such as data with extreme outliers, infinite variance, or ordinal and interval scales.
Characteristics and Uses of Non-Parametric Methods
Minimal Assumptions: Non-parametric methods require fewer assumptions about the data’s distribution. This characteristic can be beneficial when data do not conform to the stringent requirements of parametric methods, such as normal distribution or homoscedasticity.
Interpretability and Power: The lack of assumptions can sometimes result in lower statistical power compared to parametric tests, which means non-parametric methods might need larger sample sizes to achieve similar power. Additionally, because these methods do not assume a specific distribution, the results can be less interpretable in terms of parameters that describe the population.
Application to Complex Data:
Rank Tests:
Median Tests:
Log-Rank Test:
Non-Parametric Regression:
Parametric Test | Non-Parametric Equivalent |
---|---|
One-Sample/Paired t-test | Wilcoxon Signed Rank Test |
Two-Sample t-test | (Wilcoxon-)Mann-Whitney U-test |
ANOVA | Kruskal-Wallis (One-way), Friedman (Repeated Measures) |
Pearson Correlation | Spearman Correlation |
Calculating the sample size required for non-parametric tests involves different considerations and methodologies depending on the type of data (continuous or ordinal) and the specifics of the hypothesis being tested. Non-parametric methods typically require larger sample sizes than parametric methods to achieve comparable power because of their reliance on fewer assumptions about the underlying data distribution.
Power Methods for Continuous | Asymptotic Relative Efficiency (A.R.E.), Calculation of Moment(s), Direct Input of Moments (e.g. p1 = P[X>Y]), Exemplary Dataset Approach, Simulation-based power |
---|---|
Power Methods for Ordinal | Calculation of Moment(s) (e.g. Kolassa or O’Brien & Castelloe for Mann-Whitney U-test), Exemplary Dataset Approach, Simulation-based power |
When designing a study involving continuous data, non-parametric tests often assume a location shift hypothesis—this means the distributions of two groups are identical except for a shift in location. The required sample size can depend heavily on this assumption:
Ordinal data, which involve rankings or ordered categories, require specific methods that consider the nature of the data:
Case Study
The study sample was set … 70 per group were fully assessable. This number was estimated by assuming that all patients would achieve ulcer healing by the end of week 24±1, and that mean (SD) time to healing would be 84 (42) days in the placebo group and 63 (42) days in the mesoglycan group. The specified significance level was 0.05 (two-tailed) and statistical power was 0.80.
Understand the Effect Size: For the Mann-Whitney U-test, the effect size can be approximated by converting the difference between groups into a standardized effect size (Cohen’s d for example), or by estimating the probability that a randomly picked score from one group will be higher than a randomly picked score from another group.
Conversion to Z-Score: You’ll need to convert this effect size into a z-score under the normal curve, which can be used directly in non-parametric calculations or approximated for use in R’s power calculation functions.
Using Simulation: Since direct calculation methods for non-parametric tests are not always straightforward in standard R packages, simulating data based on the assumed distributions and then applying the Mann-Whitney U-test repeatedly to estimate power is a more accurate approach.
For parametric method using two sample t-test below
##
## Two-sample t test power calculation
##
## n = 63.76561
## d = 0.5
## sig.level = 0.05
## power = 0.8
## alternative = two.sided
##
## NOTE: n is number in *each* group
This approach below uses an assumption that the data are approximately normally distributed to calculate Cohen’s d. Since you are actually using a non-parametric test, the next step would be to simulate data that match the descriptions and perform the Mann-Whitney U-test multiple times to empirically determine the power. Here’s a simple way to do that using simulation:
## [1] 0.823
Noether, G. E. (1987). Sample size determination for some common nonparametric tests. Journal of the American Statistical Association, 82(398), 645–647.
Shieh, G., Jan, S. L., & Randles, R. H. (2007). Power and sample size determinations for the Wilcoxon signed-rank test. Journal of Statistical Computation and Simulation, 77(8), 717–724.
Wang, H., Chen, B., & Chow, S. C. (2003). Sample size determination based on rank tests in clinical trials. Journal of Biopharmaceutical Statistics, 13(4), 735–751.
Kolassa, J. E. (1995). A comparison of size and power calculations for the Wilcoxon statistic for ordered categorical data. Statistics in Medicine, 14(14), 1577–1581.
Whitehead, J. (1993). Sample size calculations for ordered categorical data. Statistics in Medicine, 12(24), 2257–2271.
Dixon, W. J., & Massey, F. J. (1983). Introduction to Statistical Analysis (4th ed.). New York: McGraw-Hill.
O’Brien, R. G., & Muller, K. E. (1993). Unified power analysis for t-tests through multivariate hypotheses. In L. K. Edwards (Ed.), Statistics: Textbooks and Monographs, Vol. 137. Applied Analysis of Variance in Behavioral Science (pp. 297–344). New York: Marcel Dekker.
Divine, G., Kapke, A., Havstad, S., & Joseph, C. L. (2009). Exemplary data set sample size calculation for Wilcoxon–Mann–Whitney tests. Statistics in Medicine, 29(1), 108–115.
Tang, Y. (2011). Size and power estimation for the Wilcoxon–Mann–Whitney test for ordered categorical data. Statistics in Medicine, 30(29), 3461–3470.
Arosio, E., Ferrari, G., Santoro, L., Gianese, F., & Coccheri, S. (2001). A placebo-controlled, double-blind study of mesoglycan in the treatment of chronic venous ulcers. European Journal of Vascular and Endovascular Surgery, 22(4), 365–372.
Segal, I., Khamis, S., Sagie, L., Genizi, J., Azriel, D., Katzenelenbogen, S., & Fattal‐Valevski, A. (2023). Functional benefit and orthotic effect of dorsiflexion-FES in children with hemiplegic cerebral palsy. Children (Basel), 10(3), 531.
Schaller, S., Kiselev, J., Loidl, V., Quentin, W., Schmidt, K., Mörgeli, R., Rombey, T., Busse, R., Mansmann, U., Spies, C., Marschall, U., Eckardt-Felmberg, R., Landgraf, I., Schwantes, U., Busse, R., & Mansmann, U. (2022). Prehabilitation of elderly frail or pre-frail patients prior to elective surgery (PRAEP-GO): Study protocol for a randomized, controlled, outcome assessor-blinded trial. Trials, 23(1).
Harrell, F. E., Jr. (n.d.). Biostatistics for Biomedical Research. Retrieved from https://hbiostat.org/bbr/
Harrell, F. (n.d.). Statistical thinking - What does a statistical method assume? Retrieved from https://www.fharrell.com/post/assume/
Conroy, R. (2012). What hypotheses do “nonparametric” two-group tests actually test? The Stata Journal, 12(2), 182-192.