Bioequivalence and Biosimilar Trials
Biological medicines, commonly referred to as biologics, are drugs that are derived from natural, living sources such as animal and plant cells, or microorganisms like bacteria and yeast. These sources can contribute various biological substances, including proteins, sugars, or combinations of these, which are used to create the medicines.
Biologics are typically more complex than chemically synthesized drugs. This complexity arises from their biological origins and the intricate processes involved in their production, which include purification and manufacturing. Due to their complexity, biologics often require specialized handling and administration methods.
Biosimilars are a specific type of biologic drug designed to be highly similar to an existing FDA-approved biologic, known as the original biologic or the reference product. Although biosimilars are similar to their reference products, they are not exact replicas, which is a key distinction from generic drugs in the context of traditional, non-biological medications.
The development of biosimilars involves ensuring that there are no clinically meaningful differences from the reference product in terms of safety, purity, and potency. This means that a biosimilar is expected to perform in the same way as its reference product in any given treatment, providing the same therapeutic benefits and safety profile.
“In clinical trials, the objective is to evaluate the efficacy & safety of a new treatment”
In a clinical trial, researchers primarily aim to determine:
This is the universal goal, regardless of the specific disease or treatment.
“However, the definition of ‘efficacy’ will depend on the clinical and regulatory scenario e.g. new treatment approval vs generic trt. approval”
“Efficacy” isn’t always about being better than the current treatment.
“Equivalence: ‘equivalent to control’ by being within lower & upper margins”
Equivalence means the treatment effect is not meaningfully different from the control.
Researchers define two margins:
If the treatment’s effect falls between these two, it is considered equivalent.
This means unlike superiority testing (where better is better), equivalence testing is direction-neutral. You’re not trying to prove better or worse — just similar enough.
“Test to show a new treatment is equivalent to an existing one. Uses two one-sided tests (TOST)”
TOST stands for Two One-Sided Tests — a statistical method for testing equivalence.
You simultaneously test two hypotheses:
Null Hypothesis H₀: The treatment is not equivalent, i.e.,
\[ Δ ≤ Δ_L \quad \text{or} \quad Δ ≥ Δ_U \]
Alternative Hypothesis H₁: The treatment is equivalent, i.e.,
\[ Δ_L < Δ < Δ_U \]
where \(Δ\) is the difference in effect between the new and control treatments.
Define margins:
Test both null hypotheses using one-sided significance levels, which is the essence of the TOST approach.
💊 What is Bioequivalence?
Bioequivalence (BE) refers to the relationship between two drug products that contain the same active ingredient and, when given in the same molar dose, demonstrate similar bioavailability (BA) — meaning that the rate and extent of drug absorption into the bloodstream are not significantly different between the two.
According to the International Council for Harmonisation (ICH, 2024), two products are considered bioequivalent if their BA parameters fall within predefined acceptable limits, ensuring they can be expected to act the same way in the body.
BE is crucial in drug development and regulatory science because it allows:
In essence, BE studies serve as a shortcut to demonstrate therapeutic equivalence—we assume that if two drugs behave similarly in terms of pharmacokinetics, they will have the same clinical effect.
🧪 How is Bioequivalence Assessed?
The most common and regulatory-accepted method is through pharmacokinetic (PK) studies. The key parameters include:
In BE studies, these metrics are compared between the test product (e.g., a generic drug) and the reference product (e.g., a brand-name drug).
To be considered bioequivalent, the ratio of these PK measures (usually on the log scale) between the test and reference products must fall within the 80%–125% range for the geometric mean ratio (GMR). This is the standard acceptance range set by most regulatory agencies.
In special cases where PK measures are insufficient or unavailable, other methods may be used:
📊 Types of BE Assessments
Different statistical models may be used depending on the study population or variability:
A successful BE study enables:
To demonstrate biosimilarity, bioequivalence in the pharmacokinetics between the candidate biosimilar and the reference product is required. For both the FDA and EMA it is necessary demonstrate that the 90% confidence interval (CI) for the Geometric Mean Ratio (GMR) for the PK endpoint(s) of AUC and/or Cmax falls within the equivalence limits of 0.8 to 1.25.
The key differences between the regulatory requirements arises in the requirements for the clinical equivalence which must be demonstrated between the candidate biosimilar and reference product:
Other differences, which may vary according to the specific biologic medicine and indication, include:
In most cases separate US-sourced and EU-sourced reference products are available and it will be necessary to show bioequivalence of the candidate biosimilar to both sources of the reference. Bioequivalence can be demonstrated in either healthy volunteers or in patients. In addition, demonstration of equivalence in relevant clinical or pharmacodynamic (PD) endpoints in a comparative patient study is required. Whilst many biological medicines are approved for use in several indications, there is no requirement to evaluate the candidate biosimilar in each of the approved indications. When it can be assumed that the mechanism of action is the same across indications, as is often the case for auto-immune diseases, for example, evaluation in a single indication will suffice. The choice of which indication is often a pragmatic one, weighing up the number of patients that would need to be recruited to demonstrate equivalence on the endpoint of interest, with the available pool of patients.
Bioequivalence (BE) should usually be demonstrated in healthy volunteers, as this is considered the most homogeneous and sensitive population, i.e., the one in which differences between the biosimilar candidate and reference are most likely to be detected. If a biologic medicine is not considered safe to administer to healthy volunteers, bioequivalence will need to be demonstrated in patients, often as part of a Phase I/III study
Study Designs: There are two study designs that can be used for such studies; crossover or parallel group. A crossover study offers advantages in terms of sample size requirements. However, if the half-life of the biologic medicine is long, such a study is not practical and a parallel group study is required.
Primary Pharmacokinetic Endpoints: The primary pharmacokinetic (PK) endpoints are usually area under the curve (AUC), with the specific choice of AUC parameter dependent on the biological medicine and informed by clinical pharmacology experts and maximum concentration (Cmax). AUC from time zero to infinity (AUC0-inf) is usually preferred by the regulatory authorities with AUC0-last as a key secondary endpoint. Other PK parameters will usually be determined and reported, but are not subject to formal comparison, with no requirement to show equivalence.
Sample Size and Statistical Power:
When the sample size in each group is 106, a three-group design will have 90% power to reject both the null hypothesis that the ratio of the test mean to the reference mean is below 0.8 and the null hypothesis that the ratio of test mean to the reference mean is above 1.25; i.e., that the test and standard are not equivalent, in favor of the alternative hypothesis that the means of the two groups are equivalent, assuming that the expected ratio of means is 1, the CV is 0.522, that data will be analyzed in the log-scale using t-tests for differences in means, and that each t-test is made at the 5% level. Allowing for 10% losses to follow-up (dropout), at least 354 subjects will be enrolled.
The aim of the comparative Phase 3 clinical equivalence study is not to demonstrate clinical benefit, but to confirm clinical equivalence of the proposed biosimilar to reference product (EMA, 2014; FDA, 2016). The study population should be sufficiently sensitive for detecting potential product-related differences while at the same time minimizing the influence of patient- or disease-related factors.
The most sensitive study population would typically be one in which a robust treatment effect has been shown, as this facilitates the detection of small differences in efficacy. Prior lines of therapy and the effect of concomitant medications are also important considerations. Ideally, the Phase 3 study should be performed with first-line or monotherapy, in a homogeneous patient population (e.g., in terms of disease severity) using a short-term clinical efficacy endpoint. From the perspective of feasibility and study conduct it is also important that the condition is relatively common, particularly if there are likely to be a number of competing studies.
The choice of primary endpoint will be driven by the indication under study, the endpoints used in the pre-licensing studies for the reference product, and any indication specific regulatory guidance. When the clinically relevant endpoint would require treatment over a long duration it’s common to use a PD measure as the primary endpoint in the study, for example, for denosumab biosimilars the endpoint of interest is usually bone mineral density (BMD) rather than fracture rates.
To establish clinical equivalence, similarity margins must be defined, usually to ensure the preservation of a proportion of the effect of the biosimilar vs placebo. The similarity margin is usually derived from a meta-analysis of studies performed with the reference product. In some cases, where other biosimilars have already been given approval, the choice of margin will already have been established.
The FDA and EMA differ in their requirements for the similarity margin. The FDA requires the margin to be chosen based on approximately 50% of the lower bound of the 95% CI. The EMA requires preservation of 50% of the point estimate. Furthermore, FDA accepts demonstration of equivalence based on the 90% CI, whereas the EMA requires the 95% CI (See Section 3.3). Illustrative text for a sample size determination is given below:
When applying the FDA requirements, a sample size of 325 subjects in each group yields 80.03% power for a Z-test (Pooled) to reject the null hypothesis that the difference in the response rate in the treatment group and the response rate in the reference group is outside the margin of equivalence of -0.11 to 0.11, i.e. that the treatment and reference are not equivalent, in favor of the alternative hypothesis that the response rates of the treatment and reference groups are equivalent, assuming that the actual difference in response rates is 0, the response rate in the reference group is 0.36, and that each of the two one-sided tests is made at the 5% significance level (90% confidence interval).
When applying the EMA requirements, a sample size of 325 subjects in each group yields 86.64% power for a Z-test (Pooled) to reject the null hypothesis that the difference in the response rate in the treatment group and the response rate in the reference group is outside the margin of equivalence of -0.13 to 0.13, i.e. that the treatment and reference are not equivalent, in favor of the alternative hypothesis that the response rates of the treatment and reference groups are equivalent, assuming that the actual difference in response rates is 0, the response rate in the reference group is 0.36, and that each of the two one-sided tests is made at the 2.5% significance level (95% confidence interval).
Allowing for 10% losses to follow-up (dropout), at least 730 subjects will be enrolled.
In some cases it is necessary to include a single change of treatment from the reference to the candidate biosimilar following assessment of the primary endpoint. The aim is to assess the impact on development of neutralizing antibodies (nABs) and any safety concerns following a switch from the reference to the biosimilar, since it is likely that many patients already receiving the reference will have their treatment changed to the biosimilar once approved. This switch is usually handled by randomization in the reference group(s).
When it’s not possible to assess bioequivalence in healthy volunteers, bioequivalence must be established in patients. For convenience and to reduce the number of patients required, this evaluation is conducted within the comparative clinical equivalence study. In some studies sampling for PK endpoints is conducted in all patients, whereas in others it is limited to only the number of patients required to establish bioequivalence. The decision is informed by the acceptability of the PK sampling scheme to patients, taking into account any challenges to recruitment if only a subset of patients will be subject to the additional PK sampling
Similarity Margin:
The similarity margin is derived from a meta-analysis (Random effects model) of four randomized studies, SOLO 1, SOLO 2, R668-AD-1021 and CHRONOS, which resulted in a difference in the proportions achieving EASI-75 of 0.38, with 95% CI of 0.32 to 0.43. For FDA requirements the margin is based on approximately 50% of the lower bound of the 95% CI (0.32), yielding a margin of 16%. For the EMA requirements, the margin is based on 50% of the point estimate (0.38), which gives a margin of 19%. FDA accepts demonstration of equivalence based on the 90% CI, whereas EMA requires the 95% CI.”
And of analysis text:
“Primary efficacy variable (FDA Analysis) The primary endpoint will be the proportion of subjects with EASI-75 (≥75% improvement from baseline) at Week 16. Equivalence will be concluded if the 90% confidence interval of the difference between the 2 proportions is completely contained within the interval [ 16%;+16%].
Primary efficacy variable (EMA Analysis) The primary endpoint will be the proportion of subjects with EASI-75 (≥75% improvement from baseline) at Week 16. Equivalence will be concluded if the 95% confidence interval of the difference between the 2 proportions is completely contained within the interval [ 19%;+19%].”
A crossover design is a type of clinical trial where each participant receives multiple treatments in a specific order (sequence), allowing direct within-subject comparisons.
📘 Key Features:
Subjects act as their own control, which reduces inter-subject variability.
Particularly useful in chronic or stable conditions, where treatment effects are reversible and short-term.
Most commonly used design: 2x2 crossover
✅ Advantages of Crossover Designs
Advantage | Explanation |
---|---|
Reduced variation | Since each subject receives both treatments, the design eliminates between-subject differences. This leads to more precise estimates. |
Higher efficiency | Typically requires a smaller sample size than a parallel-group design for the same power. |
Ethical benefit | Every patient gets active treatment, not just placebo, reducing ethical concerns especially in severe conditions. |
❌ Disadvantages of Crossover Designs
Disadvantage | Explanation |
---|---|
Not suitable for unstable/permanent effects | Can’t use when diseases are unstable or treatment effects are long-lasting or curative. |
Carryover effects | Treatment effects from one period can carry into the next, biasing results, even with a washout period. |
Dropout risk & complexity | Longer follow-up = more chance for dropout, missed visits, or patient fatigue, leading to data loss or complications in analysis. |
Designs with >2 sequences and/or >2 periods
Described using the notation: treatments × sequences × periods
2×4×2
= 2 treatments, 4 sequences, 2 periods
(Balaam’s design)Why Use Higher-Order Designs?
Estimate intra-subject variability independently
Better for handling carryover effects
Required for:
Common Higher-Order Designs
Design | Description |
---|---|
2×4×2 (Balaam’s) | Useful when comparing two treatments and getting replicate data within subjects. |
2×2×3 (Dual balanced) | Helps balance carryover effects and period effects. |
2×4×4 | Extended periods and sequences for better control of variability. |
Latin Square | Used for >2 treatments; ensures that each treatment appears once per period and is preceded/followed equally. Example: ABC, BCA, CAB |
Williams Design | A special Latin Square where each treatment precedes and follows every other treatment equally. Balances out carryover effects efficiently. Example: ABCD, BDAC, CADB, DCBA |
“This bioequivalence study was a single-center, single-dose, randomized, open, two-way crossover study performed between the sildenafil 100 mg orodispersible film (IBSA, test) and the marketed Viagra 100 mg film-coated tablet (Pfizer, reference).
The highest coefficient of variance for the pharmacokinetic parameters Cmax and AUC was estimated to be 0.383… Fixing the significance level α at 5% and the hypothesized test/reference mean ratio to 1, 50 subjects were considered sufficient to attain a power of 80% to correctly conclude the bioequivalence between the two formulations within the range 80.00%–125.00% for all parameters (Cmax and AUC).”
Source: Radicioni, et al. (2017)
Parameter | Value |
---|---|
Significance Level | 0.05 |
Lower Equivalence Limit | 0.80 |
Upper Equivalence Limit | 1.25 |
Expected True Mean Ratio | 1 |
Coefficient of Variation | 0.383 |
Sample Size | 50 |
Power (%) | 80% |
“We assume a CV of 0.25, a T/R-ratio of 0.95, and target a power of 0.80 in a 6-sequence 3-period Williams’ design.
As usual in BE, alpha = 0.05 is employed (we will assess the study by a 100(1−2α)=90% confidence interval). The BE-limits are theta1 = 0.80 and theta2 = 1.25.”
Source: bebac.at
Parameter | Value |
---|---|
Significance Level | 0.05 |
Lower Equivalence Limit | 0.80 |
Upper Equivalence Limit | 1.25 |
Expected True Mean Ratio | 0.95 |
Coefficient of Variation | 0.25 |
Sample Size | 30 |
Target power (%) | 80% |
We assume a 2-sequence 4-period replicate design for the assessment of a NTID with a CV of 0.125, a T/R-ratio of 0.95, and target a power of 0.80.
As usual in BE, alpha = 0.05 is employed (we will assess the study by a 100(1−2α)=90% confidence interval). The BE-limits are theta1 = 0.90 and theta2 = 1.1111.
Source: bebac.at
Parameter | Value |
---|---|
Significance Level | 0.05 |
Lower Equivalence Limit | 0.90 |
Upper Equivalence Limit | 1.1111 |
Expected True Mean Ratio | 0.95 |
Coefficient of Variation | 0.125 |
Sample Size | 34 |
Target power (%) | 80% |
Immunogenicity endpoints (ADA, ADA titers, and NAb) will typically be summarized descriptively for the safety analysis set. The number and percentage of subjects testing positive for ADAs, their titers, and the percentage of subjects with NAbs will be tabulated by actual treatment, scheduled visit, and overall when applicable.
Log-transformed PK primary endpoints (e.g., AUC0-inf, AUC0-last, and Cmax) will usually be analyzed on the PK Analysis Set using an ANOVA model with treatment and stratification factors as fixed effects. For the comparison of primary endpoints, the 90% confidence intervals (CIs) for the GMR will be derived by exponentiating the 90% CI obtained for the difference between the two treatments’ LS means resulting from the analysis of the log-transformed PK primary endpoints. If the 90% CIs for the GMR of all PK primary endpoints fall entirely within the 0.8 to 1.25 equivalence margins, PK equivalence between the two treatments will be declared. Other PK endpoints will usually be summarized by treatment.
The analysis approach will depend on the data type for the primary endpoint, i.e., binary, continuous, or time to event. In all cases, the point estimate of the treatment effect, whether a difference or ratio, and the associated 90% CI (for the FDA) and 95% CI (for the EMA) will be calculated. An adjustment for multiplicity in the analyses for the two main regulators is not required, since each regulator is uninterested in the result as presented to the other, i.e., multiple testing is not being conducted. If the calculated CI falls entirely within the defined equivalence or similarity margins, the candidate biosimilar is considered equivalent to the reference biologic. For example, in comparing the biomarker responses of two treatments based on CTX and P1NP, the area under the effect curve (AUEC0-wkxx) for percentage change from baseline of CTX and P1NP was analyzed using an analysis of covariance with log (AUEC0-wkxx) as the dependent variable, with treatment, and stratification factors as fixed effects, and log (baseline serum CTX or P1NP concentration) as a covariate. The 90% CI for the GMR of AUEC for percentage change from baseline of CTX and P1NP was derived by exponentiating the 90% CI obtained for the difference between the two treatments’ LS means.
Most clinical pharmacology studies are considered exploratory rather than confirmatory, and as such, do not strictly require the definition of estimands. However, bioequivalence studies are considered confirmatory, and possible intercurrent events (ICEs) and strategies for handling these in the statistical analysis should be considered and detailed (ICH E9 (R1)). For example, potential intercurrent events which lead to subjects being excluded from the PK analysis set might be handled using a principal stratum strategy, and these subjects will not be included in the primary analysis. There is significant discussion about how ICEs in clinical equivalence studies should be handled, given the concerns that an analysis based on the intention-to-treat (ITT) analysis set will be more likely to conclude equivalence, i.e., is not the conservative approach. As with other types of studies, the indication, frequency of administration, and the endpoint all need to be considered when determining the appropriate strategy. In some circumstances, the regulators may require the primary analysis to be based on a per protocol (PP) analysis set, while in others, an analysis based on the ITT or full analysis set (FAS) may be required, with appropriate ICE strategies. Clear justification and early discussions with the regulators are advised.
ICH E9 (R1) addendum on estimands and sensitivity analysis in clinical trials to the guideline on statistical principles for clinical trials, 2020
Julious SA, McIntyre NE. Sample sizes for trials involving multiple correlated must-win comparisons. Pharm Stat. 2012 Mar-Apr;11(2):177-85. doi: 10.1002/pst.515. Epub 2012 Mar 1. PMID: 22383136.
Biosimilars Market by Drug Class (Drug Class (Monoclonal Antibodies (adalimumab, Infliximab, rituximab, Trastuzumab), Insulin, erythropoietin, anticoagulants. RGH), Indication, Region – Global Forecast to 2028. Biosimilars Market. Jun2023, Report Code: PH7582
Chow, S., & Liu, J. (2008). Design and Analysis of Bioavailability and Bioequivalence Studies (3rd ed.). Chapman and Hall/CRC.
Hauschke, D., Steinijans, V., & Pigeot, I. (2007). Bioequivalence Studies in Drug Development: Methods and Applications. Wiley.
Niazi, S. K. (2007). Handbook of Bioequivalence Testing. CRC Press.
Chow, S. (2014). Bioavailability and bioequivalence in drug development. Wiley Interdisciplinary Reviews: Computational Statistics, 6(4), 304–312. https://doi.org/10.1002/wics.1310
Open Resources for Nursing (Open RN); Ernstmeyer K, Christman E (Eds.). (2023). Nursing Pharmacology (2nd ed.). Eau Claire (WI): Chippewa Valley Technical College. Chapter 1: Pharmacokinetics & Pharmacodynamics. Available from: https://www.ncbi.nlm.nih.gov/books/NBK595006/
Schütz, H. A short history of bioequivalence. Retrieved from: https://bebac.at/articles/A-Short-History-of-Bioequivalence.phtml#Introduction
Schütz, H. Sample size estimation for equivalence studies in a replicate design. Retrieved from: https://bebac.at/articles/Sample-Size-Estimation-for-Equivalence-Studies-in-a-Replicate-Design.phtml
U.S. Food and Drug Administration (FDA), Center for Drug Evaluation and Research (CDER). (n.d.). Bioequivalence Studies With Pharmacokinetic Endpoints for Drugs Submitted Under an ANDA: Guidance for Industry. Retrieved from: https://www.fda.gov/media/87219/download
Center for Drug Evaluation and Research (CDER), FDA. Helpful Webinars and Other Resources for Generic Drug Manufacturers. Retrieved from: https://www.fda.gov/drugs/abbreviated-new-drug-application-anda/helpful-webinars-and-other-resources-generic-drug-manufacturers
European Medicines Agency (EMA). (n.d.). Guideline on the Investigation of Bioequivalence. Retrieved from: https://www.ema.europa.eu/en/documents/scientific-guideline/guideline-investigation-bioequivalence-rev1_en.pdf
International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use (ICH). (2024). Bioequivalence for Immediate Release Solid Oral Dosage Forms M13A. Retrieved from: https://database.ich.org/sites/default/files/ICH_M13A_Step4_Final_Guideline_2024_0723.pdf
Saville, D. (2007). Bioequivalence. Best Practice Journal, March 2007. Retrieved from: https://bpac.org.nz/bpj/2007/march/bioequiv.aspx
Chow, S., Shao, J., & Wang, H. (2007). Sample Size Calculations in Clinical Research. Chapman and Hall/CRC.
Senn, S. S. (1993). Cross-over Trials in Clinical Research. Wiley.
Jones, B., & Kenward, M. G. (1989). Design and Analysis of Cross-Over Trials. Chapman and Hall/CRC.
Fenta, H. M. (2014). Determination of sample size for two-stage sequential designs in bioequivalence studies under 2x2 crossover design. Science Journal of Clinical Medicine, 3(5), 82. https://doi.org/10.11648/j.sjcm.20140305.12
Radicioni, M., Castiglioni, C., Giori, A., Cupone, I., Frangione, V., & Rovati, S. (2017). Bioequivalence study of a new sildenafil 100 mg orodispersible film compared to the conventional film-coated 100 mg tablet administered to healthy male volunteers. Drug Design, Development and Therapy, 11, 1183–1192. https://doi.org/10.2147/dddt.s124034
Owen, D. B. (1965). A special case of a bivariate non-central t-distribution. Biometrika, 52(3/4), 437. https://doi.org/10.2307/2333696
Schütz, H. Sample size estimation for equivalence studies in higher-order crossover designs. Retrieved from: https://bebac.at/articles/Sample-Size-Estimation-for-Equivalence-Studies-in-Higher-Order-Crossover-Designs.phtml
Comprehensive R Archive Network (CRAN). Power and sample size for (bio)equivalence studies [R package PowerTOST version 1.5-6]. Retrieved from: https://cran.r-project.org/package=PowerTOST