1 Introduction

1.1 Estimands and Missing Data

Missing data in clinical studies is a major concern because it can compromise the validity, reliability, and generalizability of the study’s findings. In clinical trials, data is collected at various points to monitor the effects of interventions, track patient progress, and assess outcomes. However, missing data can occur due to various reasons, and if not handled properly, it can lead to biased results and flawed conclusions.

Missing data may result from an intercurrent event (ICE) or simply due to a missed assessment. An intercurrent event refers to an event that happens after the treatment has started and affects either the collection of data or the interpretation of the treatment’s effect. For example, a patient might stop participating in the study due to adverse effects or an unrelated medical condition that prevents further follow-up. These events can complicate the interpretation of the data because they reflect disruptions that were not anticipated at the start of the trial. In contrast, a missed assessment refers to situations where data collection fails at a particular visit or time point. This could be due to logistical issues, scheduling conflicts, or patient non-compliance, and while it may not directly affect how the treatment’s effect is interpreted, it still results in incomplete data.

Baseline data in clinical trials is usually complete. This is the data collected at the start of the study before any intervention begins. Baseline measurements are critical because they provide the reference point for understanding how patients change over time. Since baseline data is essential for trial validity, efforts are made to ensure that this data is thoroughly collected from all participants. However, as the study progresses, subsequent visits may have various missed assessments. These missing data points are more likely to occur during follow-up visits or data collection stages later in the study, particularly in long-term trials or studies involving frequent visits or tests.

The potential impact of missing data on the interpretation of study findings is significant and needs to be carefully considered when designing the study protocol. If not managed properly, missing data can introduce bias into the study. This occurs when the missing data is not random but instead is related to specific factors that could affect the outcome. For example, if participants who drop out of the study tend to have worse health outcomes, then the study results may overestimate the treatment’s effectiveness. Another issue caused by missing data is the loss of statistical power, which happens when fewer data points are available for analysis. This makes it more difficult to detect real differences between treatment groups, increasing the likelihood of false conclusions.

At the time of protocol development, researchers need to plan how to handle missing data and potential missing outcomes. Several strategies can be applied, including efforts to prevent missing data by ensuring robust follow-up processes or implementing strategies like reminder systems for patients. Additionally, statistical techniques such as imputation can be used to estimate missing values, or more advanced models that account for missing data can be employed. Sensitivity analysis, which tests how different assumptions about the missing data might affect the study’s results, is another important consideration. This ensures that the final conclusions are robust to various scenarios of missing data.

One of the first steps in managing missing data is to identify the cause of the missing data. This can involve understanding whether the missing data is Missing Completely at Random (MCAR), Missing at Random (MAR), or Missing Not at Random (MNAR). If data is MCAR, it means that the missing data is unrelated to any of the variables in the study, making the issue less concerning because it doesn’t introduce bias. MAR means that the missing data is related to observed variables, such as younger patients being more likely to miss follow-up visits. MNAR means that the missing data is related to unobserved data, such as patients with worse outcomes dropping out of the trial.

When designing a study, it is important to clearly define the research questions of interest and how missing data may affect these questions. In clinical studies, estimands are used to define the quantity being estimated and answer the research questions. Estimands provide a framework that helps define the effect of the treatment or intervention under study. Handling missing data will depend on how the estimands are defined. For example, if the goal is to estimate the effect of the treatment assuming full adherence, missing data from participants who discontinue treatment could pose a problem and would need to be appropriately accounted for. Alternatively, if the estimand reflects real-world use of the treatment, data from participants who drop out might be considered part of the natural course of the study and handled differently.

In conclusion, missing data is a complex issue that requires careful planning and consideration in clinical studies. The source of missing data, whether due to intercurrent events or missed assessments, needs to be clearly understood, and strategies for handling missing data should be built into the study design from the beginning. Identifying the causes of missing data, understanding the impact on study results, and aligning with the research objectives and estimands are essential for ensuring valid and interpretable findings in clinical research.

1.2 Estimand, Estimators and Estimation

1. Trial Objective to Estimand Flow - Trial Objective: This is the primary goal or question that the trial seeks to answer. It’s the starting point for defining what the trial will focus on. - Estimand: A precise definition of what is to be estimated in the trial. It translates the trial objective into a statistically quantifiable entity. It is essentially what the trial aims to measure or the specific effect that the trial intends to estimate. - Main Estimator: The statistical methodology or approach used to estimate the estimand. This could be a specific statistical model or analysis technique. - Main Estimate: The actual numerical estimate obtained from applying the main estimator to the trial data. This is the result that addresses the trial objective. - Sensitivity Estimators and Estimates: Additional analyses conducted to assess the robustness of the main estimate against different assumptions or conditions. These help validate the reliability of the findings under various scenarios.

2. Description of an Estimand: Attributes - Treatment: Identification of both the treatment of interest and the comparator (control treatment). It specifies what is being tested and against what it is being compared. - Variable: The primary measure or outcome variable that the study aims to evaluate. This could be a clinical metric, patient-reported outcome, or a biomarker. - Population: The specific group of patients targeted by the scientific question. This defines who the trial results will apply to. - Population-Level Summary: How the results are summarized across the entire study population, such as differences between means, ratios of proportions, or hazard ratios. This summarizes the treatment effect at the group level.

3. Intercurrent Events and Strategies to Address Them - Intercurrent events are occurrences during the trial that could affect the interpretation of the treatment effect, such as participants taking additional medication. - Treatment Policy: Considers the effect of the treatment as it is used in practice, including any additional medications taken. - Composite: Defines the endpoint to include whether or not intercurrent events occur, considering events like additional medication intake as part of the treatment assessment. - Hypothetical: Estimates what the treatment effect would have been if the intercurrent event had not occurred. - Principal Stratum: Focuses on a subset of participants who would not experience the intercurrent event, assessing the effect in this more controlled scenario. - While on Treatment: Looks at the treatment effect only before any intercurrent event occurs, isolating the effect of the treatment itself.

1.3 Causal

Causal estimands are a critical concept in the field of statistics, particularly when it comes to understanding the effect of interventions in clinical trials and observational studies. They are designed to estimate the impact of a treatment by considering what the outcome would have been under different treatment conditions.

Concept of Causal Estimands

Causal estimands are aimed at answering “what if” questions in a formal, quantitative way. They focus on understanding the effect of a specific treatment by asking how the outcomes would differ if the treatment were applied versus not applied, or if an alternative treatment were used. This approach aligns with causal inference, which seeks to infer the cause-and-effect relationship from data.

Framework of Potential Outcomes

The potential outcomes framework is fundamental to causal inference and was originally formalized by Donald Rubin. It considers every subject in a study to have potential outcomes under each treatment condition. For example: - \(Y(1)\): Outcome if the subject receives the treatment. - \(Y(0)\): Outcome if the subject does not receive the treatment.

These potential outcomes help to define the causal effect of the treatment, which cannot be observed directly since we can only observe one of these outcomes for each individual — the one corresponding to the treatment they actually received.

Causal Estimand Formula

The basic causal estimand in a randomized controlled trial (RCT) can be expressed as: - \(E(Y(1)) - E(Y(0))\) This represents the expected difference in outcomes between subjects assigned to the treatment versus those assigned to the control. This difference is what statisticians aim to estimate through the trial.

Challenges in Observational Studies

In observational studies, where treatments are not randomly assigned, estimating causal effects becomes more complex due to potential confounding factors. Here, additional models and assumptions about how treatments are assigned to patients (assignment models) and how outcomes are generated (outcome models) are necessary. These models help to adjust for factors that may influence both the treatment assignment and the outcomes.

International Council for Harmonization (ICH) and Causal Estimands

The ICH guidelines emphasize the importance of causal estimands in clinical trials, suggesting that the trials should be designed to answer specific causal questions. Even though the term “causal” is not explicitly used, the guidelines align with causal reasoning principles to ensure that the results of clinical trials are robust and interpretable in terms of causal effects.

Statistical Inference for Causal Estimands

Statistical methods are employed to estimate causal estimands from the observed data. In RCTs, this often involves comparing the observed outcomes between the treatment and control groups, leveraging the randomization to argue that these groups are comparable. In non-randomized studies, more sophisticated statistical techniques, such as instrumental variable analysis, propensity score matching, or structural models, are required.

Importance for Regulatory Authorities

Regulatory authorities, such as the FDA or EMA, are particularly interested in causal estimands because they provide a clear basis for regulatory decisions regarding drug approvals. By focusing on causal estimands, regulators can better understand the true effect of a drug, independent of other confounding treatment or patient characteristics

1.4 Missing data vs intercurrent event

Intercurrent events are incidents that occur after the initiation of treatment and may impact the interpretation of the trial outcomes or affect the continuity of the measurements related to the clinical question of interest. These can include events such as patients starting additional therapies, experiencing side effects leading to treatment discontinuation, or any other circumstances that alter the course of standard treatment administration.

Missing data refers to information that was intended to be collected but was not, due to various reasons such as patients dropping out of the study, missing visits, or failure to record certain outcomes. It’s important to distinguish between data that is missing because it was not collected (but could have been under different circumstances) and data that is considered not meaningful due to an intercurrent event.

Handling Strategies for Intercurrent Events and Missing Data

1. Treatment Policy Strategy: - Approach: Includes all data up to and possibly including the intercurrent event, considering the measurements of interest regardless of subsequent therapies or changes. - Missing Data Issue: Here, the missing data problem may arise when assuming values for unobserved outcomes based on the observed data. For example, if patients are lost to follow-up but are assumed to continue on their projected path without treatment changes.

2. Hypothetical Strategy: - Approach: Assumes a scenario where the intercurrent event, such as treatment discontinuation, does not occur. - Missing Data Issue: Focuses on hypothetical data. For instance, it would not consider data from follow-up visits if a patient had not been lost to follow-up, imagining the patient remained in the trial under the initial treatment conditions.

3. Composite Strategy: - Approach: Combines multiple elements or outcomes into a single variable that incorporates the intercurrent event as part of the variable of interest. - Missing Data Issue: Typically, there is no missing data concern under this strategy as the intercurrent events are accounted for within the composite outcome measure.

4. While-on-Treatment Strategy: - Approach: Analyzes the response to treatment only up to the point of an intercurrent event. - Missing Data Issue: There is generally no missing data because the analysis only includes data collected while the patients were on treatment and before any intercurrent event.

5. Principal Stratum Strategy: - Approach: Focuses on specific subgroups (strata) that are not affected by the intercurrent events, based on their potential outcomes under different treatment scenarios. - Missing Data Issue: This strategy avoids missing data issues by defining the population such that the intercurrent event is not considered relevant for the stratum of interest. It inherently excludes patients from the analysis if they are outside the target strata.

1.5 Sensitivity versus Supplementary Analysis

Sensitivity Analysis

Purpose: Sensitivity analysis is performed to assess the robustness of the conclusions derived from the main analysis of a clinical trial. It involves testing how the main findings hold up under various assumptions or variations in the analysis model.
Process: This type of analysis typically involves using multiple sensitivity estimators that deviate in specific ways from the main estimator’s assumptions. For instance, sensitivity analyses may involve changing assumptions about the distribution of the data, the model used, or handling of missing data.
Objective: The goal is to explore the extent to which the main findings are dependent on the assumptions made in the statistical modeling. This is crucial for verifying that the conclusions are not unduly influenced by these assumptions and therefore can be considered reliable under a variety of scenarios.

Supplementary Analysis

Purpose: Supplementary analysis goes beyond sensitivity analysis to further explore and understand the treatment effects. These analyses are usually more exploratory in nature and are often conducted to address additional research questions or hypotheses that were not the primary focus of the main analysis.
Process: This could include additional analyses as requested by regulatory authorities, or analyses planned after reviewing initial findings to probe deeper into specific areas of interest. Supplementary analyses may investigate different subgroups of patients, additional endpoints, or longer-term outcomes that were not part of the original estimand.
Objective: The main aim is to provide a broader understanding of the data and treatment effects. This might involve confirming the findings from the main analysis, exploring areas where the main analysis was inconclusive, or generating new insights that could lead to further research questions.

Key Differences and Interplay

Scope: Sensitivity analysis is more focused on testing the stability and reliability of the results under different assumptions directly linked to the main outcome of interest defined by the estimand. In contrast, supplementary analysis often has a broader scope, potentially addressing new or secondary questions that extend beyond the original estimand.
Outcome Dependency: Sensitivity analyses are inherently tied to the outcomes of the main estimator and focus on the dependencies and variabilities around these outcomes. Supplementary analyses, however, might explore entirely new outcomes or expand on the findings of the main analysis in ways that provide additional context or insights.
Regulatory Impact: Sensitivity analyses are critical for regulatory review, providing evidence that the study findings are robust and not unduly influenced by specific assumptions. Supplementary analyses, while informative, may not always be crucial for regulatory approval but can be important for labeling, post-marketing commitments, or future clinical development strategies.

1.6 Disease Specific Guideline

European Medicines Agency (EMA) and the U.S. Food and Drug Administration (FDA) include estimands in their specific disease guidance documents

EMA Guidance:
- Diseases Covered: The EMA has incorporated the use of estimands in the guidelines for several conditions and areas, including:
  - Diabetes
  - Alzheimer’s disease
  - Acute kidney injury
  - Chronic non-infectious liver diseases
  - Epileptic disorders
  - Medicinal products with genetically modified cells
  - Registry-based studies
FDA Guidance:
- Diseases Covered: The FDA also specifies the use of estimands in their guidance for diseases such as:
  - Eosinophilic esophagitis
  - Acute myeloid leukemia
  - Chronic rhinosinusitis with nasal polyps

2 Case Study Continuous Endocrine Condition

2.1 Study Details

Case Study Overview:

This case study provides guidance on the definition and analysis of primary and key secondary estimands, choice of sensitivity analyses, and handling of missing data. It focuses on a Phase 3 randomized, placebo-controlled study in patients with an endocrine condition. The study spans a 24-week period and includes two continuous endpoints.

Study Design: - Study Type: Phase 3 randomized, placebo-controlled study - Population: Patients with an endocrine condition - Endpoints: Two continuous endpoints measured over a 24-week period - Co-primary endpoint 1: Blood pressure control (change in average systolic blood pressure, SBP) - Co-primary endpoint 2: Glycemic control (change in HbA1c)

Note: Estimands are only defined for primary and key secondary endpoints.

Primary Objectives: The primary objective of the study is to evaluate the effect of the test drug (Drug X) compared to placebo in terms of: 1. Blood Pressure Control: Patients with uncontrolled hypertension 2. Glycemic Control: Patients with uncontrolled blood sugar

Co-primary Endpoints:

Estimand 1: Blood Pressure Control
- Treatment: Drug X vs. Placebo
- Target Population: Patients with the endocrine disorder who have uncontrolled hypertension.
- Variable: Change in average systolic blood pressure (SBP) based on 24-hour ABPM from Baseline to Week 24.
Intercurrent Events and Strategy:
- Treatment discontinuation due to lack of efficacy: Handled using the treatment policy strategy.
- Treatment discontinuation due to safety concerns: Handled using the treatment policy strategy.
- Treatment discontinuation due to other reasons: Handled using the treatment policy strategy.
- Use of rescue medication for blood pressure control: Handled using the treatment policy strategy.
- No dose of study treatment received: Handled using the treatment policy strategy.
- Death: Handled using the “while alive” strategy.
Population Level Summary:
- Analysis Method: Difference in Least Square Means (LSM) of change from Baseline to Week 24 in average SBP based on 24-hour ABPM between Drug X and Placebo.
Text Summary: The study will evaluate the effect of Drug X on blood pressure control compared to Placebo. This evaluation will consider participants while they are alive, regardless of treatment discontinuation due to lack of efficacy or safety, failure to receive a dose of study treatment, or the use of rescue medication for blood pressure control.

Estimand 2: Glycemic Control
- Treatment: Drug X vs. Placebo (as above)
- Target Population: Patients with the endocrine disorder who have uncontrolled blood sugar.
- Variable: Change in HbA1c from Baseline to Week 24.
Intercurrent Events and Strategy:
- Treatment discontinuation due to lack of efficacy: Handled using the treatment policy strategy.
- Treatment discontinuation due to safety concerns: Handled using the treatment policy strategy.
- Treatment discontinuation due to other reasons: Handled using the treatment policy strategy.
- Use of rescue medication for glycemic control: Handled using the treatment policy strategy.
- No dose of study treatment received: Handled using the treatment policy strategy.
- Death: Handled using the “while alive” strategy.
Population Level Summary:
- Analysis Method: Difference in Least Square Means (LSM) of change from Baseline to Week 24 in HbA1c between Drug X and Placebo.
Text Summary: The study will evaluate the effect of Drug X on glycemic control compared to Placebo. This evaluation will consider participants while they are alive, regardless of treatment discontinuation due to lack of efficacy or safety, failure to receive a dose of study treatment, or the use of rescue medication for glycemic control.

2.2 Other Points Considered

Is this related to the eligibility criteria, the target product profile or the analysis population?

The target population in clinical trials is a key concept, often tied to the study’s eligibility criteria, the target product profile, and the analysis population. According to regulatory guidelines such as the ICH E9 addendum, the target population is generally described as the study population. However, defining the target population for an estimand based solely on eligibility criteria can sometimes present difficulties.

For example, in this case study, the target population is defined as patients with the condition of interest (an endocrine disorder) along with associated health issues (uncontrolled hypertension or uncontrolled blood sugar). In some cases, it might be important to include additional details, such as belonging to a specific subgroup that would be relevant to the analysis and to which the study results would directly apply.

It’s important to note that the target population should not be defined in terms of the analysis population. Instead, the analysis population should be defined based on the target population, and the estimands should be established accordingly.

Handling Participants Who Were Randomized but Did Not Meet Eligibility Criteria:

Regulators typically expect that participants who were randomized, even if they did not meet all eligibility criteria, be included in the analysis population. This inclusion is particularly important when the study uses a treatment policy strategy, which implies that the effects of the treatment should be evaluated across all patients randomized, regardless of deviations from protocol or treatment adherence. This strategy assumes that all randomized participants at least met the basic condition requirements, such as having the endocrine disorder and either uncontrolled hypertension or uncontrolled blood sugar.

In certain circumstances, exclusion of ineligible participants from the analysis population may be warranted based on specific scientific questions. However, such exclusions must be clearly justified, and regulatory authorities should be consulted for agreement on the approach. For this specific case study, there is an additional risk of randomization errors since the trial involves two patient groups that may overlap. This overlap should be considered during the protocol’s risk assessment, and appropriate mitigation steps should be planned to handle potential issues.

Handling Participants Who Did Not Receive Study Treatment

Similarly, regulators generally expect that participants who were randomized but did not receive study treatment are still included in the analysis population, particularly when the treatment policy strategy is applied. The treatment policy strategy considers the treatment effect for all randomized participants, regardless of whether they completed the intervention according to protocol. In cases where excluding such participants is scientifically necessary, the justification should be clearly stated, and agreement sought with the regulators.

In summary, the target population is critical for defining the scope and relevance of a clinical study. It is important to align the analysis population with the estimand definition based on the study’s goals and regulatory expectations. Both the inclusion of ineligible or untreated participants in the analysis population and any deviations from this general rule need to be justified within the study’s framework and properly discussed with regulatory authorities.

Data Collection to Align with Intercurrent Event (ICE) Strategy

In clinical trials, it is crucial to collect adequate and accurate data regarding Intercurrent Events (ICEs) to properly implement the predefined strategies for handling such events. These events, which occur after treatment initiation and may influence the interpretation of treatment effects, need to be well-documented so that they can be incorporated into the analysis according to the chosen strategy (e.g., treatment policy strategy). Below are the key aspects to consider:

What data do we need to collect regarding the ICE?

To align data collection with the ICE strategy, the following information is essential:

Identification of ICE:
- Clearly document whether an ICE has occurred, based on the protocol definition. ICEs may include treatment discontinuation due to lack of efficacy, adverse events, or the use of rescue medication.
- Investigators should have a comprehensive list of potential ICEs that could occur during the trial and precise criteria for identifying these events.
Detailed Documentation:
- Record the specific type of ICE (e.g., treatment discontinuation due to lack of efficacy, safety concerns, or the use of rescue medication).
- Collect the date and time of the ICE occurrence to determine when the event took place in relation to the study timeline.
- Document the reason for the ICE, including whether it was due to clinical judgment, participant choice, or another cause.
Contextual Information:
- Gather information about the participant’s status at the time of the ICE (e.g., clinical signs, symptoms, or any ongoing adverse events).
- If relevant, capture information on any actions taken as a result of the ICE (e.g., whether rescue medication was administered or alternative treatments were introduced).

Can Lack of Efficacy Be Determined?

To determine lack of efficacy, investigators must collect data that reflects treatment outcomes in relation to the primary endpoints (e.g., blood pressure or HbA1c levels). The key elements to capture include:

Treatment Response Data: Track whether the patient is responding to the treatment as expected over time. This includes regular measurements of the primary endpoints (e.g., blood pressure and glycemic control).
Follow-up Assessments: Continue collecting endpoint data even after an ICE, especially if the treatment policy strategy is in place, to assess whether lack of efficacy might be contributing to the event.
Clinical Judgment: Ensure that investigators record any clinical decisions or observations that indicate lack of efficacy (e.g., worsening of the condition, or failure to achieve desired treatment thresholds).

Operational and Data Collection Issues for the Treatment Policy Strategy

When applying the treatment policy strategy, there are several operational and data collection challenges to consider:

Continued Data Collection After ICE:
- Protocol Clarity: It should be explicitly stated in the protocol that data collection must continue even after an ICE occurs. This is crucial for trials using the treatment policy strategy, which evaluates treatment effects regardless of intercurrent events.
- Investigator Awareness: Investigators need to understand that post-ICE data collection is necessary and that stopping data collection after an ICE would violate the treatment policy strategy’s intent.
Data Collection Burden:
- Ensure that post-ICE follow-up assessments are practical for both investigators and participants. There may be logistical issues if participants discontinue the trial drug or shift to other treatments.
- Systems should be in place to ensure that participants remain engaged and continue to provide outcome data, even if they are no longer receiving the study treatment.
Maintaining Data Integrity:
- Consistency: Data collection tools should be designed to ensure consistency in how ICEs and subsequent events are recorded.
- Data Completeness: Investigators should be encouraged to collect complete data sets following an ICE, including all relevant outcomes, adverse events, and clinical interventions.
Adjusting Analysis Plans:
- ICEs introduce complexities in data analysis, particularly in treatment policy strategies. Proper documentation will allow the correct handling of such events during statistical analysis, ensuring that the estimands reflect the treatment effect despite the ICE.

ICE of Death

When considering death as an Intercurrent Event (ICE), even in studies where the number of deaths is anticipated to be small, it is essential to determine how these cases will be handled both in the study protocol and in the statistical analysis. Death, in many studies, impacts outcome availability and thus must often be treated as an ICE. However, when deaths are expected to be very rare or unimportant to the primary objectives of the study, there can be a decision not to treat death as an ICE, but this requires careful planning.

Is it Necessary to Consider Death as an ICE?

While death is always a potential occurrence in any study, whether or not to explicitly include it as an ICE depends on the study’s context and design. For indications where only a small number of deaths are anticipated, sponsors may opt not to treat death as an ICE, especially if the focus of the trial is on treatment efficacy rather than survival or safety. However, in these cases, it’s important to establish clear guidelines for how such participants are handled in the analysis. Simply ignoring deaths can lead to biased results if not addressed appropriately.

If death is not considered an ICE, it still impacts data availability (since no further outcome data can be collected for the deceased participants). Therefore, even in situations where death is infrequent, strategies for handling deaths in the analysis must be well thought out.

What Strategies Can Be Considered for Death as an ICE?

Treatment Policy Strategy (Not Applicable): The treatment policy strategy generally assumes that data is collected and considered regardless of intercurrent events. However, in the case of death, this strategy becomes impractical because data cannot be collected post-mortem. As a result, applying the treatment policy strategy is not feasible for handling death as an ICE.
Hypothetical Strategy (Not Recommended): The hypothetical strategy involves asking “what would have happened if the ICE had not occurred?” While this approach can be useful for some types of intercurrent events, it is generally considered irrelevant and not useful for death. In most cases, simulating a hypothetical scenario in which a deceased participant survives does not provide meaningful or practical insights for a clinical study, especially when the death is related to factors beyond the study’s primary objectives.
Composite Endpoint Strategy (Problematic for Continuous Endpoints): In some cases, death is combined with other outcomes in a composite endpoint, where death and other serious events are considered together in one combined metric. However, this approach is problematic for continuous endpoints, such as blood pressure or glycemic control, because it is not feasible to incorporate death into the measurement of these types of outcomes. Continuous endpoints measure a change over time, which cannot be extended beyond the point of death. Defining a composite endpoint in this case is not practical and may lead to statistical complications.
While Alive Strategy (Suggested Approach): The most appropriate strategy when dealing with death as an ICE for continuous endpoints is the while alive strategy. This approach excludes participants from the analysis set following their death, as it is no longer possible to measure their outcomes beyond that point.

The “while alive” strategy essentially censors data after death, ensuring that only data collected while the participant was alive is considered. This strategy avoids the pitfalls of trying to analyze data beyond the point of death and allows for a realistic assessment of the treatment’s effect up to that point.
- Key Considerations for the While Alive Strategy:
  - Exclude participants from analysis following their death while keeping their data prior to death.
  - Ensure proper documentation and explanation in the statistical analysis plan to avoid bias and misinterpretation of the results.
  - Provide transparent reporting on the number of deaths and how they were handled in the analysis to maintain the study’s credibility.

Study Withdrawal

In clinical trials, study withdrawal refers to situations where a participant withdraws from the study entirely, either due to personal choice, administrative reasons, or other non-treatment-related circumstances. According to the ICH E9 addendum, study withdrawal is not considered an Intercurrent Event (ICE). The addendum makes a clear distinction between intercurrent events, which affect the interpretation of the treatment effect, and other events, such as study withdrawal, which result in missing data.

Study Withdrawal vs. Intercurrent Events (ICE)

The ICH E9 addendum emphasizes that intercurrent events are to be handled by defining the estimand in a way that reflects the precise trial objective. These events are closely related to the treatment or study context, such as discontinuation due to lack of efficacy, adverse events, or the use of rescue medication. In contrast, study withdrawal results in missing data but does not directly impact the interpretation of the treatment’s effect because it is not linked to the treatment assigned.

When a participant withdraws from the study: - The outcome of interest remains relevant, but it becomes unobserved due to the participant no longer being part of the study. - The withdrawal is not related to the efficacy or safety of the treatment, so it does not affect the estimand. - Instead, study withdrawal introduces missing data, which should be managed during the statistical analysis, typically using methods for handling missing data such as imputation, last observation carried forward (LOCF), or other appropriate techniques based on the nature of the missing data.

Handling Study Withdrawal in the Analysis

Since study withdrawal leads to missing data, it should be addressed using appropriate statistical techniques. The ICH E9 addendum suggests that while a participant may withdraw from the study, the outcome of interest still exists in principle, even though it is not observed. Thus, the trial should handle this missing data in a way that preserves the integrity of the analysis without introducing bias. Strategies for addressing missing data include: - Multiple Imputation: Estimating the missing outcomes based on observed data to create a complete dataset for analysis. - Mixed Models: Incorporating available data up to the point of withdrawal and accounting for the fact that some data is missing. - Sensitivity Analyses: Testing how different assumptions about the missing data (e.g., assuming the data is missing at random vs. not at random) may influence the study’s conclusions.

Keeping Participants in the Study Despite Treatment Discontinuation

The ICH E9 addendum also stresses that discontinuing study treatment should not result in study withdrawal. Participants who stop the treatment due to personal reasons, adverse effects, or lack of efficacy should continue to be followed and remain in the study. Their data should still be collected and used in the analysis. This is critical for maintaining the integrity of the study and ensuring that the treatment’s effect is accurately assessed, even in those who discontinue treatment.

In cases where study withdrawal is unavoidable, the focus shifts to managing the resulting missing data rather than interpreting it as an ICE that alters the treatment effect.

2.3 Supplementary Estimands

The primary estimand takes a broad perspective by considering the effect of Drug X on blood pressure control across a variety of real-world scenarios (e.g., discontinuation, use of rescue medication). It provides an overall view of how the drug performs in practice, which is highly relevant to regulators and HTA.
Supplementary Estimand 1 narrows the focus by exploring the effect of Drug X without the availability of rescue medications, making it more relevant for healthcare systems with limited resources.
Supplementary Estimand 2 focuses on the ideal use of the treatment (while participants are on the study treatment and without the need for rescue medication), which is of interest to patients and clinicians who want to understand how the drug performs when taken as planned.
Supplementary Estimand 3 evaluates the practical use of the treatment, considering real-world factors such as patient-initiated treatment discontinuation, and is relevant to regulators and clinicians seeking to understand how the drug works in real-world scenarios.

Primary Estimand (Effect of Drug X on Blood Pressure Control)

Research Question and Relevance: The primary estimand focuses on the overall effect of Drug X on blood pressure control in comparison to a placebo. It considers the participant to be alive, regardless of whether they discontinued treatment due to lack of efficacy or safety, failed to receive the study treatment, or required rescue medication for blood pressure control.

Relevance: This approach is particularly relevant to regulators and Health Technology Assessments (HTA), as it provides a broad assessment of how Drug X impacts blood pressure control under real-world conditions, where patients may discontinue the treatment or use rescue medication. It enables the assessment of Drug X’s practical effectiveness in a wide range of circumstances.

Supplementary Estimand 1 (Hypothetical Approach to Rescue Medication Unavailability)

Research Question and Relevance: The supplementary estimand examines the effect of Drug X on blood pressure control under the hypothetical scenario where rescue medication is not available. This differs from the primary estimand, where the availability and use of rescue medication is incorporated into the treatment policy strategy.

Relevance: This approach is relevant to healthcare systems where rescue medication is not available as part of the potential treatment plan. It allows for an analysis of the drug’s effect in settings with limited resources or where the use of additional rescue medications is not part of standard care. By isolating the effect of Drug X without the intervention of rescue medications, this estimand addresses concerns specific to resource-limited environments, which may be of particular interest to healthcare planners and policymakers.

Supplementary Estimand 2 (Effect While on Study Treatment)

Research Question and Relevance: The second supplementary estimand focuses on the effect of Drug X on blood pressure control while the participant is still on the study treatment and without the use of rescue medication for blood pressure control. This estimand excludes data after treatment discontinuation, as it specifically looks at what happens while the treatment is taken as planned.

Relevance: This estimand is particularly relevant when understanding the effect of treatment when taken as prescribed is of interest, which is a key question for both patients and clinicians. By only considering the data from the period when patients were still on the treatment, it gives insight into the efficacy of Drug X when patients adhere to the prescribed regimen. This helps answer questions about the effectiveness of the drug in ideal conditions, which can influence patient adherence strategies and clinical recommendations.

Supplementary Estimand 3 (Practical Use of Drug X in Real-World Settings)

Research Question and Relevance: The third supplementary estimand investigates the effect of Drug X on blood pressure control as compared with placebo, considering treatment discontinuation and rescue medication use. However, it is focused on the assessment of treatment in practical settings, where patients start the treatment voluntarily and make real-world decisions about their treatment adherence.

Relevance: This estimand is relevant for evaluating the treatment effect in practice, particularly in real-world clinical settings where patients may start or stop treatment based on their own decisions or physician recommendations. This perspective is of interest to regulators, HTA, and clinicians because it evaluates how the drug performs when it is initiated and followed in practical scenarios. It aligns with the need to understand how treatments are used in less controlled environments, thus providing valuable data on the practical implementation of the treatment.

2.4 Statistical Analysis

2.4.1 Analysis Populations

The analysis populations for the co-primary endpoints are crucial to ensure that the study’s findings are both valid and relevant to the research objectives. For this study, two co-primary endpoints are being assessed: change in average systolic blood pressure (SBP) and change in HbA1c. The analysis populations for these endpoints are defined as follows:

Change from Baseline in Average SBP:
- The Intent to Treat SBP (ITTSBP) analysis set will be used. This analysis set includes all randomized participants who had uncontrolled blood pressure at the time of study screening, ensuring that all relevant subjects are included in the analysis.
- The analysis will be conducted based on the randomized treatment arm, meaning participants will be analyzed according to the treatment group they were assigned to at the start of the study, regardless of any subsequent events (such as discontinuation or treatment changes).
- This analysis population will be used for both the primary estimand and the supplementary estimands, ensuring consistency across the different strategies being applied to handle intercurrent events (ICEs).
Change in HbA1c:
- Similarly, the analysis population for the co-primary endpoint of change in HbA1c includes all randomized patients with poor glycemic control at baseline. The definition of the analysis population aligns with the endpoint being studied, ensuring that the analysis focuses on participants with the relevant health condition (poor glycemic control).
- Like the SBP analysis, the randomized treatment arm will be the basis for the analysis, maintaining consistency with the intent-to-treat principle.

Supplementary Estimands 2 and 3

For supplementary estimands 2 and 3, it is acknowledged that data from some subjects may be excluded from the analysis. This is primarily due to the different ICE handling strategies applied in these estimands. For example:

In Supplementary Estimand 2, the focus is on participants while they are on the study treatment. This means that once a participant discontinues the treatment, their data may no longer be included in the analysis beyond that point.
In Supplementary Estimand 3, the aim is to evaluate the practical use of the treatment, which may exclude certain participants based on their adherence to the treatment regimen.

However, since the ICE strategies for these supplementary estimands are clearly defined (e.g., “while on study treatment” or “while treated”), there is no need to define additional analysis sets. The analysis populations remain consistent, and the exclusion of some data based on these strategies is a natural consequence of applying the specified handling methods for intercurrent events.

2.4.2 Exploring Missing Data

In any clinical trial, missing data can present a challenge in the interpretation of the results. While it’s not always possible to directly test for the underlying mechanisms causing missing data, it is essential to explore missing data patterns as part of the statistical analysis. Understanding these patterns helps to interpret the results more accurately and determine the robustness of the conclusions drawn from the study.

For this study, the exploration of missing data should be approached systematically, and at a minimum, the following steps should be taken:

Exploration of the Missing Data Mechanism Using Descriptive Approaches

The first step in understanding missing data is to explore why data may be missing. Although we can’t definitively test for the mechanism behind the missingness (e.g., Missing Completely at Random (MCAR), Missing at Random (MAR), or Missing Not at Random (MNAR)), we can use descriptive statistics to get an idea of potential patterns. This could include: - Summarizing the amount of missing data for each variable, such as co-primary endpoints (e.g., SBP or HbA1c). - Looking at the distribution of missing data across treatment groups to determine if one group has more missing data than the other, which could signal a pattern related to the treatment. - Descriptive statistics like the percentage of missing data for each visit or time point to identify trends (e.g., more missing data in later visits).

These exploratory analyses help to hypothesize whether the data might be missing due to participant dropout, protocol deviations, or other reasons that could be linked to the treatment or study procedures.

Tables of Missing Data for Primary and Key Secondary Endpoints by Visit

Creating tables of missing data for the primary and key secondary endpoints, broken down by visit, provides a clear visualization of where and when the missing data occurred. These tables should: - List the number and percentage of missing data points for each visit. - Break down the missing data by treatment group, as differences between groups may point to treatment-related causes of missingness. - Show the timing of missing data to see if it is concentrated around certain visits (e.g., near the end of the study) or is more random throughout the study.

These tables are a straightforward way to assess whether the pattern of missing data is consistent with expectations or if it might be biased toward certain visits or treatment arms.

Lists of Intercurrent Events by Visit

Since intercurrent events (ICEs) are a key factor in this study, it is important to generate lists of ICEs by visit, especially for cases where ICEs might lead to missing data. These lists will: - Show the types and frequencies of ICEs (e.g., treatment discontinuation, use of rescue medication, death) at each visit. - Highlight any links between ICEs and missing data, for example, if participants who experience an ICE are more likely to miss subsequent data collection points. - Differentiate between ICEs handled according to the treatment policy strategy and those handled by other strategies (e.g., hypothetical, while alive), as different strategies might affect how missing data is treated in the analysis.

By reviewing these intercurrent events by visit, the analysis can identify where the study’s outcomes may be influenced by external events, helping to interpret the overall results more effectively.

2.4.3 Primary Efficacy Analysis

The primary efficacy analysis for the co-primary endpoint of change in average systolic blood pressure (SBP) from Baseline to Week 24 will be performed using an analysis of covariance (ANCOVA). This statistical method is suitable for comparing the treatment effects on blood pressure changes while controlling for baseline differences and other covariates.

Handling of Missing Data

For missing data related to the primary endpoint caused by intercurrent events (ICEs) such as: - Treatment discontinuation due to lack of efficacy or safety, - Use of rescue medication, or - Not receiving a dose of study treatment,

the analysis will use multiple imputation with a retrieved dropout approach, as described by Wang and Hu. This approach aims to account for the missing data while maintaining the validity of the statistical inference about the treatment effects.

Wang, S., Hu, H. Impute the missing data using retrieved dropouts. BMC Med Res Methodol 2022, 22: 82. https://doi.org/10.1186/s12874-022-01509-9

Retrieved Dropout Subgroups

Two retrieved dropout subgroups will be defined to handle missing data:

Discontinued Treatment Retrieved Dropout Subgroup:
- This subgroup will include all participants who have discontinued treatment but still have observations at the time of the endpoint (Week 24). These participants’ data will be used to estimate the missing data following treatment discontinuation.
Rescue Medication Retrieved Dropout Subgroup:
- This subgroup includes all participants who have received rescue medication and have observations at the time of the endpoint. The data from these participants will be used to impute missing values following the use of rescue medication.

Imputation Procedure

Imputation Frequency: Missing values will be imputed 1,000 times using a multiple imputation procedure. This creates multiple complete datasets that account for the uncertainty associated with the missing data.
Separate Models for Each Treatment Group: The imputation process will be performed separately for each treatment group to reflect potential differences in the missing data mechanism between the treatment arms.
Linear Model for Imputation:
- The imputation model will use a linear regression approach.
- Covariates for the imputation model will include:
  - Baseline mean SBP, which controls for participants’ initial blood pressure values,
  - Last on-treatment visit mean SBP, which represents the most recent observed blood pressure before the treatment discontinuation or rescue medication was administered.

This imputation strategy helps to preserve the study’s statistical power by utilizing the available data from participants who either discontinued treatment or used rescue medication, while appropriately accounting for the missing data in a way that minimizes bias.

2.4.4 Missing Data Not Due to Intercurrent Events (ICE)

For missing data that is not due to an Intercurrent Event (ICE), it is assumed to be Missing at Random (MAR). This means that the probability of the data being missing is related to observed data but not the unobserved values themselves.

To handle this type of missing data, the study will use Markov Chain Monte Carlo (MCMC) methods to impute both: - Non-monotone missing data: Data where missingness occurs at random points, without a clear pattern. - Monotone missing data: Data where missingness occurs in a sequential manner (e.g., if a participant misses multiple follow-up visits after initially completing a few visits).

The MCMC approach is well-suited for multiple imputation in these scenarios, as it models the joint distribution of the data and generates imputations based on this model, filling in the missing values while maintaining the relationships in the observed data.

2.4.5 Sensitivity Analyses

To assess the robustness of the analysis to different assumptions about the missing data, several sensitivity analyses will be conducted. These analyses provide insights into how deviations from the missing at random assumption might affect the study results.

Imputation Based on the MAR Assumption (Using MCMC)

As part of the sensitivity analyses, the study will use multiple imputation methods based on the Missing at Random (MAR) assumption, with MCMC to handle the imputation of missing data. This analysis will evaluate the treatment effect under the assumption that the missing data is related to the observed data, but not to unobserved outcomes. This approach helps assess how much the results depend on the MAR assumption.

Tipping Point Analysis Using the Delta Adjustment Approach

The tipping point analysis will be used to evaluate how sensitive the results are to deviations from the MAR assumption. The delta adjustment approach will be employed for this purpose: - The delta adjustment introduces a systematic shift (or “delta”) to the imputed values to simulate Missing Not at Random (MNAR) scenarios. This allows for the exploration of how much deviation from the MAR assumption would be needed to change the overall conclusions of the analysis. - By adjusting the imputed data in this way, the tipping point analysis provides a range of potential outcomes under different missing data scenarios. It identifies whether there is a “tipping point” at which the study’s conclusions change as the assumptions about the missing data deviate from MAR.

This combination of multiple imputation methods and sensitivity analyses ensures that the treatment effect estimates are robust and account for potential biases introduced by missing data. If the results remain consistent across different imputation strategies, confidence in the study’s findings is reinforced.

2.4.6 Supplementary Analyses

While supplementary analyses are typically not detailed in the Clinical Study Protocol (CSP), they are instead defined in the Statistical Analysis Plan (SAP), which offers more in-depth guidance on how these analyses will be conducted. The supplementary estimands differ from the primary estimand in their handling of intercurrent events (ICEs), reflecting different assumptions about how ICEs, such as rescue medication use and treatment discontinuation, influence the analysis.

Each supplementary estimand adjusts the analysis based on different assumptions regarding the treatment effect and how intercurrent events (particularly rescue medication use and not receiving a dose of study treatment) should be handled. These adjustments allow for a more nuanced understanding of how Drug X performs under various conditions:

Supplementary Estimand 1 uses a hypothetical strategy to estimate the treatment effect assuming no rescue medication is available, and imputes missing data based on different approaches for the intervention and control groups.
Supplementary Estimand 2 applies a while on study treatment strategy, where participants are excluded from the analysis after using rescue medication or not receiving a dose of study treatment.
Supplementary Estimand 3 focuses specifically on excluding participants from the analysis after they fail to receive a dose of the study treatment, using a similar while on study treatment approach.

These supplementary analyses provide complementary perspectives to the primary analysis, exploring the robustness of the treatment effect under different real-world scenarios.

Supplementary Estimand 1

This estimand differs from the primary estimand in the ICE strategy for rescue medication. The key difference lies in the adoption of a hypothetical approach: - Hypothetical Strategy for Rescue Medication: This approach assumes that rescue medication is not available for the intervention group. As a result, the treatment effect is estimated under the assumption that participants do not have access to rescue medication. - Handling of Missing Data: All data following the use of rescue medication will be considered missing. - For the intervention group, a reference-based multiple imputation method will be used, which assumes that after the ICE, the trajectory of the participant’s data will reflect that of a reference group (e.g., control group). - In the control group, missing data will be imputed using the MAR (Missing at Random) assumption. This reflects the assumption that the missing data is related to the observed data but not to the unobserved outcomes themselves.

Supplementary Estimand 2

This estimand differs from the primary estimand by the use of a “while on study treatment” strategy for two ICEs: rescue medication use and the participant not receiving a dose of the study treatment. - While on Study Treatment for Rescue Medication and No Dose Received: This strategy means that participants who either received rescue medication or never received a dose of the study treatment will be excluded from the analysis after these events occur. - Specifically, the data from timepoints following the ICE will not be included in the analysis for those participants, as their experiences are considered irrelevant to the effect of the treatment under study in this particular estimand.

Supplementary Estimand 3

This estimand differs from the primary estimand by using the “while on study treatment” approach for the ICE of not receiving a dose of the study treatment. - While on Study Treatment for No Dose of Study Treatment: In this case, participants who did not receive a dose of the study treatment will be excluded from the analysis for any timepoints following the ICE. - The rationale is to assess the treatment effect only during the period in which participants are actually receiving the treatment, excluding any data that comes after the treatment regimen was interrupted or never initiated.

2.5 ANCOVA VS MMRM

The shift in regulatory preference from Mixed Model for Repeated Measures (MMRM) to Analysis of Covariance (ANCOVA) for handling missing data, particularly in studies with intercurrent events (ICEs), represents an important change in how missing data is managed in clinical trials.

While MMRM was previously the preferred model for longitudinal data analysis, regulators now favor ANCOVA for handling missing data and ICEs, particularly in studies where participants may be excluded following an ICE. ANCOVA allows for more flexibility in missing data handling, especially when using whilst on treatment or whilst alive strategies, and ensures consistency across sensitivity analyses. Nonetheless, MMRM remains a viable option when there are no ICEs and the study aims to analyze the trend over time with proper covariance structures in place.

2.5.1 Mixed Model for Repeated Measures (MMRM)

Previously, the MMRM approach was commonly used for the primary analysis of longitudinal studies like this one, as it handles repeated measures and accounts for correlations between time points within individuals. However, regulators have raised concerns about its use, particularly in the context of intercurrent events (ICEs) and missing data assumptions. Specifically:

MMRM under MAR (Missing at Random) assumes that missing data is unrelated to unobserved outcomes unless explicitly handled through imputation prior to analysis. In the presence of ICEs, this assumption can be problematic because it implies that participants who discontinue treatment or experience an ICE would have continued as expected, which is not always realistic.
If a whilst on treatment or whilst alive strategy is used to handle ICEs, MMRM implicitly imputes missing values after the ICE. This means that participants who should be excluded following an ICE are instead included in the analysis beyond that point. This can lead to biased estimates of the treatment effect because the model assumes that all participants, including those experiencing an ICE, would continue similarly to those who did not experience an ICE.
MMRM with MAR estimates the mean treatment effect as if participants would have continued treatment just like others in the same arm, effectively creating a hypothetical scenario. This approach may not align with real-world treatment discontinuation patterns, especially when other ICE-handling strategies (e.g., whilst on treatment or whilst alive) are being used.

2.5.2 ANCOVA Model for Each Timepoint

Given these limitations, the ANCOVA model has become the preferred approach by both the European Medicines Agency (EMA) and increasingly the FDA. This model offers several advantages: 1. Separate Timepoint Analysis: The ANCOVA model can analyze data separately at each timepoint, allowing more flexibility in handling missing data and excluding participants after an ICE without implicitly imputing values, which can happen with MMRM.

Handling Missing Data After ICEs: ANCOVA allows for participants to be excluded from the analysis after an ICE (such as treatment discontinuation or death), aligning with strategies like whilst on treatment or whilst alive. This ensures that post-ICE data is not included for individuals who should no longer be contributing to the treatment effect estimation.
Consistency with Sensitivity Analyses: Using ANCOVA for the primary analysis also allows for sensitivity analyses to be conducted with the same statistical model. This consistency is beneficial when evaluating different assumptions about missing data (e.g., Missing at Random (MAR) versus Missing Not at Random (MNAR)).

2.5.3 When MMRM Can Still Be Used

Limitation: Loss of Correlation Between Timepoints

One limitation of the ANCOVA approach is that it does not account for the correlation between time points within individuals. This is important because the repeated measures design of the study means that an individual’s measurements at different time points are likely to be correlated.
In contrast, MMRM uses an unstructured covariance matrix to account for within-subject correlations, as long as the model converges. However, if the goal of the study is not to estimate a trend over time, but rather the treatment effect at specific timepoints (such as at the end of the study), ANCOVA remains a suitable and simpler alternative.

MMRM is still considered a valid approach under certain conditions:

No ICEs: If the study does not include any ICEs that are being handled with a whilst on treatment or whilst alive strategy, MMRM may still be appropriate. For example, when there are no treatment discontinuations or other ICEs of concern, MMRM can be used effectively, especially when imputing missing data under the MNAR assumption.
Convergence: If the model can converge with the appropriate unstructured covariance matrix, MMRM can still provide reliable estimates of within-subject correlations, particularly when the treatment effect over time is of interest.

2.5.4 Primary Efficacy Analysis and Missing Data Handling

The primary efficacy analysis should explicitly detail the approach for handling missing data due to ICEs as well as other types of missing data. Key considerations include: 1. ICE Handling: Clear strategies should be specified for how to handle data following key ICEs (such as treatment discontinuation or the use of rescue medication), particularly when using a whilst on treatment or whilst alive strategy.

Imputation of Missing Data: Missing data due to reasons unrelated to ICEs should be handled using appropriate multiple imputation techniques, such as MCMC, under the assumption that data is MAR or MNAR, depending on the scenario.
Consistency Across Sensitivity Analyses: Sensitivity analyses should be performed to assess the robustness of the primary analysis, testing different assumptions about the missing data mechanism (e.g., MAR vs. MNAR). Using the same ANCOVA model for both the primary and sensitivity analyses ensures that results are consistent and interpretable across different analyses.

3 Case Study: ACTG 175 HIV Trial

3.1 Study Details

The ACTG 175 study was a pivotal clinical trial conducted in the early 1990s to evaluate the efficacy of different antiretroviral therapies in HIV-infected adults. This randomized, double-blind trial aimed to compare the outcomes of nucleoside monotherapy versus combination therapy among patients who had CD4 T-lymphocyte cell counts ranging from 200 to 500 per cubic millimeter.

At the time, zidovudine (ZID), a type of antiretroviral medication, was known to improve survival rates and reduce the incidence of opportunistic infections and disease progression in patients with advanced HIV type 1. However, the effectiveness of ZID was observed to wane over prolonged use. Consequently, combination therapies incorporating ZID with other antiretroviral agents began to show promise, offering more durable benefits in patients with more advanced stages of HIV.

The ACTG 175 trial focused on patients with less advanced HIV disease, specifically those who had CD4 counts within the specified range and had not experienced any AIDS-defining illnesses other than minimal mucocutaneous Kaposi’s sarcoma. Eligible participants also required a Karnofsky performance score of at least 70 and had to meet certain laboratory criteria. The study targeted important clinical outcomes such as a decline of 50% or more in CD4 cell count, progression to AIDS, and mortality, using these as composite time-to-event endpoints.

Aspect	Details
Study Design	Randomized, double-blind trial comparing nucleoside monotherapy versus combination therapy in HIV-infected adults.
Objective	To evaluate the efficacy of different antiretroviral therapies, particularly in patients with less advanced HIV disease.
Participants	2,467 HIV-1 infected adults with CD4 T-lymphocyte cell counts from 200 to 500 per cubic millimeter. No history of AIDS-defining illness other than minimal mucocutaneous Kaposi’s sarcoma. Karnofsky performance score of at least 70.
Treatment Regimens	Four daily regimens: 600 mg of ZID, 600 mg of ZID + 400 mg of DID (ZID+DID), 600 mg of ZID + 2.25 mg of ZAL (ZID+ZAL), and 400 mg of DID alone.
Randomization	Stratified by the length of any prior antiretroviral therapy.
Rescue Therapy	Offered to patients who experienced a significant decline in CD4 count or an AIDS-related event, maintaining treatment blindness.
Follow-Up Duration	Minimum of 24 months for the last patient enrolled, with CD4 counts monitored at baseline, week 8, and every 12 weeks thereafter.
Primary Endpoint	Time to the first occurrence of a ≥50% decline in CD4 cell count, an AIDS event, or death.
Secondary Endpoints	Time to the first occurrence of an AIDS event or death. Also, descriptive summaries of mean changes from baseline in CD4 counts over time.
Methodological Significance	Illustrates various clinical trial concepts such as the use of alternative estimands and analysis strategies.
Comparative Analysis Focus	Comparison of the efficacy of ZID monotherapy against a combined group (ZID+DID and ZID+ZAL) in subsequent analyses.

Below is a summarized table of the data on premature treatment discontinuation and rescue treatment from the ACTG 175 HIV trial:

Category	Monotherapy Group (ZID)	Combination Therapy Group	Notes
Total Rate of Premature Discontinuation	62%	55%	-
Discontinuation Due to Toxicity	11%	15%	Higher rates of discontinuation due to toxicity were observed in the combination therapy group.
Discontinuation Due to Lack of Efficacy	3%	2%	-
Other Reasons for Discontinuation	-	-	Includes low-grade toxic reactions, declining CD4 cell counts, desire to seek other therapies, and study burden.
Received Rescue Treatment	30%	17%	Rescue treatment was more commonly initiated in the monotherapy group.
Rate of Rescue Before 24 Months	21%	10%	-
Met Rescue Criteria but Did Not Initiate	8%	7%	These patients opted to discontinue the study treatment instead of initiating rescue.
Loss to Follow-Up Rate	22%	22%	Loss to follow-up was identical in both treatment groups.
Mortality Rates (Divergence After 2 Years)	Similar	Similar	Mortality rates were comparable during the first 2 years post-randomization, diverging thereafter.

3.2 Estimands for the Continuous Endpoint of Change from Baseline in CD4

The primary objective of this estimand is to guide clinical decision-making by assessing the relative benefits of starting HIV treatment with either a monotherapy or a combination therapy, and allowing for the possibility of switching from monotherapy to combination therapy if the initial response is unsatisfactory.

ICEs refer to events that could potentially affect the treatment during the study period, such as:

Adherence to the randomized treatment, with variations like partial non-compliance still considered as adherence if the initially randomized treatment continues for 24 months.
Treatment changes like switching to rescue therapy, stopping treatment, or switching to a non-study therapy.
Patient death during the study.

Estimand 1

Treatment Policy and Composite Strategy Estimand for Median Change from Baseline to Month 24 in CD4 Count

Section	Description
1b. Define Objective	The main objective is to inform clinical decision-making on selecting a treatment strategy in adult patients with less advanced HIV disease, based on changes from baseline in CD4 counts at 24 months post-treatment initiation. The intent is to evaluate the benefits of starting with a combination therapy versus a monotherapy, with the option of switching to a combination therapy if predefined criteria for unsatisfactory clinical response are met.
2a. Identify Possible ICEs	Treatment regimens that may occur in this trial are summarized, with or without planning. Scenarios include: adherence to randomized treatment (Scenario 1), switching to protocol-designed rescue therapy (Scenario 2), switching to no treatment (Scenario 3), or an alternative therapy (Scenario 4). Other antiretroviral agents are prohibited, and patients may die during the study.
2b. Define Treatment Regimen	The intended treatment regimen includes the initially randomized treatment (ZID monotherapy or combination therapy with DID or ZAL) and any subsequent changes such as early discontinuation, switch to another non-study treatment, or rescue from monotherapy to combination therapy over 24 months post-randomization.
2c. Define Estimand	The estimand includes four elements: (a) treatment effect estimated for all randomized patients, (b) efficacy measured by change from baseline to Month 24 in CD4 counts, (c) handling of all ICEs including premature discontinuation, switches, or initiation of rescue therapy, with death treated as a treatment failure, and (d) comparison of changes from baseline to Month 24 in CD4 counts using the median change difference between groups.
3a. Data Needed for Estimand	CD4 count data observed at Month 24 post-randomization, regardless of treatment changes (except death), including survival status and date of death.
4a. Main Estimator	A nonparametric rank-based approach handling continuous values of change from baseline to Month 24 in CD4 counts. This includes treating deaths with the worst rank and using rank-based ANCOVA adjusted for baseline CD4 counts and prior antiretroviral therapy.
4b. Missing Data Assumption	Missing data due to patient withdrawal are expected. Missing outcomes are imputed assuming a distribution similar to patients who discontinued treatment but remained in the study, conditioned on baseline characteristics and post-baseline CD4 counts.
4c. Sensitivity Estimators	Sensitivity analysis to assess the robustness of conclusions to missing data assumptions by assigning worse ranks to patients with missing CD4 values at Month 24 compared to those remaining. The “win ratio” is calculated under the main and sensitivity estimator rankings to summarize treatment effects.

Estimand 2

Treatment Policy and Principal Stratification Strategy Estimand for Median Change from Baseline to Month 24 in CD4 Count

Section	Description
1b. Define Objective	The main objective is to inform clinical decision-making for treatment strategy selection in adult patients with less advanced HIV disease. The focus is on evaluating the benefits of following a combination therapy for 24 months versus a monotherapy for 24 months in patients who would survive without requiring rescue. The aim is to determine if combination therapy offers any advantage in improving CD4 counts among patients who respond well to both therapies, based on criteria used for rescue and survival.
2a. Identify Possible ICEs	The intercurrent events (ICEs) expected are the same as those described in Estimand 1. These include adherence to the randomized treatment, switching to rescue therapy, switching to no treatment, switching to an alternative therapy, or death during the study.
2b. Define Treatment Regimen	The treatment regimen intended for evaluation is the initially randomized treatment (ZID monotherapy or combination therapy with either DID or zalcitabine, ZID + [DID or ZAL]) over 24 months without any rescue intervention.
2c. Define Estimand	The estimand includes four elements: (a) treatment effect estimated for the population of HIV-infected adult patients who would survive 24 months without requiring rescue, per the protocol-defined criteria, (b) efficacy measured by the change from baseline to Month 24 in CD4 counts, (c) handling of ICEs of death and initiation of the protocol-defined rescue with a principal stratification strategy, while other ICEs are managed using a treatment policy strategy, and (d) comparison of median changes from baseline to Month 24 in CD4 counts between the treatment groups.
3a. Data Useful for Estimand	Data includes date of death, date when rescue criteria are met, censoring dates for these events for all randomized patients, baseline patient and disease characteristics predictive of outcomes, and CD4 counts at Month 24 for patients who survived to Month 24 without rescue.
4a. Main Estimator	An estimator models the membership of patients in the principal stratum of interest, defined by whether an ICE of death or meeting rescue criteria occurs before Month 24 postrandomization. This involves defining strata based on ICE occurrences and assessing treatment effects within these strata. The model uses binary indicators (S Z) where Z = 0 for monotherapy and Z = 1 for combination therapy to denote whether such ICEs occur.

Estimand 2 for the ACTG 175 HIV trial utilizes a sophisticated methodological approach to evaluate the impact of treatment regimens on CD4 counts at 24 months in patients who do not require rescue treatment. This analysis is framed around the principal stratification strategy, focusing on patients who survive without rescue through Month 24 on any treatment, abbreviated as “SWoR–any.”

The primary objective of this estimand is to inform clinical decision-making by comparing the effects of combination therapy versus monotherapy over a 24-month period in patients who are expected to survive without needing rescue intervention. This approach aims to discern whether combination therapy offers any significant advantage over monotherapy in terms of improving CD4 counts, thereby helping to minimize unnecessary treatments.

Identification and Handling of ICEs

The intercurrent events (ICEs), such as adherence to treatment, changes to rescue therapy, switching to alternative treatments, or death, are the same as those in Estimand 1. The principal focus here is on patients who can maintain their health without the need for rescue, judged by the same criteria used to determine rescue requirements. The ICEs of death and the initiation of protocol-defined rescue before Month 24 are managed through a principal stratification strategy, while other ICEs are handled using a treatment policy strategy.

Estimation of Treatment Effect

To accurately model and impute the principal stratum membership, a Bayesian imputation model is employed. This model utilizes observed data to predict the status of each patient regarding their ability to survive without rescue under each treatment modality. This involves multiple imputation steps to handle the missing data points effectively, ensuring each patient’s missing status is accurately imputed.

The membership in the “SWoR–any” stratum is determined based on whether a patient, regardless of treatment, could have survived without rescue. For instance, for a patient under monotherapy (Z = 0), their observed data S(0) is used, and their missing status S(1) under combination therapy is imputed. This inverse is true for patients initially under combination therapy.

The treatment effect within the “SWoR–any” stratum is estimated by comparing the median changes in CD4 counts from baseline to Month 24 between the treatment groups. This is done using a rank-based ANCOVA, adjusted for baseline CD4 counts and the duration of prior antiretroviral therapy. The model incorporates quantile regression to adjust for these baseline covariates when estimating the differences in medians.

Missing data, especially the ICE status and CD4 counts at Month 24 for patients who withdraw from the study without meeting the rescue criteria, are imputed using the same multiple imputation model. This model is based on time-to-event data, considering the time to death or rescue, whichever occurs first.

3.3 Estimands Based on the Time-to-Event Outcomes

Estimand 1

Treatment Policy Estimand for Time to AIDS or Death in All Randomized Patient

Section	Description
1b. Define Objective	The objective is to assess the risk of AIDS or death for patients starting treatment with monotherapy with an option of switching to combination therapy later, compared to those who start and continue on combination therapy.
2a. Identify Possible ICEs	Similar to Estimand 1 above, except that death is now part of the outcome, not just an intercurrent event.
2b. Define Treatment Regimen	The treatment regimen for evaluation includes the initially randomized treatment (ZID monotherapy or combination therapy with ZID + [DID or ZAL]), along with any subsequent changes such as early treatment discontinuation, switch to another nonstudy treatment, or rescue from monotherapy to combination therapy as per the protocol-defined rescue procedure.
2c. Define Estimand	The estimand is defined by several components: (a) The population is the same as for Estimand 1 above. (b) Efficacy is evaluated based on the time to the first AIDS event or death. (c) Specific ICEs leading to changes in treatment (e.g., 50% reduction in CD4 counts and premature discontinuation) are managed by a treatment policy strategy. Both AIDS and death, which may result in treatment changes, are included in the outcome. (d) The treatment effect is summarized with the hazard ratio for the defined outcome event.
3a. Data Needed for Estimand	Necessary data includes time to event or censoring, event or censoring status, along with covariates used in the analysis model, collected regardless of the occurrence of any ICEs.
4a. Main Estimator	The main estimator is a covariate-adjusted estimate of the treatment coefficient in the proportional hazards Cox regression model for time to event (Cox PH).
4b. Missing Data Assumption	For the primary analysis, patients who withdraw from the study without experiencing the event (AIDS or death) are considered censored at random (CAR) in the Cox PH estimation procedure.
4c. Sensitivity Estimators	Sensitivity analysis tests the CAR assumption using a multiple imputation-based procedure. It involves fitting a Bayesian piecewise exponential proportional hazards model (PEPHM) assuming a higher event hazard for patients who discontinued from the study within 24 months after randomization. This process generates multiple imputed datasets, analyzes each using the Cox regression model, and combines the results using Rubin’s combination rules to produce an overall point estimate, confidence interval, and p-value for the treatment effect.

tipping point analysis

In the case study under discussion, a tipping point analysis is utilized to examine the impact of varying sensitivity parameters, specifically δ0 and δ1, which are associated with the mono- and combination therapy groups respectively. This analysis helps determine the thresholds at which the null hypothesis—that there is no difference in the risk of AIDS or death between the two treatment groups—might fail to be rejected.

In this analysis, δ0 is consistently set at 1, reflecting an assumption of Censoring at Random (CAR) for patients in the monotherapy group, which serves as the control group in this context. On the other hand, δ1, which pertains to the combination therapy group considered experimental here, is varied to explore different scenarios of increased hazard following patient withdrawal from the study.

The main analysis presents a hazard ratio of 0.67 (95% CI: 0.52, 0.87), indicating a superior outcome for the combination therapy compared to monotherapy. This result is consistent in both the observed data and the sensitivity analysis. The sensitivity analysis involves multiple imputation of times to event for patients who were censored due to study withdrawal, employing sensitivity parameters where δ = 10 for monotherapy and δ1 = 1 for combination therapy. This scenario maintains the assumption of CAR.

Further sensitivity analyses adjust δ1 to higher values to simulate an increased hazard for patients in the combination therapy group who withdraw from the study. By adjusting δ1 to 3, the hazard ratio for the combination therapy adjusts closer to 1, reaching 0.8 with a p-value around 0.1. This indicates that the significance of the main analysis could be challenged if the hazard for withdrawn patients were more than doubled, as evidenced by the threshold of δ1 = 2.4 needed to achieve a p-value of 0.05 in a two-sided test.

Estimand 2

Principal Stratification Estimand for Time to AIDS or Death (Not Needing Rescue on Any Study Treatment)

Section	Description
1b. Define Objective	The objective is to evaluate the risk of AIDS or death in patients treated with monotherapy compared to those receiving combination therapy in a subgroup of patients who do not experience a ≥50% decrease from baseline in CD4 counts on either treatment. This subpopulation differs from those typically considered as not needing rescue, as it specifically includes those who may progress to AIDS or die without a significant early decrease in CD4 counts.
2a. Identify Possible ICEs	The intercurrent events are the same as in previous estimands, but with death included as part of the outcome.
2b. Define Treatment Regimen	The regimen under evaluation is the initially randomized treatment (ZID monotherapy or combination therapy with DID or zalcitabine, ZID+[DID or ZAL]) conducted without any rescue intervention.
2c. Define Estimand	(a) The population is the principal stratum of patients who would not have a ≥50% reduction in CD4 counts throughout the treatment. (b) Efficacy is assessed by the time to the first AIDS event or death. (c) The premature discontinuation of the study treatment is handled by the treatment policy strategy, while events leading to a potential rescue (≥50% worsening in CD4 counts) are managed by the principal stratification strategy. AIDS events are included in the outcome. (d) The effect of combination therapy vs. monotherapy is summarized with the hazard ratio for the outcome event.
3a. Data Needed for Estimand	Required data includes time to AIDS event or death, censoring status, event indicator for ≥50% reduction in CD4 counts, along with covariates similar to those in Estimand 2 above. Data after a premature discontinuation of study treatment are also necessary.
4a. Main Estimator	The main estimator is similar to that in Estimand 1 this section but is specifically defined for the subpopulation that does not experience a significant reduction in CD4 counts, as per the principal stratification framework. A Bayesian logistic regression model is used to impute missing ICE statuses based on observed data, where only counterfactual values are imputed.
4b. Missing Data Assumption	Patients who are censored for the endpoint event (AIDS or death) are handled by the Cox PH estimation procedure assuming censoring at random (CAR). Missing covariate values for a small number of patients are imputed using a multiple imputation procedure based on the MCMC method.
4c. Sensitivity Estimators	The CAR assumption can be stress tested in a manner similar to Estimand 1 this section, though specific results are omitted here for brevity.

Estimand 3

Principal Stratification Estimand for Time to AIDS or Death (Needing Rescue on Monotherapy Only)

Section	Description
1a. Identify Decision Maker	The decision maker is the same as in Estimand 1, typically the clinical trial’s principal investigator or a clinical decision-making body.
1b. Define Objective	The objective is to evaluate the risk of an AIDS event or death for patients who start on a monotherapy and later switch to a combination therapy due to a ≥50% decline in CD4 counts, compared to those who start and continue on the combination therapy. This analysis focuses on a specific subpopulation: patients who experience a ≥50% worsening in CD4 counts on monotherapy, regardless of their response on combination therapy.
2a. Identify Possible ICEs	The intercurrent events are the same as previously detailed in Table Estimand 1, with death included as part of the endpoint.
2c. Define Estimand	(a) The population of interest is the principal stratum of patients who would experience a ≥50% reduction in CD4 count at any point after starting monotherapy. (b) Efficacy is evaluated based on time to the first AIDS event or death. (c) Events resulting in a switch to combination therapy (due to a ≥50% worsening in CD4 counts) are handled by the principal stratification strategy, with AIDS events included in the endpoint.
3a. Data Needed for Estimand	Data requirements are similar to those for Estimand 2, including time to AIDS event, death, or censoring, along with the event indicator for a ≥50% reduction in CD4 counts and other relevant covariates.
4a. Main Estimator	The main estimator is aligned with Estimand 2 but specifically tailored for the subpopulation defined as S(0) = 1, which indicates patients in the monotherapy group who necessitate a switch to combination therapy due to a significant decline in CD4 counts.
4b. Missing Data Assumption	The approach to missing data is consistent with Estimand 2, assuming censoring at random (CAR) for patients who withdraw from the study without experiencing the event (AIDS or death).
4c. Sensitivity Estimators	Sensitivity analysis follows the methods used in Estimand 2, testing the robustness of conclusions against the assumptions regarding missing data and the handling of censored cases.

4 Case Study: Type 2 Diabetes (Treatment Policy Estimands)

4.1 Study Details

Study Design

Type: Parallel, randomized, placebo-controlled, blinded trial
Size: 400 patients, randomized in a 1:1 ratio between the test treatment and placebo

Population

Participants: Patients with type 2 diabetes managed solely by diet and exercise

Treatments

Comparison: Test treatment versus placebo

Key Variable

Primary Endpoint: Change in Hemoglobin A1c (HbA1c) levels from baseline to week 26. HbA1c is a marker of average blood glucose levels over the previous two to three months, with a decrease indicating improvement in diabetes management.

Summary Measure

Assessment: Expected change from baseline to week 26 in HbA1c, with a between-group comparison focusing on the difference in changes between the test treatment and placebo groups.

Intercurrent Event

Event Description: Discontinuation of the randomized treatment and switch to unblinded use of the test treatment. This event is considered under a single category termed ‘treatment non-adherence,’ which includes:
- For patients initially receiving the test treatment, this would involve continuing the test treatment but in an unblinded manner.
- For patients initially receiving placebo, this would involve switching to the test treatment, also unblinded.

Visit Schedule

Visits: One baseline visit (V0) and five post-baseline visits (V1-V5), with V5 at week 26 marking the end of the study period.

Implications of the Design and Intercurrent Event

Unblinding Risks: The possibility of patients switching from placebo to the test treatment and from blinded to unblinded test treatment use could potentially introduce biases or affect the trial’s integrity by revealing treatment assignments. This needs careful handling to maintain the validity of the study outcomes.
Handling of Non-adherence: The study’s approach to treatment discontinuation (categorized as non-adherence) could impact the interpretation of the efficacy data. It’s crucial how this data will be analyzed, as non-adherence might affect the comparability between groups if not properly accounted for in the analysis.
Efficacy Measurement: The primary focus on the change in HbA1c allows for a direct assessment of the treatment’s impact on glucose regulation over a substantial period, aligning well with clinical objectives in diabetes care. The measure is objective and quantifiable, providing a clear metric for evaluating the effectiveness of the treatment.
Ethical Considerations: Allowing patients on placebo to switch to the test treatment (unblinded) after discontinuation could be seen as enhancing the ethical conduct of the trial by potentially providing a beneficial treatment to those not initially receiving it. However, this must be balanced against the risk of bias introduced by such switches.

4.2 Treatment Policy Estimand of Interest

Here’s a breakdown of the key components:

Population:

Patients with type 2 diabetes who are managing their condition solely through diet and exercise.

Treatments:

The trial compares a test treatment with a placebo. The crucial aspect of this estimand is that it considers the effect of the treatments regardless of the patients’ adherence to the assigned treatment regimen.

Variable:

The primary endpoint is the change in Hemoglobin A1c (HbA1c) from baseline to week 26. HbA1c is a key indicator that reflects the average blood glucose concentration over the previous three months.

Summary Measure:

The expected change from baseline in HbA1c at week 26, with the analysis focusing on the difference between the two groups. This measure will help determine if the test treatment is more effective than the placebo in lowering blood glucose levels over the trial period.

Data Collection Approach:

Data will continue to be collected until the primary endpoint for all patients, including those who do not adhere to the treatment to which they were initially randomized. This approach supports the treatment policy estimand by capturing the full scope of treatment effects, inclusive of all deviations from the protocol that might occur during the trial.

Significance of This Estimand:

This estimand is significant because it aims to capture the ‘real-world’ effectiveness of the test treatment. By evaluating the impact of the treatment irrespective of adherence, the estimand provides a more comprehensive understanding of how effective the treatment could be in typical clinical practice, where patients may not always follow prescribed treatments strictly.

The treatment policy estimand approach allows the study results to be more generalizable and reflective of practical clinical outcomes, acknowledging that non-adherence is a common occurrence in real-world settings. This makes the findings relevant for healthcare providers and policymakers when considering the potential benefits and limitations of new treatments for type 2 diabetes.

4.3 Missing data under treatment policy strategy

Missing data imputation is a critical process in clinical trials, particularly when ensuring the integrity and robustness of the study’s results in the face of missing data due to non-adherence, dropouts, or other reasons. Aligning missing data imputation strategies with the targeted estimand and considering clinically and statistically sound assumptions are essential for maintaining the validity of the trial’s conclusions.

Principles for Missing Data Imputation:

Alignment with Estimand: The method of imputation should reflect the nature of the estimand. For a treatment policy estimand, the imputation method should accommodate data in a way that reflects the intention-to-treat principle, considering all assigned treatments as if they were followed as per protocol.
Clinically Plausible Assumptions: These depend on the therapeutic context, disease characteristics, and the treatment mechanism. Assumptions must consider factors like whether the drug is disease-modifying or merely symptomatic and its pharmacokinetics such as half-life.
Adequate Modelling Assumptions: The statistical model used for imputation should be robust, minimizing bias and providing a reliable approximation of missing values based on available data.

Common Imputation Methods:

These methods are often used in scenarios where treatment discontinuation leads to missing data, and the aim is to estimate the trajectory of patients’ outcomes as if they had continued on the assigned treatment or shifted to a control or placebo condition.

When choosing an imputation method, it is critical to consider the nature of the disease and treatment. For instance, in chronic conditions where effects are prolonged and discontinuations common, more nuanced approaches like CIR might be more appropriate than J2R, which could be more suitable for acute settings or where the drug effect is expected to cease immediately upon discontinuation.

Comparison of Methods

Reference-Based Methods (J2R, CIR, CR):
- Jump to Reference (J2R / JR): Assumes all drug effects cease immediately upon discontinuation, with future outcomes following the placebo trajectory.
- Copy Increment in Reference (CIR): Assumes the rate of change (increment) in the patient’s outcomes will start to mimic those observed in the placebo group post-discontinuation.
- Copy Reference (CR): Patients are assumed to follow the entire trajectory of the placebo group from the point of their discontinuation.
Missing at Random (MAR):
- Suitable when the reasons for missing data are related to observed factors rather than the missing data itself, assuming a similarity in behavior between those with complete and incomplete data.
Retrieved Dropout (RDO) Imputation:
- Useful in scenarios where it’s possible to track outcomes of patients post-discontinuation, providing a more direct observation of potential outcomes for dropouts. This method is particularly valuable when analyzing long-term effects and adherence issues in clinical trials.

1. Jump to Reference (J2R / JR) Imputation

Overview: This method assumes that any drug effect disappears immediately upon discontinuation, and the patient’s condition reverts to what it would have been under the placebo.
- Description: This method assumes that any effect from the active drug ceases immediately upon discontinuation. Patients are then assumed to “jump” to the trajectory typical of the reference group, usually the placebo arm.
- Use Case: Appropriate when the drug effect dissipates quickly after discontinuation.
Visualization Details:
- Similar to CR, the blue line represents the drug arm and the black line represents the control (placebo) arm.
- Patients who discontinue the drug are assumed to revert to a condition similar to the placebo group immediately.
- Future values are worse than if they had continued on the drug but are aligned with the mean of the placebo group for simplicity in analysis.

2. Copy Reference (CR) Imputation

Overview: In the CR method, it is assumed that once a patient discontinues the drug, their future values will mimic the trajectory of the placebo arm, regardless of any benefit they might have initially experienced from the drug.
- Description: Here, the assumption is that after discontinuation, the changes (increments) in the patient’s measurements mimic those observed in the reference group from that point forward.
- Use Case: Useful when the drug’s effect diminishes gradually rather than instantly, allowing for a more gradual transition in the effect observed in patients.
Visualization Details:
- The blue line represents the drug arm, and the black line represents the placebo arm.
- Patients who drop out are assumed to follow the trajectory of the placebo arm exactly from the point of dropout.
- The imputed values for future observations are aligned with the mean trajectory of the placebo group.

3. Copy Increments in Reference (CIR) Imputation

Overview: This method assumes that patients who drop out from the drug arm do not revert entirely to the placebo condition but instead begin to follow the incremental changes observed in the placebo arm. This acknowledges some residual effect of the drug that was taken before dropout.
- Description: Patients are assumed to follow the distribution pattern of the reference group from the point of randomization. This method effectively resets the patient’s expected outcome to mirror the reference group entirely from the start of the study.
- Use Case: Best suited for scenarios where treatment effects are unclear or highly variable, or when the treatment is suspected to have no lasting independent effects beyond discontinuation.
Visualization Details:
- The drug’s impact is considered to taper off, not abruptly stop, as patients begin to mimic the incremental progress (or lack thereof) of the placebo arm from their last observed value.
- This creates a more gradual transition in the dataset, potentially reflecting a more realistic scenario of drug discontinuation effects.

4. Missing at Random (MAR)

Implementation: This approach assumes that missing data can be modeled based on similar subjects within the same treatment arm, considering that missingness is not related to unobserved variables.
Visualization and Usage: The graph illustrates how variability in outcomes increases over time, which is a typical scenario in long-term studies. The imputed values (squares) are based on the conditional mean given the observed values, which helps maintain the internal consistency of the dataset.

5. Retrieved Dropout (RDO)

Concept: The RDO method focuses on utilizing data from patients who discontinued treatment but whose outcomes continue to be tracked. This approach helps model what could happen to patients who drop out, by using data from those who have similar profiles but remain under observation.
Implementation and Challenges: For patients with missing data at a specific visit, information is borrowed from similar patients in the same treatment arm who have available dropout data. This method requires a sufficient amount of RDO data for reliable imputation and can lead to variance inflation if the data is not sufficient, impacting the bias-variance trade-off.
Visualization: The diagram shows various points where patients either continue, drop out, or are followed after discontinuation, with imputed values being informed by the retrieved dropout data.

4.4 Multiple Imputation

Step 1: Parameter Estimation (Imputation Model)

Objective: Fit a multivariate normal distribution for each treatment arm using data observed prior to any intercurrent event (ICE), such as dropout or switching treatments.
Components:
- \(\mu_a, \Sigma_a\): Mean and covariance matrix for the active treatment arm.
- \(\mu_r, \Sigma_r\): Mean and covariance matrix for the reference (placebo) arm.
- Uninformative priors are used for both the mean and the covariance matrices, with the covariance matrix typically employing an Inverse Wishart distribution. This choice helps in avoiding bias from overly prescriptive assumptions about the data structure.

Step 2: Imputation

Objective: Generate multiple complete datasets by imputing missing values based on the distributions estimated in Step 1.
Process:
- Draw from the posterior distribution of parameters \(\mu_r, \Sigma_r, \mu_a, \Sigma_a\) established in Step 1.
- Construct a joint distribution of observed and missing data to facilitate imputation.
- Impute missing data from the conditional distribution of \(Y_{miss} | Y_{obs}\), where \(Y_{miss}\) is the missing data and \(Y_{obs}\) is the observed data, based on the relationships established in the model.
- The imputation is repeated multiple times (commonly denoted as \(M\) times), creating multiple complete datasets.
- Different imputation means are calculated depending on the method: J2R, CIR, CR. Each method adjusts the imputation based on the reference trajectory, whether it’s a direct copy, an increment adjustment, or a complete jump to the reference values at the time of dropout.
  - J2R Mean (\(\tilde{\mu}\)):
    - For Jump to Reference, the imputed values for post-ICE data are a direct continuation of the reference group mean from the latest observed time point (\(t_i\)), effectively assuming that the treatment effect disappears and the patient follows the placebo trajectory.
  - CIR Mean (\(\tilde{\mu}\)):
    - For Copy Increment in Reference, the imputed values are calculated as a blend of the active arm’s trajectory up to the last observed time point and then shifting towards the change observed in the reference group. This reflects a gradual decline or alteration in the treatment effect rather than an abrupt stop.
  - CR Mean (\(\tilde{\mu}\)):
    - For Copy Reference, the imputed values are straightforwardly set to follow the reference arm’s mean (\(\mu_r\)), assuming that post-dropout, the patient’s outcomes align exactly with those typically seen in the placebo group.

Step 3: Analysis

Objective: Analyze each dataset independently to compute the summary measures of interest, which might include means, variances, or other statistical tests.
Importance: This step allows for the assessment of variability and the robustness of the study results across different imputed datasets.

Step 4: Pooling

Objective: Combine results from multiple imputed datasets.
Methodology: Use Rubin’s rules to pool the results. Rubin’s rules provide a way to combine estimates from multiple imputed datasets to obtain overall estimates and their variances, accounting for both within-imputation and between-imputation variability. This approach helps in deriving more accurate confidence intervals and p-values.

4.5 Analysis of Treatment Policy Estimands

An analysis of the example randomized controlled trial in patients with type 2 diabetes. The analyses in this worksheet target a treatment policy estimand, i.e., we are interested in the comparison between treatment versus placebo group irrespective of whether or not patients experienced the intercurrent event of treatment discontinuation.

Review Data

Although we are interested in the treatment effect irrespective of whether or not a patient discontinued randomized treatment, it is still generally of interest to understand the proportion of patients who adhered or discontinued treatment.

Let us create a table to summarize the number and proportion of patients who discontinued treatment by group and visit.

## $ctl
##  ontrt            1            2            3            4            5
##      0   8   (4.0%)  18   (9.0%)  31  (15.5%)  43  (21.5%)  50  (25.0%)
##      1 192  (96.0%) 182  (91.0%) 169  (84.5%) 157  (78.5%) 150  (75.0%)
##  Total 200 (100.0%) 200 (100.0%) 200 (100.0%) 200 (100.0%) 200 (100.0%)
## 
## $trt
##  ontrt            1            2            3            4            5
##      0   5   (2.5%)  12   (6.0%)  20  (10.0%)  27  (13.5%)  29  (14.5%)
##      1 195  (97.5%) 188  (94.0%) 180  (90.0%) 173  (86.5%) 171  (85.5%)
##  Total 200 (100.0%) 200 (100.0%) 200 (100.0%) 200 (100.0%) 200 (100.0%)

Now let us create a figure to show the mean change in HbA1c from baseline by visit, treatment group and whether they have experienced the intercurrent event of treatment discontinuation

Analysis of Data (ANCOVA)

As we have no missing data, we can perform our analysis just using ANCOVA. Our primary estimand is interested in the change in HbA1c from baseline to week 26 (visit 5), so we can restrict our analysis to visitn==5.

## 
## Call:
## lm(formula = hba1cChg ~ group + hba1cBl, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.1306 -0.7002 -0.0591  0.7915  3.2440 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.94031    0.63404   9.369  < 2e-16 ***
## grouptrt    -0.68208    0.10915  -6.249 1.07e-09 ***
## hba1cBl     -0.76454    0.07976  -9.585  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.092 on 397 degrees of freedom
## Multiple R-squared:  0.2482, Adjusted R-squared:  0.2444 
## F-statistic: 65.52 on 2 and 397 DF,  p-value: < 2.2e-16

Trial with Missing Data

## $ctl
##                    dispo            1            2            3            4            5
##                Off-study   2   (1.0%)   7   (3.5%)  14   (7.0%)  24  (12.0%)  32  (16.0%)
##  Off-treatment, on-study   6   (3.0%)  11   (5.5%)  17   (8.5%)  19   (9.5%)  18   (9.0%)
##             On-treatment 192  (96.0%) 182  (91.0%) 169  (84.5%) 157  (78.5%) 150  (75.0%)
##                    Total 200 (100.0%) 200 (100.0%) 200 (100.0%) 200 (100.0%) 200 (100.0%)
## 
## $trt
##                    dispo            1            2            3            4            5
##                Off-study   1   (0.5%)   4   (2.0%)   9   (4.5%)  14   (7.0%)  16   (8.0%)
##  Off-treatment, on-study   4   (2.0%)   8   (4.0%)  11   (5.5%)  13   (6.5%)  13   (6.5%)
##             On-treatment 195  (97.5%) 188  (94.0%) 180  (90.0%) 173  (86.5%) 171  (85.5%)
##                    Total 200 (100.0%) 200 (100.0%) 200 (100.0%) 200 (100.0%) 200 (100.0%)

Complete-Case Analysis

## 
## Call:
## lm(formula = hba1cChg ~ group + hba1cBl, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.1634 -0.6974 -0.0336  0.7934  3.1896 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.86939    0.67544   8.690  < 2e-16 ***
## grouptrt    -0.74453    0.11422  -6.518 2.49e-10 ***
## hba1cBl     -0.74898    0.08605  -8.704  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.07 on 349 degrees of freedom
##   (48 observations deleted due to missingness)
## Multiple R-squared:  0.2602, Adjusted R-squared:  0.256 
## F-statistic: 61.38 on 2 and 349 DF,  p-value: < 2.2e-16

Multiple imputation analysis (JR - Jump to Reference)

However, first we need to create:

A dataset showing the visit when the intercurrent event occurred for each patient.
A list of the key variables to be used in the imputation step.

Note that in reality we never observe the missing data. Therefore, we would not be able to decide on the imputation strategy based on the data. Instead this should be based on clinically plausible assumptions depending on the therapeutic setting, disease characteristics and treatment mechanism.

ry running some different analyses on data_missing using different assumptions for the missing data. Strategies available in rbmi are:

MAR - Missing At Random
JR - Jump to Reference
CR - Copy Reference
CIR - Copy Increments in Reference
LMCF - Last Mean Carried Forward

Step 1: Fit the imputation model to the observed data

{r,echo = F,message = FALSE, error = FALSE, warning = FALSE}
# Define which imputation method 
# Here we use Bayesian multiple imputation with 250 imputed datasets 
set.seed(123456)
method <- rbmi::method_bayes(burn_in = 200,
                             burn_between = 6,
                             n_samples = 250)

# Create samples for the imputation parameters by running the draws() function

drawObj <- draws(data = data_missing,
                 data_ice = dat_ice,
                 vars = vars,
                 method = method)

Step 2: Impute the missing data using the imputation model multiple times

{r,echo = F,message = FALSE, error = FALSE, warning = FALSE}
# Impute the data using the imputation model above
# In the dataset `dat_ice` we specified a jump-to-reference strategy
# Now we need to specify for each group what this 'reference' is
# Set reference arm in control group to itself to the control group
# Set reference arm in treatment group to control group
imputeObj <- rbmi::impute(draws = drawObj,
                    references = c("trt"="ctl", 
                                   "ctl"="ctl"))

Step 3: Analyse each complete dataset

{r,echo = F,message = FALSE, error = FALSE, warning = FALSE}
# Fit the analysis model (ANCOVA) on each imputed dataset
# Analysis model includes group and baseline HbA1c as covariates
anaObj <- analyse(imputations = imputeObj, 
                  fun = ancova,
                  vars = set_vars(outcome = "hba1cChg",
                                  visit = "visitn",
                                  subjid = "subjid",
                                  group = "group",
                                  covariates = "hba1cBl"))

Step 4: Combine the results to obtain point estimate and variance estimation

{r,echo = F,message = FALSE, error = FALSE, warning = FALSE}
# Pool the results to get treatment effects
# In the output:
# trt_x refers to the difference between control and treatment group at visit x
# lsm_ref_x is the change from baseline in the reference group (control) at visit x
# lsm_alt_x is the change from baseline in the alternative group (treatment) at visit x
poolObj <- pool(results = anaObj, conf.level = 0.95, alternative = "two.sided")
poolObj

Retrieved-dropout models

This practical above has only covered reference-based imputation methods. An alternative approach is to use retrieved-dropout models. The link below contains a vignette showing how the rmbi package can also be used to implement this approach.

https://insightsengineering.github.io/rbmi/main/articles/retrieved_dropout.html

5 Analysis of Hypothetical Estimands

5.1 Estimation for hypothetical estimands

5.2 Prediction of Hypothetical Trajectories

Explicit or implicit predictions of hypothetical trajectories: This refers to making predictions about what might happen under various hypothetical scenarios, using either explicitly stated models or assumptions.
Assumptions for the predictions: Assumptions must align with the hypothetical scenarios, influencing the model’s design and expected outcomes.

Notation and Study Design

Randomized Treatment (Z): Indicates whether a participant received the treatment (1) or was in the control group (0).
Intercurrent Event Indicator (Eᵢ): Shows whether an intercurrent event (ICE) occurred at each visit (1 if occurred, 0 if not). Assumes once an intercurrent event occurs, it continues to exist.
Outcome Variable (Y): The observed change in HbA1c from baseline at week 26.
Potential Outcome (Yᵢ): Hypothetical outcome without any intercurrent event or under different treatment conditions.
Estimand of interest: The difference in the outcome between treatment and control groups, assuming no intercurrent events.
Covariates (X₀, Xᵢ): Baseline and subsequent covariates that might affect the outcome, measured throughout the study.

Z influences E₁: Treatment can affect the likelihood of an intercurrent event.
Eᵢ influences Eᵢ₊₁: Indicates a cascade effect where an intercurrent event at one point increases the likelihood of another in the future.
X₀ → X₁ → X₂ → … → X₄: Represents changes or measurements of covariates (like HbA1c) over time.
Arrows into Y: Shows that all these factors, including treatment, intercurrent events, and covariates, influence the final outcome of HbA1c levels.

Treatment Policy

Green Pathways (Estimated Treatment Effect): These represent the direct and indirect effects of the treatment (Z) on the outcome (Y). The treatment affects each point in time where HbA1c is measured (X1 through X4) and can influence intercurrent events (E1 through E5), which in turn can affect subsequent measurements and the final outcome.
This diagram shows all possible impacts of the treatment throughout the course of the study, including its potential to affect the occurrence of intercurrent events, which are particularly critical in clinical trials.

Hypothetical Estimand

Green Pathways (Estimated Treatment Effect): These lines show the direct influence of treatment on the outcome (Y) and intermediate HbA1c measurements (X1 to X4), assuming no intercurrent events (E1 to E5) affected the outcomes. This hypothetical estimand aims to estimate what the treatment effect would be in an ideal scenario where no intercurrent events alter the course of treatment.
Red Pathways (Biasing path): These paths highlight potential sources of bias if the intercurrent events were ignored in the analysis. They show how each intercurrent event (E1 to E5) could influence the HbA1c measurements (X2 to X4), potentially confounding the true treatment effect.

Key Points

Treatment Policy Estimand: This considers all real-world effects, including intercurrent events, providing a comprehensive view of treatment effectiveness.
Hypothetical Estimand: By ignoring intercurrent events, this focuses on the direct effect of the treatment under idealized conditions, useful for understanding the intrinsic efficacy of the treatment.

Concept of Time-dependent Confounding:

Time-dependent Confounders (X₁-X₄): These are variables that:
1. Are affected by previous treatment.
2. Influence the probability of future treatment.
3. Impact the outcome of interest (Y).

In this study, the levels of HbA1c measured at different times (X₁ through X₄) are time-dependent confounders because each measurement can influence and be influenced by treatment decisions and outcomes.

The challenge lies in adjusting for these confounders without inadvertently blocking the pathway through which the treatment effect is transmitted. This is a key concern in causal inference: - Blocking the Effect Pathway: Adjusting for X₁-X₄ directly could block some of the treatment effects since these confounders are also intermediaries of treatment effects.

Directed Acyclic Graphs (DAGs) Analysis - DAG (a):

Green Paths: Show the estimated treatment effect paths which demonstrate how treatment (Z) potentially affects the final HbA1c measurement (Y) through intermediate measurements (X₁-X₄) and intercurrent events (E₁-E₅).
Red Paths: Illustrate the biasing paths where confounders (X₁-X₄) and intercurrent events (E₁-E₅) may misrepresent the true treatment effect if not properly accounted for.

Directed Acyclic Graphs (DAGs) Analysis - DAG (b):

Simplified Representation: Focuses on paths that are purely related to treatment effects, ignoring paths through time-dependent confounders to avoid the bias introduced by adjusting for these confounders.

Desired Comparison:

Y(Z = 1, E₁-₄ = 0) vs Y(Z = 0, E₁-₄ = 0): This comparison aims to measure the treatment effect assuming no intercurrent events have occurred to purely see the effect of the treatment.

5.3 Methods for Estimating Hypothetical Estimands

5.3.1 Multiple Imputation (MI)

Multiple Imputation for Hypothetical Estimands:

Purpose: In clinical trials, especially those with longitudinal measurements, data after an ICE are often not considered relevant for the hypothetical estimand (the outcome that would have been observed had the ICE not occurred). This can be thought of as a missing data problem.
Method: MI treats all post-ICE data as missing. For example, if an intercurrent event \(E1\) occurs, then subsequent measurements \(X1, X2, X3, X4\), and the final outcome \(Y\) are treated as missing. The imputation model is used to estimate these missing values based on other available data, under the assumption that the missingness is related to observed data but not to the missing data itself.
Assumptions: A key assumption in this context might be Missing At Random (MAR), where the likelihood of missing data depends only on observed data.

Use of MAR in Predicting Hypothetical Trajectories:

Definition: Under MAR, the missingness of data is related only to the observed data, not to any unobserved data.
Application: In hypothetical estimands, assuming MAR suggests that the data missing due to ICEs can be imputed based on the observed characteristics and responses of similar patients who did not experience the ICEs.
Feasibility: Whether MAR is a sensible assumption depends on the specific characteristics of the intercurrent event and the study design. If the ICEs are believed to be random with respect to future outcomes after controlling for observed data, MAR can be a reasonable assumption. However, if ICEs are related to unobserved future outcomes or unmeasured confounders, then MAR would not be appropriate, and a more sophisticated method like MNAR (Missing Not At Random) might be needed.

5.3.2 Inverse Probability Weighting (IPW)

Inverse Probability Weighting for Hypothetical Estimands: - Purpose: IPW is used to adjust for the non-random occurrence of intercurrent events by modeling the process leading to these events. - Method: Each participant’s data is weighted by the inverse of the probability of their observed treatment path, given their covariates and previous treatment history. This approach aims to create a pseudo-population where the occurrence of intercurrent events is independent of treatment, mimicking the condition of no ICEs.

Problem Setup:

Confounder (\(X_i\)): This is a variable that influences both the outcome (\(Y\)) and the likelihood of an intercurrent event (\(E_{i+1}\)).
Intercurrent Event (\(E_i\)): Events that can affect the continuation or outcome of the treatment.

IPW Idea:

Upweighting: In the presence of intercurrent events that might skew the observed outcome, IPW adjusts the influence of each individual’s data based on their probability of not experiencing the intercurrent event, given their confounders. This adjustment helps in maintaining a balanced representation of all groups within the study.
Creating a Pseudo-Population: IPW adjusts the dataset to create a “pseudo-population” in which the distribution of individuals who did and did not experience intercurrent events is balanced as if these events were independent of the measured confounders. This adjustment helps to mitigate the effect of confounders that are linked to the likelihood of experiencing an intercurrent event.

Imagine a study with two baseline groups differentiated by color (blue/green), which represent different levels or types of a confounder \(X\). Suppose that the probability of not having an intercurrent event \(E=0\) given the confounder blue is \(P(E=0|blue) = 0.5\) (i.e., 50%). If in the actual study, fewer blue individuals did not experience the event compared to green, then each blue individual who did not experience the event might be weighted more heavily (e.g., a weight of 2) to represent not only themselves but also those blue individuals who did have the event, thus simulating a scenario where the intercurrent event is independent of being blue or green.

Weights (\(w_k\)): Each subject \(k\) receives a weight calculated as the inverse of the probability of being free from the intercurrent event, conditioned on their treatment status \(Z\) and confounders \(X\). Mathematically, this is expressed as: \[ w_k = \frac{1}{P(E_k = 0|Z_k, X_k)} \] This weight is used to adjust their contribution to the analysis, effectively increasing the influence of underrepresented scenarios within the observed data.

IPW for Hypothetical Estimand:

In estimating a hypothetical estimand (where we hypothesize the outcome had the ICEs not occurred), IPW helps to simulate a dataset where: - ICEs are absent: It weights the data so that the analysis can proceed as if the ICEs did not occur. - Independent of Confounders \(X\): It also ensures that this simulated dataset is independent of the distribution of the confounder, \(X\), making the analysis robust against confounding bias due to \(X\).

This method is crucial for ensuring that estimates of treatment effects or exposures are unbiased by confounders or selection mechanisms related to intercurrent events, providing a clearer picture of the causal effects of interest.

5.3.3 G-Computation

Purpose: G-computation is a statistical technique used to estimate the effect of a treatment or exposure in the presence of confounders.
Method: It involves modeling the outcome as a function of treatment and confounders. In the context of hypothetical estimands, it can sometimes be equivalent to multiple imputation, depending on how the outcome model is specified and used.

5.3.4 Advanced Methods

Augmented IPW and Targeted Maximum Likelihood Estimation (TMLE): These methods combine the strengths of IPW and outcome modeling to produce more efficient and less biased estimates.
G-Estimation: This method is specifically designed for estimating the effects of time-varying treatments in the presence of time-varying confounders that are also affected by past treatment.

5.4 Time-dependent Intercurrent Event Occurrence

Objective: - We aim to estimate the probability that a patient does not experience the intercurrent event at any time during the study. This is crucial for properly weighing each subject in the analysis to account for these events.

Calculation of Probabilities

Formula for Probability: - The probability calculation involves taking the product of the conditional probabilities of not experiencing the intercurrent event across all time points up to the current visit \(i\): \[ \prod_{i=1}^5 P(E_{i,k} = 0 | Z = z_k, E_{1,i-1,k} = 0, X_{i-1,k} = x_{i-1,k}) \] Here: - \(E_{i,k}\) = Intercurrent event indicator at visit \(i\) for individual \(k\). - \(Z = z_k\) = Treatment assignment for individual \(k\). - \(E_{1,i-1,k} = 0\) = Condition where no intercurrent events have occurred up to the previous visit. - \(X_{i-1,k} = x_{i-1,k}\) = Covariates observed up to the previous visit for individual \(k\).

Intuition: - Each factor in the product adjusts for the history of treatment and intercurrent events, along with the observed covariates, making the probability specific to the pathway that individual \(k\) has actually followed.

Subject Weights Calculation

Weight Formula: - Each subject \(k\) receives a weight calculated as: \[ w_k = \frac{1}{\prod_{i=1}^5 P(E_{i,k} = 0 | Z = z_k, E_{1,i-1,k} = 0, X_{i-1,k} = x_{i-1,k})} \] - Purpose of Weights: These weights are used to create a weighted sample (pseudo-population) in which the occurrence of intercurrent events is statistically independent of the observed covariates and treatment assignment. This adjustment is necessary to estimate the effect of the treatment under the hypothetical scenario where no intercurrent events occur.

This approach is particularly important in longitudinal studies where events occurring after baseline can affect the treatment and subsequent outcomes. By adjusting the contribution of each participant’s data based on the likelihood of remaining event-free, IPW helps to reduce bias in estimating treatment effects, providing a clearer picture of the treatment’s potential impact under ideal conditions.

5.5 Estimation of weight and treatment effect

5.5.1 Estimation of Weight

Methods:

Nonparametric Methods:
- Example: Calculate the sample proportion within each stratum of \(X\). This approach does not assume any specific form of the relationship between the covariates and the probability of the intercurrent event \(E\). It’s straightforward but may not be practical with continuous or high-dimensional covariates due to the “curse of dimensionality.”
Parametric Methods:
- Example: Use logistic regression to model the probability of \(E\) given covariates \(X\) and treatment \(Z\). This approach allows for more efficient estimation in the presence of multiple or continuous covariates and can provide better insights into how specific variables influence the probability of experiencing an intercurrent event.

5.5.2 Estimation of Treatment Effect

Weighted Sample Mean: - Uses the weights derived (potentially from one of the above methods) to calculate a mean that reflects a population where the treatment assignment \(Z\) is independent of the potential outcomes under no intercurrent events.

Hájek Estimator: - A specific type of estimator for the mean that adjusts the weighted sample mean by the sum of the weights, helping to stabilize estimates especially in smaller samples or in unbalanced designs: \[ \text{Hájek Estimator: } \frac{\sum_{i=1}^n (1-E_i)Z_iY_iw_i}{\sum_{i=1}^n (1-E_i)Z_iw_i} - \frac{\sum_{i=1}^n (1-E_i)(1-Z_i)Y_iw_i}{\sum_{i=1}^n (1-E_i)(1-Z_i)w_i} \] This formula calculates the weighted averages of the outcome \(Y\) separately for the treated and control groups, adjusting for the distribution of the weights.

Weighted Least Squares Regression: - Uses the weights in a regression model of \(Y\) on \(Z\) for those cases where \(E = 0\), treating it as a Marginal Structural Model. This model provides a way to estimate causal effects by appropriately accounting for time-dependent confounders adjusted by weights.

Inference and Confidence Intervals (CIs): - Robust Sandwich Variance Estimator: This method is used to calculate confidence intervals that are robust to misspecification of the model. While valid, these CIs might be conservative (i.e., wider than necessary), potentially overestimating the uncertainty. - Non-parametric Bootstrap: Often used to validate CIs by resampling the data with replacement and recalculating the estimator multiple times. This method can provide a more accurate representation of the uncertainty if the original CI assumptions are violated or if the sample size is small.

5.6 Issue of Large Weights in IPW

Positivity Assumption: This is crucial for IPW. If the propensity score (probability of receiving treatment given covariates) for any individual is exactly 0 or 1, it leads to infinite weights, which are problematic.
Consequence of Extreme Propensity Scores: Propensity scores close to 0 or 1 result in very large weights for those individuals, making the estimates less precise and potentially unstable. This usually occurs when an individual’s characteristics (covariates) make them highly unlikely or likely to receive treatment compared to the rest of the sample.

How to Handle Large Weights

Investigate the Cause:
- Identify Outliers: Determine who these individuals with large weights are and investigate whether they represent a combination of covariate values that are rare within the dataset.
- Understand Their Impact: Analyze whether these individuals have extreme values for some confounders which might be influencing the propensity score significantly.
Use Stabilized Weights:
- Method: Adjust the original weights by the ratio of the marginal probability of receiving treatment to the conditional probability given the covariates. This often reduces the variance of the weights without introducing bias.
- Reference: Cole & Hernan (2008) provide a detailed discussion on this technique.
Trim Weights:
- Approach: Remove or cap individuals with weights beyond a certain threshold (e.g., greater than \(w_0\)).
- Effect: This method changes the target population of the inference because it effectively excludes or down-weights the most extreme cases.
Truncate Weights:
- Method: Set weights that are smaller than the \(p\%\) quantile to the \(p\%\) quantile and similarly for weights larger than the \((1-p)\%\) quantile.
- Trade-off: Truncating weights introduces some bias (because it alters the actual contribution of each observation based on its probability of treatment) but reduces variance, which can lead to more stable estimates. This is a classic bias-variance trade-off scenario.

5.7 Analysis of Hypothetical Estimands

In the below scenario, using complete-case analysis results in a smaller treatment effect estimate, highlighting that biases due to confounding can vary in direction. Although Multiple Imputation (MI) and Inverse Probability Weighting (IPW) show similar standard errors (SE) in this specific instance with only one post-baseline covariate, typically, MI produces smaller SE compared to IPW. However, the reduced SE with MI comes at the expense of more extensive parametric assumptions, indicating a trade-off between statistical precision and reliance on model-based assumptions.

5.7.1 Review Data

Load the dataset DiabetesExampleData_wide_noICE.rds, which contains the ideal trial in which no intercurrent events were observed. The dataset is contextually the same as those used previously in the Treatment Policy Estimands Practical, except that for the analyses in this worksheet we will primarily work with data in wide, rather than long, format.

## 
## The downloaded binary packages are in
##  /var/folders/4r/75mf9fk97hvc4s0qnfx5n3sm0000gn/T//RtmpPY3r7E/downloaded_packages

5.7.2 Analysis of Data (ANCOVA)

## 
## Call:
## lm(formula = hba1cChg_5 ~ group + hba1cBl, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2518 -0.6455 -0.0299  0.7744  3.1562 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.28553    0.63645   8.305 1.59e-15 ***
## grouptrt    -0.80323    0.10957  -7.331 1.29e-12 ***
## hba1cBl     -0.66254    0.08007  -8.275 1.97e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.096 on 397 degrees of freedom
## Multiple R-squared:  0.2356, Adjusted R-squared:  0.2317 
## F-statistic: 61.17 on 2 and 397 DF,  p-value: < 2.2e-16

5.7.3 Trial where some participants did not adhere to randomized treatment

Now we will move onto the more realistic example, where some patients did not adhere to their randomized treatment. We are interested in the hypothetical estimand as if all patients adhered to their randomized treatment.

	ctl (N=200)	trt (N=200)	Overall (N=400)
ontrt_1
0	8 (4.0%)	5 (2.5%)	13 (3.3%)
1	192 (96.0%)	195 (97.5%)	387 (96.8%)
ontrt_2
0	18 (9.0%)	12 (6.0%)	30 (7.5%)
1	182 (91.0%)	188 (94.0%)	370 (92.5%)
ontrt_3
0	31 (15.5%)	20 (10.0%)	51 (12.8%)
1	169 (84.5%)	180 (90.0%)	349 (87.3%)
ontrt_4
0	43 (21.5%)	27 (13.5%)	70 (17.5%)
1	157 (78.5%)	173 (86.5%)	330 (82.5%)
ontrt_5
0	50 (25.0%)	29 (14.5%)	79 (19.8%)
1	150 (75.0%)	171 (85.5%)	321 (80.3%)

5.7.4 Analysis of Patients without Intercurrent Event

Let us first perform the analysis dropping any patients who did not adhere to treatment, i.e., only using the observed outcome data in the patients who did adhere to their randomized treatment until the end of follow-up. This analysis will only provide an unbiased estimate of the treatment effect under the assumption that treatment non-adherence occurred completely at random in both treatment arms.

## 
## Call:
## lm(formula = hba1cChg_5 ~ group + hba1cBl, data = .)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.1646 -0.7189 -0.0195  0.8072  3.1928 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.99485    0.72999   8.212 5.51e-15 ***
## grouptrt    -0.73757    0.12147  -6.072 3.60e-09 ***
## hba1cBl     -0.76496    0.09408  -8.131 9.60e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.081 on 318 degrees of freedom
##   (79 observations deleted due to missingness)
## Multiple R-squared:  0.2629, Adjusted R-squared:  0.2582 
## F-statistic:  56.7 on 2 and 318 DF,  p-value: < 2.2e-16

5.7.5 Analysis using Multiple imputation

Now, let us move onto more principled analyses, starting with a multiple imputation approach. Here, for the patients who experience the intercurrent event of treatment non-adherence, we aim to impute what their values of HbA1c would have been if they had not experienced the intercurrent event by modelling the hypothetical future trajectory based on their past trajectory and the trajectories of similar patients.

As soon as a patient experiences the intercurrent event of treatment non-adherence, all future values of HbA1c will be missing. Therefore, we have a monotone missingness pattern, i.e., if a patient discontinues treatment at visit 1, then HbA1c will be missing from visit 1 to visit 5, whereas if a patient discontinues treatment at visit 4, then HbA1c will only be missing for visits 4 and 5.

The code below will perform a sequential imputation starting by imputing HbA1c at visit 1, sequentially through the visits, finally imputing HbA1c at visit 5. Each imputation is performed using Bayesian linear regression method=“norm” including treatment and all previous values of HbA1c.

We then fit our ANCOVA model to each imputed dataset, and finally pool the results of each imputation using Rubin’s rules to obtain a point estimate and its standard error.

5.7.6 Analysis using Inverse Probability Weighting (IPW)

Another way to perform this analysis is to use inverse probability weighting. In this approach, we only use the outcome data for the patients who did not experience the intercurrent event, but we up-weight these patients in such a way that they also represent what we estimate would have been observed in similar patients who did experience the intercurrent event.

The weights are given by the inverse of the propensity score, which is the propensity for patients to experience the intercurrent event at each visit. We start by calculating the weights to account for patients who experienced the intercurrent event at visit 1.

	ctl (N=192)	trt (N=195)	Overall (N=387)
weights
Mean (SD)	1.04 (0.0615)	1.03 (0.0284)	1.03 (0.0484)
Median [Min, Max]	1.02 [1.00, 1.51]	1.02 [1.00, 1.16]	1.02 [1.00, 1.51]
Sum	200	200	400

Our original sample size was 400 patients (200 in each arm). At visit 1, 8 patients had experienced the intercurrent event in the control arm, and 5 patients in the treatment arm. This reduced our sample size to 192 and 195 respectively, but through up-weighting patients who were similar to those who experienced the intercurrent event, we have maintained an effective sample size of 200 per arm.

Next we continue this process sequentially from visit 2 up to the end of follow-up.

	ctl (N=150)	trt (N=171)	Overall (N=321)
weights
Mean (SD)	1.36 (1.64)	1.18 (0.243)	1.26 (1.14)
Median [Min, Max]	1.11 [1.00, 20.9]	1.11 [1.01, 3.41]	1.11 [1.00, 20.9]
Sum	203	202	405

It is always a good idea to check the weights, because extreme weights can highlight potential violations of the positivity and/or modelling assumptions.

In the example above, the largest weight is almost 21. This means that the data of this one patient is now being used to represent twenty other similar patients who did not adhere to the randomized treatment. We could consider this as suggesting that patients with these characteristics only have a 1/21 = ~5% probability of adhering to their randomized treatment. This is perhaps not so extreme, and suggests that the positivity assumption holds in this case. However, it is not uncommon to see examples with weights >100, meaning this patient had less than 1% probability of not experiencing the intercurrent event. Although this would still strictly meet the positivity assumption that the probability is >0 and <1, the variance estimation will start to increase significantly as weights become more extreme.

The sum of the weights should equal the original sample size in each treatment arm. Here we see the sum of the weights in each arm is not quite equal to 200, but it is close. There is no strict limit on what can be considered ‘close enough’, but if the sum of the weights differs significantly from the original sample size it can suggest issues with the modelling of the propensity score.

Finally, we can check the results by fitting a weighted linear regression. The variance is estimated using a “robust” (Huber-White) sandwich estimator to account for the weighting.

## 
## t test of coefficients:
## 
##              Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)  5.201939   0.625226  8.3201 2.619e-15 ***
## grouptrt    -0.787932   0.126312 -6.2380 1.412e-09 ***
## hba1cBl     -0.650241   0.078175 -8.3178 2.661e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

5.7.7 IPW (bootstrap SE)

Note that the robust variance estimation used above assumes that the propensity score is known rather than estimated. This leads to a conservative overestimated variance. Unbiased variance estimates can instead be obtained via a bootstrap procedure.

## [1] 0.1227755

Estimands and Sensitivity Analyses

Zehui Bai

2025-06-30 21:34:53

1 Introduction

1.1 Estimands and Missing Data

1.2 Estimand, Estimators and Estimation

1.3 Causal

1.4 Missing data vs intercurrent event

1.5 Sensitivity versus Supplementary Analysis

1.6 Disease Specific Guideline

2 Case Study Continuous Endocrine Condition

2.1 Study Details

2.2 Other Points Considered

2.3 Supplementary Estimands

Primary Estimand (Effect of Drug X on Blood Pressure Control)

Supplementary Estimand 1 (Hypothetical Approach to Rescue Medication Unavailability)

Supplementary Estimand 2 (Effect While on Study Treatment)

Supplementary Estimand 3 (Practical Use of Drug X in Real-World Settings)

2.4 Statistical Analysis

2.4.1 Analysis Populations

2.4.2 Exploring Missing Data

2.4.3 Primary Efficacy Analysis

2.4.4 Missing Data Not Due to Intercurrent Events (ICE)

2.4.5 Sensitivity Analyses

2.4.6 Supplementary Analyses

2.5 ANCOVA VS MMRM

2.5.1 Mixed Model for Repeated Measures (MMRM)

2.5.2 ANCOVA Model for Each Timepoint

2.5.3 When MMRM Can Still Be Used

2.5.4 Primary Efficacy Analysis and Missing Data Handling

3 Case Study: ACTG 175 HIV Trial

3.1 Study Details

3.2 Estimands for the Continuous Endpoint of Change from Baseline in CD4

Estimand 1

Estimand 2

3.3 Estimands Based on the Time-to-Event Outcomes

Estimand 1

Estimand 2

Estimand 3

4 Case Study: Type 2 Diabetes (Treatment Policy Estimands)

4.1 Study Details

4.2 Treatment Policy Estimand of Interest

4.3 Missing data under treatment policy strategy

1. Jump to Reference (J2R / JR) Imputation

2. Copy Reference (CR) Imputation

3. Copy Increments in Reference (CIR) Imputation

4. Missing at Random (MAR)

5. Retrieved Dropout (RDO)

4.4 Multiple Imputation

4.5 Analysis of Treatment Policy Estimands

Complete-Case Analysis

Multiple imputation analysis (JR - Jump to Reference)

Retrieved-dropout models

5 Analysis of Hypothetical Estimands

5.1 Estimation for hypothetical estimands

5.2 Prediction of Hypothetical Trajectories

5.3 Methods for Estimating Hypothetical Estimands

5.3.1 Multiple Imputation (MI)

5.3.2 Inverse Probability Weighting (IPW)

5.3.3 G-Computation

5.3.4 Advanced Methods

5.4 Time-dependent Intercurrent Event Occurrence

5.5 Estimation of weight and treatment effect

5.5.1 Estimation of Weight

5.5.2 Estimation of Treatment Effect

5.6 Issue of Large Weights in IPW

5.7 Analysis of Hypothetical Estimands

5.7.1 Review Data

5.7.2 Analysis of Data (ANCOVA)

5.7.3 Trial where some participants did not adhere to randomized treatment

5.7.4 Analysis of Patients without Intercurrent Event

5.7.5 Analysis using Multiple imputation

5.7.6 Analysis using Inverse Probability Weighting (IPW)

5.7.7 IPW (bootstrap SE)