Analysis of Adverse Event
Efficacy and safety are two crucial aspects of assessing the risk-benefit ratio of a drug. When a drug’s efficacy meets the regulatory requirements, its safety profile must also be within a reasonable range. However, clinical trials sometimes reveal peculiar adverse reactions that attract regulatory attention, such as immune-related adverse events (irAEs) from PD-1/L1 inhibitors, bone damage from BTK inhibitors, and bleeding events from EGFR inhibitors. In such cases, sponsors are often required to conduct more in-depth analyses to clarify the drug’s safety. Based on past clinical trial experiences and literature, here have summarized some analytical methods for adverse events that sponsors and regulatory bodies might require for your reference.
Calculating the incidence rate is a fundamental requirement for AE analysis. This includes overall incidence rates, incidence by preferred terms (PT), standard of care (SOC), and by severity levels, including AEs that lead to drug discontinuation, interruption of treatment, or death.
The time from drug administration to the occurrence of an AE and the duration of the AE are two important dimensions of AE analysis. If an AE persists, it indicates that this type of AE tends to last long clinically and may require the sponsor to explain how to manage and ensure safety, especially for AEs of grade 3 or higher, which are of greater concern. The following table displays a common summary format:
Source: John Shaik et al., PharmaSUG 2016.
The figure below demonstrates the use of Kaplan-Meier (KM) curves to analyze the occurrence of AEs.
Source:
Kriss Harris et al., PhUSE 2017
Analyzing the patterns of AE occurrence over different time periods. For example, although an AE may occur rapidly, it might disappear quickly or be transient, thus having a minimal impact. If a certain type of AE continues throughout the medication period and is of a higher CTCAE grade, it is likely drug-related and clinical considerations for management are necessary.
The following figure shows the distribution over time of three types of AEs in melanoma patients treated with Nivo+Ipi. It can be observed that most skin-related AEs cease to appear after 3 months, and GI and liver-related AEs significantly decrease after 6 months.
The outcome of an AE represents whether interventions against the AE are effective. If there is no recovery, the duration of the AE will be prolonged, indicating that effective intervention for this type of AE is challenging.
If the AEs in the experimental group are particularly prominent (e.g., significantly more bleeding events or a higher mortality rate in the experimental group compared to controls), regulatory bodies often require an analysis of influencing factors. If clinical influencing factors can be identified, measures can be taken to control them, also indicating that the AEs are predictable.
Sometimes, to reduce the bias due to timing in between-group comparisons, a landmark analysis strategy might be used, which excludes patients who stopped treatment or died prematurely. For example, patients who terminated treatment or died within two months may be excluded.
This is also a method for analyzing differences in AE characteristics between two groups, using survival analysis to study the hazard ratio of AE occurrence. The figure below shows the risk of irAEs in the Keynote-054 study comparing Drug K with a placebo (HR = 5.0), indicating that the risk of irAEs is significantly higher in the experimental group.
dataADSL <- subset(dataAll$ADSL, SAFFL == "Y")
# Filter ADAE for safety and treatment-emergent flags
dataTEAE <- dataAll$ADAE %>%
filter(SAFFL == "Y", TRTEMFL == "Y") %>%
mutate(
TRTA = fct_reorder(TRTA, TRTAN) # reorder treatment
)
# Total AE population (only safety flag, not TRTEMFL)
dataTotalAE <- dataAll$ADSL %>%
filter(SAFFL == "Y") %>%
mutate(
TRTA = fct_reorder(TRT01A, TRT01AN)
)
# TEAE with worst-case scenario by AE severity
dataTEAE <- dataTEAE %>%
mutate(
AESEV = factor(AESEV, levels = c("MILD", "MODERATE", "SEVERE")),
AESEVN = as.numeric(AESEV)
) %>%
group_by(USUBJID, TRTA) %>%
mutate(
WORSTINT = if_else(AESEVN == max(AESEVN, na.rm = TRUE), as.character(AESEV), NA_character_)
) %>%
ungroup() %>%
mutate(
WORSTINT = factor(WORSTINT, levels = levels(AESEV))
)
## specify labels for each variable:
varsAE <- c("TRTEMFL", "AESER", "AESDTH", "AEREL")
dataTotalAE <- dataAll$ADSL %>%
filter(SAFFL == "Y") %>%
mutate(TRTA = fct_reorder(TRT01A, TRT01AN))
getSummaryStatisticsTable(
data = dataTEAE,
colVar = "TRTA",
var = c("TRTEMFL", "AESER", "WORSTINT", "AESDTH", "AEREL"),
varFlag = c("TRTEMFL", "AESER", "AESDTH"),
varLab = c(TRTEMFL = "Treatment-Emergent", WORSTINT = "Worst-case severity:"),
varGeneralLab = "Subjects with, n(%):",
varInclude0 = TRUE,
varTotalInclude = "WORSTINT",
stats = getStats('n (%)'),
emptyValue = "0",
labelVars = labelVars,
dataTotal = dataTotalAE, # <- Use ADSL-based population for N and percentages
title = tools::toTitleCase("Table: Summary Table of Treatment-emergent Adverse Events (Safety Analysis Set)")
# file = file.path("tables_CSR", "Table_TEAE_summary.docx") # optional export
)
Table: Summary Table of Treatment-Emergent Adverse Events (Safety Analysis Set) | |||
---|---|---|---|
Subjects with, n(%): | Placebo | Xanomeline Low Dose | Xanomeline High Dose |
Variable group | |||
Treatment-Emergent | 2 (100) | 2 (100) | 3 (100) |
Serious Event | 0 | 0 | 1 (33.3) |
Worst-case severity: | 2 (100) | 2 (100) | 3 (100) |
MILD | 0 | 0 | 0 |
MODERATE | 0 | 1 (50.0) | 1 (33.3) |
SEVERE | 2 (100) | 1 (50.0) | 2 (66.7) |
Results in Death | 2 (100) | 1 (50.0) | 0 |
Causality | |||
NONE | 1 (50.0) | 2 (100) | 3 (100) |
POSSIBLE | 1 (50.0) | 1 (50.0) | 2 (66.7) |
PROBABLE | 0 | 1 (50.0) | 3 (100) |
REMOTE | 0 | 0 | 1 (33.3) |
dataTEAE <- subset(dataAll$ADAE, SAFFL == "Y" & TRTEMFL == "Y")
# order treatment and severity categories
dataTEAE$TRTA <- with(dataTEAE, reorder(TRTA, TRTAN))
## data considered for the total
dataTotalAE <- subset(dataAll$ADSL, SAFFL == "Y")
dataTotalAE$TRTA <- with(dataTotalAE, reorder(TRT01A, TRT01AN))
getSummaryStatisticsTable(
data = dataTEAE,
rowVar = c("AESOC", "AEDECOD"),
colVar = "TRTA",
## total
# data
dataTotal = dataTotalAE,
# row total
rowVarTotalInclude = c("AESOC", "AEDECOD"), rowTotalLab = "Any TEAE",
stats = getStats("n (%)"),
labelVars = labelVars,
rowVarLab = c('AESOC' = "TEAE by SOC and Preferred Term,\nn (%)"),
# sort rows based on the total column:
rowOrder = "total",
rowOrderTotalFilterFct = function(x) subset(x, TRTA == "Total"),
title = paste("Table: Treatment-emergent Adverse Events by System Organ Class",
"and Preferred Term (Safety Analysis Set)"
)
# file = file.path("tables_CSR", "Table_TEAE_SOCPT_atLeast1Subject.docx")
)
Table: Treatment-emergent Adverse Events by System Organ Class and Preferred Term (Safety Analysis Set) | |||
---|---|---|---|
TEAE by SOC and Preferred Term, | Placebo | Xanomeline Low Dose | Xanomeline High Dose |
Dictionary-Derived Term | |||
Any TEAE | 2 (100) | 2 (100) | 3 (100) |
GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS | 0 | 2 (100) | 3 (100) |
APPLICATION SITE PRURITUS | 0 | 2 (100) | 2 (66.7) |
APPLICATION SITE ERYTHEMA | 0 | 2 (100) | 1 (33.3) |
APPLICATION SITE IRRITATION | 0 | 1 (50.0) | 1 (33.3) |
APPLICATION SITE DERMATITIS | 0 | 0 | 1 (33.3) |
FATIGUE | 0 | 0 | 1 (33.3) |
SECRETION DISCHARGE | 0 | 1 (50.0) | 0 |
SUDDEN DEATH | 0 | 1 (50.0) | 0 |
MUSCULOSKELETAL AND CONNECTIVE TISSUE DISORDERS | 0 | 2 (100) | 2 (66.7) |
BACK PAIN | 0 | 0 | 1 (33.3) |
FLANK PAIN | 0 | 0 | 1 (33.3) |
MUSCULAR WEAKNESS | 0 | 1 (50.0) | 0 |
SHOULDER PAIN | 0 | 1 (50.0) | 0 |
PSYCHIATRIC DISORDERS | 1 (50.0) | 1 (50.0) | 1 (33.3) |
COMPLETED SUICIDE | 1 (50.0) | 0 | 0 |
CONFUSIONAL STATE | 0 | 1 (50.0) | 0 |
HALLUCINATION, VISUAL | 0 | 0 | 1 (33.3) |
GASTROINTESTINAL DISORDERS | 0 | 0 | 2 (66.7) |
NAUSEA | 0 | 0 | 2 (66.7) |
INFECTIONS AND INFESTATIONS | 0 | 1 (50.0) | 1 (33.3) |
LOWER RESPIRATORY TRACT INFECTION | 0 | 0 | 1 (33.3) |
PNEUMONIA | 0 | 1 (50.0) | 0 |
NERVOUS SYSTEM DISORDERS | 0 | 0 | 2 (66.7) |
AMNESIA | 0 | 0 | 1 (33.3) |
LETHARGY | 0 | 0 | 1 (33.3) |
PARTIAL SEIZURES WITH SECONDARY GENERALISATION | 0 | 0 | 1 (33.3) |
RENAL AND URINARY DISORDERS | 0 | 1 (50.0) | 1 (33.3) |
CALCULUS URETHRAL | 0 | 0 | 1 (33.3) |
INCONTINENCE | 0 | 1 (50.0) | 0 |
RESPIRATORY, THORACIC AND MEDIASTINAL DISORDERS | 0 | 1 (50.0) | 1 (33.3) |
DYSPNOEA | 0 | 1 (50.0) | 0 |
EPISTAXIS | 0 | 0 | 1 (33.3) |
SKIN AND SUBCUTANEOUS TISSUE DISORDERS | 0 | 1 (50.0) | 1 (33.3) |
ACTINIC KERATOSIS | 0 | 0 | 1 (33.3) |
ERYTHEMA | 0 | 1 (50.0) | 0 |
CARDIAC DISORDERS | 1 (50.0) | 0 | 0 |
MYOCARDIAL INFARCTION | 1 (50.0) | 0 | 0 |
INJURY, POISONING AND PROCEDURAL COMPLICATIONS | 0 | 1 (50.0) | 0 |
JOINT DISLOCATION | 0 | 1 (50.0) | 0 |
SKIN LACERATION | 0 | 1 (50.0) | 0 |
INVESTIGATIONS | 0 | 1 (50.0) | 0 |
NASAL MUCOSA BIOPSY | 0 | 1 (50.0) | 0 |
METABOLISM AND NUTRITION DISORDERS | 0 | 0 | 1 (33.3) |
DECREASED APPETITE | 0 | 0 | 1 (33.3) |
getSummaryStatisticsTable(
data = dataTEAE,
rowVar = c("AESOC", "AEDECOD"),
colVar = "TRTA",
## total
# data
dataTotal = dataTotalAE,
# row total
rowVarTotalInclude = c("AESOC", "AEDECOD"), rowTotalLab = "Any TEAE",
stats = getStats("n (%)"),
labelVars = labelVars,
rowVarLab = c('AESOC' = "SOC and Preferred Term,\nn (%)"),
# sort rows based on the total column:
rowOrder = "total",
rowOrderTotalFilterFct = function(x) subset(x, TRTA == "Total"),
title = paste("Table: Treatment-emergent Adverse Events by System Organ Class",
"and Preferred Term reported in at least 25% of the subjects",
"in any treatment group (Safety Analysis Set)"
),
# file = file.path("tables_CSR", "Table_TEAE_SOCPT_atLeast25PercentsSubject.docx"),
# include only events occuring in at least 25% for at least one preferred term:
filterFct = function(x)
ddply(x, "AESOC", function(x){ # per AESOC to include the total
ddply(x, "AEDECOD", function(y){
yTotal <- subset(y, grepl("Total", TRTA))
if(any(yTotal$statPercN >= 25)) y
})
})
)
Table: Treatment-emergent Adverse Events by System Organ Class and Preferred Term reported in at least 25% of the subjects in any treatment group (Safety Analysis Set) | |||
---|---|---|---|
SOC and Preferred Term, | Placebo | Xanomeline Low Dose | Xanomeline High Dose |
Dictionary-Derived Term | |||
Any TEAE | 2 (100) | 2 (100) | 3 (100) |
GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS | 0 | 2 (100) | 3 (100) |
APPLICATION SITE PRURITUS | 0 | 2 (100) | 2 (66.7) |
APPLICATION SITE ERYTHEMA | 0 | 2 (100) | 1 (33.3) |
APPLICATION SITE IRRITATION | 0 | 1 (50.0) | 1 (33.3) |
MUSCULOSKELETAL AND CONNECTIVE TISSUE DISORDERS | 0 | 2 (100) | 2 (66.7) |
PSYCHIATRIC DISORDERS | 1 (50.0) | 1 (50.0) | 1 (33.3) |
GASTROINTESTINAL DISORDERS | 0 | 0 | 2 (66.7) |
NAUSEA | 0 | 0 | 2 (66.7) |
INFECTIONS AND INFESTATIONS | 0 | 1 (50.0) | 1 (33.3) |
NERVOUS SYSTEM DISORDERS | 0 | 0 | 2 (66.7) |
RENAL AND URINARY DISORDERS | 0 | 1 (50.0) | 1 (33.3) |
RESPIRATORY, THORACIC AND MEDIASTINAL DISORDERS | 0 | 1 (50.0) | 1 (33.3) |
SKIN AND SUBCUTANEOUS TISSUE DISORDERS | 0 | 1 (50.0) | 1 (33.3) |
# Simple capitalization function
simpleCap <- function(x) {
str_to_title(tolower(x))
}
# Extract analysis datasets
dataTEAE <- subset(dataAll$ADAE, SAFFL == "Y" & TRTEMFL == "Y")
dataTEAE$TRTA <- with(dataTEAE, reorder(TRTA, TRTAN))
# Total population for percentages (from ADSL)
dataTotalAE <- subset(dataAll$ADSL, SAFFL == "Y")
dataTotalAE$TRTA <- with(dataTotalAE, reorder(TRT01A, TRT01AN))
# Set AE severity levels
dataTEAE$AESEV <- factor(dataTEAE$AESEV, levels = c("MILD", "MODERATE", "SEVERE"))
dataTEAE$AESEVN <- as.numeric(dataTEAE$AESEV)
# Extract worst-case event per subject per AE term
dataAEWC <- ddply(dataTEAE, c("AESOC", "AEDECOD", "USUBJID", "TRTA"), function(x) {
x[which.max(x$AESEVN), ]
})
# Set WORSTINT with all severity levels included
dataAEWC$WORSTINT <- factor(
simpleCap(tolower(dataAEWC$AESEV)),
levels = c("Mild", "Moderate", "Severe")
)
labelVars["WORSTINT"] <- "Worst-case scenario"
# Generate datasets for totals: by AEDECOD and by SOC
dataTotalRow <- list(
AEDECOD = ddply(dataAEWC, c("AESOC", "USUBJID", "TRTA"), function(x) {
x[which.max(x$AESEVN), ]
}),
AESOC = ddply(dataAEWC, c("USUBJID", "TRTA"), function(x) {
x[which.max(x$AESEVN), ]
})
)
# Ensure WORSTINT factor levels are preserved in total rows
dataTotalRow <- lapply(dataTotalRow, function(df) {
df$WORSTINT <- factor(
simpleCap(tolower(df$AESEV)),
levels = c("Mild", "Moderate", "Severe")
)
df
})
# Build the table
getSummaryStatisticsTable(
data = dataAEWC,
rowVar = c("AESOC", "AEDECOD", "WORSTINT"),
rowVarInSepCol = "WORSTINT",
rowVarTotalInclude = c("AESOC", "AEDECOD"),
dataTotalRow = dataTotalRow,
rowVarTotalByVar = "WORSTINT",
rowTotalLab = "Any TEAE",
rowVarLab = c(
AESOC = "Subjects with, n(%):",
WORSTINT = "Worst-case scenario"
),
rowOrder = "total",
colVar = "TRTA",
stats = getStats("n (%)"),
emptyValue = "0",
labelVars = labelVars,
dataTotal = dataTotalAE,
title = tools::toTitleCase(
paste(
"Table: Treatment-emergent Adverse",
"Events by system organ",
"and preferred term by worst-case (safety Analysis Set)"
)
)
# file = file.path("tables_CSR", "Table_TEAE_Severity.docx")
)
Table: Treatment-Emergent Adverse Events by System Organ and Preferred Term by Worst-Case (Safety Analysis Set) | ||||
---|---|---|---|---|
Subjects with, n(%): | Worst-case scenario | Placebo | Xanomeline Low Dose | Xanomeline High Dose |
Dictionary-Derived Term | ||||
Any TEAE | Severe | 2 (100) | 1 (50.0) | 2 (66.7) |
Moderate | 0 | 1 (50.0) | 1 (33.3) | |
GENERAL DISORDERS AND ADMINISTRATION SITE CONDITIONS | Severe | 0 | 1 (50.0) | 0 |
Moderate | 0 | 0 | 2 (66.7) | |
Mild | 0 | 1 (50.0) | 1 (33.3) | |
APPLICATION SITE PRURITUS | Moderate | 0 | 0 | 1 (33.3) |
Mild | 0 | 2 (100) | 1 (33.3) | |
APPLICATION SITE ERYTHEMA | Mild | 0 | 2 (100) | 1 (33.3) |
APPLICATION SITE IRRITATION | Moderate | 0 | 0 | 1 (33.3) |
Mild | 0 | 1 (50.0) | 0 | |
APPLICATION SITE DERMATITIS | Moderate | 0 | 0 | 1 (33.3) |
FATIGUE | Mild | 0 | 0 | 1 (33.3) |
SECRETION DISCHARGE | Mild | 0 | 1 (50.0) | 0 |
SUDDEN DEATH | Severe | 0 | 1 (50.0) | 0 |
MUSCULOSKELETAL AND CONNECTIVE TISSUE DISORDERS | Moderate | 0 | 1 (50.0) | 1 (33.3) |
Mild | 0 | 1 (50.0) | 1 (33.3) | |
BACK PAIN | Mild | 0 | 0 | 1 (33.3) |
FLANK PAIN | Moderate | 0 | 0 | 1 (33.3) |
MUSCULAR WEAKNESS | Moderate | 0 | 1 (50.0) | 0 |
SHOULDER PAIN | Mild | 0 | 1 (50.0) | 0 |
PSYCHIATRIC DISORDERS | Severe | 1 (50.0) | 0 | 0 |
Moderate | 0 | 1 (50.0) | 1 (33.3) | |
COMPLETED SUICIDE | Severe | 1 (50.0) | 0 | 0 |
CONFUSIONAL STATE | Moderate | 0 | 1 (50.0) | 0 |
HALLUCINATION, VISUAL | Moderate | 0 | 0 | 1 (33.3) |
NERVOUS SYSTEM DISORDERS | Severe | 0 | 0 | 1 (33.3) |
Moderate | 0 | 0 | 1 (33.3) | |
AMNESIA | Mild | 0 | 0 | 1 (33.3) |
LETHARGY | Moderate | 0 | 0 | 1 (33.3) |
PARTIAL SEIZURES WITH SECONDARY GENERALISATION | Severe | 0 | 0 | 1 (33.3) |
GASTROINTESTINAL DISORDERS | Severe | 0 | 0 | 1 (33.3) |
Mild | 0 | 0 | 1 (33.3) | |
NAUSEA | Severe | 0 | 0 | 1 (33.3) |
Mild | 0 | 0 | 1 (33.3) | |
INFECTIONS AND INFESTATIONS | Moderate | 0 | 1 (50.0) | 1 (33.3) |
LOWER RESPIRATORY TRACT INFECTION | Moderate | 0 | 0 | 1 (33.3) |
PNEUMONIA | Moderate | 0 | 1 (50.0) | 0 |
RENAL AND URINARY DISORDERS | Moderate | 0 | 0 | 1 (33.3) |
Mild | 0 | 1 (50.0) | 0 | |
CALCULUS URETHRAL | Moderate | 0 | 0 | 1 (33.3) |
INCONTINENCE | Mild | 0 | 1 (50.0) | 0 |
RESPIRATORY, THORACIC AND MEDIASTINAL DISORDERS | Moderate | 0 | 1 (50.0) | 0 |
Mild | 0 | 0 | 1 (33.3) | |
DYSPNOEA | Moderate | 0 | 1 (50.0) | 0 |
EPISTAXIS | Mild | 0 | 0 | 1 (33.3) |
SKIN AND SUBCUTANEOUS TISSUE DISORDERS | Mild | 0 | 1 (50.0) | 1 (33.3) |
ACTINIC KERATOSIS | Mild | 0 | 0 | 1 (33.3) |
ERYTHEMA | Mild | 0 | 1 (50.0) | 0 |
INJURY, POISONING AND PROCEDURAL COMPLICATIONS | Moderate | 0 | 1 (50.0) | 0 |
JOINT DISLOCATION | Moderate | 0 | 1 (50.0) | 0 |
SKIN LACERATION | Mild | 0 | 1 (50.0) | 0 |
CARDIAC DISORDERS | Severe | 1 (50.0) | 0 | 0 |
MYOCARDIAL INFARCTION | Severe | 1 (50.0) | 0 | 0 |
INVESTIGATIONS | Mild | 0 | 1 (50.0) | 0 |
NASAL MUCOSA BIOPSY | Mild | 0 | 1 (50.0) | 0 |
METABOLISM AND NUTRITION DISORDERS | Moderate | 0 | 0 | 1 (33.3) |
DECREASED APPETITE | Moderate | 0 | 0 | 1 (33.3) |
load(file = "./01_Datasets/adae.rda")
load(file = "./01_Datasets/adsl.rda")
### Subset the population dataset ADSL on SAFFL safety flag
### consider only two treatments for comparison.
### convert all the upcase variables in ADSL and ADAE to lower case
adsl2 <- adsl %>%
rename_with(tolower) %>%
filter(saffl=='Y' & trt01a==c('Xan_Hi','Pbo')) %>%
select(usubjid,saffl,trt01a)
adae2 <- adae %>% rename_with(tolower)
adsl_cnt2 <- adsl2 %>%
group_by(trt01a) %>%
dplyr::summarise(bign=n()) %>%
pivot_wider(names_from = trt01a, values_from = bign)
### separate variables/macro variable to store those values two treatment counts
adsl_cnt2
pbo <- adsl_cnt2$Pbo
xan <- adsl_cnt2$Xan_Hi
### Merge the ADSL and ADAE on usubjid variable, keep only the required variables and remove the duplicate records, group by treatment and aedecod. Get the count of each aedecod per treatment and then derive the percentage. While deriving the population we are using the macro variables pbo and xan which has the population bign count.
adsl_adae = inner_join(adsl2,adae2,by=c("usubjid")) %>%
select(usubjid,aedecod,trt01a.x) %>%
distinct(usubjid,aedecod,trt01a.x) %>%
group_by(trt01a.x,aedecod) %>%
dplyr::summarise(cnt=n(),.groups = 'drop') %>%
ungroup() %>%
mutate(pct=ifelse(trt01a.x=='Pbo',cnt/pbo,cnt/xan)) %>%
ungroup() %>%
arrange(aedecod,trt01a.x)
### Derive the mean relative risk, lcl and ucl
adsl_adae2 <- adsl_adae %>%
select(-pct) %>%
pivot_wider(names_from = c(trt01a.x), values_from = cnt) %>%
mutate(nb=Pbo, na=Xan_Hi, snb=pbo, sna=xan, a=na/sna,
b=nb/snb,factor=1.96*sqrt(a*(1-a)/sna + b*(1-b)/snb),
lcl=a-b-factor,ucl=a-b+factor,mean=0.5*(lcl+ucl)) %>%
filter(!is.na(mean))
### Plot 1: AE Proportion dot plot
ggplot(adsl_adae %>% filter(aedecod %in% adsl_adae2$aedecod) %>%
arrange(desc(aedecod)),aes(x=pct,y=reorder(aedecod,desc(aedecod)))) +
geom_point(shape = 17,size=2,aes(colour = factor(trt01a.x))) +
ggtitle("Proportion") +
xlab('Proportion') + ylab('') +
scale_colour_manual(values = c("Blue", "Red")) +
theme(legend.position="bottom") + labs(col="Treatment:")
### Plot 2: Relative Risk
ggplot(data=adsl_adae2, aes(x=reorder(aedecod,desc(aedecod)), y=mean, ymin=lcl, ymax=ucl)) +
geom_pointrange() +
geom_hline(yintercept=0, lty=2) + # add a dotted line at x=1 after flip
coord_flip() + # flip coordinates (puts labels on y axis)
xlab("") + ylab("Mean (95% CI)") +
ggtitle("Risk Difference with 0.95CI") +
theme(axis.ticks = element_blank(),legend.position="none")
### align the above two images side by side, use the package cowplot.
p2 <- ggplot(data=adsl_adae2, aes(x=reorder(aedecod,desc(aedecod)), y=mean, ymin=lcl, ymax=ucl)) +
geom_pointrange() +
geom_hline(yintercept=0, lty=2) + # add a dotted line at x=1 after flip
coord_flip() + # flip coordinates (puts labels on y axis)
xlab("") + ylab("Mean (95% CI)") +
ggtitle("Risk Difference with 0.95CI") +
theme(axis.text.y = element_blank(),axis.ticks = element_blank(),legend.position="none")
p1 <- ggplot(adsl_adae %>% filter(aedecod %in% adsl_adae2$aedecod) %>%
arrange(desc(aedecod)),aes(x=pct,y=reorder(aedecod,desc(aedecod)))) +
geom_point(shape = 17,size=2,aes(colour = factor(trt01a.x))) +
ggtitle("Proportion") +
xlab('Proportion') + ylab('') +
scale_colour_manual(values = c("Blue", "Red")) +
theme(legend.position="bottom") + labs(col="Treatment:")
plot_grid(p1, p2, labels = "AUTO",nrow = 1,rel_widths = c(0.8, 0.5))
The data is an excerpt of the official VAERS data created by US Food and Drug Administration (FDA) and Centers for Disease Control and Prevention (CDC) that may be associated with vaccines.
Using a heatmap to display adverse event (AE) co-occurrence can be an effective visual tool in medical research to identify and analyze patterns of symptoms or side effects that frequently appear together.
Heat maps are charts with values being represented by colors. The intensity of the color is an indication of the number i.e. higher the value more intense will be the color.
HEAT MAP FOR NUMBER OF ADVERSE EVENTS PER TREATMENT PER SYSTEM ORGAN CLASS (SOC):
A heat map can be generated to visualize the number of adverse events recorded per SOC per treatment. We will plot the treatments on x-axis and the SOCs on y-axis. Each tile of the heat map will be colored as per the number of events recorded for that particular combination. The color of the tile will vary from yellow to red, with low counts shown in yellow and high counts in red. The tiles with white color denote the absence of events for a particular combination of SOC and treatment. The number of events will be printed in either black or white as per a predefined threshold; counts below or equal to the threshold will be printed in black, remaining will be in white.
In this plot, we are able to see the number of events reported for every SOC for all 5 arms. As seen in the legend, tiles with count less than 100 are yellow with varying intensity as per the actual count for that tile. For e.g. if we check the number of adverse events in the gastrointestinal system for treatments X and Y, the count is 11 and 26 respectively. The color of the tile for Y is deeper than the tile for X. The color starts changing from yellow to orange and then to red as the number of events increases. This is observed in the “Psychiatric” SOC counts, with 31 events Y has a yellow tile, 257 events lead to treatment A having an orange tile and the treatment Z has a red tile due to the 311 events reported. Here the threshold for number of events has been set to 100. We can quickly identify the SOCs with number of events greater than 100 as they are printed in white. We can observe white tiles for SOCs with no events for a particular treatment like Reproductive system and treatment X.
# tb1 <-aggregate(USUBJID~TRTA+SOC, data=aeord, FUN=function(x) c(ln=length(x)))
# head(tb1)
# names(tb1)[names(tb1)=="USUBJID"]<- "OBS"
tb1 <- data.frame(
TRTA = c("Y", "X", "Z", "A", "B", "Y", "X", "B", "A"),
SOC = c("NERVOUS", "NERVOUS", "PSYCHIATRIC", "PSYCHIATRIC", "PSYCHIATRIC", "PSYCHIATRIC", "PSYCHIATRIC", "RENAL", "RENAL"),
OBS = c(27, 12, 311, 257, 219, 31, 14, 17, 7)
)
p1 <- ggplot(tb1, (aes(x = TRTA, y = SOC, fill=OBS))) + geom_raster() +
geom_text(aes(label = paste0("N=",round(OBS, 1)), color = OBS < 100),show.legend = F)+
scale_fill_gradient(low="yellow",high="red",na.value="white")+
scale_color_manual(values = c("FALSE" = "white", "TRUE" = "black")) +
scale_x_discrete("Treatment") + scale_y_discrete("System Organ Class") +
theme_classic()+ggtitle("Heatmap for Number of Adverse Events per Treatment per SOC")+
theme(plot.title = element_text(hjust = 0.5)) + labs(fill="Number of Events")
p1
HEAT MAP FOR NUMBER OF SERIOUS AND NON SERIOUS ADVERSE EVENTS PER SOC PER TREATMENT
The adverse events are classified as either ‘Non Serious’ or ‘Serious’. We can plot the number of serious and non serious events occurring per SOC per treatment as shown in the heat map below. In this plot, both the counts will be displayed in the same tile and the color of the tile will be decided by the maximum of the number of events if both events are present.
HEAT MAP FOR NUMBER OF ADVERSE EVENTS PER TREATMENT PER PREFERRED TERM (PT) FOR A SINGLE SOC
As mentioned earlier, we can analyze the psychiatric disorders further by generating a heat map of the events reported under this SOC for all treatments. We will generate a heat map of all PTs with number of reported events greater than 10.
# tb2 <- aggregate(USUBJID~TRTA+SOC+AESER, data=aeord, FUN=function(x) c(ln=length(x)))
# names(tb2)[names(tb2)=="USUBJID"]<- "OBS"
tb2 <- data.frame(
TRTA = c("Y", "X", "Z", "A", "B", "Y", "X", "B", "A"),
SOC = c("NERVOUS", "NERVOUS", "PSYCHIATRIC", "PSYCHIATRIC", "PSYCHIATRIC", "PSYCHIATRIC", "PSYCHIATRIC", "RENAL", "RENAL"),
AESER = c("Y","Y","N","N","Y","N","Y","N","Y"),
OBS = c(27, 12, 311, 257, 219, 31, 14, 17, 7)
)
tb3 <-tb2[order(tb2$TRTA, tb2$SOC, tb2$OBS),]
p2 <- ggplot(tb3, aes(TRTA, SOC, fill = OBS)) + geom_raster() +
geom_text(data=tb3[tb3$AESER=="N",],aes(label = paste0("NSER=",round(OBS, 1))),
show.legend = F,size=2.9, vjust=0, hjust=1) +
geom_text(data=tb3[tb3$AESER=="Y",],aes(label = paste0("SER=",round(OBS, 1))),
show.legend = F,size=2.9, vjust=1, hjust=0)+
scale_fill_gradient(low="yellow",high="red",na.value="white")+
scale_x_discrete("Treatment") + scale_y_discrete(name = "System Organ Class") +
theme_classic()+
ggtitle("Heatmap for Number of Serious and Non Serious Adverse Events per SOC per
Treatment") + theme(plot.title = element_text(hjust = 0.5)) + labs(fill="Number of
Events")
p2
HEAT MAP FOR NUMBER OF ADVERSE EVENTS OF PTS OF A SINGLE SOC OF A SINGLE TREATMENT PER SEVERITY
We can check the severity of all the PTs for a single SOC for a single treatment. Let us analyze the Psychiatric Disorder PTs reported for treatment Z. We will plot the number of events reported for PTs against their severity which will have the value as Mild, Moderate or Severe
HEAT MAP FOR NUMBER OF RELATED AND NOT RELATED ADVERSE EVENTS PER PT PER SEVERITY FOR A SINGLE TREATMENT
We can check the causality of the events reported in the above plot along with their severity. Let us once again consider the psychiatry PTs reported for treatment Z. We will plot the number of events reported for PTs with their causality which will have values as ‘Related’ or ‘Not Related’ against severity
A volcano plot is a type of scatter-plot that is used to quickly identify changes in large data sets composed of replicate data. It plots significance versus fold-change on the y and x axes, respectively. It is constructed by plotting the negative log of the p value on the y axis (usually base 10). This results in data points with low p values (highly significant) appearing toward the top of the plot. The x axis is the log of the fold change between the two conditions. The log of the fold change is used so that changes in both directions appear equidistant from the center. In the clinical domain, a volcano plot is used to view Risk Difference (RD), Relative Risk or Odds Ratio of AE occurrence between the treatment groups by preferred term.
VOLCANO PLOT OF TREATMENT EMERGENT ADVERSE EVENT AT PT LEVEL
Volcano plots are also generated on the same data which is used for heat maps. Here, we have plotted negative log10-transformed p-value calculated using Fisher exact test on Y-axis and Log2-transformed relative risk on x-axis.
The dashed red line on y-axis shows where p = 0.05 with points above the line having p < 0.05 and points below the line having p > 0.05. Note that smaller the p-value larger is the number on y-axis. For this graph, we have labeled the AE terms with p-values < 0.05 as they are significant.
# Set the seed for reproducibility
set.seed(123)
# Create a data frame with synthetic data
total <- data.frame(
AETERM = c("NECK SHOULDER PAIN", "SINUSITIS", "DECREASED APPETITE", "DRY MOUTH", "UPPER RESPIRATORY INFECTION",
"COLD SYMPTOMS", "DRYNESS IN MOUTH", "WEIGHT GAIN", "HYPERHIDROSIS", "INCREASED IRRITABILITY",
"INCREASED THIRST", "LOWER BACK PAIN", "NUMBNESS (BOTH HANDS)"),
rr = runif(13, 0.1, 6), # Uniform distribution between 0.01 and 6
log2rr = log(runif(13, 0.1, 6), 2), # Normal distribution with mean -3 and sd 1
p = runif(13, 0.0001, 0.1), # Uniform distribution for p-values
Total = sample(2:50, 13, replace = TRUE), # Random sampling for total count
term = c(NA, NA, "DECREASED APPETITE", "DRY MOUTH", NA, NA, NA, NA, NA, NA, NA, NA, NA)
)
total %>%
kable(caption = "DATA SNIPPET", format = "html") %>%
kable_styling(latex_options = "striped")
AETERM | rr | log2rr | p | Total | term |
---|---|---|---|---|---|
NECK SHOULDER PAIN | 1.7967074 | 1.7984807 | 0.0544522 | 13 | NA |
SINUSITIS | 4.7510003 | -0.4996963 | 0.0594548 | 16 | NA |
DECREASED APPETITE | 2.5129638 | 2.4353532 | 0.0289871 | 33 | DECREASED APPETITE |
DRY MOUTH | 5.3098027 | 0.6340520 | 0.0147967 | 43 | DRY MOUTH |
UPPER RESPIRATORY INFECTION | 5.6487570 | -1.5222139 | 0.0963061 | 46 | NA |
COLD SYMPTOMS | 0.3687833 | 1.0248390 | 0.0902397 | 8 | NA |
DRYNESS IN MOUTH | 3.2158224 | 2.5189308 | 0.0691015 | 10 | NA |
WEIGHT GAIN | 5.3652724 | 2.4190755 | 0.0795672 | 42 | NA |
HYPERHIDROSIS | 3.3534666 | 2.0661030 | 0.0025589 | 11 | NA |
INCREASED IRRITABILITY | 2.7940269 | 1.9556811 | 0.0478318 | 24 | NA |
INCREASED THIRST | 5.7453167 | 2.5768103 | 0.0758701 | 28 | NA |
LOWER BACK PAIN | 2.7746715 | 1.9886535 | 0.0217192 | 8 | NA |
NUMBNESS (BOTH HANDS) | 4.0976667 | 2.0977219 | 0.0318863 | 28 | NA |
#Make a basic volcano plot
with(total, plot(log2rr, -log10(p), pch=20, main="Volcano Plots of Treatment Emergent Adverse
Events at PT Level", xlim=c(-8.5,8), abline(h = 1.3, col="red", lty=2), xlab="Log2 (Relative
Risk)", ylab="-Log10 (p-value)"))
#Add colored points: red if padj<0.05, orange of log2rr>1, green if both)
with(subset(total, p<.05 ), points(log2rr, -log10(p), pch=20, col="red"))
with(subset(total, abs(log2rr)>1), points(log2rr, -log10(p), pch=20, col="orange"))
with(subset(total, p<.05 & abs(log2rr)>1), points(log2rr, -log10(p), pch=20, col="green"))
#Label points with the textxy function from the calibrate plot
with(subset(total, p<.05 & abs(log2rr)>1), textxy(log2rr, -log10(p), labs=AETERM, cex=.7))
BUBBLE PLOT OF -LOG10 (P-VALUE) BY RELATIVE RISK SIZED BY COUNTS
The x-axis represents the log2 relative risk and y-axis represents the –log10 p-values computed using the fisher exact test. Each bubble represents an adverse event, with bubble size indicative of the total number of adverse events that occur for both treatments combined. Specifically, the bubble area is proportional to the total number of events.
Log2 (Relative Risk) of 0 represents no difference in risk between the 2 treatment groups, while bubbles to the right indicate a higher risk for subjects in treatment group Y (i.e. – If Log2 (RR) = 1, that is the same as an RR=2), and bubbles to the left indicate a higher risk for subjects in treatment group Z. Color, green and blue helps to emphasize adverse events that are more common for treatment Y or Z, respectively. The size of the bubble represents the total number of occurrences of the AE of interest. Names of events with total number of occurrences greater than 10 are displayed.
p <-plot_ly(total, x = ~ log2rr, y = ~ -log10(p), text = ~ AETERM, type = 'scatter', mode =
'markers', color=~ log2rr, colors=c("blue","green") , marker = list(size = ~ Total, opacity =
0.9)) %>%
layout(title = 'Bubble Plot Treatment Emergent Adverse Event of Significance versus Relative
Risk', xaxis = list(showgrid = FALSE, title = 'Log2 (Relative Risk)'), yaxis = list(showgrid
= FALSE, title = '-Log10 (p-value)')) %>%
add_annotations(x = total$log2rr, y = total$p1, text = total$term, xref = "x", yref = "y",
showarrow = FALSE, font = list(size = 10))
p
The Tendril plot concept is illustrated using the preferred term of Back Pain. Events on placebo tilt the tendril to the right and events on active tilt the tendril to the left. In this illustrative plot, events are colored by treatment arm; yellow and brown circles for placebo and active arm, respectively. The distance between points are proportional to time between events.
This tendril plot provides a visual comparison of System Organ Class Adverse Events (AEs) between a placebo group and an intervention group in a clinical study. Each tendril represents a different category of AEs, such as gastrointestinal or neurological, differentiated by color. The plot’s horizontal axis divides the two study groups, while the vertical axis indicates the progression of the study, though specific time units are not provided.
In the plot, the length and direction of each tendril illustrate the incidence and progression of AEs in each category relative to the treatment group. For example, lines extending significantly toward the intervention side indicate a higher occurrence of AEs in that group compared to placebo. This visual tool is effective for quickly assessing the safety profile and identifying the most prevalent adverse events in each treatment group during the study.
# packages
pacman::p_load(tidyverse, rio)
pacman::p_load(lubridate)
pacman::p_load(labelled)
pacman::p_load(Tendril)
pacman::p_load(ggtext)
pacman::p_load(colorspace)
# import
ae <- read_csv("./01_Datasets/2020-08-12_ae_example_dataset.csv") %>%
mutate(rando_date = ymd(rando_date),
aestdat = ymd(aestdat),
aeeddat = ymd(aeeddat)) %>%
mutate(day = as.numeric(aestdat - rando_date)) %>%
as.data.frame()
# Fixed mistake in subject 2011
ae[ae$usubjid == "2011", "arm"] <- "Intervention"
# add labels
var_label(ae) <- list(
usubjid = "unique subject identifier",
arm = "treatment assignment name",
armn = "treatment assignment numeric (0: placebo; 1:intervention)",
rando_date = "date of randomisation (yyyymmdd)",
repeatnum = "unique event identifier within usubjid",
aept = "adverse event code at preferred term/lower level",
aebodsys = "adverse event code at body system/higher level",
aesev = "adverse event severity grade (mild, moderate, severe)",
aesevn = "adverse event severity grade number (1: mild, 2: moderate, 3: severe)",
aeser = "serious adverse event (no, yes)",
aesern = "serious adverse event (0: no, 1: yes)",
aestdat = "adverse event start date (yyyymmdd)",
aeeddat = "adverse event end date (yyyymmdd)",
dur = "adverse event duration (days)")
# Tendril
subj <- ae %>%
dplyr::count(usubjid, arm) %>%
#add_row(usubjid = 3000, arm = "Intervention", n = 0) %>%
#add_row(usubjid = 3001, arm = "Placebo", n = 0) %>%
select(-n) %>%
as.data.frame()
pt <- Tendril(mydata = ae,
rotations = rep(3, nrow(ae)),
AEfreqThreshold = 5,
Tag = "Comment",
Treatments = c("Intervention", "Placebo"),
Unique.Subject.Identifier = "usubjid",
Terms = "aebodsys",
#Terms = "aept",
Treat = "arm",
StartDay = "day",
# SubjList = subj,
# SubjList.subject = "usubjid",
# SubjList.treatment = "arm",
# filter_double_events = TRUE
)
# working with terms
Terms <- pt$data %>%
dplyr::group_by(Terms) %>%
dplyr::summarise(n = n(), x = x[n]) %>%
dplyr::ungroup() %>%
arrange(x) %>%
mutate(text = str_glue("{Terms} (n={n})"))
# Reorder guides
pt$data$Terms <- fct_relevel(pt$data$Terms,
Terms %>%
pull(Terms) %>%
as.vector()
)
levels(pt$data$Terms) <- Terms %>% pull(text) %>% as.vector()
# plot Results
plot(pt) +
geom_point(alpha = 0.25) +
geom_path(alpha = 0.25) +
geom_vline(xintercept = 0, color = 'gray50', linetype = "dashed") +
geom_hline(yintercept = 0, color = 'gray50', linetype = "dashed") +
#scale_color_brewer(type = "qual", palette = "Set1") +
scale_color_discrete_qualitative(palette = "Dark3")+
scale_x_continuous(limits = c(-100, 200)) +
labs(title = "**Tendril Plot** of System Organ Class Adverse Events (AE) having at least 5 incidences",
# caption ="The Tendril Plot: a novel visual summary of the incidence, significance and temporal aspects of AE in clinical trials (JAMIA 2018; 25(8): 1069-1073)"
caption =
"Each MedDRA adverse event code at body system/higher level is
represented by a line (tendril) and each point is an event. Since time runs
along each tendril, it is the shape that carries the important information,
rather than the x and y coordinates. An event on the Intervention treatment
arm will tilt tendril direction to the *right*, and an event on the placebo arm will
tilt tendril direction to the *left*.
<br>
*The Tendril Plot: a novel visual summary of the incidence, significance and temporal aspects of AE in clinical trials (JAMIA 2018; 25(8): 1069-1073)*"
) +
guides(color = guide_legend(title = "AE at Body System")) +
theme(aspect.ratio = 0.70,
plot.title = element_markdown(),
plot.caption = element_textbox_simple(
size = 8,
lineheight = 1,
hjust = 0, vjust = 1,
padding = margin(1, 1, 1, 1),
margin = margin(1, 1, 1, 1)
),
plot.caption.position = "plot",
legend.position = c(0.99,0.92),
legend.justification = c(1,1),
legend.background = element_rect(fill = 'gray90'),
legend.key = element_rect(fill = 'gray90'))
# ggsave("./tendril_plot.png",
# width = 7.5, height = 6, units = "in")
# Simulating data
set.seed(123)
data <- data.frame(
Category = c("Social circumstances", "Immune system disorders", "Endocrine disorders",
"Neoplasms benign, malignant and unspecified (incl cysts and polyps)",
"Ear and labyrinth disorders", "General disorders and administration site conditions",
"Injury, poisoning and procedural complications", "Investigations", "Nervous system disorders",
"Metabolism and nutrition disorders", "Vascular disorders", "Renal and urinary disorders",
"Infections and infestations", "Musculoskeletal and connective tissue disorders",
"Skin and subcutaneous tissue disorders", "Pregnancy, puerperium and perinatal conditions",
"Congenital, familial and genetic disorders", "Gastrointestinal disorders", "Eye disorders",
"Respiratory, thoracic and mediastinal disorders", "Reproductive system and breast disorders",
"Cardiac disorders", "Hepatobiliary disorders", "Psychiatric disorders", "Surgical and medical procedures"),
Proportion = runif(25, 0.01, 0.04),
Proportion2 = runif(25, 0.01, 0.04),
RelativeRisk = runif(25, 0.5, 2),
LowerCI = runif(25, 0.1, 1.5),
UpperCI = runif(25, 1.5, 3),
Pvalue = runif(25, 0.001, 1)
)
data$Significant <- ifelse(data$Pvalue < 0.05, "red", "black")
double_dot_plot <- ggplot(data, aes(x = Category)) +
geom_point(aes(y = Proportion), color = "blue", size = 3) +
geom_point(aes(y = Proportion2), color = "red", size = 3) +
coord_flip() +
scale_y_continuous(labels = scales::percent_format(), breaks = seq(0, 0.04, by = 0.01)) +
labs(y = "Proportion", x = NULL, title = "Double Dot Plot") +
theme_minimal()
risk_plot <- ggplot(data, aes(x = Category, y = RelativeRisk)) +
geom_pointrange(aes(ymin = LowerCI, ymax = UpperCI), size = 0.5) +
geom_text(aes(label = sprintf("%.3f", Pvalue), color = Significant), hjust = 1.5, size = 3) +
coord_flip() +
scale_y_log10(breaks = c(0.1, 0.5, 1, 2, 10), labels = scales::trans_format("log10", scales::math_format(10^.x))) +
labs(y = "Relative Risk with 95% CI", x = NULL, title = "Relative Risk Plot") +
theme_minimal() +
theme(axis.text.y = element_blank(), axis.ticks.y = element_blank(), axis.title.y = element_blank()) +
scale_color_identity()
grid.arrange(double_dot_plot, risk_plot, ncol = 2)
This visualisation has been submitted as an html file and can be found here.
The above visualization is based on clinical trial data for an active treatment for eczema compared to placebo in adolescents that are unresponsive to standard care.
An analysis revealed that more adverse events were recorded for subjects that were assigned to an intervention treatment arm, compared to those assigned to a placebo.
Subjects assigned to placebo treatment arm that reported an adverse event had their event treated in a duration of 6 days, on average. Participants assigned to interevention treatment, on average, saw the end of their adverse event in 8 days.
This is definitely more than just a visualisation - it’s a tool. You can upload your data and all AEs are represented in the middle pane. Each patient is represented by a circle / pie chart. You can kick it off by pushing the go button and the app will show you the development of AEs over time. Each AE is represented by a “slice of the pie” and the color as well as the location of it will tell you which AE is showing. If an AE disappears, it will still be shown as an “empty slice”. The size of the slice represents the severity. You have several different options you can work with, like changing which AEs will be shown (color coded), stop at a specific point in time, etc. You can even switch to a barplot representation instead of pie charts.
This is a fully developed app and thus, it is very useful and innovative. Its interactivity brings additional benefits to the user. Furthermore, this is available as an R package called “AdEPro”.
# library(adepro)
#Run this code to put the variables in the dataset provided into the correct format
AE <- read.csv("./01_Datasets/2020-08-12_ae_example_dataset.csv")%>%
mutate(TRTSDT=ymd(rando_date)) %>%
mutate(SUBJIDN = usubjid) %>%
mutate(AEDECOD = aept) %>%
mutate(TRT01A = arm) %>%
mutate(TRT01A = if_else(TRT01A=="Intervention","Experimental","Control")) %>%
mutate(LVDT = 365) %>%
mutate(DTHDT = "NA") %>%
mutate(SAFFN = 1) %>%
mutate(AETRTEMN = 1) %>%
mutate(AESERN = aesern) %>%
mutate(AERELN = 1) %>%
mutate(AE_Start = ymd(aestdat)) %>%
mutate(AESTDY = AE_Start - TRTSDT + 1)%>%
mutate(AEENDY = AESTDY + dur + 1) %>%
mutate(AESEVN = aesevn) %>%
mutate(TRTSDT=0) %>%
mutate(AGE="NA") %>%
mutate(SEX="NA") %>%
mutate(REGION="NA") %>%
select(c(SUBJIDN,AEDECOD,AESTDY, AEENDY,AESEVN,AETRTEMN,AESERN,AERELN, TRT01A, TRTSDT, LVDT, DTHDT,AGE, SEX, REGION, SAFFN))
#write this back to CSV
write.csv(AE,"./01_Datasets/adepro_app_AE.csv",row.names=FALSE)
#Run the following line to run the app.
#CSV must be loaded into the app
# launch_adepro()
The purpose of creating this visualization is to explore and illustrate patterns in disease incidence across the United States from 1928 to 2000, focusing on diseases like measles, polio, and rubella. The visualization aims to answer key questions: How has the incidence rate of each disease changed over time? How does incidence vary across different states? And, importantly, what impact did vaccination have on disease incidence rates?
df <- read.csv("./01_Datasets/US_state_epi_vaccines.csv")
# Records of the year for which each vaccine was introduced
vac_dat <- data.frame(disease = c("MEASLES", "POLIO", "RUBELLA"),
Z = c(1963, 1955, 1969))
df %>%
na.omit() %>%
ggplot() +
geom_tile(aes(x = year, y = state, fill = incidence)) +
scale_fill_gradient2(trans = "log10", high = "red", low = "blue",
mid = "white", midpoint = 0, na.value = "white") +
scale_x_continuous(expand = c(0.02, 0.02)) +
geom_vline(data = vac_dat, aes(xintercept = Z), linewidth = 1.5, color = "darkred") +
facet_wrap(~disease, scales = "free") +
dark_theme_minimal() +
theme(axis.text.y = element_text(size = 5),
plot.title = element_text(hjust = 0.5, size = 16)) +
labs(y = "State", fill = "Log \nIncidence", x = "Year",
title = "Visulaizing Effect of Vaccines",
caption = "Dark Red Line Indicates Introduction of Vaccine")
inc <- read.csv("./01_Datasets/US_state_epi_vaccines.csv")
measles <- inc %>%
filter(disease == "MEASLES") %>%
mutate(inc_cat = cut(incidence, c(0, 200, 400, 600, 800, 1000, 1200, Inf))) %>%
mutate(inc_fac = factor(inc_cat))
ggplot(data=measles) +
geom_tile(aes(x=year, y=state, fill=inc_fac), color="gray") +
scale_x_continuous("Year", breaks=seq(1930, 2010, by=10), limits = c(1928, 2000)) +
scale_y_discrete("State", limits=rev) +
geom_vline(xintercept = 1963, size = 1, color = "#d7191c") +
scale_fill_brewer("Incidence\nPer\n100,0000\nPopulation",
labels=c("0-<200", "200-<400", "400-<600", "600-<800", "800-<1000", "1000-<1200", ">=1200"),
na.value = 'white',
na.translate = F) +
labs (title ="Incidence of measles in US states declines after <b style='color:#d7191c;'>vaccine approval</b>")+
theme_minimal() +
theme(legend.title.align = 0.5,
axis.text.x = element_text(size = 12, color = "#525252"),
axis.text.y = element_text(size = 8, color = "#525252"),
axis.line.y = element_blank(),
axis.ticks.y = element_blank(),
panel.grid.major = element_blank(),
plot.title = element_markdown())