---
title: "SSR simulation study"
output: rmarkdown::html_vignette
bibliography: gsDesignNB.bib
vignette: >
  %\VignetteIndexEntry{SSR simulation study}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 10,
  fig.height = 7
)
design_note <- "Design summary computed below"
```

```{r libraries, message=FALSE, warning=FALSE}
library(gsDesignNB)
library(gsDesign)
library(data.table)
library(MASS)
library(ggplot2)
library(dplyr)
library(gt)
library(DT)
library(future)
library(future.apply)
```

## Introduction

This vignette presents a simulation study evaluating **sample size
re-estimation (SSR)** for group sequential trials with negative binomial
recurrent event endpoints. We compare three strategies:

1. **No adaptation** -- the trial proceeds with the planned sample size.
2. **Blinded SSR** [@friede2010blinded] -- nuisance parameters are
   re-estimated from pooled (treatment-blinded) interim data.
3. **Unblinded SSR** -- nuisance parameters are re-estimated from
   treatment-specific interim data.

For hypothesis testing, the main power simulations use the `mutze_test()`
score statistic. This is particularly important in SSR: adaptation can
increase information when nuisance parameters are worse than planned, and the
final test must still preserve one-sided Type I error after that adaptation.
The dedicated non-binding Type I tables therefore compare Wald and score
testing under the same one-sided $\alpha = 0.025$ group sequential design.
The sample-size update itself remains a nuisance-parameter recalculation that
must be calibrated by simulation; the score test controls rejection after the
adaptation, rather than making adaptation unnecessary.

*Note:* Much of the simulation code in this vignette was generated with the
assistance of a large language model (LLM) and then reviewed and refined
by the package authors.

Interim analyses are triggered when the **blinded statistical information**
reaches the planned fraction of the target, rather than at fixed calendar
times. This ensures analyses occur at comparable information levels
regardless of the true nuisance parameters.

## Planned trial design

```{r planned_parameters}
lambda1_plan      <- 0.5
rr_plan           <- 0.7
lambda2_plan      <- lambda1_plan * rr_plan
k_plan            <- 0.5
power_plan        <- 0.9
alpha_plan        <- 0.025
analysis_test_type <- "score"
analysis_test_label <- tools::toTitleCase(analysis_test_type)
accrual_rate_plan <- 30
accrual_scenario_plan <- 18
accrual_dur_plan  <- 12
max_followup      <- 12
trial_dur_plan    <- accrual_dur_plan + max_followup
event_gap_val     <- 20 / 30.4375  # 20-day gap between events
analysis_times_plan <- c(9, 14, 24)  # Calendar times under plan
```

### Fixed design

```{r fixed_design}
design_plan <- sample_size_nbinom(
  lambda1 = lambda1_plan, lambda2 = lambda2_plan,
  dispersion = k_plan, power = power_plan, alpha = alpha_plan,
  accrual_rate = accrual_rate_plan,
  accrual_duration = accrual_dur_plan,
  trial_duration = trial_dur_plan,
  max_followup = max_followup,
  event_gap = event_gap_val,
  test_type = analysis_test_type
)
design_plan_wald_check <- sample_size_nbinom(
  lambda1 = lambda1_plan, lambda2 = lambda2_plan,
  dispersion = k_plan, power = power_plan, alpha = alpha_plan,
  accrual_rate = accrual_rate_plan,
  accrual_duration = accrual_dur_plan,
  trial_duration = trial_dur_plan,
  max_followup = max_followup,
  event_gap = event_gap_val,
  test_type = "wald"
)
cat("Fixed-design sample size:", design_plan$n_total, "\n")
cat("Fixed-design sample size using Wald sizing:", design_plan_wald_check$n_total, "\n")
cat("Primary analysis test:", analysis_test_type, "\n")
```

This production cache uses the score-test sizing option for the fixed design.
The separate score-vs-Wald simulation vignette shows that, for superiority
designs, traditional Wald/Zhu--Lakkis sizing paired with the score test may be
a useful practical margin for power. A protocol-level SSR design should
therefore compare the score-sized and Wald-sized starting designs under the
planned score final test before selecting the final sample-size target.
For the primary SSR design in this article, the two formulas round to the same
fixed-design sample size, so the long production cache is not duplicated under
an alternative starting size. A focused starting-size sensitivity appears below
for a lower-event stress setting where the rounded group sequential sample
sizes differ.

### Group sequential design

We use a 3-analysis group sequential design with HSD efficacy spending and
Cauchy futility spending calibrated so the futility bounds correspond to
approximately RR = 0.9 at each analysis.

Specifically, `sfl = sfCauchy` with `sflpar = c(0.4, 0.75, 0.56, 0.63)` gives
planned lower bounds that are close to declaring futility when the observed RR
is greater than about 0.9 at either interim. Because the design uses
non-binding futility (`test.type = 4`), ignoring a lower-bound crossing does
not invalidate the final efficacy decision, which is still based on upper
boundary control.

```{r gs_design}
gs_plan <- design_plan |>
  gsNBCalendar(
    k = 3, test.type = 4, alpha = alpha_plan,
    sfu = sfHSD, sfupar = -2,
    sfl = sfCauchy, sflpar = c(0.4, 0.75, 0.56, 0.63),
    analysis_times = analysis_times_plan
  ) |>
  gsDesignNB::toInteger()

n_planned <- gs_plan$n_total[gs_plan$k]
target_info <- gs_plan$n.I[gs_plan$k]
planned_timing <- gs_plan$timing
gs_inflation <- n_planned / design_plan$n_total
accrual_rate_plan_eff <- n_planned / accrual_dur_plan
design_note <- paste0(
  "Design: lambda1=", lambda1_plan,
  ", RR=", rr_plan,
  ", k=", k_plan,
  ", planned accrual=", round(accrual_rate_plan_eff, 1), "/mo",
  ", planned N=", n_planned,
  ", max follow-up=", max_followup, " mo"
)

```

The dedicated non-binding Type I section below uses this same $\alpha = 0.025$
group sequential design twice: once with the Wald statistic and once with the
score statistic. The power simulations use the score statistic throughout, so
the power results should be read as score-test power after SSR rather than as
Wald-test power.

The group sequential inflation factor is
`r round(gs_inflation, 3)`, giving a planned sample size of
`r n_planned` (up from the fixed-design
`r round(design_plan$n_total)`) with an effective accrual rate of
`r round(accrual_rate_plan_eff, 1)` patients per month.

```{r gs_bound_summary}
gsBoundSummary(gs_plan,
  deltaname = "RR", logdelta = TRUE,
  Nname = "Information", timename = "Month",
  digits = 4, ddigits = 2
) |>
  as.data.frame() |>
  gt() |>
  tab_header(
    title = "Planned Group Sequential Design",
    subtitle = design_note
  ) |>
  tab_footnote(
    "Cauchy futility spending gives planned futility near observed RR > 0.9 at IA1/IA2; lower bounds are non-binding."
  ) |>
  tab_footnote(
    footnote = sprintf(
      "Planned cumulative sample size: IA1 = %.0f, IA2 = %.0f, Final = %.0f.",
      gs_plan$n_total[1], gs_plan$n_total[2], gs_plan$n_total[3]
    )
  )
```

### Group sequential sample size under each nuisance scenario

The figure below shows how the required group sequential sample size and
monthly enrollment rate vary with the control rate $\lambda_1$ and
dispersion $k$.
Since the sample size does not depend on the enrollment pace (accrual rate
only scales enrollment duration), we show two lines -- one per $k$ value --
with $\lambda_1$ on the horizontal axis.
The planned design assumptions ($\lambda_1 = `r lambda1_plan`$,
$k = `r k_plan`$) are marked with a dashed line.

```{r gs_n_by_scenario, fig.width=9, fig.height=4.5}
lambda1_seq <- seq(0.2, 1.0, by = 0.1)
gs_n_grid <- expand.grid(
  lambda1_true = lambda1_seq,
  k_true = c(0.5, 1.0),
  stringsAsFactors = FALSE
)

gs_n_grid$GS_N <- sapply(seq_len(nrow(gs_n_grid)), function(i) {
  fixed_i <- tryCatch(
    sample_size_nbinom(
      lambda1 = gs_n_grid$lambda1_true[i],
      lambda2 = gs_n_grid$lambda1_true[i] * rr_plan,
      dispersion = gs_n_grid$k_true[i],
      power = power_plan, alpha = alpha_plan,
      accrual_rate = accrual_rate_plan,
      accrual_duration = accrual_dur_plan,
      trial_duration = trial_dur_plan,
      max_followup = max_followup,
      event_gap = event_gap_val,
      test_type = analysis_test_type
    ),
    error = function(e) NULL
  )
  if (is.null(fixed_i)) return(NA_real_)
  tryCatch({
    g <- gsNBCalendar(fixed_i, k = 3, test.type = 4, alpha = alpha_plan,
      sfu = sfHSD, sfupar = -2,
      sfl = sfCauchy, sflpar = c(0.4, 0.75, 0.56, 0.63),
      analysis_times = analysis_times_plan) |>
      gsDesignNB::toInteger()
    g$n_total[g$k]
  }, error = function(e) NA_real_)
})

gs_n_grid$Accrual_rate <- gs_n_grid$GS_N / accrual_dur_plan
gs_n_grid$k_label <- paste0("k = ", gs_n_grid$k_true)
gs_n_grid$N_label <- round(gs_n_grid$GS_N)
gs_n_grid$Rate_label <- round(gs_n_grid$Accrual_rate)

plot_n <- ggplot(gs_n_grid, aes(x = lambda1_true, y = GS_N,
                                color = k_label, linetype = k_label)) +
  geom_line(linewidth = 1) +
  geom_point(size = 2) +
  geom_text(aes(label = N_label), vjust = -0.8, size = 3.2, show.legend = FALSE) +
  geom_vline(xintercept = lambda1_plan, linetype = "dashed", alpha = 0.4) +
  labs(x = expression(lambda[1]), y = "GS sample size (N)",
       color = NULL, linetype = NULL) +
  theme_minimal(base_size = 13) +
  theme(legend.position = "bottom")

plot_rate <- ggplot(gs_n_grid, aes(x = lambda1_true, y = Accrual_rate,
                                   color = k_label, linetype = k_label)) +
  geom_line(linewidth = 1) +
  geom_point(size = 2) +
  geom_text(aes(label = Rate_label), vjust = -0.8, size = 3.2, show.legend = FALSE) +
  geom_vline(xintercept = lambda1_plan, linetype = "dashed", alpha = 0.4) +
  labs(x = expression(lambda[1]), y = "Enrollment rate (pts/mo)",
       color = NULL, linetype = NULL) +
  theme_minimal(base_size = 13) +
  theme(legend.position = "bottom")

gridExtra::grid.arrange(plot_n, plot_rate, ncol = 2,
  top = grid::textGrob(
    "GS Sample Size and Enrollment Rate by Control Rate and Dispersion",
    gp = grid::gpar(fontsize = 14, fontface = "bold")
  )
)
```

Across the range of $\lambda_1$ and $k$ values considered, the required
group sequential sample size ranges from
`r min(gs_n_grid$GS_N, na.rm = TRUE)` to
`r max(gs_n_grid$GS_N, na.rm = TRUE)`.
Higher dispersion ($k = 1.0$) consistently requires more subjects than
$k = 0.5$, while higher control rates reduce the required N because each
subject contributes more information.
This wide range highlights why sample size re-estimation can be valuable
when nuisance parameters are uncertain.

### Expected information fraction at planned time of each interim

This table shows the expected information fraction at each planned interim
calendar time under each nuisance scenario. Under the approximate design assumptions
(bold green row), the information fractions match the GS design reasonably well
(`r paste(round(100 * planned_timing[1:2], 1), collapse = "%, ")`%).

```{r info_fractions}
nuisance_grid <- expand.grid(
  lambda1_true = c(0.3, 0.5, 0.8),
  k_true = c(0.5, 1.0),
  accrual_true = c(12, 18, 24),
  stringsAsFactors = FALSE
)

# Accrual scenarios use effective monthly enrollment rates directly.

for (a in 1:2) {
  col_name <- paste0("IF_analysis_", a)
  nuisance_grid[[col_name]] <- sapply(seq_len(nrow(nuisance_grid)), function(i) {
    accrual_eff <- nuisance_grid$accrual_true[i]
    info_at_t <- compute_info_at_time(
      analysis_time = analysis_times_plan[a],
      accrual_rate = accrual_eff,
      accrual_duration = accrual_dur_plan,
      lambda1 = nuisance_grid$lambda1_true[i],
      lambda2 = nuisance_grid$lambda1_true[i] * rr_plan,
      dispersion = nuisance_grid$k_true[i],
      max_followup = max_followup,
      event_gap = event_gap_val
    )
    round(100 * info_at_t / target_info, 1)
  })
}

nuisance_grid |>
  gt() |>
  tab_header(
    title = "Expected Information Fraction (%) at Planned Time of Each Interim",
    subtitle = design_note
  ) |>
  cols_label(
    lambda1_true = "lambda1",
    k_true = "k",
    accrual_true = "Accrual (pts/mo)",
    IF_analysis_1 = paste0("IA 1 (mo ", analysis_times_plan[1], ")"),
    IF_analysis_2 = paste0("IA 2 (mo ", analysis_times_plan[2], ")")
  ) |>
  tab_footnote(
    footnote = paste(
      "Computed via compute_info_at_time() divided by planned final information.",
      "Accrual values (12/18/24) are effective enrollment rates used directly.",
      "Bold green = design assumptions.",
      "With information-based timing, interims occur when blinded info reaches",
      "the target fraction, so the calendar time varies by scenario."
    )
  )
```

## Scenario grid

```{r scenario_grid}
scenarios <- expand.grid(
  lambda1_true = c(0.3, 0.5, 0.8),
  k_true = c(0.5, 1.0),
  accrual_true = c(12, 18, 24),
  rr_true = c(0.6, 0.7, 0.85, 1.0, 1.1),
  stringsAsFactors = FALSE
)

n_sims_initial <- 50
n_sims_production_power <- 3600L
n_sims_production_type1 <- 20000L
n_sims_production_rr_gt1 <- 1000L
use_production <- identical(Sys.getenv("GSDESIGNNB_PRODUCTION_SSR"), "true")
# Production rep counts for Type I (non-binding) tables only, without re-running the full power grid:
use_production_type1 <- use_production ||
  identical(Sys.getenv("GSDESIGNNB_PRODUCTION_TYPE1"), "true")

scenarios$n_sims <- if (use_production) {
  as.integer(ifelse(
    scenarios$rr_true == 1,
    n_sims_production_type1,
    ifelse(scenarios$rr_true > 1, n_sims_production_rr_gt1, n_sims_production_power)
  ))
} else {
  rep(as.integer(n_sims_initial), length.out = nrow(scenarios))
}
scenarios$accrual_eff <- scenarios$accrual_true

n_max <- 2 * n_planned
min_if_futility <- 0.3
target_if <- planned_timing  # Target IF for each analysis
# IA2 adaptation gate (less strict than prior 80% / 2 months setting)
max_enrollment_frac_for_ia2 <- 1.00
min_months_to_close_for_adapt <- 2
analysis_lag_months <- 2

# Optional precomputed outputs for fast vignette builds
precomputed_basename <- paste0("ssr_sim_vignette_outputs_", analysis_test_type, ".rds")
project_root <- if (file.exists("DESCRIPTION")) "." else
  if (file.exists("../DESCRIPTION")) ".." else "."
precomputed_source_path <- file.path(
  project_root, "inst", "extdata", precomputed_basename
)
precomputed_file <- system.file("extdata", precomputed_basename, package = "gsDesignNB")
if (precomputed_file == "" && file.exists(precomputed_source_path)) {
  precomputed_file <- precomputed_source_path
}
force_rerun <- identical(Sys.getenv("GSDESIGNNB_FORCE_SSR_SIM"), "true")
use_precomputed <- (!use_production) && !force_rerun && nzchar(precomputed_file)
save_precomputed <- identical(Sys.getenv("GSDESIGNNB_SAVE_SSR_OUTPUTS"), "true")
save_precomputed_path <- precomputed_source_path
# Re-run Type I (non-binding) sims even if RDS cache exists (see inst/extdata/ssr_type1_null_*.rds)
force_type1_sim <- identical(Sys.getenv("GSDESIGNNB_FORCE_TYPE1_SIM"), "true")
type1_cache_dir <- dirname(save_precomputed_path)

cat("Scenarios:", nrow(scenarios), "| Fresh-run replicates requested:", sum(scenarios$n_sims), "\n")
if (use_precomputed) {
  cat("Rendered results use the replicate counts stored in the precomputed cache.\n")
}
cat("Accrual rates used in simulation:", paste(round(sort(unique(scenarios$accrual_true)), 1), collapse = ", "),
    "/month\n")
cat(
  "IA2 SSR gate: adaptation uses cutoff at min(IA2 time, predicted close - ",
  min_months_to_close_for_adapt,
  " months); enrollment cap <= ",
  100 * max_enrollment_frac_for_ia2,
  "%.\n",
  sep = ""
)
cat("Futility-stop sample size counts enrollment through +",
    analysis_lag_months, " months after stop.\n", sep = "")
cat("Production plan:", n_sims_production_power,
    "reps per scenario for RR < 1 (power);",
    n_sims_production_type1, "reps per scenario for RR = 1.0 (Type I);",
    n_sims_production_rr_gt1, "reps per scenario for RR > 1.\n")
cat("separate RR = 1 non-binding futility check uses", n_sims_production_type1,
    "reps per k and test statistic.\n")
if (use_precomputed) cat("Using precomputed outputs:", precomputed_file, "\n")
cat(
  "Type I (RR=1, non-binding) per-test cache: inst/extdata/ssr_type1_null_alpha025_wald.rds,",
  "ssr_type1_null_alpha025_score.rds; set GSDESIGNNB_FORCE_TYPE1_SIM=true to rebuild.\n"
)
cat(
  "Production Type I reps only (keep precomputed power grid):",
  "GSDESIGNNB_PRODUCTION_TYPE1=true\n"
)
```

## Simulation engine

The reusable adaptive-trial logic now lives in `sim_ssr_nbinom()`. This
vignette keeps the **scenario grid**, **caching**, and **visualization**
layers, but delegates the actual SSR simulation engine to the package.

Interim analyses are information-based and use dynamic spending:

1. **IA1 (~40% IF)**: efficacy/futility only; no SSR adaptation.
2. **IA2 (target ~76% IF)**: efficacy/futility evaluated using observed IF
   and spending.
3. **SSR at IA2 only**: nuisance estimates for adaptation use an IA2
   adaptation cutoff time at
   $\min(\text{IA2 time},\; \text{predicted enrollment close} -
   `r min_months_to_close_for_adapt` \text{ months})$, enforcing at least
   `r min_months_to_close_for_adapt` months of operational lead-time.
   The enrollment-fraction cap is set in the scenario setup.

At each analysis, bounds are recalculated from observed information and
spending time. The blinded NB information estimate from
`calculate_blinded_info()` is used to determine interim timing.
If this estimate is unavailable or non-positive, timing falls back to
unblinded information from `mutze_test()` (tracked as fallback counts
in the results table).

```{r simulation_engine}
make_enroll_rate <- function(accrual_eff) {
  data.frame(rate = accrual_eff, duration = n_max / accrual_eff)
}

make_fail_rate <- function(lambda1_true, rr_true, k_true) {
  data.frame(
    treatment = c("Control", "Experimental"),
    rate = c(lambda1_true, lambda1_true * rr_true),
    dispersion = k_true
  )
}

dropout_rate_sim <- data.frame(
  treatment = c("Control", "Experimental"),
  rate = c(0, 0),
  duration = c(100, 100)
)

run_scenario <- function(sc_idx) {
  sc <- scenarios[sc_idx, ]
  message(sprintf(
    "Starting scenario %d / %d: RR=%.2f, lambda1=%.2f, k=%.2f, accrual=%.1f",
    sc_idx, nrow(scenarios), sc$rr_true, sc$lambda1_true, sc$k_true, sc$accrual_true
  ))

  sim_res <- sim_ssr_nbinom(
    n_sims = sc$n_sims,
    enroll_rate = make_enroll_rate(sc$accrual_eff),
    fail_rate = make_fail_rate(sc$lambda1_true, sc$rr_true, sc$k_true),
    dropout_rate = dropout_rate_sim,
    max_followup = max_followup,
    design = gs_plan,
    n_max = n_max,
    min_if_futility = min_if_futility,
    max_enrollment_frac_for_adapt = max_enrollment_frac_for_ia2,
    min_months_to_close_for_adapt = min_months_to_close_for_adapt,
    analysis_lag_months = analysis_lag_months,
    event_gap = event_gap_val,
    ignore_futility = FALSE,
    test_type = analysis_test_type,
    metadata = list(
      lambda1_true = sc$lambda1_true,
      k_true = sc$k_true,
      accrual_true = sc$accrual_true,
      accrual_eff = sc$accrual_eff,
      analysis_test = analysis_test_type,
      rr_true = sc$rr_true
    ),
    seed = 1000 + sc_idx
  )

  sim_res$trial_results
}

run_null_type1_sims <- function(gs_plan_x, alpha_plan_x, null_n, test_type_x) {
  null_all <- vector("list", length(null_k_scenarios))

  for (i in seq_along(null_k_scenarios)) {
    k_null <- null_k_scenarios[i]
    sim_res <- sim_ssr_nbinom(
      n_sims = null_n,
      enroll_rate = make_enroll_rate(accrual_scenario_plan),
      fail_rate = make_fail_rate(lambda1_plan, 1.0, k_null),
      dropout_rate = dropout_rate_sim,
      max_followup = max_followup,
      design = gs_plan_x,
      n_max = n_max,
      min_if_futility = min_if_futility,
      max_enrollment_frac_for_adapt = max_enrollment_frac_for_ia2,
      min_months_to_close_for_adapt = min_months_to_close_for_adapt,
      analysis_lag_months = analysis_lag_months,
      event_gap = event_gap_val,
      ignore_futility = TRUE,
      test_type = test_type_x,
      metadata = list(k_true = k_null, analysis_test = test_type_x),
      seed = 5000 + i
    )
    null_all[[i]] <- sim_res$trial_results
  }

  null_all <- Filter(Negate(is.null), null_all)
  if (length(null_all) == 0) {
    return(data.table(
      k_true = numeric(0), analysis_test = character(0), strategy = character(0),
      n_sims = integer(0), type1_error = numeric(0),
      cross_ia1 = numeric(0), cross_ia2 = numeric(0), cross_final = numeric(0),
      mean_n = numeric(0), adapted_rate = numeric(0)
    ))
  }

  null_dt <- rbindlist(null_all)
  sm <- summarize_ssr_sim(null_dt, by = c("k_true", "strategy"))$trial_summary
  sm <- as.data.frame(sm)
  sm$type1_error <- sm$rejection_rate
  sm$adapted_rate <- sm$pct_adapted / 100
  sm[, c(
    "k_true", "strategy", "n_sims", "type1_error",
    "cross_ia1", "cross_ia2", "cross_final", "mean_n", "adapted_rate"
  )]
  sm$analysis_test <- test_type_x
  sm[, c(
    "k_true", "analysis_test", "strategy", "n_sims", "type1_error",
    "cross_ia1", "cross_ia2", "cross_final", "mean_n", "adapted_rate"
  )]
}
```

## Running simulations

```{r run_simulations, warning=FALSE}
precomputed_outputs <- NULL
if (use_precomputed) {
  precomputed_outputs <- readRDS(precomputed_file)
  all_results <- as.data.frame(precomputed_outputs$plot_data)
  sim_runtime_seconds <- precomputed_outputs$sim_runtime_seconds
  workers <- if (!is.null(precomputed_outputs$workers)) {
    as.integer(precomputed_outputs$workers)
  } else {
    max(1L, future::availableCores() - 1L)
  }
  sim_mode <- "Loaded precomputed outputs"
} else {
  sim_start <- Sys.time()
  workers <- max(1L, future::availableCores() - 1L)
  old_plan <- future::plan()
  future::plan(future::multisession, workers = workers)

  all_results <- lapply(seq_len(nrow(scenarios)), run_scenario)

  future::plan(old_plan)
  all_results <- Filter(Negate(is.null), all_results)
  all_results <- do.call(rbind, all_results)
  sim_runtime_seconds <- as.numeric(difftime(Sys.time(), sim_start, units = "secs"))
  sim_mode <- "Fresh simulation"
}

# Backward-compatible defaults for precomputed files from older vignette versions
required_cols <- c(
  "ia2_adapt_cut_time",
  "ia2_enroll_frac", "ia2_months_to_close_pred",
  "ia2_adapt_allowed", "ia2_adapt_applied"
)
missing_cols <- setdiff(required_cols, names(all_results))
if (length(missing_cols) > 0) {
  for (nm in missing_cols) all_results[[nm]] <- NA
}

cat("Simulation mode:", sim_mode, "\n")
cat("Workers:", workers, "\n")
cat("Rows:", nrow(all_results), "\n")
if (!is.null(sim_runtime_seconds) && is.finite(sim_runtime_seconds)) {
  cat(sprintf("Simulation wall time: %.2f minutes (%.1f seconds)\n",
              sim_runtime_seconds / 60, sim_runtime_seconds))
}

# RR = 1.0 non-binding futility check (Type I), comparing Wald and score tests
# under the same nominal alpha = 0.025 design. Power simulations use the score
# test selected by analysis_test_type.
# Results cache: inst/extdata/ssr_type1_null_alpha025_wald.rds and
# ssr_type1_null_alpha025_score.rds
# Re-run after changing sim logic or rep count: Sys.setenv(GSDESIGNNB_FORCE_TYPE1_SIM = "true")
null_nonbinding_n <- if (use_production_type1) n_sims_production_type1 else n_sims_initial
null_k_scenarios <- c(k_plan, 1.0)

type1_tests <- list(
  list(
    key = "alpha025_wald",
    label = "Wald",
    test_type = "wald",
    gs_plan = gs_plan,
    alpha_plan = alpha_plan
  ),
  list(
    key = "alpha025_score",
    label = "Score",
    test_type = "score",
    gs_plan = gs_plan,
    alpha_plan = alpha_plan
  )
)

null_nonbinding_by_test <- list()
null_nonbinding_runtime_by_test <- list()
null_nonbinding_mode_by_test <- list()

bundle_type1_ok <- isTRUE(use_precomputed) &&
  is.list(precomputed_outputs$null_nonbinding_by_test) &&
  all(c("Wald", "Score") %in% names(precomputed_outputs$null_nonbinding_by_test))

if (bundle_type1_ok) {
  null_nonbinding_by_test[["Wald"]] <-
    as.data.table(precomputed_outputs$null_nonbinding_by_test[["Wald"]])
  null_nonbinding_by_test[["Score"]] <-
    as.data.table(precomputed_outputs$null_nonbinding_by_test[["Score"]])
  br <- precomputed_outputs$null_nonbinding_runtime_by_test
  if (!is.null(br)) {
    null_nonbinding_runtime_by_test <- as.list(br)
  } else {
    null_nonbinding_runtime_by_test <- list(
      "Wald" = precomputed_outputs$null_nonbinding_runtime_seconds,
      "Score" = NA_real_
    )
  }
  null_nonbinding_mode_by_test <- list("Wald" = "Precomputed bundle", "Score" = "Precomputed bundle")
} else {
  dir.create(type1_cache_dir, recursive = TRUE, showWarnings = FALSE)
  for (td in type1_tests) {
    cache_path <- file.path(type1_cache_dir, paste0("ssr_type1_null_", td$key, ".rds"))
    loaded <- FALSE
    if (!force_type1_sim && file.exists(cache_path)) {
      cr <- tryCatch(readRDS(cache_path), error = function(e) NULL)
      same_n <- is.list(cr) && isTRUE(
        as.integer(cr$null_nonbinding_n) == as.integer(null_nonbinding_n)
      )
      if (same_n) {
        null_nonbinding_by_test[[td$label]] <- as.data.table(cr$summary)
        null_nonbinding_runtime_by_test[[td$label]] <- cr$runtime_seconds
        null_nonbinding_mode_by_test[[td$label]] <- "Cached RDS"
        loaded <- TRUE
      }
    }
    if (!loaded) {
      cat("Running Type I (non-binding) sims:", td$label, "test, alpha =", td$alpha_plan, "\n")
      t0 <- Sys.time()
      sm <- run_null_type1_sims(td$gs_plan, td$alpha_plan, null_nonbinding_n, td$test_type)
      rt <- as.numeric(difftime(Sys.time(), t0, units = "secs"))
      null_nonbinding_by_test[[td$label]] <- sm
      null_nonbinding_runtime_by_test[[td$label]] <- rt
      null_nonbinding_mode_by_test[[td$label]] <- "Fresh simulation"
      saveRDS(
        list(
          summary = as.data.frame(sm),
          runtime_seconds = rt,
          null_nonbinding_n = null_nonbinding_n,
          alpha_design = td$alpha_plan,
          test_type = td$test_type,
          generated_at = as.character(Sys.time())
        ),
        cache_path
      )
      cat("  Saved:", cache_path, "\n")
    }
  }
}

null_nonbinding_summary <- null_nonbinding_by_test[[analysis_test_label]]
null_nonbinding_runtime_seconds <- null_nonbinding_runtime_by_test[[analysis_test_label]]
null_nonbinding_mode <- paste0(
  "Wald: ", null_nonbinding_mode_by_test[["Wald"]],
  " | Score: ", null_nonbinding_mode_by_test[["Score"]]
)

cat("RR=1 non-binding Type I modes:", null_nonbinding_mode, "\n")
cat("RR=1 non-binding replications per k:", null_nonbinding_n, "\n")
cat("k scenarios:", paste(null_k_scenarios, collapse = ", "), "\n")
```

## Results

```{r summarize}
dt <- as.data.table(all_results)
summary_dt <- summarize_ssr_sim(
  dt,
  by = c("lambda1_true", "k_true", "accrual_true", "rr_true", "strategy")
)$trial_summary |>
  as.data.table()
```

```{r save_precomputed_outputs, eval=save_precomputed}
plot_data <- dt[, .(
  lambda1_true, k_true, accrual_true, rr_true, analysis_test, strategy,
  reject, futility, reject_stage, futility_stage,
  n_adapted, adapted,
  participants_with_events_stop, events_observed_stop,
  if_ia1, if_ia2, if_final,
  ia1_time, ia2_time, final_time,
  ia1_fallback, ia2_fallback,
  participants_with_events_ia1, participants_with_events_ia2,
  participants_with_events_final,
  events_observed_ia1, events_observed_ia2, events_observed_final,
  adapt_cut_time, adapt_enroll_frac, adapt_months_to_close_pred,
  adapt_allowed, adapt_applied,
  ia2_adapt_cut_time,
  ia2_enroll_frac, ia2_months_to_close_pred,
  ia2_adapt_allowed, ia2_adapt_applied
)]

dir.create(dirname(save_precomputed_path), recursive = TRUE, showWarnings = FALSE)
saveRDS(
  list(
    plot_data = as.data.frame(plot_data),
    sim_runtime_seconds = sim_runtime_seconds,
    workers = workers,
    null_nonbinding_summary = as.data.frame(null_nonbinding_summary),
    null_nonbinding_by_test = lapply(null_nonbinding_by_test, as.data.frame),
    null_nonbinding_n = null_nonbinding_n,
    null_nonbinding_runtime_seconds = null_nonbinding_runtime_seconds,
    null_nonbinding_runtime_by_test = null_nonbinding_runtime_by_test,
    generated_at = as.character(Sys.time()),
    settings = list(
      analysis_test_type = analysis_test_type,
      n_sims_initial = n_sims_initial,
      n_sims_production_power = n_sims_production_power,
      n_sims_production_type1 = n_sims_production_type1,
      n_sims_production_rr_gt1 = n_sims_production_rr_gt1,
      use_production = use_production,
      use_production_type1 = use_production_type1,
      design_note = design_note
    )
  ),
  save_precomputed_path
)
cat("Saved precomputed vignette outputs to:", save_precomputed_path, "\n")
```

```{r runtime_summary}
scenario_rep_counts <- unique(summary_dt[, .(
  lambda1_true, k_true, accrual_true, rr_true, n_sims
)])
runtime_df <- data.frame(
  Metric = c("Simulation mode", "Workers", "Scenarios", "Replicates", "Rows", "Wall time (minutes)"),
  Value = c(
    sim_mode,
    as.character(workers),
    nrow(scenario_rep_counts),
    sum(scenario_rep_counts$n_sims),
    nrow(all_results),
    if (!is.null(sim_runtime_seconds) && is.finite(sim_runtime_seconds))
      sprintf("%.2f", sim_runtime_seconds / 60) else "NA"
  )
)

runtime_df |>
  gt() |>
  tab_header(
    title = "Simulation Runtime",
    subtitle = "Use precomputed outputs to avoid rerunning on pkgdown/CI/CRAN builds"
  )
```

```{r precision_note, results='asis', echo=FALSE}
power_reps <- unique(summary_dt[rr_true < 1, n_sims])
min_power_reps <- min(power_reps, na.rm = TRUE)
mcse_at_90 <- sqrt(0.9 * 0.1 / min_power_reps)

if (is.finite(min_power_reps) && min_power_reps < 1000) {
  cat(
    "**Precision note.** The bundled non-null power grid uses ",
    min_power_reps,
    " replicates per cell (Monte Carlo SE about ",
    scales::percent(mcse_at_90, accuracy = 0.1),
    " near 90% power), so the power plots below are directional diagnostics. ",
    "The dedicated RR = 1 Type I tables use the production-scale cache shown in their subtitles. ",
    "Set `GSDESIGNNB_PRODUCTION_SSR=true` and `GSDESIGNNB_SAVE_SSR_OUTPUTS=true` to regenerate ",
    "and store the larger non-null power grid.\n\n",
    sep = ""
  )
}
```

```{r null_nonbinding_summary, results='asis'}
# Same nominal one-sided alpha = 0.025 design; compare Wald and score tests.
for (test_label in c("Wald", "Score")) {
  cat("\n\n### Type I error table: ", test_label, " test, alpha = 0.025\n\n", sep = "")
  null_df <- as.data.frame(null_nonbinding_by_test[[test_label]])
  if (!"k_true" %in% names(null_df)) null_df$k_true <- NA_real_
  rt_min <- null_nonbinding_runtime_by_test[[test_label]]
  rt_str <- if (is.finite(rt_min)) paste0(round(rt_min / 60, 2), " min") else "NA"
  null_display <- null_df |>
    dplyr::transmute(
      k = k_true,
      Strategy = strategy,
      `Type I error` = round(type1_error, 4),
      `IA1` = round(cross_ia1, 4),
      `IA2` = round(cross_ia2, 4),
      `Final` = round(cross_final, 4),
      `Mean N` = round(mean_n, 1),
      `SSR applied (%)` = round(100 * adapted_rate, 1)
    )

  tab <- null_display |>
    gt() |>
    tab_header(
      title = paste0("Type I Error Under RR = 1.0: ", test_label, " Test"),
      subtitle = paste0(
        "Nominal one-sided alpha: 0.025 | ",
        "Replications per k: ", null_nonbinding_n,
        " | ", null_nonbinding_mode_by_test[[test_label]],
        " | Runtime: ", rt_str
      )
    ) |>
    tab_spanner(label = "Efficacy crossing at", columns = c("IA1", "IA2", "Final")) |>
    tab_footnote(
      paste(
        "Futility stopping is ignored (non-binding) so all trials continue to",
        "the final analysis unless stopped for efficacy.",
        "'SSR applied' is the percentage of trials where the adapted N exceeded",
        "the planned N (planning k =", k_plan, ").",
        "Under the null, SSR may still increase N because",
        "nuisance parameter estimates can differ from planning values.",
        "Both tables use the same group-sequential design built at nominal alpha = 0.025;",
        "the only intended difference is the final/interim test statistic.",
        "Power results elsewhere use the score test."
      )
    )
  print(tab)
}
```

### Starting sample-size sensitivity

The main SSR production cache is based on the score-sized fixed design, but in
the primary planning example the Wald and score formulas round to the same
starting sample size. To check whether the fixed-design power margin from Wald
sizing carries through an adaptive SSR workflow, we add a targeted sensitivity
in a low-event stress setting
($\lambda_1 = 0.15$, $k = 0.5$, RR = 0.7, no event gap). In that setting the
Wald-sized group sequential design enrolls 472 participants at the final
analysis, compared with 464 for the score-sized design. Both starting-size
rules below still use the score test for interim and final analyses; only the
starting design is changed.

```{r ssr_sizing_sensitivity}
sizing_sens_source_path <- file.path(
  project_root, "inst", "extdata", "ssr_sizing_sensitivity.rds"
)
sizing_sens_file <- system.file(
  "extdata", "ssr_sizing_sensitivity.rds", package = "gsDesignNB"
)
if (sizing_sens_file == "" && file.exists(sizing_sens_source_path)) {
  sizing_sens_file <- sizing_sens_source_path
}

if (sizing_sens_file == "") {
  cat(
    "The supplemental SSR sizing-sensitivity cache is not available. ",
    "Run `Rscript data-raw/generate_ssr_sizing_sensitivity.R` to regenerate it.\n"
  )
} else {
  sizing_sens <- readRDS(sizing_sens_file)
  sizing_sens_dt <- as.data.table(sizing_sens$summary)
  sizing_sens_dt[, Metric := fifelse(
    rr_true == 1,
    "Type I error (RR = 1.0; non-binding futility)",
    paste0("Power (RR = ", rr_true, ")")
  )]
  sizing_sens_dt[, `Starting design` := tools::toTitleCase(starting_sizing)]
  sizing_sens_dt[, `Estimate` := rejection_rate]
  sizing_sens_dt[, `SSR applied (%)` := pct_adapted]

  sizing_sens_display <- sizing_sens_dt[
    strategy %in% c("No adaptation", "Blinded SSR", "Unblinded SSR"),
    .(
      Metric,
      `Starting design`,
      Strategy = strategy,
      `Fixed N` = fixed_n,
      `GS N` = gs_n,
      Estimate = round(Estimate, 4),
      MCSE = round(mcse, 4),
      `Mean N` = round(mean_n, 1),
      `SSR applied (%)` = round(`SSR applied (%)`, 1)
    )
  ]
  setorder(sizing_sens_display, Metric, `Starting design`, Strategy)

  sizing_sens_display |>
    gt(groupname_col = "Metric") |>
    tab_header(
      title = "Supplemental SSR Starting-Size Sensitivity",
      subtitle = paste(
        "Score final test; Wald-sized GS N =",
        max(sizing_sens_display$`GS N`),
        "vs score-sized GS N =",
        min(sizing_sens_display$`GS N`)
      )
    ) |>
    tab_footnote(
      paste(
        "This targeted sensitivity uses a lower-event stress setting and is",
        "intended to check the direction of the starting-size recommendation,",
        "not to replace the full SSR production grid."
      )
    )
}
```

In this stress setting, the score-test Type I estimates remain close to nominal
for both starting-size rules: 0.0252--0.0258 with the Wald-sized start and
0.0258--0.0266 with the score-sized start, with MCSE about 0.0022. At RR = 0.7,
the power estimates are also similar across starting-size rules and within
Monte Carlo uncertainty. The Wald-sized start increases the group sequential
sample size from 464 to 472, but it does not produce a clear additional power
advantage in this SSR setting. This supports the practical recommendation to
simulate the specific SSR rule rather than assuming that a fixed-design sizing
margin will automatically translate into adaptive-design power.

### Power by rate ratio and SSR strategy

```{r power_curves, fig.width=10, fig.height=6}
power_avg <- dt[rr_true < 1.0, .(
  power = mean(reject, na.rm = TRUE),
  mean_final_if = mean(if_final, na.rm = TRUE),
  mean_final_month = mean(final_time, na.rm = TRUE)
), by = .(rr_true, strategy)]

ggplot(power_avg, aes(x = rr_true, y = power,
                       color = strategy, linetype = strategy)) +
  geom_line(linewidth = 1) +
  geom_point(size = 2.5) +
  geom_hline(yintercept = power_plan, linetype = "dashed", alpha = 0.4) +
  scale_y_continuous(
    limits = c(0, 1),
    breaks = seq(0, 1, 0.2),
    labels = scales::percent
  ) +
  scale_x_continuous(breaks = seq(0.5, 0.9, 0.1)) +
  labs(
    title = "Power by Rate Ratio and SSR Strategy",
    subtitle = paste("Averaged across nuisance scenarios |", design_note),
    x = "True Rate Ratio", y = "Power",
    color = "Strategy", linetype = "Strategy"
  ) +
  theme_minimal(base_size = 13) +
  theme(legend.position = "bottom")
```

```{r no_adapt_power_loss, fig.width=10, fig.height=5}
power_rr_plan <- summary_dt[
  rr_true == rr_plan &
    strategy %in% c("No adaptation", "Blinded SSR", "Unblinded SSR")
]
power_rr_plan[, strategy := factor(strategy,
  levels = c("No adaptation", "Blinded SSR", "Unblinded SSR"))]
power_rr_plan[, k_label := paste0("k = ", k_true)]
power_rr_plan[, accrual_label := paste0(accrual_true, " pts/mo")]

ggplot(power_rr_plan,
       aes(x = lambda1_true, y = rejection_rate,
           color = strategy, shape = strategy)) +
  geom_line(linewidth = 0.9) +
  geom_point(size = 2.5) +
  geom_hline(yintercept = power_plan, linetype = "dashed", alpha = 0.4) +
  facet_grid(k_label ~ accrual_label) +
  scale_y_continuous(limits = c(0, 1), breaks = seq(0, 1, 0.2),
                     labels = scales::percent) +
  labs(
    title = sprintf("Power at Planned RR (%.1f) by Nuisance Scenario", rr_plan),
    subtitle = paste("Dashed line = target power", scales::percent(power_plan),
                     "|", design_note),
    x = expression(lambda[1]~(true)), y = "Power",
    color = "Strategy", shape = "Strategy"
  ) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "bottom")
```

### Power by nuisance scenario {.tabset}

```{r power_by_nuisance_tabs, results='asis', fig.width=13, fig.height=10, echo=FALSE}
rr_values <- sort(unique(dt$rr_true[dt$rr_true <= 1]))

for (rr_val in rr_values) {
  cat(sprintf("\n\n#### RR = %s\n\n", rr_val))

  stage_dt <- dt[rr_true == rr_val, .(
    `Cross IA1` = mean(reject_stage == "IA1", na.rm = TRUE),
    `Cross IA2` = mean(reject_stage == "IA2", na.rm = TRUE),
    `Cross Final` = mean(reject_stage == "Final", na.rm = TRUE),
    `No cross Final` = mean(reject_stage == "No reject" & !futility, na.rm = TRUE),
    `Futility IA2` = mean(futility_stage == "IA2", na.rm = TRUE),
    `Futility IA1` = mean(futility_stage == "IA1", na.rm = TRUE)
  ), by = .(lambda1_true, k_true, accrual_true, strategy)]

  stage_long <- data.table::melt(
    stage_dt,
    id.vars = c("lambda1_true", "k_true", "accrual_true", "strategy"),
    variable.name = "outcome",
    value.name = "prob"
  )
  stage_long[, outcome := factor(outcome, levels = c(
    "Cross IA1", "Cross IA2", "Cross Final",
    "No cross Final", "Futility IA2", "Futility IA1"
  ))]
  stage_long[, strategy := factor(strategy,
    levels = c("No adaptation", "Blinded SSR", "Unblinded SSR"))]
  stage_long[, lambda1_label := paste0("lambda1 = ", lambda1_true)]
  stage_long[, k_label := paste0("k = ", k_true)]
  stage_long[, accrual_label := paste0("accrual = ", accrual_true, "/mo")]

  if (nrow(stage_long) == 0) next

  p <- ggplot(stage_long, aes(x = strategy, y = prob, fill = outcome)) +
    geom_col(width = 0.75, color = "white", linewidth = 0.2,
             position = position_stack(reverse = TRUE)) +
    facet_grid(k_label + accrual_label ~ lambda1_label) +
    scale_fill_manual(
      values = c(
        "Cross IA1" = "#9ecae1", "Cross IA2" = "#4292c6", "Cross Final" = "#084594",
        "No cross Final" = "#fc9272", "Futility IA2" = "#de2d26", "Futility IA1" = "#a50f15"
      )
    ) +
    scale_y_continuous(breaks = seq(0, 1, 0.2), labels = scales::percent_format()) +
    labs(
      title = sprintf("Trial Outcomes by Analysis Stage (RR = %s)", rr_val),
      subtitle = design_note,
      x = "Strategy", y = "Probability", fill = "Outcome"
    ) +
    theme_minimal(base_size = 12) +
    theme(legend.position = "bottom",
          axis.text.x = element_text(angle = 20, hjust = 1))
  print(p)
  cat("\n\n")
}
```

### Adapted sample size distribution {.tabset}

```{r adapted_n_tabs, results='asis', fig.width=13, fig.height=10, echo=FALSE}
GeomHalfViolin <- ggplot2::ggproto(
  "GeomHalfViolin",
  ggplot2::GeomViolin,
  draw_group = function(data, ..., side = "l", draw_quantiles = NULL) {
    data <- transform(
      data,
      xminv = x - violinwidth * (x - xmin),
      xmaxv = x + violinwidth * (xmax - x)
    )
    newdata <- if (side == "l") {
      transform(data, x = xminv)[order(data$y), ]
    } else {
      transform(data, x = xmaxv)[order(data$y, decreasing = TRUE), ]
    }
    newdata <- rbind(newdata[1, ], newdata, newdata[nrow(newdata), ], newdata[1, ])
    ggplot2:::ggname(
      "geom_half_violin",
      ggplot2::GeomPolygon$draw_panel(newdata, ...)
    )
  }
)

geom_half_violin <- function(mapping = NULL, data = NULL, stat = "ydensity",
                             position = "identity", ..., side = c("l", "r"),
                             trim = TRUE, scale = "area", na.rm = FALSE,
                             show.legend = NA, inherit.aes = TRUE) {
  side <- match.arg(side)
  ggplot2::layer(
    data = data,
    mapping = mapping,
    stat = stat,
    geom = GeomHalfViolin,
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(side = side, trim = trim, scale = scale, na.rm = na.rm, ...)
  )
}

for (rr_val in rr_values) {
  cat(sprintf("\n\n#### RR = %s\n\n", rr_val))

  violin_dt <- dt[rr_true == rr_val & strategy %in% c("Blinded SSR", "Unblinded SSR")]
  violin_dt[, strategy := factor(strategy, levels = c("Unblinded SSR", "Blinded SSR"))]
  violin_dt[, lambda1_label := paste0("lambda1 = ", lambda1_true)]
  violin_dt[, k_label := paste0("k = ", k_true)]
  violin_dt[, accrual_label := factor(accrual_true)]

  p <- ggplot(violin_dt, aes(x = accrual_label, y = n_adapted)) +
    geom_half_violin(
      data = violin_dt[strategy == "Unblinded SSR"],
      aes(fill = strategy, group = interaction(accrual_label, strategy)),
      side = "l", alpha = 0.7, scale = "width", trim = FALSE, color = NA
    ) +
    geom_half_violin(
      data = violin_dt[strategy == "Blinded SSR"],
      aes(fill = strategy, group = interaction(accrual_label, strategy)),
      side = "r", alpha = 0.7, scale = "width", trim = FALSE, color = NA
    ) +
    geom_hline(yintercept = n_planned, linetype = "dashed", alpha = 0.5) +
    geom_hline(yintercept = n_max, linetype = "dotted", color = "red",
               alpha = 0.5) +
    scale_fill_manual(
      values = c("Unblinded SSR" = "#084594", "Blinded SSR" = "#9ecae1"),
      breaks = c("Unblinded SSR", "Blinded SSR")
    ) +
    facet_grid(k_label ~ lambda1_label) +
    scale_y_continuous(breaks = seq(0, n_max + 50, 50)) +
    labs(
      title = sprintf("Adapted Sample Size at RR = %s", rr_val),
      subtitle = sprintf("Split violins: Unblinded (left) vs Blinded (right) SSR | Dashed=planned N (%d), Red dotted=cap (%d) | %s",
                          n_planned, n_max, design_note),
      x = "Accrual Rate (pts/month)", y = "Adapted N", fill = "Strategy"
    ) +
    theme_minimal(base_size = 12) +
    theme(legend.position = "bottom")
  print(p)
  cat("\n\n")
}
```


### Expected sample size at study stop {.tabset}

```{r expected_n_tabs, results='asis', fig.width=10, fig.height=7, echo=FALSE}
for (rr_val in rr_values) {
  cat(sprintf("\n\n#### RR = %s\n\n", rr_val))

  en_dt <- dt[rr_true == rr_val]
  en_dt[, strategy := factor(strategy,
    levels = c("No adaptation", "Blinded SSR", "Unblinded SSR"))]
  en_dt[, lambda1_label := paste0("lambda1 = ", lambda1_true)]
  en_dt[, k_label := paste0("k = ", k_true)]

  en_summary <- en_dt[, .(
    mean_n = mean(n_adapted, na.rm = TRUE),
    sd_n = sd(n_adapted, na.rm = TRUE)
  ), by = .(lambda1_true, k_true, accrual_true, strategy,
            lambda1_label, k_label)]

  p <- ggplot(en_summary,
    aes(x = factor(accrual_true), y = mean_n,
        fill = strategy)) +
    geom_col(position = position_dodge(width = 0.7), width = 0.6, alpha = 0.85) +
    geom_hline(yintercept = n_planned, linetype = "dashed", alpha = 0.5) +
    facet_grid(k_label ~ lambda1_label) +
    labs(
      title = sprintf("Expected Sample Size at Study Stop (RR = %s)", rr_val),
      subtitle = paste("Dashed = planned N;",
        "interim stops count enrollment through cutoff + 2 months |",
        design_note),
      x = "Accrual rate (pts/mo)", y = "Mean sample size", fill = "Strategy"
    ) +
    theme_minimal(base_size = 12) +
    theme(legend.position = "bottom")
  print(p)
  cat("\n\n")
}
```

### Calendar and information at each analysis

The plots below use split (side-by-side) violin distributions for
$k = 0.5$ and $k = 1.0$, with vertical panels for IA1, IA2, and
Final analysis. Panels use free y-scales.

```{r analysis_time_violin, fig.width=12, fig.height=10}
time_long <- dt[strategy == "No adaptation" & rr_true == rr_plan, .(
  lambda1_true, k_true, accrual_true,
  IA1 = ia1_time, IA2 = ia2_time, Final = final_time
)]
time_long <- data.table::melt(
  time_long,
  id.vars = c("lambda1_true", "k_true", "accrual_true"),
  variable.name = "analysis",
  value.name = "calendar_time"
)
time_long[, analysis := factor(analysis, levels = c("IA1", "IA2", "Final"))]
time_long[, lambda1_label := paste0("lambda1 = ", lambda1_true)]
time_long[, k_label := paste0("k = ", k_true)]

planned_time_df <- data.frame(
  analysis = factor(c("IA1", "IA2", "Final"), levels = c("IA1", "IA2", "Final")),
  planned_time = c(analysis_times_plan[1], analysis_times_plan[2], analysis_times_plan[3])
)

ggplot(time_long,
       aes(x = factor(accrual_true), y = calendar_time, fill = factor(k_true))) +
  geom_violin(position = position_dodge(width = 0.85),
              alpha = 0.7, scale = "width", trim = FALSE) +
  geom_hline(
    data = planned_time_df,
    aes(yintercept = planned_time),
    linetype = "dashed", color = "darkgreen", inherit.aes = FALSE
  ) +
  facet_grid(analysis ~ lambda1_label, scales = "free_y") +
  scale_fill_manual(values = c("0.5" = "#6BAED6", "1" = "#2171B5")) +
  labs(
    title = "Calendar Time Distribution by Analysis (RR = 0.7, No adaptation)",
    subtitle = paste("Dashed green = planned analysis time |", design_note),
    x = "Accrual rate (pts/month)",
    y = "Calendar month",
    fill = "k"
  ) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "bottom")
```

```{r analysis_info_violin, fig.width=12, fig.height=10}
if_long <- dt[strategy == "No adaptation" & rr_true == rr_plan, .(
  lambda1_true, k_true, accrual_true,
  IA1 = 100 * if_ia1, IA2 = 100 * if_ia2, Final = 100 * if_final
)]
if_long <- data.table::melt(
  if_long,
  id.vars = c("lambda1_true", "k_true", "accrual_true"),
  variable.name = "analysis",
  value.name = "info_fraction"
)
if_long[, analysis := factor(analysis, levels = c("IA1", "IA2", "Final"))]
if_long[, lambda1_label := paste0("lambda1 = ", lambda1_true)]

planned_if_df <- data.frame(
  analysis = factor(c("IA1", "IA2", "Final"), levels = c("IA1", "IA2", "Final")),
  planned_if = 100 * c(planned_timing[1], planned_timing[2], 1)
)

ggplot(if_long,
       aes(x = factor(accrual_true), y = info_fraction, fill = factor(k_true))) +
  geom_violin(position = position_dodge(width = 0.85),
              alpha = 0.7, scale = "width", trim = FALSE) +
  geom_hline(
    data = planned_if_df,
    aes(yintercept = planned_if),
    linetype = "dashed", color = "darkgreen", inherit.aes = FALSE
  ) +
  facet_grid(analysis ~ lambda1_label, scales = "free_y") +
  scale_fill_manual(values = c("0.5" = "#6BAED6", "1" = "#2171B5")) +
  labs(
    title = "Information Fraction Distribution by Analysis (RR = 0.7, No adaptation)",
    subtitle = paste("Dashed green = planned information fraction |", design_note),
    x = "Accrual rate (pts/month)",
    y = "Information fraction (%)",
    fill = "k"
  ) +
  theme_minimal(base_size = 12) +
  theme(legend.position = "bottom")
```

## Discussion

### Key findings

1. **Type I error.** Under the null (RR $\geq$ 1.0), nominal one-sided control
   is 2.5%. The dedicated non-binding Type I tables use 20 000 replicates per
   dispersion and test statistic. With the Wald statistic, empirical Type I
   error ranges from about 2.8% to 3.1%, indicating mild finite-sample
   inflation. With the score statistic under the same $\alpha = 0.025$ group
   sequential design, empirical Type I error ranges from about 2.1% to 2.3%.
   The score-test result is conservative rather than inflated, so the
   simulation-based recommendation is to use the score test directly and then
   check power, rather than lowering alpha for a Wald analysis.
   This distinction is more important in SSR than in a fixed design because
   adaptation can add information; the score test controls how that additional
   information is converted into rejection probability.

2. **Largest no-adaptation power deficit.** At planned RR = 0.7, the score-test
   production grid shows no-adaptation power ranging from about 70% to 90%.
   The largest deficit from the 90% target is concentrated in scenarios with
   lower event rates and larger dispersion (see the "Power Shortfall Without
   Adaptation" table).

3. **IA2-only adaptation recovers power where information is low.** In lower
   event-rate / higher-dispersion scenarios, IA2 adaptation raises final
   information and materially improves power versus no adaptation. Across the
   RR = 0.7 scenarios, blinded and unblinded SSR both average about 89% power
   with the score test, versus about 81% for no adaptation. The score test is
   intentionally conservative, so final designs should still verify whether a
   small information margin is needed to reach the desired power.
   Large adapted sample sizes in high-dispersion scenarios reflect the
   nuisance-parameter update and adaptation cap, not a failure of blinding.
   Compared with Wald testing, the score-test analysis makes those adaptations
   look better calibrated because the null rejection rate remains conservative
   rather than inflated.

4. **Starting-size margins need SSR-specific confirmation.** The supplemental
   low-event sensitivity compares Wald- and score-sized starting designs under
   the same score final test. Although the Wald-sized group sequential design
   starts with eight additional participants, its power is not materially higher
   than the score-sized design in this adaptive setting, and both starts have
   score-test Type I estimates close to nominal. Thus the fixed-design
   recommendation to consider Wald sizing as a power margin should be checked
   within the planned SSR rule.

### Information-based interim timing

By triggering interims when blinded information reaches the target fraction
(rather than at fixed calendar times), the information fractions are
stabilized across scenarios. This prevents the anomaly where high event rates
cause excessive information at a fixed calendar time, leading to overspending
or premature decisions.

In this update, IA2 remains information-based while adaptation uses a
lead-time cutoff of at least
`r min_months_to_close_for_adapt` months before predicted enrollment close.

### Futility at low information

Futility assessment is deferred when the observed information fraction is below
`r 100 * min_if_futility`%. The spending function approach uses
$\text{usTime} = \text{lsTime} = \min(t_{\text{planned}}, t_{\text{actual}})$
to cap spending.

IA1 includes efficacy/futility monitoring but does not permit SSR.
SSR is attempted only at IA2. For operational feasibility, SSR uses an
adaptation cutoff at
$\min(\text{IA2 time},\; \text{predicted enrollment close} -
`r min_months_to_close_for_adapt` \text{ months})$, with enrollment fraction at or
below `r 100 * max_enrollment_frac_for_ia2`%.

When a trial stops early for futility, the reported final sample size includes
subjects enrolled by the stop analysis date plus
`r analysis_lag_months` months to reflect enrollment that continues during
analysis/decision implementation.

Futility and efficacy crossing percentages are reported separately for IA1 and
IA2 in the main simulation table.

### Blinded information fallback

At each candidate cut time the simulation first attempts to estimate
blinded statistical information via `calculate_blinded_info()`.
If the blinded estimate is unavailable or non-positive (e.g., too few events
for the NB model to converge), the simulation falls back to unblinded
information from `mutze_test()` ($\mathcal{I} = 1/\text{SE}^2$).
If both methods fail, a hard-coded constant of 100 is used as a last-resort
placeholder so that the search does not stall.
The simulation results table reports the number of trials that required
this fallback at each interim analysis.

### Computational considerations

Production-scale runs use 3 600 replicates per scenario for RR $< 1$
(power), 20 000 for each scenario with RR $= 1.0$ (Type I), and 1 000 for
RR $> 1$ (interior null, lower priority for precision), plus 20 000 replicates
per $k$ and per test statistic in the separate RR = 1 non-binding futility check; these are
computationally intensive.
Parallel execution via the `future` / `future.apply` framework is
essential.
The simulation runtime table above reports the wall-clock time and
number of workers used for the current build.
For reference, the local 11-worker production cache was generated with
checkpointed chunks and took an overnight run; resumed runs report only the
wall-clock time since restart, not the cumulative time spent on previously
cached chunks.

## References