---
title: "Using the gsDesignNB AI skill"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Using the gsDesignNB AI skill}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r, message=FALSE, warning=FALSE}
library(gsDesignNB)
```

## Purpose

`gsDesignNB` includes two AI-facing documentation files:

- [`SKILL.md`](../SKILL.md), a concise workflow guide for assistants or humans
  using the package for negative binomial recurrent-event trial design.
- [`llms.txt`](../llms.txt), a generated pkgdown index that helps language
  models find the package documentation, articles, and reference pages.

This vignette demonstrates how the skill is intended to be used. The skill does
not replace the package documentation, the manuscript, or statistical review.
Instead, it keeps recurring workflows on track: use package-native functions,
align time units, carry event-gap assumptions consistently, and match
sample-size calculations to the planned final test statistic.

## Example task

Suppose the task is:

> Plan a recurrent-event superiority trial with monthly rates, a 28-day
> inter-event gap, staggered enrollment, dropout, and a final score test.

The skill points to the following package-native workflow:

1. Compute fixed-design sample size with `sample_size_nbinom()`.
2. Use the score test when Type I error calibration is the priority, especially
   for adaptive or group sequential designs.
3. Carry `event_gap` through planning, simulation, and data cutting.
4. Use `gsNBCalendar()` for calendar-time group sequential monitoring.
5. Use `mutze_test(test_type = "score")` for the planned final test.

## Time-scale setup

The most common preventable error in this package is mixing time units. Here all
rates and durations use months. The event gap is 28 days converted to months.

```{r}
event_gap_months <- 28 / 30.4375

design_args <- list(
  lambda1 = 0.08,
  lambda2 = 0.056,
  dispersion = 0.6,
  power = 0.80,
  alpha = 0.025,
  sided = 1,
  accrual_rate = 10,
  accrual_duration = 18,
  trial_duration = 30,
  dropout_rate = 0.01,
  max_followup = 12,
  event_gap = event_gap_months
)
```

## Wald versus score sizing

The skill's current recommendation is to compare Wald and score sizing, then
choose the final sample size using simulation evidence. The two calculations
use different variance references:

- Wald sizing uses the alternative variance for both the Type I and power
  components.
- Score sizing uses a null variance for the Type I component and an alternative
  variance for the power component.

```{r}
wald_design <- do.call(
  sample_size_nbinom,
  c(design_args, list(test_type = "wald"))
)

score_design <- do.call(
  sample_size_nbinom,
  c(design_args, list(test_type = "score"))
)

design_comparison <- data.frame(
  test_type = c(wald_design$test_type, score_design$test_type),
  n_total = c(wald_design$n_total, score_design$n_total),
  n1 = c(wald_design$n1, score_design$n1),
  n2 = c(wald_design$n2, score_design$n2),
  total_events = round(c(wald_design$total_events, score_design$total_events), 1),
  variance_alt = round(c(wald_design$variance, score_design$variance), 4),
  variance_null = round(c(wald_design$variance_null, score_design$variance_null), 4)
)

design_comparison
```

In this scenario, score sizing is slightly smaller than Wald sizing. That is
not a general rule, but it illustrates why the sizing rule and the analysis
test should not be conflated. In the package simulation grid, the traditional
Wald/Zhu--Lakkis sample size paired with the score test preserved Type I error
and provided a small practical power margin. The skill therefore reminds the
analyst to compare sizing rules, choose the final test deliberately, and verify
operating characteristics by simulation for the actual design setting.

## Calendar-time group sequential design

The same fixed-design result can be passed to `gsNBCalendar()` to construct a
calendar-time group sequential design. Here the Wald-sized fixed design is used
as a practical baseline sample size, while the planned analysis and simulation
use the score test for Type I error control.

```{r}
gs_design <- gsNBCalendar(
  wald_design,
  k = 3,
  test.type = 4,
  analysis_times = c(18, 24, 30)
)

data.frame(
  analysis = seq_along(gs_design$n.I),
  calendar_month = c(18, 24, 30),
  planned_information = round(gs_design$n.I, 2),
  information_fraction = round(gs_design$timing, 3)
)
```

## Simulate, cut, and test a small data set

For a quick executable demonstration, simulate a small trial, cut it at
12 months, and run the score test. This is intentionally tiny; production
operating-characteristic work should use `sim_gs_nbinom()` or
`sim_ssr_nbinom()` with many replicates and saved seeds.

```{r}
set.seed(2026)

enroll_rate <- data.frame(rate = 30 / 6, duration = 6)
fail_rate <- data.frame(
  treatment = c("Control", "Experimental"),
  rate = c(design_args$lambda1, design_args$lambda2),
  dispersion = c(design_args$dispersion, design_args$dispersion)
)
dropout_rate <- data.frame(
  treatment = c("Control", "Experimental"),
  rate = c(design_args$dropout_rate, design_args$dropout_rate),
  duration = c(100, 100)
)

sim_data <- nb_sim(
  enroll_rate = enroll_rate,
  fail_rate = fail_rate,
  dropout_rate = dropout_rate,
  max_followup = design_args$max_followup,
  n = 60,
  event_gap = design_args$event_gap
)

cut_data <- cut_data_by_date(
  sim_data,
  cut_date = 12,
  event_gap = design_args$event_gap
)

head(cut_data)
```

```{r}
score_test <- mutze_test(cut_data, test_type = "score", sided = 1)
score_test
```

## Production workflow reminder

The small example above is useful for checking assumptions and object shapes.
For design claims, the skill directs the user to the simulation routines and
summary helpers:

```{r, eval=FALSE}
sim_results <- sim_gs_nbinom(
  n_sims = 10000,
  enroll_rate = enroll_rate,
  fail_rate = fail_rate,
  dropout_rate = dropout_rate,
  max_followup = design_args$max_followup,
  event_gap = design_args$event_gap,
  design = gs_design,
  analysis_times = c(18, 24, 30),
  test_type = "score",
  seed = TRUE
)

bounded <- check_gs_bound(sim_results, gs_design)
summarize_gs_sim(bounded)
```

For sample size re-estimation studies, use `sim_ssr_nbinom()` and
`summarize_ssr_sim()`. The score final test is especially important in SSR
because adaptation can increase information under nuisance misspecification;
the score test helps preserve Type I error where a Wald analysis may be mildly
anti-conservative. The adapted sample size itself should still be checked by
simulation rather than assumed from the formula alone.

## What this skill is and is not

The skill is a workflow aid. It is useful for:

- choosing package-native functions rather than reimplementing logic;
- preserving time-scale, event-gap, and test-statistic consistency;
- finding the right vignette or reference page quickly;
- reminding users when simulations are needed to support recommendations.

It is not a substitute for protocol-level statistical judgment. Clinical trial
designs still require review of assumptions, estimands, missing-data handling,
operating characteristics, and regulatory context.