--- title: "Two-sample normal sample size" output: rmarkdown::html_vignette bibliography: gsDesign.bib vignette: > %\VignetteIndexEntry{Two-sample normal sample size} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", dev = "ragg_png", dpi = 96, fig.retina = 1, fig.width = 7.2916667, fig.asp = 0.618, fig.align = "center", out.width = "80%" ) ``` ## Introduction ```{r, echo=FALSE, message=FALSE, warning=FALSE} library(gsDesign) library(tidyr) library(knitr) ``` Limited support is provided for 2-sample design with a normally distributed random variable as the outcome. Users are encouraged to look at guidance such as in @JTBook. We provide a tool where for a large sample case where a reasonable estimate of standard deviation is available, a reasonable sample size can be computed based straightforward distribution theory outlined below. ## The problem considered The overall sample size notation used for **gsDesign** is to consider a standardized effect size parameter which is referred to as $\theta$ in @JTBook. We begin with the 2-sample normal problem where we assume a possibly different standard deviation in each treatment group. For $j = 1, 2$, we let $X_{j, i}$, $i = 1, 2, \ldots n_j$ represent independent and identically distributed observations following a normal distribution with mean $\mu_j$ and standard deviation $\sigma_j$. The natural parameter for comparing the two distributions is $$\delta = \mu_2 - \mu_1$$ and we wish to test if $\delta > 0$ in a one-sided testing scenario to test for superiority of treatment 2 over treatment 1. We could also consider testing, say, $\delta > \delta_0$ for a non-inferiority scenario with $\delta_0<0$ or **super superiority** if $\delta_0>0$. While normally a t-test would be used for this, for large sample sizes this would be nearly equivalent to a Z-test defined by: $$Z=\frac{\bar X_2 - \bar X_1-\delta_0}{\sqrt{\sigma^2_2/n_2 + \sigma_1^2/n_1}}\approx \frac{\bar X_2 - \bar X_1}{\sqrt{s^2_2/n_2 + s_1^2/n_1}}=t$$ where $\bar X_j$ is the sample mean and $s_j^2$ is the sample variance for group $j=1,2$. The far right hand side of this is Welch's t-test. For our examples we use this $t$-test and show that the sample size computation based on the $Z$-test above works well for the chosen problems. ## Sample size Thus, $n_2=rn/(1+r)$, $n_1=n/(1+r)$ and when $r=1$ we have $n_1=n_2=n/2$. Now that we have completed needed notation, those not interested in the theory behind the sample size and power calculation used may skip the rest of this section. We let $$\sigma^2=(1+r)(\sigma_1^2+\sigma_2^2/r)$$ and define $$ \theta= (\delta -\delta_0)/\sigma.$$ Under the given assumptions, $$Z \sim \text{Normal}\left(\sqrt n\theta,1\right).$$ Under the null hypothesis that $\delta=\delta_0$, we have $Z\sim \text{Normal}(0,1)$. Thus, regardless of $n$ we have $$P_0[Z\ge \Phi^{-1}(1-\alpha)]=\alpha.$$ Under the alternate hypothesis that $\delta=\delta_1$ and we denote a corresponding $\theta_1$. We define the type II error $\beta$ and power $1-\beta$ by $$ \begin{align} 1-\beta =& P_1[Z\ge \Phi^{-1}(1-\alpha)]\\ =& P[Z-\sqrt n\theta_1\ge \Phi^{-1}(1-\alpha)-\sqrt n\theta_1]\\ =&\Phi(\Phi^{-1}(1-\alpha)-\sqrt n\theta_1)). \end{align}$$ If the power $1-\beta$ is fixed, we can invert this formula to compute sample size with: $$n= \left(\frac{\Phi^{-1}(1-\beta)+\Phi^{-1}(1-\alpha)}{\theta_1}\right)^2.$$ For 2-sided testing, we simply substitute $\alpha/2$ for $\alpha$ in the above two formulas. ## Examples We consider two examples to check the above formulas vs. `nNormal()`. We then confirm that the approximation is working well by simulating and confirming that the power and Type I error approximations are useful. Finally, we provide a simple group sequential design example. ### Sample size We consider an example with $\sigma_2=1.25$, $\sigma_1=1.6$, $\delta=0.8$ and $\delta_0=0$. We let the sample size ratio be 2 experimental group observations per control observation. We compute sample size with `nNormal()` assuming one-sided Type I error $\alpha=0.025$ and 90% power ($1-\beta=0.9$). Checking using the sample size formula above, we have: ```{r} r <- 2 sigma <- sqrt((1 + r) * (1.6^2 + 1.25^2 / r)) theta <- 0.8 / sigma ((qnorm(.9) + qnorm(.975)) / theta)^2 ``` ### Power Now, assume we let the sample size be 200 and compute power under the same scenario. ```{r} nNormal(delta1 = 0.8, sd = 1.6, sd2 = 1.25, alpha = 0.025, n = 200, ratio = 2) ``` From the power formula above, we duplicate this with: ```{r} pwr <- pnorm(qnorm(.975) - sqrt(200) * theta, lower.tail = FALSE) pwr ``` If we want to plot power for a variety of sample sizes, we can input `n` as a vector: ```{r} n <- 100:200 pwrn <- nNormal(delta1 = 0.8, sd = 1.6, sd2 = 1.25, alpha = 0.025, n = n, ratio = 2) plot(n, pwrn, type = "l") ``` Alternatively, you could fix sample size at 200 and plot power under different treatment effect assumptions: ```{r} delta1 <- seq(.5, 1, .025) pwrdelta1 <- nNormal(delta1 = delta1, sd = 1.6, sd2 = 1.25, alpha = 0.025, n = 200, ratio = 2) plot(delta1, pwrdelta1, type = "l") ``` ### Verification with simulation Rather than simulate individual observations, we will take advantage of the fact that for $j=1,2$ $$\bar X_j\sim \text{Normal}(\mu_j,\sigma_j^2/n_j)$$ and $$(n_j-1)s_j^2/\sigma_j^2=\sum_{i=1}^{n_j} (X_{ij}-\bar X_j)/\sigma^2 \sim \chi ^2_{n_j-1}$$ are independent. Thus, we can simulate trial power with $n=200$ 1 million times with a t-statistic with unequal variances quickly as follows under the alternate hypothesis: ```{r} nsim <- 1000000 delta <- 0.8 sd1 <- 1.6 sd2 <- 1.25 n1 <- 67 n2 <- 133 deltahat <- rnorm(n = nsim, mean = delta, sd = sd1 / sqrt(n1)) - rnorm(n = nsim, mean = 0, sd = sd2 / sqrt(n2)) s <- sqrt( sd1^2 * rchisq(n = nsim, df = n1 - 1) / (n1 - 1) / n1 + sd2^2 * rchisq(n = nsim, df = n2 - 1) / (n2 - 1) / n2 ) z <- deltahat / s mean(z >= qnorm(.975)) ``` The standard error for this simulation power calculation is approximately ```{r} sqrt(pwr * (1 - pwr) / nsim) ``` suggesting we should be within less than about 0.001 if the actual power, which suggests the normal power approximation is reasonable for this scenario. ### Group sequential design Now we derive a group sequential design under the above scenario. We will largely use default parameters and show two methods. For the first, we plug in the fixed sample size above as follows: ```{r} d <- gsDesign( k = 2, n.fix = nNormal(delta1 = 0.8, sd = 1.6, sd2 = 1.25, alpha = 0.025, beta = .1, ratio = 2), delta1 = 0.8 ) d %>% gsBoundSummary(deltaname = "Mean difference") %>% kable(row.names = FALSE) ``` A textual summary of the design is given by: ```{r,results='asis'} cat(summary(d)) ``` We can get the same answer by plugging in the standardized effect size we computed above: ```{r} gsDesign( k = 2, delta = theta, delta1 = 0.8 ) %>% gsBoundSummary(deltaname = "Mean difference") %>% kable(row.names = FALSE) ``` We leave it to the reader to verify the properties of the above design using simulation as in the fixed design example. ## References