You can use this function that has been adapted from the conf.int
function available in the animation package (inspirational example here)
Description
This function gives a demonstration of the concept of confidence intervals in mathematical statistics.
Usage
ci.animate(level = 0.95, ns = 50, nci=100, cl = c("red", "gray"), show.plot=TRUE, nmu = 0, nsd = 1, ...)
Arguments
level
the confidence level \((1 - \alpha)\); default value is 0.95ns
the sample size for drawing samples from N(0, 1)nci
the number of confidence intervals/samples to generatecl
two different colors to annotate whether the confidence intervals cover the true mean (cl[1]
= no; cl[2]
: yes)show.plot
whether or not to plot the confidence intervals (default is yes)nmu
the mean of the normal population from which we are sampling (default is standard normal)nsd
the standard deviation of the normal population from which we are sampling (default is standard normal)...
other arguments passed to plot.defaultDetails
Keep on drawing samples from the Normal distribution N(0, 1), computing the intervals based on a given confidence level and plotting them as segments in a graph. In the end, we may check the coverage rate against the given confidence level.
Intervals that cover the true parameter are denoted in color cl[2], otherwise in color cl[1]. Each time we draw a sample, we can compute the corresponding confidence interval. As the process of drawing samples goes on, there will be a legend indicating the numbers of the two kinds of intervals respectively and the coverage rate is also denoted in the top-left of the plot.
ci.animate <- function(level = 0.95, ns = 50, nci=100, cl = c("red", "gray"), show.plot=TRUE, nmu = 0, nsd = 1, ...){
d = replicate(nci, rnorm(ns, mean=nmu, sd = nsd))
m = colMeans(d)
z = qnorm(1 - (1 - level)/2) # z*
y0 = m - z*nsd/sqrt(ns)
y1 = m + z*nsd/sqrt(ns)
rg = range(c(y0, y1))
cvr = y0 < nmu & y1 > nmu
xax = pretty(1:nci)
if (show.plot){
for (i in 1:nci) {
plot(1:nci, ylim = rg, type = "n", xlab = "Samples", ylab = expression("CI: [" ~
bar(x) - z[alpha/2] * sigma/sqrt(n) ~ ", " ~ bar(x) +
z[alpha/2] * sigma/sqrt(n) ~ "]"), xaxt = "n" , ...)
abline(h = nmu, lty = 2)
axis(1, xax[xax <= i])
arrows(1:i, y0[1:i], 1:i, y1[1:i], length = par("din")[1]/nci *
0.5, angle = 90, code = 3, col = cl[cvr[1:i] + 1])
points(1:i, m[1:i], col = cl[cvr[1:i] + 1])
legend("topright", legend = format(c(i - sum(cvr[1:i]),
sum(cvr[1:i])), width = nchar(nci)), fill = cl, bty = "n",
ncol = 2)
legend("topleft", legend = paste("coverage rate:", format(round(mean(cvr[1:i]),
3), nsmall = 3)), bty = "n")
}
}
coverage.rate = sum(cvr)/nci
out <- list(level = level, ns = ns, nci=nci, y0=y0, y1=y1, cvr=cvr, coverage.rate = coverage.rate)
}
library(gifski) # used to create animation below
set.seed(230)
ciout <- ci.animate( ns= 30, nci=100)
The above demo draws 100 samples of size 30 from a normal population with mean \(\mu\) = 0 and \(\sigma\) = 1; that is \(X\sim N(0,1)\) (the standard normal distribution). For each sample it constructs a 0.95% confidence interval using the equations provided in lecture: \[\begin{align}
\overline x &\pm z_{\alpha/2}\frac{\sigma}{\sqrt{n}}\\
(\overline x - z_{\alpha/2}\frac{\sigma}{\sqrt{n}} &,
\overline x + z_{\alpha/2}\frac{\sigma}{\sqrt{n}} )
\end{align}\] where \[\begin{align}
\overline x &= \frac{\sum_{1}^{n} x_i}{n} & \text{ and } && z_{\alpha/2} &= z_{0.025} \\
&= \frac{\sum_{1}^{30} x_i}{30} &&& & = 1.959964
\end{align}\] As you can see in the visualization and the table below, with a seed of 230
we actually got really lucky and 99% of the confidence intervals we constructed contained the true value of \(\mu\). That is to say, the coverage rate is 0.99 meaning 99 of the 100 confidence intervals contained 0 within their lower and upper bounds. Running this code again will produce different samples, and therefore difference confidence intervals and potential different coverage rates. While a random run maybe produce a coverage rate a little higher/lower than the set confidence level of 0.95, on average we would expect that 99% of the confidence intervals we draw will in fact contain \(\mu\).
library(knitr)
library(kableExtra)
CI = cbind(round(ciout$y0,3), round(ciout$y1,3))
rownames(CI) = 1:ciout$nci
CI <- cbind(CI, ifelse(ciout$cvr, "yes", "no"))
colnames(CI) = c("lower bound", "upper bound", "Includes mu?")
CI %>%
kable(format = "html", digits = 3 ) %>%
kable_styling(bootstrap_options = "bordered",
full_width = FALSE) %>%
add_header_above(c("Confidence Interval" = 2, "")) %>%
row_spec(which(ciout$cvr==FALSE), italic = TRUE, color = "black", background = "#ffcccb")
lower bound | upper bound | Includes mu? |
---|---|---|
-0.559 | 0.156 | yes |
-0.287 | 0.428 | yes |
-0.336 | 0.379 | yes |
-0.474 | 0.242 | yes |
-0.155 | 0.561 | yes |
-0.313 | 0.402 | yes |
-0.346 | 0.369 | yes |
-0.386 | 0.33 | yes |
-0.588 | 0.128 | yes |
-0.281 | 0.435 | yes |
-0.262 | 0.454 | yes |
-0.23 | 0.485 | yes |
-0.347 | 0.368 | yes |
-0.567 | 0.149 | yes |
-0.582 | 0.134 | yes |
-0.312 | 0.403 | yes |
0.009 | 0.725 | no |
-0.202 | 0.514 | yes |
-0.307 | 0.408 | yes |
-0.193 | 0.523 | yes |
-0.177 | 0.539 | yes |
-0.363 | 0.353 | yes |
-0.55 | 0.166 | yes |
-0.364 | 0.352 | yes |
-0.202 | 0.514 | yes |
-0.225 | 0.491 | yes |
-0.444 | 0.272 | yes |
-0.305 | 0.411 | yes |
-0.146 | 0.57 | yes |
-0.212 | 0.504 | yes |
-0.365 | 0.351 | yes |
-0.579 | 0.137 | yes |
-0.487 | 0.229 | yes |
-0.286 | 0.43 | yes |
-0.284 | 0.432 | yes |
-0.33 | 0.385 | yes |
-0.363 | 0.353 | yes |
-0.674 | 0.042 | yes |
-0.288 | 0.428 | yes |
-0.559 | 0.157 | yes |
-0.345 | 0.371 | yes |
-0.478 | 0.237 | yes |
-0.123 | 0.593 | yes |
-0.4 | 0.316 | yes |
-0.306 | 0.41 | yes |
-0.28 | 0.436 | yes |
-0.244 | 0.472 | yes |
-0.346 | 0.37 | yes |
-0.4 | 0.315 | yes |
-0.167 | 0.548 | yes |
-0.689 | 0.027 | yes |
-0.25 | 0.466 | yes |
-0.637 | 0.079 | yes |
-0.474 | 0.242 | yes |
-0.668 | 0.048 | yes |
-0.431 | 0.284 | yes |
-0.696 | 0.02 | yes |
-0.211 | 0.505 | yes |
-0.096 | 0.62 | yes |
-0.452 | 0.264 | yes |
-0.519 | 0.197 | yes |
-0.462 | 0.254 | yes |
-0.293 | 0.422 | yes |
-0.493 | 0.222 | yes |
-0.318 | 0.398 | yes |
-0.215 | 0.501 | yes |
-0.568 | 0.148 | yes |
-0.305 | 0.411 | yes |
-0.395 | 0.321 | yes |
-0.191 | 0.524 | yes |
-0.415 | 0.3 | yes |
-0.438 | 0.277 | yes |
-0.483 | 0.233 | yes |
-0.112 | 0.603 | yes |
-0.432 | 0.283 | yes |
-0.216 | 0.5 | yes |
-0.385 | 0.331 | yes |
-0.36 | 0.356 | yes |
-0.688 | 0.027 | yes |
-0.352 | 0.364 | yes |
-0.346 | 0.369 | yes |
-0.276 | 0.44 | yes |
-0.329 | 0.387 | yes |
-0.225 | 0.49 | yes |
-0.197 | 0.519 | yes |
-0.694 | 0.022 | yes |
-0.58 | 0.135 | yes |
-0.361 | 0.355 | yes |
-0.237 | 0.478 | yes |
-0.301 | 0.415 | yes |
-0.495 | 0.22 | yes |
-0.6 | 0.116 | yes |
-0.213 | 0.503 | yes |
-0.45 | 0.266 | yes |
-0.225 | 0.491 | yes |
-0.389 | 0.326 | yes |
-0.458 | 0.257 | yes |
-0.4 | 0.316 | yes |
-0.586 | 0.13 | yes |
-0.324 | 0.391 | yes |
You can pull the code from this post or play around with things on this shiny app: https://shiny.rit.albany.edu/stat/confidence/
Let’s run the demo again, this time with a different seed (set.seed(2021)
).
set.seed(2021)
ciout <- ci.animate( ns= 30, nci=100)
This time, the coverage was 0.97. For fun, let’s run this code 100 more times and record the coverage rate each time (plots not shown):
coverage <- NULL
for (i in 1:100){
ciout <- ci.animate( ns= 30, nci=100, show.plot = FALSE)
coverage[i] <- ciout$coverage.rate
}
hist(coverage)
On average, the coverage was 0.9507. If we bumped up the sample size to 80:
coverage <- NULL
for (i in 1:100){
ciout <- ci.animate( ns= 80, nci=100, show.plot = FALSE)
coverage[i] <- ciout$coverage.rate
}
hist(coverage)
the coverage on average is 0.95.
demo2.mu <- 70
demo2.sd <- 5
Let’s sample from a normal distribution having a mean of 70 and standard deviation of 5. We will repeat the simulation and count how many confidence intervals contain the true mean value of \(\mu = 70\). Let’s use the default values of a confidence level of 0.95, sample size of 50 and number of samples as 100. Note that when you use the default values, you do not need to specify those arguments in the function. That is:
ci.animate(level = 0.95, ns = 50, nci=100)
is equivalent to running:
ci.animate(level = 0.95, ns = 50, nci=100)
We will, however, need to change the default mean and standard deviation default parameter values for the population distribution from which we are sampling. The arguments of our function that controls these are nmu
and nsd
respectively. To see the code, press the CODE button to the right.
set.seed(95)
demo2 <- ci.animate(nmu=demo2.mu, nsd=demo2.sd)
For this simulation 93% of these confidence intervals contained the true value of \(\mu = 70\). That is to say, the coverage rate is 0.93 meaning 93 of the 100 confidence intervals contained 70 within their lower and upper bounds. In that sense, we got a bit unlucky in that more than 5% of our confidence intervals did not contain the true value of \(\mu\)
CI = cbind(round(demo2$y0,3), round(demo2$y1,3))
rownames(CI) = 1:demo2$nci
CI <- cbind(CI, ifelse(demo2$cvr, "yes", "no"))
colnames(CI) = c("lower bound", "upper bound", "Includes mu?")
CI %>%
kable(format = "html", digits = c(3,3,NULL) ) %>%
kable_styling(bootstrap_options = "bordered",
full_width = FALSE) %>%
add_header_above(c("Confidence Interval" = 2, "")) %>%
row_spec(which(demo2$cvr==FALSE), italic = TRUE, color = "black", background = "#ffcccb")
lower bound | upper bound | Includes mu? |
---|---|---|
68.268 | 71.04 | yes |
70.848 | 73.62 | no |
68.569 | 71.341 | yes |
67.151 | 69.923 | no |
68.557 | 71.329 | yes |
68.402 | 71.174 | yes |
67.658 | 70.429 | yes |
68.978 | 71.75 | yes |
68.006 | 70.778 | yes |
67.994 | 70.766 | yes |
67.695 | 70.467 | yes |
68.881 | 71.653 | yes |
68.034 | 70.806 | yes |
69.001 | 71.773 | yes |
68.654 | 71.426 | yes |
68.314 | 71.085 | yes |
67.577 | 70.349 | yes |
68.839 | 71.61 | yes |
67.712 | 70.484 | yes |
67.715 | 70.487 | yes |
68.963 | 71.735 | yes |
69.191 | 71.963 | yes |
71.065 | 73.837 | no |
70.187 | 72.958 | no |
68.528 | 71.3 | yes |
67.583 | 70.354 | yes |
67.856 | 70.628 | yes |
67.423 | 70.195 | yes |
69.173 | 71.944 | yes |
70.102 | 72.874 | no |
69.519 | 72.291 | yes |
69.427 | 72.199 | yes |
68.481 | 71.252 | yes |
69.394 | 72.166 | yes |
68.118 | 70.89 | yes |
68.122 | 70.894 | yes |
69.022 | 71.794 | yes |
68.245 | 71.017 | yes |
67.907 | 70.679 | yes |
68.348 | 71.12 | yes |
69.124 | 71.896 | yes |
69.046 | 71.818 | yes |
69.086 | 71.858 | yes |
68.664 | 71.435 | yes |
68.713 | 71.485 | yes |
69.296 | 72.068 | yes |
67.887 | 70.659 | yes |
68.711 | 71.483 | yes |
67.357 | 70.129 | yes |
67.823 | 70.595 | yes |
68.218 | 70.99 | yes |
69.476 | 72.248 | yes |
69.794 | 72.566 | yes |
68.911 | 71.683 | yes |
69.322 | 72.093 | yes |
68.39 | 71.162 | yes |
68.97 | 71.742 | yes |
67.601 | 70.372 | yes |
69.356 | 72.128 | yes |
69.511 | 72.283 | yes |
68.425 | 71.197 | yes |
68.413 | 71.185 | yes |
69.065 | 71.837 | yes |
69.13 | 71.902 | yes |
68.98 | 71.752 | yes |
67.475 | 70.247 | yes |
68.379 | 71.151 | yes |
69.706 | 72.477 | yes |
68.011 | 70.783 | yes |
67.7 | 70.472 | yes |
68.67 | 71.442 | yes |
69.245 | 72.016 | yes |
68.657 | 71.429 | yes |
68.643 | 71.415 | yes |
68.2 | 70.972 | yes |
69.48 | 72.252 | yes |
68.039 | 70.811 | yes |
68.65 | 71.422 | yes |
68.201 | 70.973 | yes |
68.874 | 71.646 | yes |
69.606 | 72.378 | yes |
66.347 | 69.118 | no |
69.172 | 71.943 | yes |
68.156 | 70.928 | yes |
67.792 | 70.564 | yes |
68.576 | 71.348 | yes |
66.473 | 69.245 | no |
69.415 | 72.187 | yes |
67.973 | 70.745 | yes |
67.761 | 70.533 | yes |
69.197 | 71.969 | yes |
68.918 | 71.69 | yes |
68.244 | 71.015 | yes |
69.106 | 71.878 | yes |
68.692 | 71.463 | yes |
67.76 | 70.532 | yes |
68.347 | 71.119 | yes |
67.365 | 70.137 | yes |
68.645 | 71.417 | yes |
67.986 | 70.758 | yes |