Confidence Intervals

You can use this function that has been adapted from the conf.int function available in the animation package (inspirational example here)

Description
This function gives a demonstration of the concept of confidence intervals in mathematical statistics.

Usage

ci.animate(level = 0.95, ns = 50, nci=100, cl = c("red", "gray"), show.plot=TRUE, nmu = 0, nsd = 1, ...)

Arguments

Details
Keep on drawing samples from the Normal distribution N(0, 1), computing the intervals based on a given confidence level and plotting them as segments in a graph. In the end, we may check the coverage rate against the given confidence level.

Intervals that cover the true parameter are denoted in color cl[2], otherwise in color cl[1]. Each time we draw a sample, we can compute the corresponding confidence interval. As the process of drawing samples goes on, there will be a legend indicating the numbers of the two kinds of intervals respectively and the coverage rate is also denoted in the top-left of the plot.

ci.animate <- function(level = 0.95, ns = 50, nci=100, cl = c("red", "gray"), show.plot=TRUE, nmu = 0, nsd = 1, ...){
    d = replicate(nci, rnorm(ns, mean=nmu, sd = nsd))
    m = colMeans(d)
    z = qnorm(1 - (1 - level)/2) # z*
    y0 = m - z*nsd/sqrt(ns)
    y1 = m + z*nsd/sqrt(ns)
    rg = range(c(y0, y1))
    cvr = y0 < nmu & y1 > nmu
    xax = pretty(1:nci)
    if (show.plot){
      for (i in 1:nci) {
        plot(1:nci, ylim = rg, type = "n", xlab = "Samples", ylab = expression("CI: [" ~ 
            bar(x) - z[alpha/2] * sigma/sqrt(n) ~ ", " ~ bar(x) + 
            z[alpha/2] * sigma/sqrt(n) ~ "]"), xaxt = "n" , ...)
        abline(h = nmu, lty = 2)
        axis(1, xax[xax <= i])
        arrows(1:i, y0[1:i], 1:i, y1[1:i], length = par("din")[1]/nci * 
            0.5, angle = 90, code = 3, col = cl[cvr[1:i] + 1])
        points(1:i, m[1:i], col = cl[cvr[1:i] + 1])
        legend("topright", legend = format(c(i - sum(cvr[1:i]), 
            sum(cvr[1:i])), width = nchar(nci)), fill = cl, bty = "n", 
            ncol = 2)
        legend("topleft", legend = paste("coverage rate:", format(round(mean(cvr[1:i]), 
            3), nsmall = 3)), bty = "n")
      }
    }
    coverage.rate = sum(cvr)/nci
    out <- list(level = level, ns = ns, nci=nci, y0=y0, y1=y1, cvr=cvr, coverage.rate = coverage.rate)
}
library(gifski) # used to create animation below

Demo 1: Population \(X\sim N(0,1)\)

set.seed(230)
ciout <- ci.animate( ns= 30, nci=100)

The above demo draws 100 samples of size 30 from a normal population with mean \(\mu\) = 0 and \(\sigma\) = 1; that is \(X\sim N(0,1)\) (the standard normal distribution). For each sample it constructs a 0.95% confidence interval using the equations provided in lecture: \[\begin{align} \overline x &\pm z_{\alpha/2}\frac{\sigma}{\sqrt{n}}\\ (\overline x - z_{\alpha/2}\frac{\sigma}{\sqrt{n}} &, \overline x + z_{\alpha/2}\frac{\sigma}{\sqrt{n}} ) \end{align}\] where \[\begin{align} \overline x &= \frac{\sum_{1}^{n} x_i}{n} & \text{ and } && z_{\alpha/2} &= z_{0.025} \\ &= \frac{\sum_{1}^{30} x_i}{30} &&& & = 1.959964 \end{align}\] As you can see in the visualization and the table below, with a seed of 230 we actually got really lucky and 99% of the confidence intervals we constructed contained the true value of \(\mu\). That is to say, the coverage rate is 0.99 meaning 99 of the 100 confidence intervals contained 0 within their lower and upper bounds. Running this code again will produce different samples, and therefore difference confidence intervals and potential different coverage rates. While a random run maybe produce a coverage rate a little higher/lower than the set confidence level of 0.95, on average we would expect that 99% of the confidence intervals we draw will in fact contain \(\mu\).

library(knitr)
library(kableExtra)
CI = cbind(round(ciout$y0,3), round(ciout$y1,3))
rownames(CI) = 1:ciout$nci
CI <- cbind(CI, ifelse(ciout$cvr, "yes", "no"))
colnames(CI) = c("lower bound", "upper bound", "Includes mu?")

CI %>%
  kable(format = "html", digits = 3 ) %>%
  kable_styling(bootstrap_options = "bordered",
                full_width = FALSE) %>%
  add_header_above(c("Confidence Interval" = 2, "")) %>%
  row_spec(which(ciout$cvr==FALSE),  italic = TRUE, color = "black", background = "#ffcccb")
Confidence Interval
lower bound upper bound Includes mu?
-0.559 0.156 yes
-0.287 0.428 yes
-0.336 0.379 yes
-0.474 0.242 yes
-0.155 0.561 yes
-0.313 0.402 yes
-0.346 0.369 yes
-0.386 0.33 yes
-0.588 0.128 yes
-0.281 0.435 yes
-0.262 0.454 yes
-0.23 0.485 yes
-0.347 0.368 yes
-0.567 0.149 yes
-0.582 0.134 yes
-0.312 0.403 yes
0.009 0.725 no
-0.202 0.514 yes
-0.307 0.408 yes
-0.193 0.523 yes
-0.177 0.539 yes
-0.363 0.353 yes
-0.55 0.166 yes
-0.364 0.352 yes
-0.202 0.514 yes
-0.225 0.491 yes
-0.444 0.272 yes
-0.305 0.411 yes
-0.146 0.57 yes
-0.212 0.504 yes
-0.365 0.351 yes
-0.579 0.137 yes
-0.487 0.229 yes
-0.286 0.43 yes
-0.284 0.432 yes
-0.33 0.385 yes
-0.363 0.353 yes
-0.674 0.042 yes
-0.288 0.428 yes
-0.559 0.157 yes
-0.345 0.371 yes
-0.478 0.237 yes
-0.123 0.593 yes
-0.4 0.316 yes
-0.306 0.41 yes
-0.28 0.436 yes
-0.244 0.472 yes
-0.346 0.37 yes
-0.4 0.315 yes
-0.167 0.548 yes
-0.689 0.027 yes
-0.25 0.466 yes
-0.637 0.079 yes
-0.474 0.242 yes
-0.668 0.048 yes
-0.431 0.284 yes
-0.696 0.02 yes
-0.211 0.505 yes
-0.096 0.62 yes
-0.452 0.264 yes
-0.519 0.197 yes
-0.462 0.254 yes
-0.293 0.422 yes
-0.493 0.222 yes
-0.318 0.398 yes
-0.215 0.501 yes
-0.568 0.148 yes
-0.305 0.411 yes
-0.395 0.321 yes
-0.191 0.524 yes
-0.415 0.3 yes
-0.438 0.277 yes
-0.483 0.233 yes
-0.112 0.603 yes
-0.432 0.283 yes
-0.216 0.5 yes
-0.385 0.331 yes
-0.36 0.356 yes
-0.688 0.027 yes
-0.352 0.364 yes
-0.346 0.369 yes
-0.276 0.44 yes
-0.329 0.387 yes
-0.225 0.49 yes
-0.197 0.519 yes
-0.694 0.022 yes
-0.58 0.135 yes
-0.361 0.355 yes
-0.237 0.478 yes
-0.301 0.415 yes
-0.495 0.22 yes
-0.6 0.116 yes
-0.213 0.503 yes
-0.45 0.266 yes
-0.225 0.491 yes
-0.389 0.326 yes
-0.458 0.257 yes
-0.4 0.316 yes
-0.586 0.13 yes
-0.324 0.391 yes

You can pull the code from this post or play around with things on this shiny app: https://shiny.rit.albany.edu/stat/confidence/

Demo 1 (again)

Let’s run the demo again, this time with a different seed (set.seed(2021)).

set.seed(2021)
ciout <- ci.animate( ns= 30, nci=100)

This time, the coverage was 0.97. For fun, let’s run this code 100 more times and record the coverage rate each time (plots not shown):

coverage <- NULL
for (i in 1:100){
  ciout <- ci.animate( ns= 30, nci=100, show.plot = FALSE)
  coverage[i] <- ciout$coverage.rate
}
hist(coverage)

On average, the coverage was 0.9507. If we bumped up the sample size to 80:

coverage <- NULL
for (i in 1:100){
  ciout <- ci.animate( ns= 80, nci=100, show.plot = FALSE)
  coverage[i] <- ciout$coverage.rate
}
hist(coverage)

the coverage on average is 0.95.

demo2.mu <- 70
demo2.sd <- 5

Demo 2 Population \(X\sim N(70,5)\)

Let’s sample from a normal distribution having a mean of 70 and standard deviation of 5. We will repeat the simulation and count how many confidence intervals contain the true mean value of \(\mu = 70\). Let’s use the default values of a confidence level of 0.95, sample size of 50 and number of samples as 100. Note that when you use the default values, you do not need to specify those arguments in the function. That is:
ci.animate(level = 0.95, ns = 50, nci=100)

is equivalent to running:
ci.animate(level = 0.95, ns = 50, nci=100)

We will, however, need to change the default mean and standard deviation default parameter values for the population distribution from which we are sampling. The arguments of our function that controls these are nmu and nsd respectively. To see the code, press the CODE button to the right.

set.seed(95)
demo2 <- ci.animate(nmu=demo2.mu, nsd=demo2.sd)

For this simulation 93% of these confidence intervals contained the true value of \(\mu = 70\). That is to say, the coverage rate is 0.93 meaning 93 of the 100 confidence intervals contained 70 within their lower and upper bounds. In that sense, we got a bit unlucky in that more than 5% of our confidence intervals did not contain the true value of \(\mu\)

CI = cbind(round(demo2$y0,3), round(demo2$y1,3))
rownames(CI) = 1:demo2$nci
CI <- cbind(CI, ifelse(demo2$cvr, "yes", "no"))
colnames(CI) = c("lower bound", "upper bound", "Includes mu?")

CI %>%
  kable(format = "html", digits = c(3,3,NULL) ) %>%
  kable_styling(bootstrap_options = "bordered",
                full_width = FALSE) %>%
  add_header_above(c("Confidence Interval" = 2, "")) %>%
  row_spec(which(demo2$cvr==FALSE),  italic = TRUE, color = "black", background = "#ffcccb")
Confidence Interval
lower bound upper bound Includes mu?
68.268 71.04 yes
70.848 73.62 no
68.569 71.341 yes
67.151 69.923 no
68.557 71.329 yes
68.402 71.174 yes
67.658 70.429 yes
68.978 71.75 yes
68.006 70.778 yes
67.994 70.766 yes
67.695 70.467 yes
68.881 71.653 yes
68.034 70.806 yes
69.001 71.773 yes
68.654 71.426 yes
68.314 71.085 yes
67.577 70.349 yes
68.839 71.61 yes
67.712 70.484 yes
67.715 70.487 yes
68.963 71.735 yes
69.191 71.963 yes
71.065 73.837 no
70.187 72.958 no
68.528 71.3 yes
67.583 70.354 yes
67.856 70.628 yes
67.423 70.195 yes
69.173 71.944 yes
70.102 72.874 no
69.519 72.291 yes
69.427 72.199 yes
68.481 71.252 yes
69.394 72.166 yes
68.118 70.89 yes
68.122 70.894 yes
69.022 71.794 yes
68.245 71.017 yes
67.907 70.679 yes
68.348 71.12 yes
69.124 71.896 yes
69.046 71.818 yes
69.086 71.858 yes
68.664 71.435 yes
68.713 71.485 yes
69.296 72.068 yes
67.887 70.659 yes
68.711 71.483 yes
67.357 70.129 yes
67.823 70.595 yes
68.218 70.99 yes
69.476 72.248 yes
69.794 72.566 yes
68.911 71.683 yes
69.322 72.093 yes
68.39 71.162 yes
68.97 71.742 yes
67.601 70.372 yes
69.356 72.128 yes
69.511 72.283 yes
68.425 71.197 yes
68.413 71.185 yes
69.065 71.837 yes
69.13 71.902 yes
68.98 71.752 yes
67.475 70.247 yes
68.379 71.151 yes
69.706 72.477 yes
68.011 70.783 yes
67.7 70.472 yes
68.67 71.442 yes
69.245 72.016 yes
68.657 71.429 yes
68.643 71.415 yes
68.2 70.972 yes
69.48 72.252 yes
68.039 70.811 yes
68.65 71.422 yes
68.201 70.973 yes
68.874 71.646 yes
69.606 72.378 yes
66.347 69.118 no
69.172 71.943 yes
68.156 70.928 yes
67.792 70.564 yes
68.576 71.348 yes
66.473 69.245 no
69.415 72.187 yes
67.973 70.745 yes
67.761 70.533 yes
69.197 71.969 yes
68.918 71.69 yes
68.244 71.015 yes
69.106 71.878 yes
68.692 71.463 yes
67.76 70.532 yes
68.347 71.119 yes
67.365 70.137 yes
68.645 71.417 yes
67.986 70.758 yes