Confidence Intervals

You can use this function that has been adapted from the conf.int function available in the animation package (inspirational example here)

Description
This function gives a demonstration of the concept of confidence intervals in mathematical statistics.

Usage

ci.animate(level = 0.95, ns = 50, nci=100, cl = c("red", "gray"), show.plot=TRUE, nmu = 0, nsd = 1, ...)

Arguments

level the confidence level \((1 - \alpha)\); default value is 0.95
ns the sample size for drawing samples from N(0, 1)
nci the number of confidence intervals/samples to generate
cl two different colors to annotate whether the confidence intervals cover the true mean (cl[1] = no; cl[2]: yes)
show.plot whether or not to plot the confidence intervals (default is yes)
nmu the mean of the normal population from which we are sampling (default is standard normal)
nsd the standard deviation of the normal population from which we are sampling (default is standard normal)
... other arguments passed to plot.default

Details
Keep on drawing samples from the Normal distribution N(0, 1), computing the intervals based on a given confidence level and plotting them as segments in a graph. In the end, we may check the coverage rate against the given confidence level.

Intervals that cover the true parameter are denoted in color cl[2], otherwise in color cl[1]. Each time we draw a sample, we can compute the corresponding confidence interval. As the process of drawing samples goes on, there will be a legend indicating the numbers of the two kinds of intervals respectively and the coverage rate is also denoted in the top-left of the plot.

ci.animate <- function(level = 0.95, ns = 50, nci=100, cl = c("red", "gray"), show.plot=TRUE, nmu = 0, nsd = 1, ...){
    d = replicate(nci, rnorm(ns, mean=nmu, sd = nsd))
    m = colMeans(d)
    z = qnorm(1 - (1 - level)/2) # z*
    y0 = m - z*nsd/sqrt(ns)
    y1 = m + z*nsd/sqrt(ns)
    rg = range(c(y0, y1))
    cvr = y0 < nmu & y1 > nmu
    xax = pretty(1:nci)
    if (show.plot){
      for (i in 1:nci) {
        plot(1:nci, ylim = rg, type = "n", xlab = "Samples", ylab = expression("CI: [" ~ 
            bar(x) - z[alpha/2] * sigma/sqrt(n) ~ ", " ~ bar(x) + 
            z[alpha/2] * sigma/sqrt(n) ~ "]"), xaxt = "n" , ...)
        abline(h = nmu, lty = 2)
        axis(1, xax[xax <= i])
        arrows(1:i, y0[1:i], 1:i, y1[1:i], length = par("din")[1]/nci * 
            0.5, angle = 90, code = 3, col = cl[cvr[1:i] + 1])
        points(1:i, m[1:i], col = cl[cvr[1:i] + 1])
        legend("topright", legend = format(c(i - sum(cvr[1:i]), 
            sum(cvr[1:i])), width = nchar(nci)), fill = cl, bty = "n", 
            ncol = 2)
        legend("topleft", legend = paste("coverage rate:", format(round(mean(cvr[1:i]), 
            3), nsmall = 3)), bty = "n")
      }
    }
    coverage.rate = sum(cvr)/nci
    out <- list(level = level, ns = ns, nci=nci, y0=y0, y1=y1, cvr=cvr, coverage.rate = coverage.rate)
}
library(gifski) # used to create animation below

Demo 1: Population \(X\sim N(0,1)\)

set.seed(230)
ciout <- ci.animate( ns= 30, nci=100)

The above demo draws 100 samples of size 30 from a normal population with mean \(\mu\) = 0 and \(\sigma\) = 1; that is \(X\sim N(0,1)\) (the standard normal distribution). For each sample it constructs a 0.95% confidence interval using the equations provided in lecture: \[\begin{align} \overline x &\pm z_{\alpha/2}\frac{\sigma}{\sqrt{n}}\\ (\overline x - z_{\alpha/2}\frac{\sigma}{\sqrt{n}} &, \overline x + z_{\alpha/2}\frac{\sigma}{\sqrt{n}} ) \end{align}\] where \[\begin{align} \overline x &= \frac{\sum_{1}^{n} x_i}{n} & \text{ and } && z_{\alpha/2} &= z_{0.025} \\ &= \frac{\sum_{1}^{30} x_i}{30} &&& & = 1.959964 \end{align}\] As you can see in the visualization and the table below, with a seed of 230 we actually got really lucky and 99% of the confidence intervals we constructed contained the true value of \(\mu\). That is to say, the coverage rate is 0.99 meaning 99 of the 100 confidence intervals contained 0 within their lower and upper bounds. Running this code again will produce different samples, and therefore difference confidence intervals and potential different coverage rates. While a random run maybe produce a coverage rate a little higher/lower than the set confidence level of 0.95, on average we would expect that 99% of the confidence intervals we draw will in fact contain \(\mu\).

library(knitr)
library(kableExtra)
CI = cbind(round(ciout$y0,3), round(ciout$y1,3))
rownames(CI) = 1:ciout$nci
CI <- cbind(CI, ifelse(ciout$cvr, "yes", "no"))
colnames(CI) = c("lower bound", "upper bound", "Includes mu?")

CI %>%
  kable(format = "html", digits = 3 ) %>%
  kable_styling(bootstrap_options = "bordered",
                full_width = FALSE) %>%
  add_header_above(c("Confidence Interval" = 2, "")) %>%
  row_spec(which(ciout$cvr==FALSE),  italic = TRUE, color = "black", background = "#ffcccb")

Confidence Interval
lower bound	upper bound	Includes mu?
-0.559	0.156	yes
-0.287	0.428	yes
-0.336	0.379	yes
-0.474	0.242	yes
-0.155	0.561	yes
-0.313	0.402	yes
-0.346	0.369	yes
-0.386	0.33	yes
-0.588	0.128	yes
-0.281	0.435	yes
-0.262	0.454	yes
-0.23	0.485	yes
-0.347	0.368	yes
-0.567	0.149	yes
-0.582	0.134	yes
-0.312	0.403	yes
0.009	0.725	no
-0.202	0.514	yes
-0.307	0.408	yes
-0.193	0.523	yes
-0.177	0.539	yes
-0.363	0.353	yes
-0.55	0.166	yes
-0.364	0.352	yes
-0.202	0.514	yes
-0.225	0.491	yes
-0.444	0.272	yes
-0.305	0.411	yes
-0.146	0.57	yes
-0.212	0.504	yes
-0.365	0.351	yes
-0.579	0.137	yes
-0.487	0.229	yes
-0.286	0.43	yes
-0.284	0.432	yes
-0.33	0.385	yes
-0.363	0.353	yes
-0.674	0.042	yes
-0.288	0.428	yes
-0.559	0.157	yes
-0.345	0.371	yes
-0.478	0.237	yes
-0.123	0.593	yes
-0.4	0.316	yes
-0.306	0.41	yes
-0.28	0.436	yes
-0.244	0.472	yes
-0.346	0.37	yes
-0.4	0.315	yes
-0.167	0.548	yes
-0.689	0.027	yes
-0.25	0.466	yes
-0.637	0.079	yes
-0.474	0.242	yes
-0.668	0.048	yes
-0.431	0.284	yes
-0.696	0.02	yes
-0.211	0.505	yes
-0.096	0.62	yes
-0.452	0.264	yes
-0.519	0.197	yes
-0.462	0.254	yes
-0.293	0.422	yes
-0.493	0.222	yes
-0.318	0.398	yes
-0.215	0.501	yes
-0.568	0.148	yes
-0.305	0.411	yes
-0.395	0.321	yes
-0.191	0.524	yes
-0.415	0.3	yes
-0.438	0.277	yes
-0.483	0.233	yes
-0.112	0.603	yes
-0.432	0.283	yes
-0.216	0.5	yes
-0.385	0.331	yes
-0.36	0.356	yes
-0.688	0.027	yes
-0.352	0.364	yes
-0.346	0.369	yes
-0.276	0.44	yes
-0.329	0.387	yes
-0.225	0.49	yes
-0.197	0.519	yes
-0.694	0.022	yes
-0.58	0.135	yes
-0.361	0.355	yes
-0.237	0.478	yes
-0.301	0.415	yes
-0.495	0.22	yes
-0.6	0.116	yes
-0.213	0.503	yes
-0.45	0.266	yes
-0.225	0.491	yes
-0.389	0.326	yes
-0.458	0.257	yes
-0.4	0.316	yes
-0.586	0.13	yes
-0.324	0.391	yes

You can pull the code from this post or play around with things on this shiny app: https://shiny.rit.albany.edu/stat/confidence/

Demo 1 (again)

Let’s run the demo again, this time with a different seed (set.seed(2021)).

set.seed(2021)
ciout <- ci.animate( ns= 30, nci=100)

This time, the coverage was 0.97. For fun, let’s run this code 100 more times and record the coverage rate each time (plots not shown):

coverage <- NULL
for (i in 1:100){
  ciout <- ci.animate( ns= 30, nci=100, show.plot = FALSE)
  coverage[i] <- ciout$coverage.rate
}
hist(coverage)

On average, the coverage was 0.9507. If we bumped up the sample size to 80:

coverage <- NULL
for (i in 1:100){
  ciout <- ci.animate( ns= 80, nci=100, show.plot = FALSE)
  coverage[i] <- ciout$coverage.rate
}
hist(coverage)

the coverage on average is 0.95.

demo2.mu <- 70
demo2.sd <- 5

Demo 2 Population \(X\sim N(70,5)\)

Let’s sample from a normal distribution having a mean of 70 and standard deviation of 5. We will repeat the simulation and count how many confidence intervals contain the true mean value of \(\mu = 70\). Let’s use the default values of a confidence level of 0.95, sample size of 50 and number of samples as 100. Note that when you use the default values, you do not need to specify those arguments in the function. That is:
ci.animate(level = 0.95, ns = 50, nci=100)

is equivalent to running:
ci.animate(level = 0.95, ns = 50, nci=100)

We will, however, need to change the default mean and standard deviation default parameter values for the population distribution from which we are sampling. The arguments of our function that controls these are nmu and nsd respectively. To see the code, press the CODE button to the right.

set.seed(95)
demo2 <- ci.animate(nmu=demo2.mu, nsd=demo2.sd)

For this simulation 93% of these confidence intervals contained the true value of \(\mu = 70\). That is to say, the coverage rate is 0.93 meaning 93 of the 100 confidence intervals contained 70 within their lower and upper bounds. In that sense, we got a bit unlucky in that more than 5% of our confidence intervals did not contain the true value of \(\mu\)

CI = cbind(round(demo2$y0,3), round(demo2$y1,3))
rownames(CI) = 1:demo2$nci
CI <- cbind(CI, ifelse(demo2$cvr, "yes", "no"))
colnames(CI) = c("lower bound", "upper bound", "Includes mu?")

CI %>%
  kable(format = "html", digits = c(3,3,NULL) ) %>%
  kable_styling(bootstrap_options = "bordered",
                full_width = FALSE) %>%
  add_header_above(c("Confidence Interval" = 2, "")) %>%
  row_spec(which(demo2$cvr==FALSE),  italic = TRUE, color = "black", background = "#ffcccb")

Confidence Interval
lower bound	upper bound	Includes mu?
68.268	71.04	yes
70.848	73.62	no
68.569	71.341	yes
67.151	69.923	no
68.557	71.329	yes
68.402	71.174	yes
67.658	70.429	yes
68.978	71.75	yes
68.006	70.778	yes
67.994	70.766	yes
67.695	70.467	yes
68.881	71.653	yes
68.034	70.806	yes
69.001	71.773	yes
68.654	71.426	yes
68.314	71.085	yes
67.577	70.349	yes
68.839	71.61	yes
67.712	70.484	yes
67.715	70.487	yes
68.963	71.735	yes
69.191	71.963	yes
71.065	73.837	no
70.187	72.958	no
68.528	71.3	yes
67.583	70.354	yes
67.856	70.628	yes
67.423	70.195	yes
69.173	71.944	yes
70.102	72.874	no
69.519	72.291	yes
69.427	72.199	yes
68.481	71.252	yes
69.394	72.166	yes
68.118	70.89	yes
68.122	70.894	yes
69.022	71.794	yes
68.245	71.017	yes
67.907	70.679	yes
68.348	71.12	yes
69.124	71.896	yes
69.046	71.818	yes
69.086	71.858	yes
68.664	71.435	yes
68.713	71.485	yes
69.296	72.068	yes
67.887	70.659	yes
68.711	71.483	yes
67.357	70.129	yes
67.823	70.595	yes
68.218	70.99	yes
69.476	72.248	yes
69.794	72.566	yes
68.911	71.683	yes
69.322	72.093	yes
68.39	71.162	yes
68.97	71.742	yes
67.601	70.372	yes
69.356	72.128	yes
69.511	72.283	yes
68.425	71.197	yes
68.413	71.185	yes
69.065	71.837	yes
69.13	71.902	yes
68.98	71.752	yes
67.475	70.247	yes
68.379	71.151	yes
69.706	72.477	yes
68.011	70.783	yes
67.7	70.472	yes
68.67	71.442	yes
69.245	72.016	yes
68.657	71.429	yes
68.643	71.415	yes
68.2	70.972	yes
69.48	72.252	yes
68.039	70.811	yes
68.65	71.422	yes
68.201	70.973	yes
68.874	71.646	yes
69.606	72.378	yes
66.347	69.118	no
69.172	71.943	yes
68.156	70.928	yes
67.792	70.564	yes
68.576	71.348	yes
66.473	69.245	no
69.415	72.187	yes
67.973	70.745	yes
67.761	70.533	yes
69.197	71.969	yes
68.918	71.69	yes
68.244	71.015	yes
69.106	71.878	yes
68.692	71.463	yes
67.76	70.532	yes
68.347	71.119	yes
67.365	70.137	yes
68.645	71.417	yes
67.986	70.758	yes

Confidence Interval

A demonstration

Dr. Irene Vrbik

Last updated: November, 2021

Confidence Intervals

Demo 1: Population \(X\sim N(0,1)\)

Demo 1 (again)

Demo 2 Population \(X\sim N(70,5)\)