Tutorial 04: One Sample Confidence Intervals

Q1 — Z Critical Value

Print the 95% 2-sided $Z^*$ critical value when $\alpha = 0.05$.

Info

Remember: Critical values represent points on a distribution which play an important role in both confidence intervals and hypothesis testing. They are the cutoffs on a reference distribution that set the width of confidence intervals.

Q2 — One Sample Z Confidence Interval

The penguins dataset contains measurements for 3 penguin species from the Palmer Archipelago. Use body_mass_g (grams). Assume the population SD is known: σ = 450 g. Construct a 90% Z-interval for the true mean body mass. Print c(lower, upper).

Info

When σ is known, the margin of error uses the standard normal critical value.

Preview

Run this code chunk to get a glimpse of the dataset. Feel free to change the values to visualize more/less number of rows.

Info

When σ is known, \[ \mathrm{CI}_{1-\alpha}:\ \bar{x} \pm x_{1-\alpha/2}\,\frac{\sigma}{\sqrt{n}}. \]

if (!requireNamespace("palmerpenguins", quietly = TRUE)) webr::install("palmerpenguins")
library(palmerpenguins)
x <- na.omit(palmerpenguins::penguins$body_mass_g)
xbar <- mean(x)
n <- length(x)
z <- qnorm(0.95) # 90% two-sided
sigma <- 450
xbar + c(-1,1) * z * sigma / sqrt(n)

Q3 — T Critical Value

PlantGrowth is a dataset that contains results from an experiment to compare yields (as measured by dried weight of plants) obtained under a control and two different treatment conditions. Print the 95% two-sided t* using PlantGrowth’s weight column when $\alpha = 0.05$.

Preview

Run this code chunk to get a glimpse of the dataset. Feel free to change the values to visualize more/less number of rows.

Q4 — One-Sample Confidence Interval for μ (σ Unknown)

Using PlantGrowth$weight (σ unknown), construct a 95% CI for the true mean plant weight. Give your output as a vector of the form c(lower, upper).

Preview

Run this code chunk to get a glimpse of the dataset. Feel free to change the values to visualize more/less number of rows.

Info

When σ is unknown, replace with the sample SD s and use the t distribution:

\[ \mathrm{CI}_{1-\alpha}:\ \bar{x} \pm t_{1-\alpha/2,\;n-1}\,\frac{s}{\sqrt{n}},\ \text{df}=n-1. \]

data("PlantGrowth")
xbar <- mean(PlantGrowth$weight); 
sd <- sd(PlantGrowth$weight); 
n <- length(PlantGrowth$weight)
critical_value <- qt(0.975, df = n - 1)
xbar + c(-1,1)*critical_value*sd/sqrt(n)

Q5 — One sample CI for μ

The dataset used here is river lengths, which gives the lengths (in miles) of 141 “major” rivers in North America, as compiled by the US Geological Survey. Assume river lengths (miles) have known population SD σ = 300. Build a 95% CI for the true mean river length using rivers whe $\alpha = 0.05$.

Preview

Run this code chunk to get a glimpse of the dataset. Feel free to change the values to visualize more/less number of rows.

Info

When σ is unknown, use the t distribution with degrees of freedom n − 1.

rivers_data <- data.frame(length_miles = as.numeric(datasets::rivers))
x <- rivers_data$length_miles
xbar <- mean(x); n <- length(x); z <- qnorm(0.975); sigma <- 300
xbar + c(-1,1) * z * sigma / sqrt(n)

Q6 — One-Sample CI for μ

Using airquality$Ozone (ignore missing), construct and print 90% CI for the true mean ozone level (ppb). Assume normality is reasonable.

Preview

w <- na.omit(datasets::airquality$Ozone)
xbar <- mean(w); s <- sd(w); n <- length(w)
tstar <- qt(0.95, df = n - 1) # two-sided 90%
xbar + c(-1,1) * tstar * s / sqrt(n)

Q7 — One-Sample CI for a Proportion

In the UCBAdmissions dataset, Admit status for a student is “Admitted” or “Rejected”. Build and print c(lower, upper) for the 95% CI on the overall admission proportion.

Info

For a single proportion:

\[ \mathrm{CI}_{1-\alpha}:\ \hat{p} \pm z_{1-\alpha/2}\,\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}. \]

Rule: Ensure $n\hat{p}$ and $n(1 - \hat{p})$ are not too small

Preview

df <- as.data.frame(datasets::UCBAdmissions)
n <- sum(df$Freq)
phat <- sum(df$Freq[df$Admit == "Admitted"]) / n
z <- qnorm(0.975)
se <- sqrt(phat*(1 - phat)/n)
phat + c(-1,1) * z * se

Q8 - One-Sample CI for a Variance (σ²)

Using PlantGrowth$weight, print c(lower, upper) for 95% CI on σ^2. (Assume normality.)

Info

For normal data, the variance CI uses the chi-square distribution with df=n−1:

\[ \left(\frac{(n-1)s^2}{\chi^2_{1-\alpha/2,\;n-1}},\; \frac{(n-1)s^2}{\chi^2_{\alpha/2,\;n-1}}\right),\qquad s^2=\text{sample variance}. \]

For 95%, the cutoffs are at 0.025 and 0.975.

Preview

x <- datasets::PlantGrowth$weight
n <- length(x); df <- n - 1; s2 <- var(x)
chi_lo <- qchisq(0.975, df = df) # upper-tail cutoff
chi_hi <- qchisq(0.025, df = df) # lower-tail cutoff
c(df * s2/chi_lo, df*s2/chi_hi)

Q9 - One-Sample CI for a Proportion (99% Practice)

HairEyeColor is a built-in contingency table of Hair × Eye × Gender of Statistics students with counts. Treat each person as a trial and estimate the true proportion with Brown eyes. Construct a 95% Z-interval for the population proportion p. Print c(lower, upper).

Info

Two-sided 1−α CI for a proportion:

\[ \mathrm{CI}_{1-\alpha}:\ \hat{p} \pm z_{1-\alpha/2}\,\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}. \]

For 99%, use $z^* = \phi^{-1}$(0.995).

Preview

Here is a glimpse of the UCBAdmissions dataset.

df <- as.data.frame(datasets::HairEyeColor)
n <- sum(df$Freq)
phat <- sum(df$Freq[df$Eye == "Brown"]) / n
z <- qnorm(0.975)
phat + c(-1,1) * z * sqrt(phat*(1-phat)/n)