Tutorial 04: One Sample Confidence Intervals

Q1 — Z Critical Value

Print the 95% 2-sided \(Z^*\) critical value when \(\alpha = 0.05\).

NoteInfo

Remember: Critical values represent points on a distribution which play an important role in both confidence intervals and hypothesis testing. They are the cutoffs on a reference distribution that set the width of confidence intervals.

Use qnorm.

qnorm(0.975)
qnorm(0.975)

Q2 — One Sample Z Confidence Interval

The penguins dataset contains measurements for 3 penguin species from the Palmer Archipelago. Use body_mass_g (grams). Assume the population SD is known: σ = 450 g. Construct a 90% Z-interval for the true mean body mass. Print c(lower, upper).

NoteInfo

When σ is known, the margin of error uses the standard normal critical value.

NotePreview

Run this code chunk to get a glimpse of the dataset. Feel free to change the values to visualize more/less number of rows.

NoteInfo

When σ is known, \[ \mathrm{CI}_{1-\alpha}:\ \bar{x} \pm x_{1-\alpha/2}\,\frac{\sigma}{\sqrt{n}}. \]

Compute the sample mean, pair with the known σ and your Z critical value. Output c(lower, upper).

if (!requireNamespace("palmerpenguins", quietly = TRUE)) webr::install("palmerpenguins") library(palmerpenguins) x <- na.omit(palmerpenguins::penguins$body_mass_g) xbar <- mean(x) n <- length(x) z <- qnorm(0.95) # 90% two-sided sigma <- 450 xbar + c(-1,1) * z * sigma / sqrt(n)
if (!requireNamespace("palmerpenguins", quietly = TRUE)) webr::install("palmerpenguins")
library(palmerpenguins)
x <- na.omit(palmerpenguins::penguins$body_mass_g)
xbar <- mean(x)
n <- length(x)
z <- qnorm(0.95) # 90% two-sided
sigma <- 450
xbar + c(-1,1) * z * sigma / sqrt(n)

Q3 — T Critical Value

PlantGrowth is a dataset that contains results from an experiment to compare yields (as measured by dried weight of plants) obtained under a control and two different treatment conditions. Print the 95% two-sided t* using PlantGrowth’s weight column when \(\alpha = 0.05\).

Photo by Nagy Arnold on Unsplash
NotePreview

Run this code chunk to get a glimpse of the dataset. Feel free to change the values to visualize more/less number of rows.

Find n from the weights vector and print the two-sided 95% t critical value using df = n − 1. Refer to the previous tutorial for a hint about the R command used to print t quantiles.

data("PlantGrowth") n <- length(PlantGrowth$weight) qt(0.975, df = n - 1)
data("PlantGrowth")
n <- length(PlantGrowth$weight)
qt(0.975, df = n - 1)

Q4 — One-Sample Confidence Interval for μ (σ Unknown)

Using PlantGrowth$weight (σ unknown), construct a 95% CI for the true mean plant weight. Give your output as a vector of the form c(lower, upper).

NotePreview

Run this code chunk to get a glimpse of the dataset. Feel free to change the values to visualize more/less number of rows.

NoteInfo

When σ is unknown, replace with the sample SD s and use the t distribution:

\[ \mathrm{CI}_{1-\alpha}:\ \bar{x} \pm t_{1-\alpha/2,\;n-1}\,\frac{s}{\sqrt{n}},\ \text{df}=n-1. \]

Form “center ± margin of error”: center is the sample mean; margin uses the t critical value with df = n−1, the sample SD, and √n. Output c(lower, upper).

data("PlantGrowth") xbar <- mean(PlantGrowth$weight); sd <- sd(PlantGrowth$weight); n <- length(PlantGrowth$weight) critical_value <- qt(0.975, df = n - 1) xbar + c(-1,1)*critical_value*sd/sqrt(n)
data("PlantGrowth")
xbar <- mean(PlantGrowth$weight); 
sd <- sd(PlantGrowth$weight); 
n <- length(PlantGrowth$weight)
critical_value <- qt(0.975, df = n - 1)
xbar + c(-1,1)*critical_value*sd/sqrt(n)

Q5 — One sample CI for μ

The dataset used here is river lengths, which gives the lengths (in miles) of 141 “major” rivers in North America, as compiled by the US Geological Survey. Assume river lengths (miles) have known population SD σ = 300. Build a 95% CI for the true mean river length using rivers whe \(\alpha = 0.05\).

Photo by kazuend on Unsplash
NotePreview

Run this code chunk to get a glimpse of the dataset. Feel free to change the values to visualize more/less number of rows.

NoteInfo

When σ is unknown, use the t distribution with degrees of freedom n − 1.

Find n from the length_miles vector and compute the two-sided 95% z critical value. Then combine it using the formula you have already used in previous questions.

rivers_data <- data.frame(length_miles = as.numeric(datasets::rivers)) x <- rivers_data$length_miles xbar <- mean(x); n <- length(x); z <- qnorm(0.975); sigma <- 300 xbar + c(-1,1) * z * sigma / sqrt(n)
rivers_data <- data.frame(length_miles = as.numeric(datasets::rivers))
x <- rivers_data$length_miles
xbar <- mean(x); n <- length(x); z <- qnorm(0.975); sigma <- 300
xbar + c(-1,1) * z * sigma / sqrt(n)

Q6 — One-Sample CI for μ

Using airquality$Ozone (ignore missing), construct and print 90% CI for the true mean ozone level (ppb). Assume normality is reasonable.

Photo by Tim Witzdam on Unsplash
NotePreview

Two-sided 90% → \(t^*\) at 0.95 with df = n−1; use s for the SD.

w <- na.omit(datasets::airquality$Ozone) xbar <- mean(w); s <- sd(w); n <- length(w) tstar <- qt(0.95, df = n - 1) # two-sided 90% xbar + c(-1,1) * tstar * s / sqrt(n)
w <- na.omit(datasets::airquality$Ozone)
xbar <- mean(w); s <- sd(w); n <- length(w)
tstar <- qt(0.95, df = n - 1) # two-sided 90%
xbar + c(-1,1) * tstar * s / sqrt(n)

Q7 — One-Sample CI for a Proportion

In the UCBAdmissions dataset, Admit status for a student is “Admitted” or “Rejected”. Build and print c(lower, upper) for the 95% CI on the overall admission proportion.

Photo by Matt Ragland on Unsplash
NoteInfo

For a single proportion:

\[ \mathrm{CI}_{1-\alpha}:\ \hat{p} \pm z_{1-\alpha/2}\,\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}. \]

Rule: Ensure \(n\hat{p}\) and \(n(1 - \hat{p})\) are not too small

NotePreview

phat <- sum(df\(Freq[df\)Admit == “Admitted”]) / n and z <- qnorm(0.975) for 95%.

df <- as.data.frame(datasets::UCBAdmissions) n <- sum(df$Freq) phat <- sum(df$Freq[df$Admit == "Admitted"]) / n z <- qnorm(0.975) se <- sqrt(phat*(1 - phat)/n) phat + c(-1,1) * z * se
df <- as.data.frame(datasets::UCBAdmissions)
n <- sum(df$Freq)
phat <- sum(df$Freq[df$Admit == "Admitted"]) / n
z <- qnorm(0.975)
se <- sqrt(phat*(1 - phat)/n)
phat + c(-1,1) * z * se

Q8 - One-Sample CI for a Variance (σ²)

Using PlantGrowth$weight, print c(lower, upper) for 95% CI on σ^2. (Assume normality.)

NoteInfo

For normal data, the variance CI uses the chi-square distribution with df=n−1:

\[ \left(\frac{(n-1)s^2}{\chi^2_{1-\alpha/2,\;n-1}},\; \frac{(n-1)s^2}{\chi^2_{\alpha/2,\;n-1}}\right),\qquad s^2=\text{sample variance}. \]

For 95%, the cutoffs are at 0.025 and 0.975.

NotePreview

Use \(s^2\), df = n−1, and chi-square quantiles at 0.975 and 0.025 (note the order).

x <- datasets::PlantGrowth$weight n <- length(x); df <- n - 1; s2 <- var(x) chi_lo <- qchisq(0.975, df = df) # upper-tail cutoff chi_hi <- qchisq(0.025, df = df) # lower-tail cutoff c(df * s2/chi_lo, df*s2/chi_hi)
x <- datasets::PlantGrowth$weight
n <- length(x); df <- n - 1; s2 <- var(x)
chi_lo <- qchisq(0.975, df = df) # upper-tail cutoff
chi_hi <- qchisq(0.025, df = df) # lower-tail cutoff
c(df * s2/chi_lo, df*s2/chi_hi)

Q9 - One-Sample CI for a Proportion (99% Practice)

HairEyeColor is a built-in contingency table of Hair × Eye × Gender of Statistics students with counts. Treat each person as a trial and estimate the true proportion with Brown eyes. Construct a 95% Z-interval for the population proportion p. Print c(lower, upper).

Photo by Daniil Lebedev on Unsplash
NoteInfo

Two-sided 1−α CI for a proportion:

\[ \mathrm{CI}_{1-\alpha}:\ \hat{p} \pm z_{1-\alpha/2}\,\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}. \]

For 99%, use \(z^* = \phi^{-1}\)(0.995).

NotePreview

Here is a glimpse of the UCBAdmissions dataset.

Use z <- qnorm(0.975) for 95%.

df <- as.data.frame(datasets::HairEyeColor) n <- sum(df$Freq) phat <- sum(df$Freq[df$Eye == "Brown"]) / n z <- qnorm(0.975) phat + c(-1,1) * z * sqrt(phat*(1-phat)/n)
df <- as.data.frame(datasets::HairEyeColor)
n <- sum(df$Freq)
phat <- sum(df$Freq[df$Eye == "Brown"]) / n
z <- qnorm(0.975)
phat + c(-1,1) * z * sqrt(phat*(1-phat)/n)