Tutorial 07: Two Sample Hypothesis Tests

Q1 — Pooled two-sample t-test (Audience score: Avengers vs Spider-Man)

For this question, we use marvel.csv. Compare the audience % score between Avengers and Spider-Man movies.

We test \[ H_0 : \mu_{Avengers} - \mu_{Spider-Man} = 0\] vs \[H_1 : \mu_{Avengers} - \mu_{Spider-Man} \neq 0\]

assuming equal population variances and using a pooled two-sample t-test. Compute the p-value manually from the sample statistics and the t distribution. Your final output should be a single numeric p-value.

Info

In a pooled two-sample t-test for two groups with sample sizes (n_1, n_2), sample means (x_1, x_2), and sample standard deviations (s_1, s_2), we first compute the pooled variance

\[ s_p^2 = \frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}. \]

The test statistic for testing (H_0: _1 - _2 = 0) is

\[ t = \frac{\bar x_1 - \bar x_2}{\sqrt{s_p^2\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}}. \]

Under the null hypothesis, this statistic follows a t distribution with

\[ \text{df} = n_1 + n_2 - 2 \]

degrees of freedom. For a two-sided test, the p-value is

\[ p\text{-value} = 2\,P\bigl(T_{\text{df}} \ge |t|\bigr). \]

Preview

Run this code chunk to get a glimpse of the dataset and visualize the audience scores.


df <- read.csv("marvel.csv", check.names = FALSE)

df <- subset(df, category %in% c("Avengers","Spider-Man"))
df$aud <- suppressWarnings(as.numeric(sub("%", "", df[["audience % score"]], fixed = TRUE)))
df <- subset(df, is.finite(aud))

g1 <- df$aud[df$category == "Avengers"]
g2 <- df$aud[df$category == "Spider-Man"]

n1 <- length(g1); n2 <- length(g2)
m1 <- mean(g1);   m2 <- mean(g2)
s1 <- sd(g1);     s2 <- sd(g2)

sp2  <- ((n1-1)*s1^2 + (n2-1)*s2^2)/(n1+n2-2)
tval <- (m1-m2)/sqrt(sp2*(1/n1 + 1/n2))
df_t <- n1+n2-2
pval <- 2*pt(-abs(tval), df_t)
pval

Q2 — Welch two-sample t-test (Opening weekend: Avengers vs Spider-Man)

Now use marvel.csv to compare the opening weekend gross (in $m) between Avengers and Spider-Man movies.

We test \[ H_0 : \mu_{Avengers} - \mu_{Spider-Man} = 0\] vs \[H_1 : \mu_{Avengers} - \mu_{Spider-Man} \neq 0\]

without assuming equal variances. Use the Welch two-sample t-test formula and compute the p-value manually from the t distribution. Do not call the built-in t.test function in this question. Return a single numeric p-value.

Info

In the Welch two-sample t-test, we keep the sample variances separate. For two groups with sample sizes (n_1, n_2), means (x_1, x_2), and standard deviations (s_1, s_2), the test statistic is

\[ t = \frac{\bar x_1 - \bar x_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}. \]

The approximate Welch degrees of freedom are

\[ \text{df}_W = \frac{\left(\dfrac{s_1^2}{n_1} + \dfrac{s_2^2}{n_2}\right)^2}{ \dfrac{\left(\dfrac{s_1^2}{n_1}\right)^2}{n_1 - 1} + \dfrac{\left(\dfrac{s_2^2}{n_2}\right)^2}{n_2 - 1} }. \]

For a two-sided test of (H_0: _1 - _2 = 0), the p-value is

\[ p\text{-value} = 2\,P\bigl(T_{\text{df}_W} \ge |t|\bigr). \]

Preview


df <- read.csv("marvel.csv", check.names = FALSE)
df <- subset(df, category %in% c("Avengers","Spider-Man"))
df$open <- df[["opening weekend ($m)"]]
df <- df[is.finite(df$open), ]

g1 <- df$open[df$category == "Avengers"]
g2 <- df$open[df$category == "Spider-Man"]

n1 <- length(g1); n2 <- length(g2)
m1 <- mean(g1);   m2 <- mean(g2)
s1 <- sd(g1);     s2 <- sd(g2)

se_diff <- sqrt(s1^2 / n1 + s2^2 / n2)
tval <- (m1 - m2) / se_diff

df_w <- (s1^2 / n1 + s2^2 / n2)^2 /
  ((s1^2 / n1)^2 / (n1 - 1) + (s2^2 / n2)^2 / (n2 - 1))

pval <- 2 * pt(-abs(tval), df_w)
pval

Q3 — Pooled two-sample t-test (Critics score: Captain America vs Iron Man)

Now compare critics % score between Captain America and Iron Man movies in marvel.csv.

We test \[ H_0 : \mu_{Cap} - \mu_{Iron Man} = 0\] vs \[H_1 : \mu_{Cap} - \mu_{Iron Man} \neq 0\]

and we treat the population variances as equal, using a pooled two-sample t-test. Your answer should be a single numeric p-value.

Info

This is another two-sample mean comparison where we assume the two sets of critics scores arise from populations with the same variance. Under that assumption, we combine the information from both groups into one pooled estimate of the variance to build the test statistic.

Preview

df <- read.csv("marvel.csv", check.names = FALSE)
df <- subset(df, category %in% c("Captain America","Iron Man"))

df$crit <- suppressWarnings(as.numeric(sub("%","", df[["critics % score"]], fixed = TRUE)))
df <- subset(df, is.finite(crit))

df$grp <- factor(df$category, levels = c("Captain America","Iron Man"))
t.test(crit ~ grp, data = df, var.equal = TRUE)$p.value

Q4 — Welch two-sample t-test (Domestic gross: Black Panther vs Thor)

Finally, compare the domestic gross (in $m) between Black Panther movies and Thor movies in marvel.csv.

We test \[ H_0 : \mu_{Black Panther} - \mu_{Thor} = 0\] vs \[H_1 : \mu_{Black Panther} - \mu_{Thor} \neq 0\]

Here we do not assume the variances are equal and instead use a Welch two-sample t-test. Your answer should be a single numeric p-value.

Info

This is another two-sample test where the two groups may have quite different variability. The Welch approach adjusts both the test statistic and degrees of freedom to account for unequal variances.

Preview

df <- read.csv("marvel.csv", check.names = FALSE)
df <- subset(df, category %in% c("Black Panther","Thor"))

df$dom <- suppressWarnings(as.numeric(df[["domestic gross ($m)"]]))
df <- df[is.finite(df$dom), ]

df$grp <- factor(df$category, levels = c("Black Panther","Thor"))

pval <- t.test(dom ~ grp, data = df, var.equal = FALSE)$p.value
unname(as.numeric(pval))

Q5 — Welch two-sample t-test using built-in method (HP: Legendary vs non-Legendary)

Use Pokemon.csv to test whether the proportion of Legendary Pokémon is the same in Generation 1 and Generation 2.

Let “success” be Legendary = TRUE. Define:

Group 1: Generation 1
Group 2: Generation 2

We test

\[ H_0:\ p_1 - p_2 = 0 \]

versus

\[ H_1:\ p_1 - p_2 \neq 0, \]

where (p_1) and (p_2) are the true proportions of Legendary Pokémon in Generation 1 and 2.

Compute the p-value manually using the normal approximation formulas. Your final output must be a single numeric p-value.

Info

Let (x_1, x_2) be the number of successes in groups 1 and 2, and (n_1, n_2) the corresponding sample sizes. The sample proportions are

\[ \hat{p}_1 = \frac{x_1}{n_1}, \qquad \hat{p}_2 = \frac{x_2}{n_2}. \]

Under (H_0: p_1 = p_2), the pooled proportion is

\[ \hat{p} = \frac{x_1 + x_2}{n_1 + n_2}. \]

The standard error for the difference in proportions under the null is

\[ \text{SE}(\hat{p}_1 - \hat{p}_2) = \sqrt{ \hat{p}(1 - \hat{p})\left(\frac{1}{n_1} + \frac{1}{n_2}\right) }. \]

The test statistic ::: {.callout-note title=“Preview”}

df <- read.csv("Pokemon.csv")
df$Legendary <- df$Legendary == "True"
df$grp <- factor(df$Legendary, levels = c(FALSE, TRUE))
res <- t.test(HP ~ grp, data = df, var.equal = FALSE)
res$p.value

Q6 — Two-sample test for difference of proportions using built-in method

Again compare the proportion of Legendary Pokémon in Generation 1 and Generation 2, but now using a built-in two-sample proportion test.

Let “success” be Legendary = TRUE.

We test

\[ H_0:\ p_1 - p_2 = 0 \]

versus

\[ H_1:\ p_1 - p_2 \neq 0, \]

where (p_1) and (p_2) are the true proportions of Legendary Pokémon in Generation 1 and Generation 2.
Use a built-in two-sample test for proportions and return a single numeric p-value.

Info

A built-in two-sample test for proportions needs: - the vector of successes in each group, ((x_1, x_2))
- the vector of sample sizes, ((n_1, n_2))

It then constructs the appropriate test statistic and p-value under the null hypothesis that the two population proportions are equal.

Preview

df <- read.csv("Pokemon.csv")
df$Legendary <- df$Legendary == "True"
df <- subset(df, Generation %in% c(1, 2))

x1 <- sum(df$Legendary[df$Generation == 1])
x2 <- sum(df$Legendary[df$Generation == 2])

n1 <- sum(df$Generation == 1)
n2 <- sum(df$Generation == 2)

res <- suppressWarnings(prop.test(c(x1, x2), c(n1, n2), correct = TRUE))
res$p.value

Q7 — F-test for ratio of variances using built-in method (Attack: Fire vs Water)

Use Pokemon.csv to compare the variance of Attack between Fire-type and Water-type Pokémon (using the primary type Type 1).

Let

Group 1: Type 1 = Fire
Group 2: Type 1 = Water

We test

\[ H_0:\ \sigma_{\text{Fire}}^2 = \sigma_{\text{Water}}^2 \]

versus

\[ H_1:\ \sigma_{\text{Fire}}^2 \neq \sigma_{\text{Water}}^2. \]

Use an appropriate built-in two-sample test for equality of variances and return a single numeric p-value.

Info

A built-in two-sample variance test compares the sample variances of two groups using an F statistic and the F distribution. It uses the group sample sizes to set the numerator and denominator degrees of freedom, and then computes a p-value for testing whether the two population variances are equal.

Preview

df <- read.csv("Pokemon.csv")
df <- subset(df, Type.1 %in% c("Fire", "Water"))
df$grp <- factor(df$Type.1, levels = c("Fire", "Water"))

res <- var.test(Attack ~ grp, data = df)
res$p.value