Approximately 80% of data analysis is using these function. Do you learn that in the 303 sequence? We will cover/review them in this class. In these packages:
Welch Two Sample t-test
data: mpg by transmission
t = -3.7671, df = 18.332, p-value = 0.001374
alternative hypothesis: true difference in means between group automatic and group manual is not equal to 0
95 percent confidence interval:
-11.280194 -3.209684
sample estimates:
mean in group automatic mean in group manual
17.14737 24.39231
How do you interpret this result?
Highly statistically significant!
Is this a causal effect?
yes and no!
Manual transmissions are (historically) more efficient…
but not 7 mpg…
What might be some confounding variables?
Expanded Independent Samples t-test
Scenario: Comparing means of two independent groups, say, Treatment (T) and Control (C).
Statistical Model:
Let \(Y_{Ti}\)and \(Y_{Ci}\) be the response variables (mgp) for the treatment (manual) and control (automatic) groups, respectively.
What is \(i\)?
We assume \(Y_{Ti} \sim N(\mu_T, \sigma^2)\) and \(Y_{Ci} \sim N(\mu_C, \sigma^2)\), where \(\mu_T\)and \(\mu_C\) are the group means, and \(\sigma^2\) is the common variance.
Null Hypothesis:
\(H_0: \mu_T = \mu_C\)
This means there is no effect of the treatment.
Linear Model Interpretation:
t-test notation:
\(Y_{Ti}\)and \(Y_{Ci}\) be the response variables (mgp) for the treatment (manual) and control (automatic) groups, respectively.
We assume \(Y_{Ti} \sim N(\mu_T, \sigma^2)\) and \(Y_{Ci} \sim N(\mu_C, \sigma^2)\), where \(\mu_T\)and \(\mu_C\) are the group means, and \(\sigma^2\) is the common variance.
Turn the t-test into a linear model:
We want to “shoe horn” the above model into a linear model, i.e. this notation: \(Y = \beta_0 + \beta_1 X + \epsilon\)
Hint: make \(X\) is a binary indicator or a “dummy variable” (0/1).
Express \(\beta_0\) and \(\beta_1\) in terms of \(\mu_T\) and \(\mu_C\).
What is the Null Hypothesis in Linear Model coefficients \(\beta_0\) and \(\beta_1\)?
Paired Sample t-test
Scenario: Comparing the same group under two different conditions.
Statistical Model:
Let \(Y_{1i}\) and \(Y_{2i}\) be the response variables for condition 1 and condition 2, respectively, for the \(i\)-th subject.
Assume \(Y_{1i} \sim N(\mu_1, \sigma^2)\) and \(Y_{2i} \sim N(\mu_2, \sigma^2)\).
Shoe horned into a linear model:
Define \(\Delta Y_i = Y_{2i} - Y_{1i}\). The assumptions imply this is normally distributed.
\(H_0: \mu_1 = \mu_2\) or equivalently \(H_0: \mu_{\Delta} = 0\) where \(\mu_{\Delta}\) is the mean of \(\Delta Y_i\).
Model: \(\Delta Y = \beta_0 + \epsilon\)
Here, \(\beta_0 = \mu_{\Delta}\).
Null Hypothesis in Linear Model Terms: \(H_0: \beta_0 = 0\)
t-test as a Linear Model
Key Concept: Comparing means can be reframed as a linear model.
Independent Samples t-test
Scenario: Comparing two independent groups (e.g., Treatment vs. Control).
Linear Model Framework:
Group membership as a categorical variable.
Model: \(Y = \beta_0 + \beta_1 X + \epsilon\)
Interpretation:
\(\beta_1\) significantly different from zero indicates a group difference.
Paired Sample t-test
Scenario: Comparing the same group under two conditions.
Linear Model Framework:
Difference in scores as the outcome.
Model: \(\Delta Y = \beta_0 + \epsilon\)
Interpretation:
Testing if \(\beta_0\) is significantly different from zero.
Linear models look cold and bunk when it is just notation
Hopefully in these simple examples above,
Became aware of mutate to make new columns/variables and select to pick columns/variables.
refreshed on t-tests
saw how the paired samples t-test is a linear model with no \(X\) and the independent samples t-test is a linear model with \(X\) as a “dummy variable”
hardest part: “doing algebra with models”
Further Discussion 1
In the t-test above, there are two possibilities “T/C”. What if there are more than two? mtcars gives the number of cylinders cyl.
mtcars |>count(cyl)
# A tibble: 3 × 2
cyl n
<dbl> <int>
1 4 11
2 6 7
3 8 14
How might we extend the t-test to more than two groups? What is the model/null hypothesis? What is the linear model?
Further Discussion 2
If we are interested in the causal effect of transmission, then we need to realize that there are confounders in the analysis above. For example, automatic cars tend to be heavier (that’s bad for mpg):