We don't need any new R to do section 6.1 calculations. e.g. Here are the numbers from the dissolved-oxygen example in the 6.1 notes.
> n = 45
> x.bar = 4.62
> s = 0.92
> mu_0 = 5
> (z = (x.bar - mu_0)/(s/sqrt(n)))
[1] -2.771
> (p.value = pnorm(z))
[1] 0.002796
We standardize x.bar
so that we can get a probability from a table,
and I find it easiest, even with R, to continue to standardize to get
a z-score, and then find a p-value with pnorm(z)
. But we don't need
to standardize when using R, since pnorm()
doesn't require mean 0
and standard deviation 1. As we saw in section 4.3, it can also use
the form pnorm(x, mu, sigma)
to give function F(x) = P(X ≤ x) for
X ~ N(\( \mu \), \( \sigma^2 \)):
> (p.value.without.z.score = pnorm(x.bar, mu_0, 0.92/sqrt(n)))
[1] 0.002796
We don't need any new R to do section 6.3 calculations. e.g. Here are the numbers from the John Wayne example in the 6.3 notes.
> p_0 = 0.14
> n = 220
> X = 91
> (p.hat = X/n)
[1] 0.4136
> (z = (p.hat - p_0)/sqrt(p_0 * (1 - p_0)/n))
[1] 11.7
> (p.value = pnorm(-z))
[1] 6.607e-32
(There's nothing to see here, folks. Move along.)
We don't need any new R to do section 6.4 calculations. e.g. Here's the test for nitrogen in ancient air.
> nitrogen = c(63.4, 65, 64.4, 63.3, 54.8, 64.5, 60.8, 49.1, 51)
> mu_0 = 78.1
> (n = length(nitrogen))
[1] 9
> (x.bar = mean(nitrogen))
[1] 59.59
> (s = sd(nitrogen))
[1] 6.255
> (t = (x.bar - mu_0)/(s/sqrt(n)))
[1] -8.878
> # Next line: while t could be left or right of zero, I'm sure that -abs(t)
> # is left of zero, so I'll use it to get a left-tail area, and then double
> # that to get the two-sided P-value.
> (p.value = 2 * pt(-abs(t), n - 1))
[1] 2.049e-05
Another way to do it is via t.test()
, which we used in section
5.4. t.test(x, alternative="two.sided", mu=0)
tests \( H_0: \mu_X =
\mu_0 = \) mu
for a sample x
from a normal population. (The “=0
”
part of “mu=0
” says that mu
defaults to 0
if it's omitted from
the function call.) e.g.
> t.test(nitrogen, alternative = "two.sided", mu = mu_0)
One Sample t-test
data: nitrogen
t = -8.878, df = 8, p-value = 2.049e-05
alternative hypothesis: true mean is not equal to 78.1
95 percent confidence interval:
54.78 64.40
sample estimates:
mean of x
59.59
Using alternative="two.sided"
gives a two-tailed P-value suitable
for \( H_1: \mu_X \ne \mu_0 \). alternative="less"
is for \( H_1: \mu_X <
\mu_0 \) and alternative="greater"
is for \( H_1: \mu_X > \mu_0 \).
counts = c(...); probs = c(...); chisq.test(x, p)
tests \( H_0: \) the sample with category counts specfied in vector counts
came from the population with category probabilities specified in vector probs
vs. \( H_1 \): they did not.
e.g. Here's the test for whether kids' M&Ms came from the Nice family:
> kids.sample = c(12, 15, 17, 6)
> Nice.population = c(0.2, 0.25, 0.4, 0.15)
> chisq.test(x = kids.sample, p = Nice.population)
Chi-squared test for given probabilities
data: kids.sample
X-squared = 1.65, df = 3, p-value = 0.6481
If you want more details, like the expected counts, save the result of chisq.test()
in a variable and look at its structure via str()
:
out = chisq.test(x = kids.sample, p = Nice.population)
str(out)
## List of 9
## $ statistic: Named num 1.65
## ..- attr(*, "names")= chr "X-squared"
## $ parameter: Named num 3
## ..- attr(*, "names")= chr "df"
## $ p.value : num 0.648
## $ method : chr "Chi-squared test for given probabilities"
## $ data.name: chr "kids.sample"
## $ observed : num [1:4] 12 15 17 6
## $ expected : num [1:4] 10 12.5 20 7.5
## $ residuals: num [1:4] 0.632 0.707 -0.671 -0.548
## $ stdres : num [1:4] 0.707 0.816 -0.866 -0.594
## - attr(*, "class")= chr "htest"
out
is a “list” of 9 components, including $expected
. The dollar sign, $
, gives access to list components. e.g.
out$expected
## [1] 10.0 12.5 20.0 7.5
out$statistic
## X-squared
## 1.65
out$p.value
## [1] 0.6481
To get a contingency table into chisq.test()
, use matrix(data, nrow, ncol, byrow=FALSE)
, which fills an nrow
by ncol
matrix, by column, from the vector data
. (Change the default byrow=FALSE
to byrow=TRUE
to fill by row instead.) e.g. For our education/smoking example,
(French.men = matrix(data = c(56, 37, 53, 54, 43, 28, 41, 27, 36, 36, 32, 16),
nrow = 3, ncol = 4, byrow = FALSE))
## [,1] [,2] [,3] [,4]
## [1,] 56 54 41 36
## [2,] 37 43 27 32
## [3,] 53 28 36 16
Note that the element of matrix m
in row r
and column c
is given by m[r,c]
, row r
is given by m[r, ]
(the column index is omitted), and column c
is given by m[ ,c]
(the row index is omitted). So we could find row sums, etc., if we wanted them:
French.men[1, 1]
## [1] 56
French.men[1, ]
## [1] 56 54 41 36
sum(French.men[1, ])
## [1] 187
But R will do the whole chi-squared test for us. For a matrix x
, the call chisq.test(x)
tests \( H_0 \): row and column variables are independent vs. \( H_1 \): they are not. e.g.
chisq.test(French.men)
##
## Pearson's Chi-squared test
##
## data: French.men
## X-squared = 13.3, df = 6, p-value = 0.03844
Once again, more details than are printed by default are available by saving the result of chisq.test()
and looking at its structure:
out = chisq.test(French.men)
str(out)
## List of 9
## $ statistic: Named num 13.3
## ..- attr(*, "names")= chr "X-squared"
## $ parameter: Named int 6
## ..- attr(*, "names")= chr "df"
## $ p.value : num 0.0384
## $ method : chr "Pearson's Chi-squared test"
## $ data.name: chr "French.men"
## $ observed : num [1:3, 1:4] 56 37 53 54 43 28 41 27 36 36 ...
## $ expected : num [1:3, 1:4] 59.5 44.2 42.3 50.9 37.9 ...
## $ residuals: num [1:3, 1:4] -0.451 -1.085 1.644 0.431 0.836 ...
## $ stdres : num [1:3, 1:4] -0.71 -1.573 2.363 0.656 1.174 ...
## - attr(*, "class")= chr "htest"
out$expected
## [,1] [,2] [,3] [,4]
## [1,] 59.48 50.93 42.37 34.22
## [2,] 44.21 37.85 31.49 25.44
## [3,] 42.31 36.22 30.14 24.34
Here's the beryllium example:
(beryllium = matrix(data = c(10, 9, 70, 8, 19, 136, 23, 11, 206), nrow = 3,
ncol = 3, byrow = FALSE))
## [,1] [,2] [,3]
## [1,] 10 8 23
## [2,] 9 19 11
## [3,] 70 136 206
chisq.test(beryllium)
##
## Pearson's Chi-squared test
##
## data: beryllium
## X-squared = 10.83, df = 4, p-value = 0.02856
For \( X \sim \chi_{\nu}^2 \), the cumulative probability distribution
function F(x) = P(X ≤ x) is given by pchisq(x, nu)
. e.g. to get
the P-value for the education/smoking example, we could have used
(p.value = 1 - pchisq(13.3, 6))
## [1] 0.03851
(This is slightly different than the P-value given by chisq.test()
, above, because it's displayed statistic, 13.3
, is a rounded version of 13.305
.)
(There's nothing to see here, folks. Move along.)
(There's nothing to see here, folks. Move along.)
(There's nothing to see here, folks. Move along.)