We’ll grade your homework by
You should write R code anywhere you see an empty R code chunk. You should write English text anywhere you see “…”; please surround it with doubled asterisks (**...**
) so that it will show up as boldface and be easy for us to find.
Include reasonable labels (titles, axis labels, legends, etc.) with each of your graphs.
Name: …
Email: …
Please use an R version no less than 4.0.0 so that students and teachers will agree on the definition. You can see your version in the variable R.version
. The following line of code checks that you’re using major version 4:
stopifnot(R.version$major == "4")
If this line of code fails, please get the current version from https://cloud.r-project.org.
We’ll use data on housing values in suburbs of Boston. They are in an R package called “MASS.” (An R package is a collection of code, data, and documentation. “MASS” refers to the book “Modern Applied Statistics with S.” R developed from the earlier language, S.) The MASS package comes with the default R installation, so it’s already on your computer. However, it’s not loaded into your R session by default. So we’ll load it via the require()
command (there’s nothing for you to do here):
require("MASS")
## Loading required package: MASS
Run ?Boston
(outside this R Markdown document) to read the help page for the Boston
data frame.
Convert the chas
variable to a factor with labels “off” and “on” (referring to the Charles river).
How many rows are in the Boston data frame? How many columns?
What does a row represent? …
What does a column represent? …
Make a density plot (with rug) of tax rates.
Describe the shape of the distribution of tax rates. …
Note that the distribution shape doesn’t make sense in light of the rug representation of the data. Make a histogram of the tax rates.
Why is the second peak of the density plot so large? In what way is the rug representation of the data inadequate? Write a line or two of code to figure it out, and then explain it.
…
Make a barplot of “chas”.
How many neighborhoods are on the Charles river?
Make a single graph consisting of three sub-graphs as follows:
Hint: use layout()
with a 4x4 matrix, using the top-right 3x3 corner for the scatterplot, leaving the bottom-left 1x1 corner blank, and using the other parts for the boxplots.
(An optional challenge, worth 0 extra credit points: remove the axis and plot border from each boxplot.)
Look into the highest-crime neighborhood by making a single graph consisting of three sub-graphs arranged in one column of three rows:
What do you notice about the ptratio and medv for the highest-crime neighborhood?
…