We’ll grade your homework by

You should write R code anywhere you see an empty R code chunk. You should write English text anywhere you see “…”; please surround it with doubled asterisks (**...**) so that it will show up as boldface and be easy for us to find.

Include reasonable labels (titles, axis labels, legends, etc.) with each of your graphs.

Name: …

Email: …

Please use an R version no less than 4.0.0 so that students and teachers will agree on the definition. You can see your version in the variable R.version. The following line of code checks that you’re using major version 4:

stopifnot(R.version$major == "4")

If this line of code fails, please get the current version from https://cloud.r-project.org.

We’ll use data on housing values in suburbs of Boston. They are in an R package called “MASS.” (An R package is a collection of code, data, and documentation. “MASS” refers to the book “Modern Applied Statistics with S.” R developed from the earlier language, S.) The MASS package comes with the default R installation, so it’s already on your computer. However, it’s not loaded into your R session by default. So we’ll load it via the require() command (there’s nothing for you to do here):

require("MASS")
## Loading required package: MASS

Run ?Boston (outside this R Markdown document) to read the help page for the Boston data frame.

Convert the chas variable to a factor with labels “off” and “on” (referring to the Charles river).

How many rows are in the Boston data frame? How many columns?

What does a row represent? …

What does a column represent? …

Make a density plot (with rug) of tax rates.

Describe the shape of the distribution of tax rates. …

Note that the distribution shape doesn’t make sense in light of the rug representation of the data. Make a histogram of the tax rates.

Why is the second peak of the density plot so large? In what way is the rug representation of the data inadequate? Write a line or two of code to figure it out, and then explain it.

Make a barplot of “chas”.

How many neighborhoods are on the Charles river?

Make a single graph consisting of three sub-graphs as follows:

Hint: use layout() with a 4x4 matrix, using the top-right 3x3 corner for the scatterplot, leaving the bottom-left 1x1 corner blank, and using the other parts for the boxplots.

(An optional challenge, worth 0 extra credit points: remove the axis and plot border from each boxplot.)

Look into the highest-crime neighborhood by making a single graph consisting of three sub-graphs arranged in one column of three rows:

What do you notice about the ptratio and medv for the highest-crime neighborhood?