First, you must download your “key” here. They key is a 40 character string. In the background of this document, I have set my key to the variable acsKey.
The library acs will allow us to download data from R. The library choroplethr is super fancy for plotting.
library(acs)
##
## Attaching package: 'acs'
##
## The following object is masked from 'package:base':
##
## apply
library(choroplethr)
api.key.install(key = acsKey)
ACS has lots of different variables. To look up the variables you want, use web search! I like this site (you might need to be on campus or use proxy server to access that site). The tableid is the thing you need to catch. For example, tableid B00001 is total population. choroplethr makes nice maps:
choroplethr_acs(tableId="B00001", map = "zip")
## Warning: NAs introduced by coercion
## Warning: joining factor and character vector, coercing into character
## vector
To access the data, you need to define the geographic region of interest (with geo.make); here it is zip code, can also use counties, states, census tracts, etc. Then, you need to fetch the data (with acs.fetch). Make sure you are connected to the internet.
us.zip=geo.make(zip.code = "*")
str(us.zip)
## Formal class 'geo.set' [package "acs"] with 3 slots
## ..@ geo.list :List of 1
## .. ..$ :Formal class 'geo' [package "acs"] with 4 slots
## .. .. .. ..@ api.for:List of 1
## .. .. .. .. ..$ zip+code+tabulation+area: chr "*"
## .. .. .. ..@ api.in : list()
## .. .. .. ..@ name : chr "Zip Code Tabulation Area *"
## .. .. .. ..@ sumlev : num 860
## ..@ combine : logi FALSE
## ..@ combine.term: chr "aggregate"
us.transport=acs.fetch(geography=us.zip,
table.number="B08301", col.names="pretty")
str(us.transport)
## Formal class 'acs' [package "acs"] with 9 slots
## ..@ endyear : int 2011
## ..@ span : int 5
## ..@ geography :'data.frame': 33120 obs. of 2 variables:
## .. ..$ NAME : chr [1:33120] "ZCTA5 01001" "ZCTA5 01002" "ZCTA5 01003" "ZCTA5 01005" ...
## .. ..$ zipcodetabulationarea: chr [1:33120] "01001" "01002" "01003" "01005" ...
## ..@ acs.colnames : chr [1:21] "Means of Transportation to Work: Total: " "Means of Transportation to Work: Car, truck, or van: " "Means of Transportation to Work: Car, truck, or van: Drove alone " "Means of Transportation to Work: Car, truck, or van: Carpooled: " ...
## ..@ modified : logi TRUE
## ..@ acs.units : Factor w/ 5 levels "count","dollars",..: NA NA NA NA NA NA NA NA NA NA ...
## ..@ currency.year : int 2011
## ..@ estimate : num [1:33120, 1:21] 8496 13774 4188 2718 7869 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:33120] "ZCTA5 01001" "ZCTA5 01002" "ZCTA5 01003" "ZCTA5 01005" ...
## .. .. ..$ : chr [1:21] "Means of Transportation to Work: Total: " "Means of Transportation to Work: Car, truck, or van: " "Means of Transportation to Work: Car, truck, or van: Drove alone " "Means of Transportation to Work: Car, truck, or van: Carpooled: " ...
## ..@ standard.error: num [1:33120, 1:21] 321 614 457 134 236 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:33120] "ZCTA5 01001" "ZCTA5 01002" "ZCTA5 01003" "ZCTA5 01005" ...
## .. .. ..$ : chr [1:21] "Means of Transportation to Work: Total: " "Means of Transportation to Work: Car, truck, or van: " "Means of Transportation to Work: Car, truck, or van: Drove alone " "Means of Transportation to Work: Car, truck, or van: Carpooled: " ...
Notice that both of those previous objects are not standard R types. To get point estimates that are structured like a matrix (this is what we will typically want), use the function estimate.
trans = estimate(us.transport) # get the data into a matrix.
is.matrix(trans)
## [1] TRUE
str(trans)
## num [1:33120, 1:21] 8496 13774 4188 2718 7869 ...
## - attr(*, "dimnames")=List of 2
## ..$ : chr [1:33120] "ZCTA5 01001" "ZCTA5 01002" "ZCTA5 01003" "ZCTA5 01005" ...
## ..$ : chr [1:21] "Means of Transportation to Work: Total: " "Means of Transportation to Work: Car, truck, or van: " "Means of Transportation to Work: Car, truck, or van: Drove alone " "Means of Transportation to Work: Car, truck, or van: Carpooled: " ...
dim(trans)
## [1] 33120 21
head(trans)
## Means of Transportation to Work: Total:
## ZCTA5 01001 8496
## ZCTA5 01002 13774
## ZCTA5 01003 4188
## ZCTA5 01005 2718
## ZCTA5 01007 7869
## ZCTA5 01008 690
## Means of Transportation to Work: Car, truck, or van:
## ZCTA5 01001 8112
## ZCTA5 01002 8943
## ZCTA5 01003 1034
## ZCTA5 01005 2531
## ZCTA5 01007 7119
## ZCTA5 01008 660
## Means of Transportation to Work: Car, truck, or van: Drove alone
## ZCTA5 01001 7585
## ZCTA5 01002 7774
## ZCTA5 01003 736
## ZCTA5 01005 2106
## ZCTA5 01007 6496
## ZCTA5 01008 595
## Means of Transportation to Work: Car, truck, or van: Carpooled:
## ZCTA5 01001 527
## ZCTA5 01002 1169
## ZCTA5 01003 298
## ZCTA5 01005 425
## ZCTA5 01007 623
## ZCTA5 01008 65
## Means of Transportation to Work: Car, truck, or van: Carpooled: In 2-person carpool
## ZCTA5 01001 344
## ZCTA5 01002 1094
## ZCTA5 01003 294
## ZCTA5 01005 355
## ZCTA5 01007 607
## ZCTA5 01008 65
## Means of Transportation to Work: Car, truck, or van: Carpooled: In 3-person carpool
## ZCTA5 01001 155
## ZCTA5 01002 25
## ZCTA5 01003 4
## ZCTA5 01005 70
## ZCTA5 01007 3
## ZCTA5 01008 0
## Means of Transportation to Work: Car, truck, or van: Carpooled: In 4-person carpool
## ZCTA5 01001 28
## ZCTA5 01002 26
## ZCTA5 01003 0
## ZCTA5 01005 0
## ZCTA5 01007 0
## ZCTA5 01008 0
## Means of Transportation to Work: Car, truck, or van: Carpooled: In 5- or 6-person carpool
## ZCTA5 01001 0
## ZCTA5 01002 0
## ZCTA5 01003 0
## ZCTA5 01005 0
## ZCTA5 01007 0
## ZCTA5 01008 0
## Means of Transportation to Work: Car, truck, or van: Carpooled: In 7-or-more-person carpool
## ZCTA5 01001 0
## ZCTA5 01002 24
## ZCTA5 01003 0
## ZCTA5 01005 0
## ZCTA5 01007 13
## ZCTA5 01008 0
## Means of Transportation to Work: Public transportation (excluding taxicab):
## ZCTA5 01001 0
## ZCTA5 01002 1286
## ZCTA5 01003 69
## ZCTA5 01005 0
## ZCTA5 01007 0
## ZCTA5 01008 0
## Means of Transportation to Work: Public transportation (excluding taxicab): Bus or trolley bus
## ZCTA5 01001 0
## ZCTA5 01002 1257
## ZCTA5 01003 69
## ZCTA5 01005 0
## ZCTA5 01007 0
## ZCTA5 01008 0
## Means of Transportation to Work: Public transportation (excluding taxicab): Streetcar or trolley car (carro publico in Puerto Rico)
## ZCTA5 01001 0
## ZCTA5 01002 0
## ZCTA5 01003 0
## ZCTA5 01005 0
## ZCTA5 01007 0
## ZCTA5 01008 0
## Means of Transportation to Work: Public transportation (excluding taxicab): Subway or elevated
## ZCTA5 01001 0
## ZCTA5 01002 0
## ZCTA5 01003 0
## ZCTA5 01005 0
## ZCTA5 01007 0
## ZCTA5 01008 0
## Means of Transportation to Work: Public transportation (excluding taxicab): Railroad
## ZCTA5 01001 0
## ZCTA5 01002 29
## ZCTA5 01003 0
## ZCTA5 01005 0
## ZCTA5 01007 0
## ZCTA5 01008 0
## Means of Transportation to Work: Public transportation (excluding taxicab): Ferryboat
## ZCTA5 01001 0
## ZCTA5 01002 0
## ZCTA5 01003 0
## ZCTA5 01005 0
## ZCTA5 01007 0
## ZCTA5 01008 0
## Means of Transportation to Work: Taxicab
## ZCTA5 01001 0
## ZCTA5 01002 0
## ZCTA5 01003 0
## ZCTA5 01005 0
## ZCTA5 01007 0
## ZCTA5 01008 0
## Means of Transportation to Work: Motorcycle
## ZCTA5 01001 0
## ZCTA5 01002 4
## ZCTA5 01003 0
## ZCTA5 01005 6
## ZCTA5 01007 19
## ZCTA5 01008 0
## Means of Transportation to Work: Bicycle
## ZCTA5 01001 0
## ZCTA5 01002 209
## ZCTA5 01003 42
## ZCTA5 01005 0
## ZCTA5 01007 13
## ZCTA5 01008 0
## Means of Transportation to Work: Walked
## ZCTA5 01001 214
## ZCTA5 01002 1722
## ZCTA5 01003 1783
## ZCTA5 01005 53
## ZCTA5 01007 94
## ZCTA5 01008 3
## Means of Transportation to Work: Other means
## ZCTA5 01001 27
## ZCTA5 01002 23
## ZCTA5 01003 0
## ZCTA5 01005 0
## ZCTA5 01007 101
## ZCTA5 01008 0
## Means of Transportation to Work: Worked at home
## ZCTA5 01001 143
## ZCTA5 01002 1587
## ZCTA5 01003 1260
## ZCTA5 01005 128
## ZCTA5 01007 523
## ZCTA5 01008 27
See the data description here.
You can plot the data and see that it has a super long tail.
trans = trans[,c(3,4,10,17,18,19)]
hist(trans[,1])
# fat tails!
plot(as.data.frame(trans)) # tails make it hard to see patterns.
# transform:
plot(as.data.frame(log(trans))) # why is everything correlated?