The R computing environment uses packages to organize objects into discrete sets. A package may have a combination of functions and datasets. The base package has the basic R tools.

Packages are collections of R functions, data, and compiled code in a well-defined format. Packages are installed onto your computer with install.packages(), which is done once. Packages are updated with update.packages(). Both these operations can be done within Rstudio from the Packages tab in the bottom-right pane of Rstudio.

The directory where packages are stored on your computer is called the library. Packages are attached from the library to your current workspace using the command library(). For more information see packages vs. libraries and links therein.

To see what packages are attached, use

sessionInfo()
## R version 3.4.1 (2017-06-30)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.6
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] compiler_3.4.1  backports_1.1.0 magrittr_1.5    rprojroot_1.2  
##  [5] tools_3.4.1     htmltools_0.3.6 yaml_2.1.14     Rcpp_0.12.12   
##  [9] stringi_1.1.5   rmarkdown_1.6   knitr_1.17      stringr_1.2.0  
## [13] digest_0.6.12   evaluate_0.10.1

The other packages automatically loaded are usually stats and graphics, and some other more arkane helpers, grDevices, utils, and methods. Note that a number of packages may be loaded via a namespace (and not attached), which means they are used indirectly by some other attached package. Each package has appended its version number after -.

The datasets package

All R distributions provide the datasets packages which only contains sample datasets. In an interactive session help will bring up the index of help pages for the datasets package.

This is a collection of datasets, each organized in the basic tabular data structure (rows correspond to observations, columns to variables) called a data.frame in R.

help(package="datasets")

An alternative is to list the names of objects in a package. Here we use the pattern to just show datasets beginning with a.

ls("package:datasets", pattern = "^a")
## [1] "ability.cov" "airmiles"    "airquality"  "anscombe"    "attenu"     
## [6] "attitude"    "austres"

Often of more interest, list the names and a brief description of the structure

ls.str("package:datasets", pattern = "^a")
## ability.cov : List of 3
##  $ cov   : num [1:6, 1:6] 24.64 5.99 33.52 6.02 20.75 ...
##  $ center: num [1:6] 0 0 0 0 0 0
##  $ n.obs : num 112
## airmiles :  Time-Series [1:24] from 1937 to 1960: 412 480 683 1052 1385 ...
## airquality : 'data.frame':   153 obs. of  6 variables:
##  $ Ozone  : int  41 36 12 18 NA 28 23 19 8 NA ...
##  $ Solar.R: int  190 118 149 313 NA NA 299 99 19 194 ...
##  $ Wind   : num  7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
##  $ Temp   : int  67 72 74 62 56 66 65 59 61 69 ...
##  $ Month  : int  5 5 5 5 5 5 5 5 5 5 ...
##  $ Day    : int  1 2 3 4 5 6 7 8 9 10 ...
## anscombe : 'data.frame': 11 obs. of  8 variables:
##  $ x1: num  10 8 13 9 11 14 6 4 12 7 ...
##  $ x2: num  10 8 13 9 11 14 6 4 12 7 ...
##  $ x3: num  10 8 13 9 11 14 6 4 12 7 ...
##  $ x4: num  8 8 8 8 8 8 8 19 8 8 ...
##  $ y1: num  8.04 6.95 7.58 8.81 8.33 ...
##  $ y2: num  9.14 8.14 8.74 8.77 9.26 8.1 6.13 3.1 9.13 7.26 ...
##  $ y3: num  7.46 6.77 12.74 7.11 7.81 ...
##  $ y4: num  6.58 5.76 7.71 8.84 8.47 7.04 5.25 12.5 5.56 7.91 ...
## attenu : 'data.frame':   182 obs. of  5 variables:
##  $ event  : num  1 2 2 2 2 2 2 2 2 2 ...
##  $ mag    : num  7 7.4 7.4 7.4 7.4 7.4 7.4 7.4 7.4 7.4 ...
##  $ station: Factor w/ 117 levels "1008","1011",..: 24 13 15 68 39 74 22 1 8 55 ...
##  $ dist   : num  12 148 42 85 107 109 156 224 293 359 ...
##  $ accel  : num  0.359 0.014 0.196 0.135 0.062 0.054 0.014 0.018 0.01 0.004 ...
## attitude : 'data.frame': 30 obs. of  7 variables:
##  $ rating    : num  43 63 71 61 81 43 58 71 72 67 ...
##  $ complaints: num  51 64 70 63 78 55 67 75 82 61 ...
##  $ privileges: num  30 51 68 45 56 49 42 50 72 45 ...
##  $ learning  : num  39 54 69 47 66 44 56 55 67 47 ...
##  $ raises    : num  61 63 76 54 71 54 66 70 71 62 ...
##  $ critical  : num  92 73 86 84 83 49 68 66 83 80 ...
##  $ advance   : num  45 47 48 35 47 34 35 41 31 41 ...
## austres :  Time-Series [1:89] from 1971 to 1993: 13067 13130 13198 13254 13304 ...

When examining a new R package, ls.str is a useful way to learn about what objects are in a package. It will list both datasets and functions. However, it can still be rather verbose; it is often better to use the Packages tab in the bottom-right pane of Rstudio. You will find a list of objects with one-line descriptions, and help page for each object by clicking on its name. Often, packages have overview documentation toward the top.

Note that in the calls to ls and ls.str the package name is given as a character string "package:datasets". This convention is also used in describing which packages are attached in a session.

Namespaces and Indirect Access to Packages

Most packages have a namespace, which identifies which objects are visible to users. This is a rather arkane topic, but is important to understand for those going on to develop their own packages.

Normally, one attaches a package using the library command, which gives direct access to all objects identified in the namespace of that package. It is possible to access objects in a package without attaching the package by using the convention packagename::objectname. For instance, the following makes explicit reference to the package datasets to examine the structure of ToothGrowth.

str(datasets::ToothGrowth)
## 'data.frame':    60 obs. of  3 variables:
##  $ len : num  4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
##  $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dose: num  0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...

This is not necessary for already attached packages (such as datasets), but can be helpful to document the source of objects. It is generally used in packages that may use a few functions or datasets from another package. For instance, attaching the dplyr package makes the magrittr pipe (%>%) available, without explicitly requiring the loading of this secondary package.

Miscellaneous Notes

motivation for creating packages

R source package contents

R installed package contents

R package references

            > fortune("installing")