R Software Introduction for Stat 571

Bret Larget has more current information in An R Primer for Introductory Statistics. See also Deyuan Jiang's discussion for Stat 849.

R is a powerful software application for interacting with data. It is freely available worldwide. With R, you can create sophisticated graphs, carry out statistical analyses, and create and run simulations. [R is also a programming language with an extensive set of built-in functions so you can write your own code to build your own statistical tools. Advanced users can even incorporate functions written in other languages, such as C, C++, and Fortran.] Further information about R specific to Stat 571 course material can be found in the rewritten R Appendices for the Stat/For/Hort 571 Course Notes.

The Comprehensive R Archive Network has collected several very good Contributed Documents that can help the new user learn more about how to use R. Here are a couple of suggested documents:

Accessing R

R is available at the CALS Computer Lab and on the Statistics computers. However, most Stat 571 students choose to use R on their Windows machine at home or lab. R is also available for MacOS X and Linux machines. Instructions (a) and (b) show how to get R for your computer, or use (c) to access R from the CALS HP. [Note: if you already have R, make sure your version is at least 1.7.]

(a) Download for Windows (advisable only if you have a fast internet connection--a direct campus connection, cable modem, or DSL, as the file is about 20 megabytes): Go to the CRAN homepage at http://cran.us.r-project.org. Click on the link Windows (95 and later), then click on the link base, and finally click on Setup program, which is named something like rw1071.exe. This begins the download. After the download is complete, double click on the downloaded file and follow the installation instructions. [MACOS X and Linux versions are also available at CRAN. Just follow the appropriate link and read the ReadMe.txt files.]

(b) From prepared CD (available through the Instructors, TAs or CALS Lab): Insert the CD into the drive, open the CD and chande to the folder for your system (windows, macosx or linux) double click on the rw1071.exe icon to begin installation (or read instructions for non-Windows). Follow the installation instructions.

(c) The CALS Computer Lab: The CALS Lab is in the basement of the Animal Sciences Building and has its own entrance on the south side of the building. There is always a student consultant on duty. Managers, Peter Crump and Tom Tabone, can be found in Room 148 and Room 152, respectively, between 8am and 4:30pm. Do not hesitate to ask any of these individuals for assistance. Although some of the consultants will not know much about the details of R, all will be able to help you use the machines, print results, etc. The CALS Computer Lab is usually open M-R 8am-10pm, F 8am-5pm, Sat 10am-5pm, Sun 12pm-9pm. [Check holiday schedules at http://www.cals.wisc.edu/calslab.]

A First Session with R

Begin R from the Start button followed by using the Program menu to select R. Once R is launched, it will open with a command window with a prompt (`>') that awaits your first command. R is a command line program. You interact with the software by typing in commands.

We demonstrate a few key R commands using some milk yield data. In this case, you need to enter the data. First, give the file a working name; for purposes here, let us choose myield. Then type

     > myield = c(44, 55, 37, 32, 37, 26, 23, 41, 34, 19, 30, 39, 46, 44) 
The symbol `=' creates an object named `myield' with value being the evaluation of the command on the right hand side. In this case, the command c catenates a comma-separated set of numbers together as a vector. [Warning: if you type a '(' and then do not complete the command by typing a ')', R will continue to wait for the command to be completed and show a string of `+' prompts even if you continue to press the `Enter' key. If you get in trouble, press the `Esc' key to get back to the prompt.]

Now you can perform some manipulations. To see the vector of data, just type

     >  myield
See a stem-leaf display of the data by typing
     > stem(myield)
For the mean and standard deviation, respectively, type
     > mean(myield)
     > sd(myield)
Remember to quit your session when you are done. You can quit from the File menu or by typing
     > q()
in the command window, or by using the `File' pulldown menu.

Reading in Data Files to R

It can get tedious typing in your data each time. Further, retyping your data is an easy way to make mistakes. Instead, you can have data stored in column(s) in a file and read it into R (see comments below on Saving Data Files for Use in R). On Windows, you can type
     > myield = read.table(file.choose())
This allows you to choose the file using a menu. On Windows or MacOS, select your file and click "Open" (on Unix, you may only have TAB completions of folder and file names). Another way is to use the R "File" menu and select "Change dir ..." to change to the directory where your data are stored. Then type [including the quotes around the file name]:
     > myield = read.table("myield.dat")
Data we provide can be found in http://www.stat.wisc.edu/~st571-1/data. You can save the data locally on your computer (see comments below on Saving Data Files for Use in R) and use one of the above commands to read as a table. Alternatively, while connected to the Internet you can read the table directly:
     > myield = read.table("http://www.stat.wisc.edu/~st571-1/data/myield.dat")
[Again, quotes are important!] Of course, you may want a local copy if you are working at home.

Note: The read.table command reads in a table of values and labels the columns as V1, V2.... For the milk yield data, we have only one column, and we can replace the table with a vector:

     > myield = myield$V1
Data sets later in the course have multiple columns, which will be used during data analysis.

Saving Data Files for Use in R

R reads data that are stored as simple text files. However, sometimes your data comes from other sources, such as an Excel or other spreadsheet, which provides a safe and convenient way to enter your data. From such an application, you should "Save as" a "Text File (*.txt)" for use in R. This saves your data as simple text in a tab-delimited format. You should be able to view your data using a simple text editor, such as WordPad or NotePad (Windows) or SimpleText (MacOS) or Emacs (all platforms). [Note however that NotePad may view a table saved as text as a long string of numbers with little boxes between the lines. R can still read the file--try--it. If this bothers you, "Save as" "Text only with line breaks" or "Text MS DOS format".] If you try to examine a Word or Excel or other spreadsheet file with a simple text editor, you will probably only see junk. Not surprising that R cannot read it either.

If you capture data using a Web Browser and you simply "Save" the data, it might actually get changed. For instance Internet Explorer assumes you want to save as an HTML FILE. If you just click OK, you will save a file called "myield_dat.htm", which adds lots of junk at the top of the data file (see below). If you instead scroll the "Save as type:" box down to "Text File (*.txt)", you get only what you see. Another way to save data from the Web that usually works is to right-click on the name. If you go ahead and try to read the saved "crane_dat.htm" into R you get the following message:

     > myield=read.table(file.choose())
     Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  :
         line 3 did not have 5 elements
Not very informative, eh? Sorry about that. Perhaps there is a lesson here: always check your data using a simple text editor. Here is the start of "myield_dat.htm", which you can see using a simple text editor such as WordPad, NotePad or SimpleText (or Emacs):
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!-- saved from url=(0054)http://www.stat.wisc.edu/~yandell/st571/data/myield.dat -->
<META http-equiv=Content-Type content="text/html; charset=windows-1252">
<META content="MSHTML 6.00.2800.1226" name=GENERATOR></HEAD>

Getting Help with R

You can get help on a specific R function by typing ?function-name at the prompt. For example,
     > ?hist
provides many more details on how to use hist using all of its available options.

However, you may not know the name of the R function. What can you do? Try typing

     > help.search("histogram")
to find out about commands pertinent to the word "histogram" (quotes are important!). After a short pause, you will get something like this (... for lines removed to save space here):
     Help files with alias or title matching 'histogram' using fuzzy matching:

     hist.POSIXt(base)       Histogram of a Date-Time Object
     hist(base)              Histograms
     nclass.Sturges(base)    Compute the Number of Classes for a Histogram
     plot.histogram(base)    Plot Histograms
     n.bins(car)             Number of Bins for Histogram
     Type 'help(FOO, package = PKG)' to inspect entry 'FOO(PKG) TITLE'.

There is an extensive set of help pages that you can browse. You can access these help by typing the command

     > help.start()
or by pulling down the Help menu and selecting the HTML version.

Preparing Computer Output for Printing

The easiest way to get hard copy is to cut-and-paste that portion of the R commands and output of interest into an editor of some kind. A Word document will work just fine for this. It is strongly advised that you include only those parts that you need. The easiest way to do this is to cut-and-paste as you go along. The CALS Computer Lab uses a debit card system for printing.

Hint: Use the Courier font for equal width characters so things line up properly!).

Maintained by Brian Yandell(byandell@wisc.edu) tue 28 oct 2003.