STAT605: Data Science Computing Project, Fall 2020

This course introduces some of the tools necessary for collecting, managing and analyzing large data sets. Topics covered will include the UNIX/Linux command line, using text editors, programming in R, version control using git, and high-performance computing. The second half of the course will involve a data analysis project, in which students will work in teams to apply these skills to analyze data and present their findings.

  Instructors: Keith Levin, kdlevin | at | wisc | dot | edu, John Gillett, jgillett | at | wisc | dot | ed
TA: Bi Cheng Wu, bwu62 | at | wisc | dot | edu
Lectures: Pre-recorded video lectures will be made available through Canvas.
Question-and-answer sessions: Tuesdays and Thursdays 8:30-9:15am, 1:00-1:45pm, 8:00-8:45pm (all times are Madison local time) in this BB Collaborate room
Office Hours: Tuesdays 9:00-10:30pm, Thursdays 10:20am-12:20pm on BBCollaborate, or by appointment
Textbook: There is no required textbook. See below for weekly readings.
Syllabus: Available here
Prerequisites: Enrollment in Statistics MS or Statistics Visiting International Program; STAT303, STAT304, and STAT305 or equivalent.

Course Schedule

Date Topics Readings Lecture Videos and Notes
Thursday, Sep 3 Course introduction; Administrivia; VirtualBox; Introduction to the Linux Command Line Introduction to UNIX Commands (recommended); Survival guide for UNIX newbies (recommended); GNU/Linux Command-Line Tools Summary (recommended) Video: Course Introduction; Video: Setting up your VM; Video: Introduction to the Command Line; Slides: Command Line
Tuesday, Sep 8 Introduction to text editing in emacs emacs reference sheet (recommended) Video: Introduction to emacs
Thursday, Sep 10 Introduction to text editing in vim vim help documentation (recommended); vim cheat sheet (recommended); Practical vim by Drew Neil (recommended) Video: Introduction to vim; jabberwocky.txt
Tuesday, Sep 15 emacs and vim, cont'd HW1 (Due Sep 18)
Thursday, Sep 17 Discussing the Lyman-break galaxy data Wikipedia page on Lyman-break galaxies Video: Introduction to the Lyman-break galaxy data
Tuesday, Sep 22 Lyman-break data, cont'd
Thursday, Sep 24 Lyman-break data, cont'd HW2 (Due Oct 2)
Tuesday, Sep 29 Version control with git; Lyman-break data, cont'd Pro Git by Chacon and Straub (recommended) Video: introduction to git; Slides: git
Thursday, Oct 1 git cont'd; Lyman-break data, cont'd
Tuesday, Oct 6 Shell scripting with bash Introduction to Bash Scripting from the Linux Documentation Project (recommended); Data Science at the Command Line by J. Janssens (recommended); S. Das (2005, 2012). Your UNIX: the Ultimate Guide. McGraw-Hill. (recommended) HW3 (Due Oct 16); Video: Introduction to shell scripting; Slides: shell scripting
Thursday, Oct 8 Shell scripting, cont'd
Tuesday, Oct 13 Regular expressions and grep Video: Introduction to Regular Expressions in grep; Slides: Regexes in grep
Thursday, Oct 15 Regexes, cont'd
Tuesday, Oct 20 Distributed computing with Slurm HW4 (Due Oct 30); Video: Slurm on the UWStatistics HPC Cluster; Notes and links
Thursday, Oct 22 Distributed computing with Slurm, cont'd Video: short introduction to sed and awk; Slides
Tuesday, Oct 27 Distributed computing on CHTC CHTC Guide; CHTC/HTCondor manual Video: Introduction to CHTC (guest lecture by Christina Koch and Jess Vera)
Thursday, Oct 29 Distributed computing on CHTC, cont'd CHTC Small Examples; Links for Small Examples
Tuesday, Nov 3 Distributed computing on CHTC, cont'd HW5 (Due Nov 13); CHTC callingR Demo; Links for callingR demo; Don't forget to vote if you are eligible to do so
Thursday, Nov 5 Distributed computing on CHTC, cont'd
Tuesday, Nov 10 Group Projects Group Project Overview and Deadlines
Thursday, Nov 12 Group Projects, cont'd Project proposals due Friday, November 13
Tuesday, Nov 17 Group Projects, cont'd
Thursday, Nov 19 Group Projects, cont'd First round of project feedback due Friday, Nov 20
Tuesday, Nov 24 Group Projects, cont'd
Thursday, Nov 26 Group Projects, cont'd First draft of project report due Monday, Nov 30
Tuesday, Dec 1 Group Projects, cont'd
Thursday, Dec 3 Group Projects, cont'd Second round of project feedback due Firday, Dec 4
Tuesday, Dec 8 Group Projects, cont'd
Thursday, Dec 10 Group Projects, cont'd Final draft of project report due Friday, Dec 11