STAT606: Computing for Data Science and Statistics, Spring 2026

This course provides a survey of some of the tools and frameworks that are currently popular among data scientists and statisticians working in both academia and industry. Our focus will be on complementing the tools that students are already familiar with from their previous courses on R. The course will begin with an accelerated introduction to the Python programming language and brief introductions to object-oriented and functional programming. We will then cover some of the scientific computing platforms available in Python, including numpy, scipy and scikit-learn, as well as visualization using matplotlib. We will then turn to discussing collecting data from the web both by scraping and using APIs. The course will conclude with a brief survey of distributed computing, focusing on Hadoop and Google Cloud Platform.

  Instructor: Keith Levin, kdlevin | at | wisc | dot | edu
Lectures: TuTh 11:00AM-12:15PM in B2550 Morgridge Hall
Office Hours: Wednesdays 4-5pm in 6651 Morgridge Hall, or by appointment
Textbook: There is no required textbook. See below for weekly readings.
Syllabus: Available here.
Prerequisites: there are no formal prerequisites for this course. Previous experience with programming in R, the UNIX/Linux command line, text editing in vim/emacs, regular expressions and distributed computing (equivalent to STAT605) is assumed.

Date Topics Readings Notes and Resources
Week 0: Jan 20-23
  • Course introduction and Administrivia
  • Installing and running Python and Jupyter
  • Basic Python: types, variables and functions
  • Jupyter notebook documentation (required)
  • Either A. B. Downey, Chapters 1 through 3 or Severance, Chapters 1, 2 and 4 (required)
  • Week 1: Jan 26 - Jan 30
  • Basic Python: conditionals and iteration
  • Sequence data: strings, lists and tuples
  • List comprehensions
  • Either A. B. Downey, Chapter 5 or Severance, Chapters 3 and 5
  • Either A. B. Downey, Chapters 8 and 10 or Severance, Chapters 6 and 8 (required); A. B. Downey, Chapter 9 (recommended)
  • Python documentation on lists (recommended); Python documentation on sequences (recommended)
  • Week 2: Feb 2-6
  • Python dictionaries and hashing
  • Files and I/O
  • Either A. B. Downey, Chapters 11 and 12 or Severance, Chapters 9 and 10 (required)
  • Python documentation on dictionaries (recommended)
  • Python documentation on tuples (recommended)
  • Python documentation on sets (recommended)
  • A. B. Downey, Section B.4 (recommended); A. B. Downey, Chapter 13 (recommended)
  • A. B. Downey, Chapter 14 or Severance, Chapter 7 (required)
  • Python File I/O Documentation (required)
  • Handling Errors and Exceptions (required)
  • Python pickle module (recommended)
  • Overview of the Python interpreter (recommended)
  • Week 3: Feb 9-13
  • Python on the Command Line
  • Discussion of QR Code miniproject
  • Calling Python from the command line (recommended)
  • Python sys module (recommended)
  • Week 4: Feb 16-20
  • Basics of object-oriented programming
  • Classes and instances
  • Methods and attributes
  • A. B. Downey, Chapters 15 and 16 or Severance Chapter 14 (required)
  • Python documentation on classes (only through section 9.3) (required)
  • D. Phillips (2015). Python 3 Object-oriented Programming, Second Edition. Packt Publishing. (recommended)
  • M. Weisfeld (2009). The Object-Oriented Thought Process, Third Edition. Addison-Wesley. (recommended)
  • Week 5: Feb 23-27
  • Basic concepts in functional programming
  • Map, reduce and filter
  • Python itertools documentation (required)
  • Python functools documentation (required)
  • A. M. Kuchling. Functional Programming HOWTO (required)
  • M. R. Cook. A Practical Introduction to Functional Programming (recommended)
  • D. Mertz Functional Programming in Python (recommended)
  • Week 6: Mar 2-6
  • numpy, scipy and matplotlib
  • Numpy quickstart tutorial (required)
  • SciPy tutorial (recommended)
  • Pyplot tutorial (required)
  • Pyplot API (recommended)
  • E. Tufte (2001). The Visual Display of Quantitative Information. Graphics Press. (recommended)
  • E. Tufte (1997). Visual and Statistical Thinking: Displays of Evidence for Making Decisions. Graphics Press. (recommended)
  • Week 7: Mar 9-13
  • Python pandas
  • pandas quickstart guide (required)
  • Basic data structures (required)
  • Basic functionality of pandas Series and DataFrames (required)
  • pandas group-by operations (required)
  • Reshaping and pivoting (required)
  • pandas cookbook (recommended)
  • Merge, join and concatenation (recommended)
  • Time series functionality (recommended)
  • Week 8: Mar 16-20
  • Markup languages: HTML, XML and JSON
  • Severance Chapter 12 (HTTP, HTML) and Chapter 13 (XML, JSON) (required)
  • BeautifulSoup documentation (Quick Start up to "CSS sleectors...") (required)
  • BeautifulSoup4 tutorial (recommended)
  • Week 9: Mar 23-27
  • Databases and SQL
  • Retrieving data with APIs
  • Mar 30-Apr 3 Spring Break. No lecture.
    Week 10: Apr 6-10
  • Introduction to Hadoop and MapReduce
  • MapReduce using mrjob
  • Week 11: Apr 13-17
  • MapReduce using PySpark
  • Week 12: Apr 20-24
  • Google TensorFlow and Keras
  • Week 13: Apr 27-May 1
  • Google TensorFlow and Keras, cont'd