Date |
Topics |
Readings |
Notes |
Monday, Oct 23 |
Course introduction; Administrivia; Regular Expressions |
Severance Chapter 11: Regular expressions (required); Python regex documentation (recommended); Jupiter documentation (recommended)
|
HW1 out; Request a Flux/Fladoop username, if necessary; Slides |
Wednesday, Oct 25 |
Markup languages; HTML, XML, JSON |
Severance Chapter 12 (HTTP,HTML) and Chapter 13 (XML,JSON) (required); BeautifulSoup documentation (just Quick Start) (required);
BeautifulSoup documentation (everything up to sections about CSS) (recommended);
BeautifulSoup4 tutorial (recommended)
|
Slides |
Monday, Oct 30 |
Markup languages (continued) |
Same as previous lecture. |
Slides |
Wednesday, Nov 1 |
Databases; SQL |
Oracle relational databases overview (only the overview!) (required);
First section of Python sqlite3 documentation or
Python 3 version (required);
w3schools SQL tutorial (recommended)
|
Slides |
Monday, Nov 6 |
Data visualization with matplotlib |
Numpy quickstart tutorial (required); Pyplot tutorial (required); Pyplot API (recommended); The Visual Display of Quantitative Information by Edward Tufte (recommended); Visual and Statistical Thinking: Displays of Evidence for Making Decisions by Edward Tufte (recommended) |
HW2 out; Slides |
Wednesday, Nov 8 |
Introduction to the Command Line |
Introduction to UNIX Commands (required); Survival guide for UNIX newbies (recommended); GNU/Linux Command−Line Tools Summary (recommended) |
Slides |
Monday, Nov 13 |
Introduction to Hadoop and MapReduce |
J. Dean and S. Ghemawat MapReduce: Simplified Data Processing on Large Clusters in Proceedings of the Sixth Symposium on Operating System Design and Implementation, 2004 (required); Introduction to HDFS by J. Hanson (recommended) |
HW3 out; Slides |
Wednesday, Nov 15 |
MapReduce in Python: mrjob |
mrjob Fundamentals and Concepts (required); Hadoop wiki: How MapReduce operations are actually carried out (required); Allen Downey’s Think Python Chapter 15 on Objects (pages 143-149, recommended); Classes and objects in Python (recommended)
|
HW1 due; Slides; Demo code |
Monday, Nov 20 |
mrjob cont'd; MapReduce in Python: Spark |
Spark programming guide (required); PySpark programming guide (required); Spark MLlib, a Spark machine learning library (recommended); Spark GraphX, a Spark library for processing graph data (recommended)
|
Slides |
Wednesday, Nov 22 |
PySpark cont'd |
Same as previous lecture. |
HW2 due; Slides |
Monday, Nov 27 |
Google TensorFlow |
TensorFlow tutorial: Getting Started with TensorFlow (required); Abadi, et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems (required); Assorted tutorials on statistical and neural models in TensorFlow (recommended)
|
Slides |
Wednesday, Nov 29 |
TensorFlow cont'd |
TF tutorial on recognizing MNIST digits using softmax regression (required); advanced tutorial on training a feedforward NN for MNIST (recommended) |
|
Monday, Dec 4 |
TensorFlow cont'd |
Chapter 6 of Deep Learning by Goodfellow, Bengio and Courville (recommended) |
HW4 out; Slides; Softmax regression demo; Multilayer CNN demo |
Wednesday, Dec 6 |
TensorFlow cont'd |
Chapter 6 of Deep Learning by Goodfellow, Bengio and Courville (recommended) |
HW3 due |
Monday, Dec 11 |
Advanced UNIX |
J. Janssens Data Science at the Command Line (recommended);
S. Das (2005, 2012). Your UNIX: the Ultimate Guide. McGraw-Hill. (recommended); Sed manual (recommended); GNU awk user’s guide (recommended) |
Slides |