Sebastian Raschka
Sebastian Raschka
Assistant Professor of Statistics @ UW-Madison

STAT 479 -- Machine Learning (Fall 2019)

Table of Contents


Course Resources

For the course material, we are going to use a mix between different technologies, each suited best for the given task.

Course Logistics

When

Where

Instructors

Office Hours

Course Description

Credits: 3

Course Description:

Introduction to machine learning for pattern classification, regression analysis, clustering, and dimensionality reduction. For each category, fundamental algorithms, as well as selections of contemporary, current state-of-the-art algorithms, are being discussed. The evaluation of machine learning models using statistical methods is a particular focus of this course. While fundamental mathematical concepts underlying machine learning and pattern classification algorithms are being taught, the practical use of machine learning algorithms using open source libraries from the Python programming ecosystem will be of equal focus in this course.

Learning Outcomes:

Course Prerequisites:

MATH 340, 341, Graduate Student Standing, or member of the Statistics Visiting International Scholars program Along with introducing of the concepts of machine learning and pattern classification, the in-class lectures will provide a refresher on relevant concepts from calculus and linear algebra; however, a calculus background (e.g., Math 221) and a linear algebra background (e.g., Math 340) is recommended. While this course will also provide an introduction to the basics of the Python programming language for machine learning, it is highly recommended that students are familiar with basic programming and have completed an introductory programming class.

Course Audience:

Students majoring in math or statistics or those wishing to take additional statistics courses.

Resources

Machine Learning Books

Python Machine Learning, 2nd Edition (highly recommended)

Elements of Statistical Learning (recommended)

Python Resources

Illustrated Guide to Python (recommended)

This book will not be coverered in class. However, some readers asked me for good Python resources as preparation for this class, and this is one of the resources I would recommend. However, there are many other Python learning resources available online.

For instance, another great book is Allen Downey’s Think Python 2e (free PDF available at https://greenteapress.com/wp/think-python-2e/).

Interactive Python course on Codecademy (highly recommended)

Depending on your preferred learning style, also consider learning Python interactively instead/or in addition of reading a Python book. A great interactive resource for learning Python is Codecademy: https://www.codecademy.com. In particular, there is a free, < 10 hr interactive course: https://www.codecademy.com/learn/learn-python.

Python Like You Mean It

A short, free intro for getting started with Python and its main scientific computing libraries: https://www.pythonlikeyoumeanit.com.

Python for Beginners (Video Lectures)

A great video series by educators at Microsoft, which was recently made available for free on YouTube: https://www.youtube.com/playlist?list=PLlrxD0HtieHhS8VzuMCfQD4uJ9yne1mE6.

Grading

The final grade will be computed using the following weighted grading scheme:

Exams

Both the midterm and final exam will be conceptual, which means that you will not be asked to write code in the exam. You should bring a pocket calculator to the class, but otherwise, no further material will be permitted (except pens).

The final will be cumulative in the sense that some of the earlier topics may be relevant to the final exam; however, the final exam will largely focus on the parts covered after the midterm. In other words, you still should be familiar with all concepts covered in the course, but questions will be centered around the topics after the midterm.

While there will be different types of questions, one question could be as follows:

Q: Does the (computational) time complexity of a k-Nearest Neighbor classifier grow linearly, quadratically, or exponentially with the number of samples in the training dataset? Explain your answer in 1-2 sentences.

A: Linearly. For each new training point there is an additional distance computation.

Class Project

Overview

The goal of working on a class project is three-fold. First, it will provide you with the opportunity to apply the concepts learned in this class creatively, which helps you with understanding material more deeply. Second, designing and working on a unique project in a team which is something that you will encounter, if you haven’t already, rather sooner than later in life, and this course project helps with preparing for that. Third, along with the opportunity to practice and the satisfaction of working creatively, students can use this project to enhance their portfolio or resume.

Note about grading

There is no “perfect project.” While you are encouraged to be ambitious, the most important aspect of this project is your learning experience. Hence, you don’t want to pick something that is too easy for you, but similarly, you don’t want to choose a project where you are not certain that is out of the scope of this class. (However, note that the more comprehensive and interesting the project is, the easier you’ll find it to write the 6-8-page project report.) The project proposal is not graded by how exciting your project is but based on whether you follow the objectives of the project proposal, project presentation, and project report. For instance, if your project ends up being unsuccessful – for example, if you choose to design a classifier and it doesn’t achieve the desired accuracy – it will not negatively affect your grade as long as you are honest, describe the potential issues well, and suggest improvements or further experiments. Again, the objective of this project is to provide you with hands-on practice and an opportunity to learn.

The project consists of 3 parts:

  1. a project proposal,
  2. a short project presentation,
  3. and a project report.

The expectations for each part will be discussed in the following sections.

1) Project Proposal

Please note that you should use the proposal-latex file(s) for writing and submitting your proposal!

The main purpose of the project proposal is to receive feedback from the TAs/the instructor regarding whether your project is feasible and whether it is within the scope of this class. Also, the project proposal offers a chance to receive useful feedback and suggestions on your project.

For this project, you will be working in a team consisting of three students. You are encouraged to form groups by yourself, as discussed in class. If you cannot find group members, the TA and I will randomly assign you to a group. If you have any concerns working with someone in your group, please talk to a TA or the instructor for accommodations.

Proposal Format:

Introduction:

Motivation:

Evaluation:

Resources:

Contributions:

You are expected to share the workload evenly, and every group member is expected to participate in both the experiments and writing. (As a group, you only need to submit one proposal and one report, though. So you need to work together and coordinate your efforts.)

It is crucial that you talk to each other regularly!!! Schedule regular meetings and/or use online communication tools (e.g., Gitter, Slack, or email) to stay in touch with your group members throughout the semester regarding the process of your project.

Modifications to the proposal

After you have received feedback from the TAs/the instructor and your project proposal has been graded, you are advised to stick to the project outline in the proposal as closely as possible. However, if there is a concept introduced in a later lecture (for instance, a machine learning algorithm that you think is more appropriate then the one you proposed), you have the option to modify your proposal, but you are not penalized if you don’t. If you wish to update your project outline, talk to a TA first.

Project Proposal Assessment

The proposal will be graded based on completeness of each of the 5 sections (Introduction, Motivation, Evaluation, Resources, and Contributions) and not be based on language, style, and how “exciting” or “interesting” the project is. For each section, you can receive a maximum of 10 points, totaling 50 pts for the proposal overall.

Also, it is important to make sure that you acknowledge previous work and use citations properly when referring to other people’s work. Even minor forms of plagiarism (e.g., copying sentences from other texts) will result in a subtraction of at least 10 pts each per incidence. And university guidelines dictate that severe incidents need to be reported. If you are unsure about what constitutes plagiarism and how to avoid it, please see the helpful guides at https://conduct.students.wisc.edu/plagiarism/.

2) Project Presentation

During the last three lectures, you will be presenting your project to the class. The presentation is “free form” but should cover the following:

The presentation should be 8-10 minutes long, plus 2 minutes will be reserved for questions. All members of the group should participate in the presentation.

The voting card should be filled out as follows:

  1. Title of the Presentation, a/10, b/10, c/10
  2. Title of the Presentation, a/10, b/10, c/10 …

where

The awards will be computed based on the highest number of points for each category. However, one project can only receive one of the prizes. The points for the grade are considered independently from the 3 prize categories. The rubric for the grades is provided in the subsection Project Presentation Assessment below.below.

Project Presentation Assessment

The rubric for assigning the points (out of 100) for the presentation is provided below:

3) Project Report

The project report is expected to be 6-8 pages long (excluding references) and should contain the follwing sections:

  1. Introduction
  2. Related Work
  3. Proposed Method
  4. Experiments
  5. Results and Discussion
  6. Conclusions
  7. Contributions

More details are provided in the LaTeX report template at https://github.com/rasbt/stat479-machine-learning-fs19/tree/master/report-template.

Please note that you should use the report-latex file for writing and submitting your report!

Also, you are required to submit all the code, computations, and experiments you developed and conducted for this project. Note that the quality of code will not have any influence on your grad and will merely serve as a basis to establish that the report contains original and “real” results.

Project Report Assessment

The rubric for grading the project reports is provided below.

Abstract: 15 pts

Introduction: 15 pts

Related Work: 15 pts

Proposed Method: 25 pts

Experiments: 25 pts

Results and Discussion: 30 pts

Conclusions: 15 pts

Contributions: 10 pts

Optional: Sharing your Project

You are encouraged to share your project/final project report online after you completed the course – for example, via GitHub or on a personal website online.

If there are enough students willing to share their report online, I’d be happy to write a short article summarizing your projects as I’ve done for the deep learning course last year.

Other Important Course Information

Rules, Rights & Responsibilities

See the Guides’s Rules, Rights and Responsibilities

Academic Integrity

By enrolling in this course, each student assumes the responsibilities of an active participant in UW-Madison’s community of scholars in which everyone’s academic work and behavior are held to the highest academic integrity standards. Academic misconduct compromises the integrity of the university. Cheating, fabrication, plagiarism, unauthorized collaboration, and helping others commit these acts are examples of academic misconduct, which can result in disciplinary action. This includes but is not limited to failure on the assignment/course, disciplinary probation, or suspension. Substantial or repeated cases of misconduct will be forwarded to the Office of Student Conduct & Community Standards for additional review. For more information, refer to studentconduct.wiscweb.wisc.edu/academic-integrity/.

Accommodations for Students with Disabilities

McBurney Disability Resource Center syllabus statement: “The University of Wisconsin-Madison supports the right of all enrolled students to a full and equal educational opportunity. The Americans with Disabilities Act (ADA), Wisconsin State Statute (36.12), and UW-Madison policy (Faculty Document 1071) require that students with disabilities be reasonably accommodated in instruction and campus life. Reasonable accommodations for students with disabilities is a shared faculty and student responsibility. Students are expected to inform faculty [me] of their need for instructional accommodations by the end of the third week of the semester, or as soon as possible after a disability has been incurred or recognized. Faculty [I], will work either directly with the student [you] or in coordination with the McBurney Center to identify and provide reasonable instructional accommodations. Disability information, including instructional accommodations as part of a student’s educational record, is confidential and protected under FERPA.” http://mcburney.wisc.edu/facstaffother/faculty/syllabus.php

Diversity and Inclusion

Institutional statement on diversity: “Diversity is a source of strength, creativity, and innovation for UW-Madison. We value the contributions of each person and respect the profound ways their identity, culture, background, experience, status, abilities, and opinion enrich the university community. We commit ourselves to the pursuit of excellence in teaching, research, outreach, and diversity as inextricably linked goals.

The University of Wisconsin-Madison fulfills its public mission by creating a welcoming and inclusive community for people from every background – people who as students, faculty, and staff serve Wisconsin and the world.” https://diversity.wisc.edu/

Schedule

Note that this is a tentative schedule subject to changes.

Below is a list of topics we aim to cover. However, we will take our time, and it is more important to build a good understanding of the core concepts and the field in general rather than covering one more algorithm. Keep in mind that a good foundation will enable you to study and understand additional algorithms if the need arises.



Topics Summary (Planned)

Below is a list of the topics I am planning to cover. Note that while these topics are numerated by lectures, note that some lectures are longer or shorter than others. Also, we may skip over certain topics in favor of others if time is a concern. While this section provides an overview of potential topics to be covered, the actual topics will be listed in the course calendar at the bottom of this page.

Part I: Introduction

Part II: Computational Foundations

Part III: Tree-Based Methods

Part IV: Evaluation

Part V: Dimensionality Reduction

Part VI: Bayesian Learning

Part VII: Regression

Part VIII: Unsupervised Learning


Calendar

Date
Event
Description
Lecture Material
Announcements
Thu,
Sep 5
Day 1
- L01: What is Machine Learning? An Overview.
 
Tue,
Sep 10
Day 2
- L01 cont'd
 
Thu,
Sep 12
Day 3
- L02: Nearest Neighbor Methods
 
Tue,
Sep 17
Day 4
- L02 cont'd
- L03: A Brief Intro to Python
 
Thu,
Sep 19
Day 5
- L03: cont'd
- L04: Scientific Computing in Python
Tue,
Sep 24
Day 6
- L04: cont'd
- L05: Preprocessing and Intro to Scikit-learn
 
Thu,
Sep 26
Day 7
L05: cont'd
 
Tue,
Oct 01
Day 8
L05: cont'd
Deadline for submitting your project group member preferences (6:00 pm).
Thu,
Oct 03
Day 9
- L06: Introduction to Decision Trees
HW1 is due tomorrow, Oct 4 (11:59 pm). The HW1 files are available here.
Tue,
Oct 08
Day 10
L06: cont'd
HW1 discussion
 
Thu,
Oct 10
Day 11
- L07: Ensemble Methods
 
Tue,
Oct 15
Day 12
L07: cont'd
 
Thu,
Oct 17
Day 13
Midterm
Exam
Takes place in the regular class room (VAN HISE 114) 4:00-5:15 pm. Please bring a scientific calcutor.
Tue,
Oct 22
Day 14
- L08: Model Evaluation Part 1
Project Proposal due 6:00 pm. PDF submission via Canvas. Use the LaTeX report template available here. Assessment criteria are explained here and here.
Thu,
Oct 24
Day 15
- L09: Model Evaluation Part 2
 
Tue,
Oct 29
Day 16
L09: cont'd
 
Thu,
Oct 31
Day 17
- L10: Model Evaluation Part 3
 
Tue,
Nov 05
Day 18
L10: cont'd
- L11: Model Evaluation Part 4
Thu,
Nov 07
Day 19
L11: cont'd
HW2 is due tomorrow tonight, Nov 8 (11:59 pm). The HW2 files are available here.
 
Tue,
Nov 12
Day 20
L11: cont'd
- L12: Model Evaluation Part 5
Thu,
Nov 14
Day 21
L12: cont'd
 
Tue,
Nov 19
Day 22
L13: Feature Selection
Thu,
Nov 21
Day 23
L14: Feature Extraction
Tue,
Nov 26
Day 24
Project Presentations I
 
Thu,
Nov 28
--
Thanksgiving (no class)
 
 
Tue,
Dec 03
Day 25
Project Presentations II
 
Thu,
Dec 05
Day 26
Project Presentations III
HW3 is due Dec 7 (11:59 pm). The HW3 files are available here.
Tue,
Dec 10
Day 27
Final Exam
Final Exam
In regular class room during regular time, 4:00-5:15 pm
Thu,
Dec 12
--
Study Day (no class)
 
 
Dec 18
--
Submit Final Project Report
 
Submit Final Project Report