Stat 333: Applied Linear Regression

Spring 2025

UW-Madison

Karl Rohe

email: first name and last name at stat dott wisc dott edu
office: 6110 Medical Sciences Center Syllabus, R labs, Previous version of the project description, and o1 scribed notes!

week 10

  1. quiz thursday
  2. set up groups.

Homework

  1. April 8 10:45am: Each person should turn in 100-ish words about your groups data sources and your curiosities. This document should be in your own words, i.e. you can discuss as group, of course, but each person must write their own 100-ish words. Be sure to include a hyperlink to your data sources and your data’s description. Turn in on canvas as a compiled markdown or Rmarkdown html file.
  2. April 15 10:45am: Each person should turn in an Rmarkdown file where they load the data into Rstudio and print out 20 sampled rows from each data set (slice_sample(n=20)). Note: this likely includes some data tidying before it is nice to print! Then, print out a nice figure, ideally something that could be “Figure 1” in your paper. Be sure to include a hyperlink to your data sources and your data’s description. Turn in on canvas as a compiled Rmarkdown html file.
  3. April 22 10:45am: Each person should turn in the group’s thesis statement.

week 9

  1. Now the class really begins!
  2. Form groups of three. Within your group, you should have similar interests (i.e. project topics). If you have not found a group, that is totally fine! We will have a “free agent” session in class on Thursday.
  3. Attendence is super important now. The class website still serves as a reminder of what we did in class. Your group members are going to be frustrated with you if you don’t show up for discussions. This is work time.
  4. What is data science?

Homework

  • Make a group name. Should be mildly descriptive and totally flamboyant. This will help organize group discussions. Agree on a name by class time on Thursday.

After spring break on thursday, we will have a reading quiz. We will discuss the following two papers, “Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination” and “Extraneous factors in judicial decisions”. Here are the pdfs.

To prepare for this quiz, think about the following issues when you are reading the documents:

  1. What is the surrounding context of the research? Why it is interesting? [This should be found in the introduction.]
  2. What is the statistical model and the source of its data, i.e. what are the variables in the model and how were they collected? [This should be found in a section on the methodology.]

Then, after you read the paper, consider the following issues:

  1. The papers make conclusions about the “real world” based upon their statistical evidence. Which parts of the data analysis support the authors claims? Why? [This should be found in a section about the results/discussion.]
  2. After reading both papers, contrast the statistical evidence between the papers. How are the arguments different? This is not about the writing, but rather about the difference in their data collection.

Week 7

  1. Thursday, March 13: Midterm exam.
  2. What is a data dictionary? What is data documentation? Why are they important?
  3. A big picture of statistics and data science.

Week 6

  1. Recap: multiple regression!
  2. practice: “Are older planes more likely to have departure delays?”
  3. practice: “Are longer distance flights more likely to have departure delays?”
  4. How do we estimate \(\beta\)? Where does it come from?

Week 5

Midterm exam on Thursday, March 13.

  1. multiple regression!
  2. Tidy data (left_join)

Homework due Thursday Feb 27 at 10:30am:

Do all of the Practice Problems. This is a huge assignment. You should be a little scared and you should start tonight.

Week 4

Axiom for life (something even more important than advice): We are all people.

When I was in second grade, my dad said “Your teacher is a person. She goes home and has dinner and lives life at home and comes back the next day. School is her work.” It’s obvious, but also a helpful perspective. It is easy to think of our teachers as “the truth” or “super human” or something like that. But, the reality is far more mundane. We make mistakes. You can offend us. You can flatter us (we don’t like to think you are flattering us though!). This is one part of my job and my job is one part of my life. I’m really excited to teach you some things and not so excited to teach you other things. Etc.

  1. Tidy data (filter, mutate, select, pipe)!
  2. Wild example fetching TidyTuesday youtube playlist
  3. Tidy data (group_by, summarize, sclice_)

Homework:

Homework?

  1. do problems at the end of these slides by Tuesday February 18 by 10:30am on canvas. Tidy data (filter, mutate, select, pipe)!
  2. write a 100ish words reflection on one of the tidy tuesday videos below. Turn in on cavnas by 10:30am Thursday Feb 20.
  • Go to the table of tidyTuesday videos. Scroll to a “random” spot in that playlist. Look for one that looks interesting and watch the guy analyze the data. About 1 hour videos (I think). Reflect in 100ish words. Turn in on canvas by Jan 31 at midnight: Reflect on your own feelings while watching the video. Did you find certain parts overwhelming, intriguing, or confusing? Pick one moment that elicited a strong reaction and explore why it had that effect on you. For me (Karl), when someone is using code that I do not (yet?) understand, I often feel scared, like I do not belong, like I will never be able to learn it, like they are smart and I’m dumb. Whatever you feel is ok and my hope is that you can become aware of that. Perhaps you feel something like I do? Perhaps you feel something different? Whatever you feel, I hope you can be gentle with yourself and the things you are feeling as we learn together.

Have a look at the data dictionary. Then, track down the data documentation. This could be easy or hard and will likely require some internet sleuthing. If there are multiple data sources, then perhaps there is one key piece of data. Identify that key piece of data, then pursue the data documentation for that data.

  1. What does one line of data correspond to?
  2. How long is the data documentation that you found? Link to it.
  3. What pieces of the data documentation are most interesting to you with respect to the data that you are using? Have a look at these pieces (skim if too long). Find something interesting or surprising or confusing (i.e. anything worth discussing in class). Say what that is and why you had the reaction you did.

Week 2

  1. What does it mean: “linear models generalize t-tests”?
  2. Tidy Tuesday!

Week 1

Life advice (and also for class participation):

Brave Image

Topics

  1. This is a fun and important course because…
  • This is not an intro course. This course serves as a gateway to the advanced courses.
  • This is not a math course. This is a statistics course. More specifically, it is an applied statistics course.
  • Why should we study statistics?
  1. Go over syllabus.
  2. Introductions to course aims and texts:
  1. Where are people learning R?
  2. Can we use Large Language Models (LLMs) for class?
  • yes. I expect you to. How?
  • Search vs validation
  • You may not use computers or calculators or LLMs on exams
  • “I expect you can write good code now - the bar has been raised because gravity is lower now” -paul gp
  1. Linear models generalize t-tests.

Lecture discussion questions

  • What are your goals after graduation?
  • Where do you envision yourself in 10 years?
  • What are your experiences with R?
  • What type of data are you interested in studying? Topic of data source? Structure of data?
  1. What is your major?
  2. What statistics classes have you taken?
  3. Why are you in this class?
  4. What type of questions are you interested in studying? Topics? Data sources? Data types? Data sources?
  5. See page 52 in ISLR.

Homework

  • Go to the table of tidyTuesday videos. Scroll to a “random” spot in that playlist. Look for one that looks interesting and watch the guy analyze the data. About 1 hour videos (I think). Reflect in 100ish words. Turn in on canvas by Jan 31 at midnight: Reflect on your own feelings while watching the video. Did you find certain parts overwhelming, intriguing, or confusing? Pick one moment that elicited a strong reaction and explore why it had that effect on you. For me (Karl), when someone is using code that I do not (yet?) understand, I often feel scared, like I do not belong, like I will never be able to learn it, like they are smart and I’m dumb. Whatever you feel is ok and my hope is that you can become aware of that. Perhaps you feel something like I do? Perhaps you feel something different? Whatever you feel, I hope you can be gentle with yourself and the things you are feeling as we learn together.

Texts:

An Introduction to Statistical Learning with Applications in R
by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani

For reference: R for Data Science by Garrett Grolemund and Hadley Wickham

Optional, more “classical” text: Applied Linear Regression, 3rd edition by Sanford Weisberg.