HW1: Practice with Python, hard-margin SVM, and linear regression¶

... your name and NetID here ...¶

Hello Students:

  • Start by downloading HW1.ipynb from this folder. Then develop it into your solution.

  • Write code where you see "... your code here ..." below. (You are welcome to use more than one cell.)

  • I've included the output from my solution in HW1.html so you can check your work. Your output should match or be close to mine. Use 3 significant figures for floats. e.g. We can print 3 figures for 𝜋/1000 as print(f'{np.pi/1000:.3}'). The pattern is print(f'{x:.precision}'), where x is the value to print and precision is the number of figures.

  • If you have questions, please ask them in class or office hours. Our TA and I are very happy to help with the programming (provided you start early enough, and provided we are not helping so much that we undermine your learning).

  • Please clean up your code:

    • Comment out unnecessary code that is useful for orienting you, like printing the data set or installing libraries.
    • Label your output, like writing 'weight=20.1' or 'The weight is 20.1' rather than just '20.1'.
    • Simplify your code if you can.
  • When you are done, run these Notebook commands:

    • Shift-L (once, so that line numbers are visible)
    • Kernel > Restart and Run All (run all cells from scratch)
    • Esc S (save)
    • File > Download as > HTML
  • Turn in:

    • HW01.ipynb to Canvas's HW01 assignment
    • HW01.html to the same assignment by clicking 'Add Another File'

    As a check, download your HW01.ipynb from Canvas to a new 'junk' folder. Try 'Kernel > Restart and Run All' to make sure it works. Glance through the new '.html' file.

  • Turn in partial solutions to Canvas before the deadline. e.g. Turn in part 1, then parts 1 and 2a, then your whole solution. That way we can award partial credit even if you miss the deadline. We will grade your last submission before the deadline.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn import svm, linear_model

1. Use a hard-margin SVM¶

to classify cars as having automatic or manual transmissions.

  • Read http://www.stat.wisc.edu/~jgillett/451/01/mtcars30.csv into a DataFrame. (This is the mtcars data frame from R with two of its rows removed to get linearly separable data.)
  • Make an X from the wt (weight in 1000s of pounds) and mpg (miles per gallon) columns. Make y from the am column (where 0=automatic or 1=manual transmission).
  • Train an SVM using kernel='linear' and C=1000. Print its coefficients and intercept.
  • Report the training accuracy. (It's given by clf.score(X, y).)
  • Predict the transmission for a car weighing 4000 pounds (wt=4) that gets 20 mpg.
  • Use five plt.plot() calls to make a figure with wt on its x-axis and mpg on its y-axis including:
    • the automatic transmission cars in red
    • the manual transmission cars in blue
    • the decision boundary (the center line of the road)
    • the lower margin boundary (the left side of the road)
    • the upper margin boundary (the right side of the road)
    • a reasonable title, axis labels, and legend
In [2]:
# ... your code here ...
The decision boundary is -8.24 * weight + -0.309 * mileage + 32.0 = 0.
The training accuracy is 1.0.
We predict that a car weighing 4 thousand pounds that gets 20 mpg has transmission type 0 (where 0=automatic, 1=manual).

2. Make three linear regression models.¶

2a: Make a simple regression model by hand.¶

Use the matrix formula $w = (X^T X)^{-1} X^T y$ we developed in class to fit these three points: (0, 5), (2, 1), (4, 3). (Use linear_model.linearRegression(), if you wish, to check your work.)

... your answer here (just give the model, $y = w x + b$; you do not need to show your work) ...

intercept=4.0, slope=-0.5

2b: Make a simple linear regression model from real data.¶

Estimate the average daily trading volume of a Dow Jones Industrial Average stock from its market capitalization. That is, use $y = $ AvgVol vs. $x =$ MarketCap.

  • Read http://www.stat.wisc.edu/~jgillett/451/data/DJIA.csv into a DataFrame.
  • Find the model. Print its equation.
  • Print its $R^2$ value (the proportion of variability in $y$ accounted for by $x$ via the linear model, given by model.score(X, y)).
  • Make a plot of the data and regression line.
  • Use the model to predict the volume for a company with market capitalization of 0.25e12 (a quarter-trillion dollars); add this as a red point on your plot.
  • Say what happens to Volume as Market Capitalization increases. (Use a Markdown cell.)
In [5]:
# ... your code here ...
The model is Volume = 2.68e-05 * (Market Capitalization) + 3.41e+06.
R^2 is 0.705.
We predict a Volume of 1.01e+07 for a company with Market Capitalization 2.5e+11 (see red dot).

Make a multiple regression model.¶

Estimate the same volume from both market capitalization and price. That is, use $y =$ AvgVol vs. $x_1 =$ MarketCap and $x_2 =$ Price.

  • Find the model.
  • Print its equation.
  • Print its $R^2$ value.
  • Say what happens to Volume as Market Capitalization increases (while holding Price fixed) and what happens to Volume as Price increases (while holding Capitalization fixed). (Use a Markdown cell.)
In [7]:
# ... your code here ...
The model is Volume = 2.89e-05 * (Market Capitalization) + -6.69e+04 * Price + 1.44e+07.
R^2 is 0.823.