[Please put your name and NetID here.]
Start by downloading HW03.ipynb from this folder. Then develop it into your solution.
Write code where you see "... your code here ..." below. (You are welcome to use more than one cell.)
If you have questions, please ask them in class or office hours. Our TA and I are very happy to help with the programming (provided you start early enough, and provided we are not helping so much that we undermine your learning).
When you are done, run these Notebook commands:
Turn in HW03.ipynb and HW03.html to Canvas's HW03 assignment
As a check, download your files from Canvas to a new 'junk' folder. Try 'Kernel > Restart and Run All' on the '.ipynb' file to make sure it works. Glance through the '.html' file.
Turn in partial solutions to Canvas before the deadline. e.g. Turn in part 1, then parts 1 and 2, then your whole solution. That way we can award partial credit even if you miss the deadline. We will grade your last submission before the deadline.
import pandas as pd
from io import StringIO
from sklearn import svm
import matplotlib.pyplot as plt
# ... your code here ... (import statements)
Or, rather, it plots a classifier's decisions over an area, revealing the boundary.
Hint: My solution used 10 lines:
columns
parameter to clf.feature_names_in_
, to get a DataFrame.def plot_decision_boundary(clf, xlim, ylim, grid_resolution):
"""Display how clf classifies each point in the space specified by xlim and ylim.
- clf is a classifier (already fit to data).
- xlim and ylim are each 2-tuples of the form (low, high).
- grid_resolution specifies the number of points into which the xlim is divided
and the number into which the ylim interval is divided. The function plots
grid_resolution * grid_resolution points."""
# ... your code here ...
Here I have provided test code for your function to visualize the decision boundary for the SVM under the header "Now try 2D toy data" inhttps://pages.stat.wisc.edu/~jgillett/451/burkov/01/01separatingHyperplane.html.
Recall: That SVM's decision boundary was $y = -x + \frac{1}{2}$, so your function should make a plot with lightskyblue above that line and pink below that line. Then my code adds the data points in blue and red.
There is nothing for you to do in this step, provided you implemented the required function above.
data_string = """
x0, x1, y
0, 0, -1
-1, 1, -1
1, -1, -1
0, 1, 1
1, 1, 1
1, 0, 1
"""
df = pd.read_csv(StringIO(data_string), sep='\s*,\s+', engine='python')
clf = svm.SVC(kernel="linear", C=1000)
clf.fit(df[['x0', 'x1']], df['y'])
# Call student's function.
plot_decision_boundary(clf=clf, xlim=(-4, 4), ylim=(-4, 4), grid_resolution=100)
# Add training examples to plot.
colors = {-1:'red', 1:'blue'}
for y in (-1, 1):
plt.plot(df.x0[df.y == y], df.x1[df.y == y], '.', color=colors[y])
# ... your code here ...
(Experiment with $k=1$ and $k=2$ to see how the decision boundary varies with $k$ before setting $k=3$.)
# ... your code here ...
Use the example under the header "Nonlinear boundary: use kernel trick" in https://pages.stat.wisc.edu/~jgillett/451/burkov/03/03SVM.html.
(Experiment with $\gamma = 2$, $\gamma = 10$, and $\gamma = 30$ to see how the decision boundary varies with gamma before setting gamma to 1/2.)
# ... your code here ...
Run gradient descent with $\alpha = 0.1$ to minimize $z = f(x, y) = (x + 1)^2 + (y + 2)^2$. Start at (0, 0) and find the next two points on the descent path.
Hint: The minimum is at (-1, -2), so your answer should be approaching this point.
by exploring the fact that rescaling may be necessary for kNN but not for a decision tree.
# ... your code here ...
Training accuracy is 0.500
(0.500 may not be correct).# ... your code here ...
# ... your code here ...
# ... your code here ...