HW03: Practice with SVM, kNN, gradient descent, feature engineering

[Please put your name and NetID here.]

Hello Students:

1. Visualize classifier decision boundaries.

1a. Complete the function in the next cell that plots a classifier's decision boundary.

Or, rather, it plots a classifier's decisions over an area, revealing the boundary.

Hint: My solution used 10 lines:

Visualize the decision boundary for an SVM.

Here I have provided test code for your function to visualize the decision boundary for the SVM under the header "Now try 2D toy data" inhttps://pages.stat.wisc.edu/~jgillett/451/burkov/01/01separatingHyperplane.html.

Recall: That SVM's decision boundary was $y = -x + \frac{1}{2}$, so your function should make a plot with lightskyblue above that line and pink below that line. Then my code adds the data points in blue and red.

There is nothing for you to do in this step, provided you implemented the required function above.

1b. Visualize the decision boundary for a decision tree.

1c. Visualize the decision boundary for kNN with $k=3$.

(Experiment with $k=1$ and $k=2$ to see how the decision boundary varies with $k$ before setting $k=3$.)

1d. Visualize the decision boundary for an SVM with a nonlinear boundary.

Use the example under the header "Nonlinear boundary: use kernel trick" in https://pages.stat.wisc.edu/~jgillett/451/burkov/03/03SVM.html.

(Experiment with $\gamma = 2$, $\gamma = 10$, and $\gamma = 30$ to see how the decision boundary varies with gamma before setting gamma to 1/2.)

2. Run gradient descent by hand.

Run gradient descent with $\alpha = 0.1$ to minimize $z = f(x, y) = (x + 1)^2 + (y + 2)^2$. Start at (0, 0) and find the next two points on the descent path.

Hint: The minimum is at (-1, -2), so your answer should be approaching this point.

... your answer in a Markdown cell here ...

3. Practice feature engineering

by exploring the fact that rescaling may be necessary for kNN but not for a decision tree.

3a. Read and plot a toy concentric ellipses data set.

3b. Train a $k$NN classifier and report its accuracy.

3c. Now rescale the features using standardization; plot, train, and report accuracy again.

3d. Train a decision tree classifier on the original (unscaled) data and report its accuracy.

3e. Why is feature scaling unnecessary for an ID3 decision tree? Answer in a markdown cell.

... your answer here in a Markdown cell ...