Three data sets, model fit, and regularization

Three data sets: practice with train_test_split():

Model fit (underfit, fit, overfit) and regularization

We will make a series of polynomial regression models $y_i = \sum_{p=0}^D w_i x_i^p + \epsilon_i$.

We will investigate what happens to the fit (underfit, fit, or overfit?) as $D$ increases.

Practice using PolynomialFeatures() to make an array whose columns are powers of $x$:

Make 2N $x$-coordinates over the interval [low, high]

(N for training and N for testing)

Make corresponding $y$-coordinates from $y_i = x_i^2 + \epsilon_i$,

where $\epsilon_i$ is a little random noise. This is the true relationship we wish to model. Pretend we do not know it.

Loop through dimensions D in $0, \ldots, 9$, making a model for each D:

Well, make an OLS polynomial regression model, but also Lasso and Ridge regularized models.

Guidance (for teacher) on exploring this example:

These explorations are harder in real life: