Ensemble learning

Classification by optical recognition of handwritten digits

The digits dataset has 1797 labeled images of hand-written digits.

First we explore the data.

We can see from digits.target that the first three rows are repeats of 0 through 9, but the fourth row seems to be random digits:

Here we split the digits data into training, validation, and test sets:

Here we try ensemble learning.

We will try each of bagging, random forest, and gradient boosting to see whether they improve performance over a basic decision tree.

With very little effort, Bagging and RandomForest improved upon the basic DecisionTree. Hyperparameter tuning (via GridSearchCV() or RandomizedSearchCV()) might help further.

This run of GradientBoosting is a disaster! The problem, I think (from code experiments in class) is the learning rate $\alpha = 1.0$ is too large. I may add code later to do hyperparameter tuning to find a good value and see that GradientBoosting can help too.

(In fact $\alpha = 0.25$ yields 0.956, but I will leave the learning_rate=1.0 code here to show the failure with $\alpha=1.0$.)