Model Performance Assessment, Hyperparameter Tuning, Cross-Validation

Model Performance Assessment

Confusion Matrix

Precision, Recall, Accuracy

Area under ROC Curve (AUC)

Preface: True Positive Rate, False Positive Rate (on data above)

(Upgrade scikit-learn?)

I had to upgrade scikit-learn to make RocCurveDisplay.from_estimator() work, below.

The wine data are described here: wine-dataset.

I am not excited about the ROC curve above, as I cannot find a satisfactory TPR/FPR combination.

Let's try another classifier.

The decision tree isn't so bad, as we can have TPR$\approx$0.9 with FPR$\approx$0.1.

Now use RocCurveDisplay.from_predictions() to confirm we get the same plot RocCurveDisplay.from_estimator() made.

Now, for understanding, repeat the last plot, but show that the points on the curve came from roc_curve():

And let's confirm that we can do the calculations done by roc_curve():

Cross-Validation

Notice that training accuracy over-estimated test accuracy, while the cross-validation accuracy is a better estimate.

Hyperparameter Tuning

From the mean_test_score column, it looks like either C value worked perfectly for kernel='rbf'.