Plot entropy for coin flip (Bernoulli trial) with probability p = P(heads) = P(success).

Small example to start by hand:

Students: You do not have to learn the following code

that implements the ID3 algorithm; but I think it may be helpful in understanding how ID3 works.

Define functions H(S), and H_weighted(S_minus, S_plus)

Define function split(X, y, feature_names)

that gives the best feature to split on, the best threshold, and the minimum entropy of that split.

Define recursive function decision_tree(X, y, feature_names, depth=0, debug=False)

that just returns on a zero-entropy set S = (X, y) of examples but otherwise calls split() to get the best split and then recursively calls decision_tree() on each of the left and right splits.

Make a tree from all 32 rows of mtcars (with scikit-learn again):