Some comments: In the index (go back) is a link to a figure with a collection of cost functions for large margin classifiers (the "margin" is yf). Some are convex, some not. The cost functions for large margin classifiers need only satisfy a rather weak condtion to target the Bayes rule, see Yi Lin, TR1044r. The penalized likelihood cost function $log(1+e^{-yf})$ which is the likelihood function when the data are coded \pm 1, and even quadratic loss (ordinary least squares regression) have been recognized by other authors as being large margin classifiers and been given new names by them of the form xxx-vector-machines. Other large margin classifiers have been proposed, some of them only later actually recognized as large margin classifiers. In some sense, the hinge function associated with the SVM is the nearest convex upper bound to the misclassification counter.