Some comments:
In the index (go back) is a link to a figure with
a collection of cost functions for large margin classifiers
(the "margin" is yf).
Some are convex, some not. The cost functions
for large margin classifiers need only
satisfy a rather weak condtion to target the
Bayes rule, see Yi Lin, TR1044r.
The penalized
likelihood cost function $log(1+e^{-yf})$
which is the likelihood function when the
data are coded \pm 1,
and even quadratic loss (ordinary least
squares regression) have been recognized
by other authors as being large margin
classifiers and been given new names by them
of the form xxx-vector-machines.
Other large margin classifiers have been
proposed, some of them only later actually
recognized as large margin classifiers.
In some sense, the hinge function associated
with the SVM is the nearest convex upper
bound to the misclassification counter.