Some comments: 

In the index (go back) is a link to a figure with 
a collection of cost functions for large margin classifiers
(the "margin" is yf).
Some are convex, some not. The cost functions 
for large  margin classifiers need only 
satisfy a rather weak condtion to target the 
Bayes rule, see Yi Lin, TR1044r.  
The penalized 
likelihood cost function $log(1+e^{-yf})$
which is the likelihood function when the 
data are coded \pm 1, 
and even quadratic loss (ordinary least 
squares regression) have been recognized 
by other authors as being large margin 
classifiers and been given new names by them 
of the form xxx-vector-machines. 
Other large margin classifiers have been 
proposed, some of them only later actually 
recognized as large margin classifiers. 

In some sense, the hinge function associated
with the SVM is the nearest convex upper 
bound to the misclassification counter.