Grace Wahba's Buehler-Martin Talks at UMinnesota
We begin by describing the popular cubic
smoothing spline as a tool for fitting
smooth curves to noisy data, and go on to
describe generalizations of the variational
problem it solves. These generalizations
can be used to build flexible models for
discrete, noisy observational data in such
diverse fields as analysis of demographic
studies of risk factors for heart attacks
and diabetic retinopathy, extracting patterns
from global historical surface temperature
records that may be relevant to global
warming, tuning of imperfect dynamical
systems models of the atmosphere and
ocean, and other applications.
Public software will be noted.
A bibliography is
here.
We first review elementary definitions
and applications of degrees of
of freedom for signal (df sig)
in linear regression and smoothing problems
with observations contaminated by Gaussian
noise. The crucial role of df sig in
model tuning will be noted, as will its
relation to cross validation. Randomized
trace methods for computing
df sig, particularly in the context of
very large data sets with iterative
computational solutions are discussed.
Examples of the use of the RanTrace df sig
in climate analysis and numerical weather prediction
will be given.
A different randomized method for computing
df sig proposed by J. Ye will be described
and its use in model selection (CART and PP)
will be noted.
Turning to non-Gaussian
data, it is necessary to reconsider
the definition of df sig. A more general
definition, given by Ye,
is shown to be appropriate for model
selection and fitting purposes if
if the loss function is KL distance.
However, this definition generally
involves the quantities one is trying to
estimate. Two approximate methods for
computing good estimates of df sig in the
Bernoulli case (Gu's U and the GACV),
will be described.
More info
here.
The smoothing spline ANOVA model paradigm will be described and the role of reproducing kernel spaces in fitting these models wlll be discussed. Further details of the applications in risk factor estimation for demographic medical data sets, and climate data analysis, mentioned in earlier lectures, will be described. Problems and some solutions for computing with very large data sets will be given. More info here.