Grace Wahba's Buehler-Martin Talks at UMinnesota

Talk 1. How to smooth curves and surfaces: an introduction to smoothing splines and related methods for statistical model building.

We begin by describing the popular cubic smoothing spline as a tool for fitting smooth curves to noisy data, and go on to describe generalizations of the variational problem it solves. These generalizations can be used to build flexible models for discrete, noisy observational data in such diverse fields as analysis of demographic studies of risk factors for heart attacks and diabetic retinopathy, extracting patterns from global historical surface temperature records that may be relevant to global warming, tuning of imperfect dynamical systems models of the atmosphere and ocean, and other applications. Public software will be noted. A bibliography is here.

Talk 2. What is the degrees of freedom for for signal, why do you want to know it, and how do you compute it?

We first review elementary definitions and applications of degrees of of freedom for signal (df sig) in linear regression and smoothing problems with observations contaminated by Gaussian noise. The crucial role of df sig in model tuning will be noted, as will its relation to cross validation. Randomized trace methods for computing df sig, particularly in the context of very large data sets with iterative computational solutions are discussed. Examples of the use of the RanTrace df sig in climate analysis and numerical weather prediction will be given. A different randomized method for computing df sig proposed by J. Ye will be described and its use in model selection (CART and PP) will be noted. Turning to non-Gaussian data, it is necessary to reconsider the definition of df sig. A more general definition, given by Ye, is shown to be appropriate for model selection and fitting purposes if if the loss function is KL distance. However, this definition generally involves the quantities one is trying to estimate. Two approximate methods for computing good estimates of df sig in the Bernoulli case (Gu's U and the GACV), will be described. More info here.

Talk 3. The nitty gritty of smoothing spline ANOVA models and the role of reproducing kernel spaces.

The smoothing spline ANOVA model paradigm will be described and the role of reproducing kernel spaces in fitting these models wlll be discussed. Further details of the applications in risk factor estimation for demographic medical data sets, and climate data analysis, mentioned in earlier lectures, will be described. Problems and some solutions for computing with very large data sets will be given. More info here.