Statistics 860 - Estimation of Functions from Data
a.k.a. statistical machine learning.

T Th 4:00-5:15 Fall 2016,
Room 133 SMI (MED SC CTR, 1300 University Ave)

Stat 709-10 NOT required.

Grace Wahba, Instructor

Short description:

1. Reproducing Kernel Hilbert Spaces from the point of view of supervised machine learning and statistical model building. Penalized likelihood, support vector machines and related regularization methods. Bayesian connections. The representer theorem.

2. Smoothing splines; thin plate splines, radial basis functions. ANOVA splines.

3. Degrees of freedom for signal and the bias-variance tradeoff, Bayesian confidence intervals; variable and model selection methods. Tuning methods: GCV, GACV, BGACV, Unbiassed risk, AIC, BIC and their properties, cross validation. Randomized trace estimates for df signal.

4. The LASSO PatternSearch algorithm. The partitioned LPS. Issues regarding prediction vs. sparse variable selection.

6.Regularized Kernel Estimation, Robust Manifold Unfolding, and Distance Correlation - Assimiltion of pairwise distance/dissimilarity information into regression, classification, clustering and variable selection algorithms with attribute and other information. The Distance Covariance Variable Selection Theorem - open questions in pairwise distance methods and variable selection..

7. Multiple, complex input structures, multivariate correlated Bernoulli outcomes, soft classification, multicategory support vector machines

8. Applications in risk factor analysis in medical data analysis with genetic, pedigree, covariate and other sources of information. Applications in data mining and machine learning. *Selected recent additions to the supervised machine learning literature.*

Prerequisites: - Statistics Majors:
multivariate
analysis, or, some exposure to Hilbert spaces, or cons. instr.
Those unfamiliar with Hilbert spaces will be asked to read the
first 33 pages of Akhiezer and Glazman, Theory of Linear
Operators in Hilbert Spaces, vol. I
`here`
at the beginning of the
course. Graduate students in Biostatistics, CS, AOS and other
physical sciences, engineering,
economics, animal science, political science, social science
and business
may find some of the techniques
studied here useful and are welcome to sit in, or, take
the course for credit if they have exposure to linear algebra,
sufficient math background to read Akhiezer and Glazman,
and are familiar with the basic properties of the multivariate
normal distribution, as found, e. g. in Anderson, Multivariate
Analysis, or Wilks, Mathematical Statistics. Otherwise, the
development will be self-contained. If in doubt, please
contact the instructor by e-mail (wahba@stat.wisc.edu)
or come to the first class. This will be a seminar-type course.
There will be no sit-down exams. Students taking the course
for credit will be expected to do several small computer projects
studying the behavior of some of the methods
discussed on simulated or experimental data, and
one or two projects in an area of application of their
choice with a possible project being the presentation of
a lecture in class on a recent paper or recent research.
Text: Wahba: Spline Models for
Observational Data, SIAM (1990) as well as selected papers,
including some from recent conferences (e. g. NIPS, ICML, JSM).
NOTE: Online version of "Spline Models" is available through the
SIAM e-books at the university library. Search "Spline" in title thru:
`here`

April 18, 2015