The bootstrap is a general technique for assessing uncertainty in estimation procedures in which computer simulation through resampling data replaces mathematical analysis. We will focus on using the bootstrap to attach a standard error to an estimated parameter, although there are many other tasks the bootstrap can solve.
Suppose that you are interested in estimating the mean from an unknown population on the basis of randomly sampled data. The sample mean is the natural estimate, but we also wish to assess the amount of uncertainty in this estimate. A measure of this uncertainty is the standard error, the standard deviation of the sampling distribution of the sample mean. Theory tells us that the standard error of the sample mean equals the population standard deviation divided by the square root of the sample size. In cases in which the population size is unknown, we may use the sample standard deviation instead of the population standard deviation. For normal shaped populations or large samples from nonnormal populations, we may also conclude that the shape of the sampling distribution is approximately normal, which allows the computation of confidence intervals.
Consider now the problem of estimating the population median. Again, the sample median is a natural estimate. Now, however, you may not be aware of a nice formula for finding the standard error. Simulation will allow us to estimate this number without great mathematical overhead.
In principle, the ideal way to estimate the standard error of the sample median would be to take a very large number of samples of the original size from the population, compute the sample median of each, and use the standard deviation of this large collection of simulated sample medians as an estimate of the true standard error. Unfortunately, we do not have the ability to sample repeatedly from the population. We can, however, sample repeatedly from our original sample, which is itself an estimate of the population. This is how the bootstrap works.
> median.boot <- median(sample(x),replace=T)Putting this into a loop allows you to implement the bootstrap.
Here is a more efficient piece of code from page 399 of An Introduction to the Bootstrap which implements the bootstrap for a general single sample problem.
> bootstrap <- function(x,nboot,theta,...){ + data <- matrix(sample(x,size=length(x)*nboot,replace=T),nrow=nboot) + return(apply(data,1,theta,...)) }To use this code to return the bootstrap sample medians from 200 bootstrap samples from array x:
> out <- bootstrap(x,200,median)If the statistic you wish to compute does not exist, you need to create an S-PLUS function which computes the estimate you want from the sampled data.
Bret Larget, larget@mathcs.duq.edu