Bootstrapping is a statistical technique that falls under the broader heading of resampling. Bootstrapping can be used in the estimation of nearly any statistic. It involves a relatively simple procedure, but repeated so many times that bootstrap techniques are heavily dependent upon computer calculations.
An Explanation of Bootstrapping
One goal of inferential statistics is to determine the value of a parameter of a population. It is typically too expensive or even impossible to measure this directly. So we use statistical sampling. We sample a population, measure a statistic of this sample, and then use this statistic to say something about the corresponding parameter of the population.
For example, in a chocolate factory, we might want to guarantee that candy bars have a particular mean weight. It’s not feasible to weigh every candy bar that is produced, so we use sampling techniques to randomly choose 100 candy bars. We calculate the mean of these 100 candy bars, and say that the population mean falls within a margin of error from what the mean of our sample is.
Suppose that a few months later we want to know with greater accuracy - or less of a margin of error - what the mean candy bar weight was on the day that we sampled the production line. We cannot use today’s candy bars, as too many variables have entered the picture (different batches of milk, sugar and cocoa beans, different atmospheric conditions, different employees on the line, etc.). All that we have from the day that we are curious about are the 100 weights. Without a time machine back to that day, it would seem that the initial margin of error is the best that we can hope for.
Fortunately we can use the technique of bootstrapping. In this situation we randomly sample with replacement from the 100 known weights. We then call this a bootstrap sample. Since we allow for replacement, this bootstrap sample most likely not identical to our initial sample. Some data points may be duplicated, and others data points from the initial 100 may be omitted in a bootstrap sample. With the help of a computer, thousands of bootstrap samples can be constructed in a relatively short time.
As mentioned, to truly use bootstrap techniques we need to use a computer. The following numerical example will help to demonstrate how the process works. If we begin with the sample 2, 4, 5, 6, 6, then all of the following are possible bootstrap samples:
- 2 ,5, 5, 6, 6
- 4, 5, 6, 6, 6
- 2, 2, 4, 5, 5
- 2, 2, 2, 4, 6
- 2, 2, 2, 2, 2
- 4,6, 6, 6, 6
Bootstrap techniques are relatively new to the field of statistics. The first use was published in a 1979 paper by Bradley Efron. As computing power has increased and become less expensive, bootstrap techniques have become more widespread.
Why the Name Bootstrapping?
The name “bootstrapping” comes from the phrase, “To lift himself up by his bootstraps.” This refers to something that is preposterous and impossible. Try as hard as you can, you cannot lift yourself into the air by tugging at pieces of leather on your boots.
There is some mathematical theory that justifies bootstrapping techniques. However, the use of bootstrapping does feel like you are doing the impossible. Although it does not seem like you would be able to improve upon the estimate of a population statistic by reusing the same sample over and over again, bootstrapping can in fact do this.