Mean = x-bar = sum x_i / n Variance = s^2 = sum (x_i - x-bar)^2 / (n-1)As written, computation of the variance requires two passes through the data, one to sum the data and compute the mean, followed by a second pass to find the sum of the squared deviations from the mean and the variance.
s^2 = ( sum (x_i)^2 - (sum x_i)^2 / n ) / (n-1)However, this algorithm gives incorrect answers for ill-conditioned data. For example, try using the statistics functions on your calculator to find the standard deviation of the numbers 10,000,001, 10,000,003, and 10,000,005. Exact computation gives the answer as 2 (the same as the standard deviation of 1, 3, and 5), but your calculator probably returns the number 0. The desk calculator algorithm will have problems whenever the coefficient of variation (s / x-bar) is quite small.
Computers do not store the values of real numbers exactly. The value called the machine EPSILON is the smallest number for which 1 + EPSILON is greater than 1 on the computer. If the square of the coefficient of variation is smaller than the machine EPSILON, the desk calculator algorithm will fail.
m_k = mean of first k numbers s_k = sum (x_i - m_k)^2, where sum is over first k numbersBy simplifying the expression for m_k - m_{k-1} and s_k - s_{k-1}, we arrive at these difference equations.
m_k = m_{k-1} + (x_k - m_{k-1}) / k s_k = s_{k-1} + (x_k - m_{k-1}) * (x_k - m_k)(See Homework #5, Problem 1 for a related question.) An algorithm built upon these difference equations can handle many data sets for which the desk calculator algorithm fails at a price of a small increase in storage and computation.
s^2 = ( sum (x_i - x_1)^2 - ( sum (x_i - x_1) )^2 / n ) / (n-1)This method also avoids many numerical difficulties because the first number is often close in size to most of the numbers in the data set.
Bret Larget, larget@mathcs.duq.edu