In lecture, we discussed ordinary least squares (OLS) regression in the setting of simple linear regression, whereby we find \(\beta_0\) and \(\beta_1\) minimizing the sum of squared errors,

\[ \ell(\beta_0,\beta_1) = \sum_{i=1}^n \left(y_i - (\beta_0 + \beta_1 x_i) \right)^2, \]

where \((x_1,y_1), (x_2,y_2),\dots,(x_n,y_n)\) are our observations.

We said in lecture that this loss is minimized by taking \[ \begin{aligned} \hat{\beta}_0 &= \bar{y} - \hat{\beta}_1 \bar{x} \\ \hat{\beta}_1 &= \frac{ \sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y}) } { \sum_{i=1}^n (x_i - \bar{x})^2 }, \end{aligned} \]

where \(\bar{x}\) and \(\bar{y}\) are the means of the predictors and responses, respectively: \[ \bar{x} = \frac{1}{n} \sum_{i=1}^n x_i ~~~\text{ and }~~~ \bar{y} = \frac{1}{n} \sum_{i=1}^n y_i. \]

This short note will show the math behind this decision.

We want to minimize our loss, \(\ell(\beta_0,\beta_1)\) with respect to \(\beta_0\) and \(\beta_1\). Thinking back to calculus, the easiest way to do this is to take derivatives with respect to our parameters \(\beta_0\) and \(\beta_1\), set those derivatives equal to zero, and solve for \(\beta_0\) and \(\beta_1\).

Okay, let’s try that.

First, let’s take the derivative of \(\ell(\beta_0,\beta_1)\) with respect to \(\beta_0\): \[ \frac{\partial \ell(\beta_0,\beta_1)}{\partial \beta_0} = \sum_{i=1}^n -2 \left(y_i - \beta_0 - \beta_1 x_i \right) . \] Now, let’s set that equal to zero and solve for \(\beta_0\).

\[ \begin{aligned} 0 &= \sum_{i=1}^n -2 \left(y_i - \beta_0 - \beta_1 x_i \right) .\\ \sum_{i=1}^n \left(y_i - \beta_0 - \beta_1 x_i \right) &= 0.\\ \sum_{i=1}^n y_i - \sum_{i=1}^n \beta_0 - \beta_1 \sum_{i=1}^n x_i &= 0 \\ n \beta_0 &= \sum_{i=1}^n y_i - \beta_1 \sum_{i=1}^n x_i \\ \beta_0 &= \bar{y} - \beta_1 \bar{x}. \end{aligned} \]

The first step consists of dividing both sides by \(-2\). The second step follows by breaking up the sum into three separate sums over \(y_i\), \(\beta_0\) and \(\beta_1 x_i\). The third step comes from moving the sums over \(x_i\) and \(y_i\) to the other side of the equation. The final step comes from dividing though by \(n\) and applying our definition of \(\bar{x}\) and \(\bar{y}\).

Now, let’s take the derivative with respect to \(\beta_1\). \[ \frac{\partial \ell(\beta_0,\beta_1)}{\partial \beta_1} = \sum_{i=1}^n 2 \left(y_i - \beta_0 - \beta_1 x_i \right)(-x_i ) . \]

Let’s set this derivative equal to zero and solve for \(\beta_1\). \[ \begin{aligned} 0 &= \sum_{i=1}^n 2 \left(y_i - \beta_0 - \beta_1 x_i \right)(-x_i ) \\ \sum_{i=1}^n \left(y_i - \beta_0 - \beta_1 x_i \right) x_i &= 0 \\ \sum_{i=1}^n y_i x_i - \beta_0 \sum_{i=1}^n x_i - \beta_1 \sum_{i=1}^n x_i^2 &= 0 , \end{aligned} \]

where the first step was to divide through by \(-2\) and the second step involved bringing \(x_i\) inside the parentheses and breaking up the sum.

Now, let’s plug in our solution for \(\beta_0\) from above: \[ \begin{aligned} \sum_{i=1}^n y_i x_i - \beta_0 \sum_{i=1}^n x_i - \beta_1 \sum_{i=1}^n x_i^2 &= 0 \\ \sum_{i=1}^n y_i x_i - (\bar{y} - \beta_1 \bar{x}) \sum_{i=1}^n x_i - \beta_1 \sum_{i=1}^n x_i^2 &= 0 \\ \sum_{i=1}^n y_i x_i - \bar{y} \sum_{i=1}^n x_i + \beta_1 \left( \bar{x} \sum_{i=1}^n x_i - \sum_{i=1}^n x_i^2\right) &=0\\ \beta_1 \left( \bar{x} \sum_{i=1}^n x_i - \sum_{i=1}^n x_i^2\right) &= -\left(\sum_{i=1}^n y_i x_i - \bar{y} \sum_{i=1}^n x_i\right) \\ \beta_1 &= \frac{ -\sum_{i=1}^n \left( y_i x_i - \bar{y} x_i \right) } { \bar{x} \sum_{i=1}^n x_i - \sum_{i=1}^n x_i^2 } \\ \beta_1 &= \frac{ \sum_{i=1}^n \left( y_i x_i - \bar{y} x_i \right) } { \sum_{i=1}^n x_i^2 - \bar{x} \sum_{i=1}^n x_i }. \end{aligned} \]

Okay, we’re almost there. Let’s look at the numerator term. Unrolling the sum, \[ \begin{aligned} \sum_{i=1}^n \left( y_i x_i - \bar{y} x_i \right) &= \sum_{i=1}^n y_i x_i - \bar{y} \sum_{i=1}^n x_i \\ &= \sum_{i=1}^n y_i x_i - n \bar{x} \bar{y} \\ &= \sum_{i=1}^n y_i x_i - n \bar{x} \bar{y} + n \bar{x} \bar{y} - n \bar{x} \bar{y} \\ &= \sum_{i=1}^n y_i x_i - \sum_{i=1}^n x_i \bar{y} - \sum_{i=1}^n y_i \bar{x} + n \bar{y} \bar{x} \\ &= \sum_{i=1}^n \left( y_i x_i - x_i \bar{y} - y_i \bar{x} + \bar{x}\bar{y} \right) \\ &= \sum_{i=1}^n (y_i - \bar{y})(x_i - \bar{x}) \end{aligned} \]

That is, \[ \begin{aligned} \beta_1 &= \frac{ \sum_{i=1}^n \left( y_i x_i - \bar{y} x_i \right) } { \sum_{i=1}^n x_i^2 - \bar{x} \sum_{i=1}^n x_i } \\ &= \frac{ \sum_{i=1}^n (x_i - \bar{x})(y_i-\bar{y}) } { \sum_{i=1}^n x_i^2 - \bar{x} \sum_{i=1}^n x_i } . \end{aligned} \] Now, let’s look at the denominator: \[ \begin{aligned} \sum_{i=1}^n x_i^2 - \bar{x} \sum_{i=1}^n x_i &= \sum_{i=1}^n x_i^2 - 2 \bar{x} \sum_{i=1}^n x_i + n \bar{x}^2 \\ &= \sum_{i=1}^n \left( x_i^2 - 2 x_i \bar{x} + \bar{x}^2 \right) \\ &= \sum_{i=1}^n (x_i - \bar{x})^2. \end{aligned} \] Putting everything together, \[ \hat{\beta}_1 = \frac{ \sum_{i=1}^n (y_i-\bar{y})(x_i-\bar{x}) }{\sum_{i=1}^n (x_i - \bar{x})^2 }, \] and our OLS estimates are

\[ \begin{aligned} \hat{\beta}_0 &= \bar{y} - \hat{\beta}_1 \bar{x} \\ \hat{\beta}_1 &= \frac{\sum_{i=1}^n (y_i - \bar{y})(x_i - \bar{x})} {\sum_{i=1}^n (x_i - \bar{x})^2 }. \end{aligned} \]