Simple (univariate) Linear Regression - Examples
This example covers three cases to demonstrate the manual computations of simple linear regression using simple numbers and a small data set of seven observations. The objective is to demonstrate
Note: Please keep in mind that all statements made here with respect to the simple linear regression are also valid in multivariate- and non-linear regression cases later.
Let's look at the three cases.
The figure and animation below show the data in the first two columns. These are followed by three columns of intermediate computations: values of x squared, values of x multiplied by values of y, and values of y squared. The column totals are used to calculate the sums of squares: SSxx, SSyy and SSxy. These, in turn, with the averages of x and y are used to obtain the parameter estimates b0 and b1. Now, the column labeled yhat shows the estimated values of y obtained using the model with the original data values x from column B.
Case 1: Perfect Regression
In this case, when the data values of x increase then the values
of y increase linearly and proportionally. A single straight line can
be drawn through all points in the data. Please note that the model
= b0 + b1 x
becomes
= 10 x because
b0 = 0 and b1 = 10. At the same
time SSE = 0 suggesting that there are no errors, or no deviations
from data points to the model line, i.e. the Sum of Squares of Errors,
SSE is zero. Further, we can also see that SST=SSR, and
hence r2=SSR/SST=1. The latter suggests that all
variability in the data is due to the relationship between the variables.
See the animation and the changing numbers in the table above.
Case 2: No Regression
In this case when values of x increase then the values of y
remain the same. A single straight line can be drawn through all points in the
data. Please note that the model
= b0
+ b1 x becomes
= 40 =
because b0 = 40 and
b1 = 0. At the same time SSE = 0 suggesting that
there are no errors, or no deviations from data
points to the model line, i.e. the Sum of Squares of Errors, SSE is zero.
Further, we can also see that SST=SSR=SSE=0, and hence
r2=SSR/SST is undefined. (Can't divide by zero!!!). The fact
that there is no slope, i.e. b1 = 0, changing values of
x have no impact on the values of y. Study the animation and
the changing numbers in the table above.
Case 3: Comparison of Manual and MS Excel Computations
In this case when values of x increase then the values of y
also increase. The increases are, however, not proportional. A single straight
line cannot be drawn such that it would touch all points in the data. Please
note that the model
= b0
+ b1 x is estimated to
=
7.57 + 5.52 x. At the same time SSE = 26.85 suggesting that
there is some level of error in the data. Further, we can also see
that SST = 879.36
and SSR = 852.51. Hence, we obtain r2 = SSR/SST = 0.97.
It follows, that most of the variability, in fact 97% is explained by
the model. Study the animation and
the changing numbers in the table above.
Please compare the above computations to the below MS Excel output. You should easily identify how most of the Excel output numbers are obtained.
Note: I strongly recommend that you repeat the manual computations and create the corresponding Excel table (with formulae). This will help you eliminate any doubt and 'magic' from regression.