Simple (univariate) Linear Regression - Examples

This example covers three cases to demonstrate the manual computations of simple linear regression using simple numbers and a small data set of seven observations. The objective is to demonstrate

Note: Please keep in mind that all statements made here with respect to the simple linear regression are also valid in multivariate- and non-linear regression cases later.

The figure and animation below show the data in the first two columns. These are followed by three columns of intermediate computations: values of x squared, values of x multiplied by values of y, and values of y squared. The column totals are used to calculate the sums of squares: SSxx, SSyy and SSxy. These, in turn, with the averages of x and y are used to obtain the parameter estimates b0 and b1. Now, the column labeled yhat shows the estimated values of y obtained using the model with the original data values x from column B.

Case 1: Perfect Regression

In this case, when the data values of x increase then the values of y increase linearly and proportionally. A single straight line can be drawn through all points in the data. Please note that the model = b0 + b1 x becomes = 10 x because b0 = 0 and b1 = 10. At the same time SSE = 0 suggesting that there are no errors, or no deviations from data points to the model line, i.e. the Sum of Squares of Errors, SSE is zero. Further, we can also see that SST=SSR, and hence r2=SSR/SST=1. The latter suggests that all variability in the data is due to the relationship between the variables. See the animation and the changing numbers in the table above.

Case 2: No Regression

In this case when values of x increase then the values of y remain the same. A single straight line can be drawn through all points in the data. Please note that the model = b0 + b1 x becomes = 40 = because b0 = 40 and b1 = 0. At the same time SSE = 0 suggesting that there are no errors, or no deviations from data points to the model line, i.e. the Sum of Squares of Errors, SSE is zero. Further, we can also see that SST=SSR=SSE=0, and hence r2=SSR/SST is undefined. (Can't divide by zero!!!). The fact that there is no slope, i.e. b1 = 0, changing values of x have no impact on the values of y. Study the animation and the changing numbers in the table above.

Case 3: Comparison of Manual and MS Excel Computations

In this case when values of x increase then the values of y also increase. The increases are, however, not proportional. A single straight line cannot be drawn such that it would touch all points in the data. Please note that the model = b0 + b1 x is estimated to = 7.57 + 5.52 x. At the same time SSE = 26.85 suggesting that there is some level of error in the data. Further, we can also see that SST = 879.36 and SSR = 852.51. Hence, we obtain r2 = SSR/SST = 0.97. It follows, that most of the variability, in fact 97% is explained by the model. Study the animation and the changing numbers in the table above.

Please compare the above computations to the below MS Excel output. You should easily identify how most of the Excel output numbers are obtained.

Note: I strongly recommend that you repeat the manual computations and create the corresponding Excel table (with formulae). This will help you eliminate any doubt and 'magic' from regression.