Multiple Linear Regression Computations
This discussion covers the computational background for multiple linear regression. You will see that this is only an extension of the simple linear regression modeling covered in Module 2.
The objective of this section is to give you necessary formulae and tools to develop multiple linear regression models. This includes the steps for testing the model parameter significance, as well as overall model significance and model assumptions.
Note: Please keep in mind that statements made here with respect to the multiple linear regression are also valid in other regression modeling.
Let's look at some of the computational steps.
The general form of a multiple linear regression model may be expressed as
The estimated model may be written as
The model parameter estimators are found by minimizing the sum of squares of errors, SSE
In order to minimize this quadratic function SSE we take partial derivatives of the function with respect to each of the unknowns bj, for j=0,1,2...k, then we set the partial derivatives equal to zero and solve the resulting system of linear equations simultaneously:
The system of linear equations in (k+1) unknowns (b0,...,bk) becomes
Note: Don't give up just yet!! Please analyze the above system of linear equations. You should notice that there is a pattern with respect to the known elements (the sums relating to x) ---- the first element of the second column is the same as the first element of the second row; the second element of the third column is the same as the second element of the third row, and so on. The left side of the system of equations forms a symmetrical square matrix with respect to the elements, which can be determined from the data. You may also notice that the simple linear regression case is also embedded in the system ---- the square matrix consisting of the first two columns and rows, and the first two elements of the right-hand-side. Once you know that this type of a pattern exists, you don't need to deal with the partial derivatives etc., for any multiple- or polynomial regression models, but just recognize the pattern and use it. We will discuss polynomial regression in the next module.
Note: Please recall again, that here bj's are the unknowns, and the x and y values come from the data.
The system can be represented using matrix notation with three matrices A, b and c as follows:
which in this case results into
Note: Please notice the symmetry and the pattern.
The matrix (or vector) of unknowns b
The right-hand-side (RHS) matrix (or vector) c
Note: Please notice the pattern.
In the above matrices all elements of the matrices A and c are determined from the data. The system of linear equations in matrix form is solved for the vector b by the following matrix operations:
Note: I strongly recommend that you repeat the manual computations for a multiple linear regression case with two independent variables, and create the corresponding MS Excel table (with formulae). This will help you eliminate any and all 'magic' from regression.