Polynomial Regression
Overview of Polynomial Regression
Simple- and multiple linear regression models, Module 2 and Module 3, attempt to model the relationship between one dependent and one or more independent variables (Recall: Dependent vs. independent variables). It was assumed that the relationship between each dependent- and independent variable pair is linear or straight line. We already saw a number of times in the previous modules that that assumption is quite strong and in practice rarely true. In the previous modules we saw that the scatter plots of data points did form distinct curvilinear patterns. Please recall e.g. the income vs. education scatter plot in the Multiple Linear Regression chapter.
Learning Objectives
When you have completed this Module you should be able to
Assumptions for Polynomial Regression Models
For polynomial regression models we assume that:
As you see, there are not too many changes in the assumptions. You already know that these assumptions may, or may not be true. In practice all model assumptions need to be tested, and, in practice, there will be no perfect models. Like in simple- and multiple linear regression we attempt to fit a line, plane or hyperplane to a set of data. (Note: a plane here is synonymous to a surface.) With one dependent variable and two independent variables we are dealing with a 3D space, and therefore the models represent planes, which we can draw if we want. Beyond one dependent- and two independent variables the model represents, what is commonly called, a hyperplane which we cannot draw.
Note: What is a polynomial!? Did you forget! If you throw a ball up in the air, it will follow a path up and down back to the ground. This flight path of the ball is approximately a parabola (recall: y = ax2). The formula of a parabola is an example of a polynomial. Note that the variable power is a constant (2 in this case). In general a polynomial can be written as y = bnxn + bn-1xn-1 + bn-2xn-2 + ... + b3x3 + b2x2 + b1x + b0. In this formula all variable exponents are constants. Below you see an animation of a parabola approximating the flight path of that ball.
Note: Sometimes people talk about the polynomial models as
either linear or non-linear models. Which one is it!?! Actually, both
terms are acceptable. It depends how you want to look at it: The polynomial
models are linear with respect to the model parameters
i estimated by
bi , because the parameter exponents are equal to
one. However, the polynomial models are nonlinear with
respect to the variables, e.g. x2, x3, .... All
polynomial terms are obtained from the data by raising the data column
values to a desired power.
Note: Polynomial terms can be a source for serious multicollinearity. For example, in the polynomial model, e.g. y = b0 + b1x + b2x2 the variable x is used twice as an independent variable, first in b1x and the second time in b2x2 as a squared term. Clearly, x and x2 are related, and therefore, multicollinearity can be present.