Simple (univariate) Linear Regression

Let's look at a few simple cases, and go from there. Please study and compare all five cases.

Case 1: Perfect Fit - We will consider only four data points (pairs of x and y values). If you want, you can think of variable x as, for example, number of units sold, and y as sales revenue in dollars (or any other currency units). In other words, if you sell one unit your sales revenue is two dollars, if you sell two units your sales revenue is three dollars, etc., as you apply a quantity discount. These data points are shown below. You can see from the data table, that x and y values have a one-to-one relationship, i.e. for each unit increase in the value of x the value of y also increases by one unit. The four data points can be connected by a straight line without any modeling or calculation.

The model line in the graph was found using Microsoft Excel program's Trendline feature. As you see this line fits the data perfectly (Please notice the R2=1. R2 measures the 'strength' of the model, and in this case reaches its maximum value (one) for a perfect fit. We will discuss this measure later!). Please look at the animation below.

In essence, the above demonstrates simplistically what regression modeling is all about, i.e. trying to fit a line to a set of data points as 'perfectly' as possible. The equation of this line is then called the regression model. We say, that the regression model represents the 'average behavior' of the relationship between the dependent- and independent variables.

In general, if there is no relationship between the dependent- and independent variables, then we say that there is no regression. If changes in the independent variable (x) values have no impact to the values of the dependent variable (y), then there is no regression. In a regression model this can be seen as the line having no slope, i.e. a horizontal line with the slope coefficient equal to zero. There is another situation of no regression we need to mention, namely that the line becomes vertical, i.e. any infinitely small change in the value of x causes an infinitely large change in the value of y.

You may have guessed from the above, that as part of the modeling we will need to test if we have a situation of no regression, i.e. does the line have a (significant) slope. In simple linear regression we are looking for strong straight-line relationship between the variables with a significant line slope (i.e. a slope, which is neither zero (horizontal line) nor infinite (vertical line)).

It appears obvious, that we may have a problem if data points do not fall into a line. We will look at such a case next.