Simple (univariate) Linear Regression
Case 5: Strong vs. Weak Linear Regressions and Relationships - Here are three common situations, which you may encounter in practice, identified as Case 5A, Case 5B, and Case 5C. In all the cases the regression models were easily found using Microsoft Excel's Trendline feature. The cases are briefly discussed below:
Case 5A - There does not appear to be a strong relationship between the variables x1 and y1. As the values of x1 increase the values of y1 appear to increase and decrease almost randomly. The variables appear independent. Based on a visual analysis of the scatter plot this appears to be a very weak linear regression case. If you now look at the regression model and, in particular, the R2 = 0.0233, you can conclude that only about 2.3% of the variability in the data is explained by the model. This, in fact, is very weak. Based on what we see and know, we can conclude that, based on the data, there does not appear to be a relationship between the dependent and independent variables, and therefore there is no regression. Please see the animation above. (Note: We will discuss the R2-concept a little later).
Case 5B - In this case the relationship between the variables x1 and y1 appears to be quite strong. As the values of x1 increase the values of y1 decrease almost linearly. There is some randomness present, since the data points do not fall into one line. The variables appear clearly dependent on each other. Based on a visual analysis of the scatter plot there should be a strong linear regression with a negative slope coefficient (parameter associated with x1. If you now look at the regression model and, in particular, the R2 = 0.9475, you can conclude that only about 94.75% of the variability in the data is explained by the model. This, in fact, is quite strong. Based on what we see and know, we conclude that there is significant regression. Please see the animation above. (Note: We will discuss the R2-concept a little later).
Case 5C - There appears to be a strong relationship between the variables x1 and y1. However, this relationship is not linear. As the values of x1 increase the values of y1 first steadily increase and then steadily decrease. The y1 values peak at x1 = 5 The variables appear to be quite strongly dependent. Based on a visual analysis of the scatter plot the linear regression should be quite weak. If you now look at the regression model and, in particular, the R2 = 0.0229, you can conclude that only about 2.3% of the variability in the data is explained by the model. This, in fact, is very weak. Based on what we see and know, we conclude that the simple (univariate) linear regression model does not work in this case. However, another type of a regression model may be able to capture the pattern.
When you compare Cases 5A and 5C, you notice that the resulting linear models and R2 are very close to each other. However, as you see, these cases are very different. In Case 5A the variables appear to have no or very little relationship, but in Case 5C, the variables appear quite strongly (even though not linearly) related. Please see the animation above.
This shall serve as a caution for practice, i.e. we should not conclude independency (no association) of variables too quickly, and should also not conclude that a model does not exist for a given situation. Rather, we should generate several models, and assume rather, that relationships between variables may exist, but those relationships may not be linear. Note: In Module 3 we will explore modeling of Case 5C-type situations.