Analysis of Variance (ANOVA)
Overview of ANOVA
Hypothesis testing in Module 1 covered cases with at most two populations. For example, we conducted tests on the equality of population means (t-test) and on the equality of population variances (F-test).
In this Module we will expand hypothesis testing to more than two populations. Suppose, for example, that you would like to test the hypothesis that three population means are equal against the hypothesis that at least two of those population means are not equal. You could certainly conduct multiple t-tests, i.e. between all possible pairs. However, if there are several populations (say k), which you would like to compare, then performing t-tests on all possible pairs becomes a combinatorial problem. This is still doable. If there are, say, sub-groups within populations and interaction between those subgroups, then that approach becomes at some point impossible. Analysis of Variance is of great help in expanding hypothesis testing to multiple populations.
The term Analysis of Variance (ANOVA) refers to a technique in which total variability in the data is divided into components of variability, each of which can be attributed to specific distinct sources of variation. In a One-way ANOVA model there is thought to be one factor or source of variation in addition to randomness, which may cause the population means to be unequal. The test is then to determine whether the contribution of this one factor is significant enough to make the population means significantly different.
One of the first, if not the first, applications of Analysis of Variance was in agriculture. The goal there was to determine if one soil fertilizer treatment resulted into better growth of crop than other fertilizer treatments when applied under the same environmental conditions to the same type of soil. For example, if the tested fertilizers were equal in terms of their impact on growth, then the treatment effect of a fertilizer was not significant. In other words, a specific fertilizer was not significantly better or worse than the overall average of all fertilizers. Possibly, because of this early application of ANOVA in farming, the term treatment is commonly accepted and used to identify the population of concern regardless of the area of application.
Learning Objectives
When you have completed this Module you should be able to