One-Way Analysis of Variance - Examples

The purpose of this example is to demonstrate ANOVA computations using several small data sets, and to help you gain insight and understanding into the basics of Analysis of Variance. You can easily recreate all examples using a statistical software of your choice.

The example uses seven data sets (labeled as DATA 1 through DATA 7) and the computational formulas presented in the section Computational Steps of One-way ANOVA.

The example is developed using Microsoft Excel's formula-, graphing- and modeling capabilities. Microsoft Excel Data Analysis Tool called ANOVA: Single Factor is used on the two last data sets (DATA 6 and DATA 7). The objective is to demonstrate both numerically and graphically

Comparison of Machine Performance

This example compares different machines producing an identical product or part. The machines are compared with respect to a single variable, production time. The goal is to determine if there is a statistically significant performance difference between the machines when looking at production time of a product or part only. To keep the example manageable only three machines are compared, with each machine making only four products or parts. The data are shown in three columns, labeled as m1, m2, m3 for machines 1, 2 and 3, and four rows, labeled as part 1, part 2, part 3, part 4. The animation below show the seven cases. Please watch and analyze the below animation!!!

DATA 1: No variability

This first data set (DATA 1) shows a situation, in which there is no variability. Each machine produces each part in exactly 10 seconds. The in-between treatment variability, measured by SSA (sum of squares among treatments), is zero suggesting that there is no treatment effect. The within treatment variability, measured by SSE (sum of squares of errors), is zero suggesting no production time change from part to part for a machine, or no error or randomness. The number of machines or treatments is represented by k, and n stands for the sample size, or number of units produced by a machine. Below the table a graphical representation of the data are given.

As you can see, the case of DATA 1 is practically meaningless for statistical analysis. If all measurements are exactly the same, and hence, performance is exactly identical, then there is no need for analysis. Practically, this kind of a situation would be highly unlikely.

DATA 2: Introducing Within Treatment Variability

DATA 2 show variability in production time within each production run. Machine 1 produced the first product in 10 seconds, the second product in seven seconds, the third and fourth products also in 10 and 7 seconds respectively. The average production time (or sample mean) is 8.50 seconds. As machines 2 and 3 show exactly the same results as machine 1 there is no variability from machine to machine, i.e. there is no in-between treatment variability, or no treatment effect. Therefore the total variability, as measured by SST, is equal to the within treatment variability or randomness, as measured by SSE. In other words, SST = SSE because SSA = 0.

This is another extreme case. No analysis is needed. The machines are identical.

DATA 3: Introducing In-between Treatment Variability

DATA 3 show no variability in production time within each production run. However, there is difference between Machine 1 (m1) and Machine 2 (m2), as well as between Machine 1 (m1) and Machine 3 (m3). Machine 2 (m2) and Machine 3 (m3) production times are identical. Machine 1 produced each part in exactly seven seconds, whereas Machines 2 and 3 needed exactly 10 seconds for each part respectively.

The average production time (or sample mean) is exactly 7.00 seconds for m1 with each sample measurement equal to the sample mean. The average production times for m2 and m3 are 10 seconds respectively with each sample measurement equal to the respective sample mean. As machines 2 and 3 show exactly the same results there is no variability between m2 and m3. However, there is in-between treatment variability, or treatment effect between m1 and the other two machines. Therefore, the total variability, as measured by SST, is equal to the in-between treatment variability, or treatment effect as measured by SSA. In other words, SST = SSA because SSE = 0.

This is another extreme case. No further analysis is needed. Machine 1 is faster than machines 2 and 3. Machines 2 and 3 are identical with respect to the variable of interest.

DATA 4 and DATA 5: Both Within and In-between Treatment Variability Present

DATA 4 and DATA 5 are modifications of DATA 1. These data represent situations, in which m1 has production times vary whereas m2 and m3 production times do not vary.

In the case of DATA 4 the measures of variability are as follows: SSA = 6.00, SSE = 9.00 and SST = 15.00. Most variability in DATA 4 appears to be from within treatment m1, because SSE > SSA. As the sample mean production time for m1 is 8.5 seconds, and for m2 and m3 10.0 seconds respectively, the question arises about the significance of the difference. Is m1 significantly faster than m2 and m3!!!

How about the case presented with DATA 5. Here SSA > SSE. Is m1 now significantly faster than m2 and m3!!!

We will address this crucial question next using DATA 6 and DATA 7.

DATA 6 and DATA 7: Both Within and In-between Treatment Variability Present

These two data sets represent more realistic situations. There is variability within and in-between each machine or treatment production times.

In DATA 6 m2 has the lowest mean production time of 5.5 seconds, followed by m1 with 6.4 seconds, and m3 6.68 seconds. Please note also, that m2 has the highest variability from run to run resulting to a within treatment variability of 0.69. Is m2 in this case a significantly faster machine!! Please note the sizes of the measures of variability: SSA, SSE, SST.

In DATA 7 m1 has the lowest mean production time of 5.56 seconds, followed by m2 with 8.80 seconds, and m3 8.58 seconds. Please note also, that m1 has the highest variability from run to run resulting to a within treatment variability of 1.18. Is m1 in this case a significantly faster machine!! Please note the sizes of the measures of variability: SSA, SSE, SST.

Let's take another look at the situations presented by DATA 6 and DATA 7. This time we will complete the hypothesis tests using the ANOVA: Single Factor feature of Microsoft Excel. As you can easily verify most of our earlier intermediate computations are shown in the ANOVA summary table. You could certainly also complete the tests from those intermediate computation results without using the ANOVA: Single Factor tool. Please recall that the degrees of freedom, or df are k-1 for SSA, k(n-1) for SSE and kn-1 for SST. Please study the animation below.!!

The hypothesis tests can be formulated as:

You should be able to conclude easily from the F-test for DATA 6 that, based on the sample information, m2 is not significantly faster. Fcalc = 3.94 < Fcrit = 4.26 suggests that the mean production times are not significantly different from each other.

You should also be able to conclude from the F-test for DATA 7 that, based on the sample information, m1 is significantly faster. Fcalc = 17.84 > Fcrit = 4.26 suggests that mean production times for at least two machines are significantly different from each other. Please note, that the test does not directly identify, which machine is the fastest, but suggests that there is significant difference. If you rerun the ANOVA without m1 you will identify that the Fcalc-value will drop dramatically. This change in the F-value points toward the m1 factor effect.

Note: One-way ANOVA and its extensions have found applications in most areas of business, engineering, health care and law. Please think about potential applications in your area of interest. Applications in sports: player selection; who is the best receiver, running back, or is there any significant difference, etc. (statistically, not just comparing absolute numbers, or averages). Think about applications in manufacturing: selection of machines, equipment, material, parts, or even suppliers. Consider healthcare applications: selection of laboratory equipment (measurement accuracy vs. cost vs. potential liability); comparison of treatments, effect of drugs, etc.