Review of Hypothesis Testing, Confidence Intervals and Correlation

Example - Hypothesis Testing

Note: Data in this example are purely fictitious. The intent is to demonstrate the concept of hypothesis testing only. The Ford/Firestone tire recall case is actual. It has caused significant legal issues and business losses to both companies. Please search the web for information about the Ford/Firestone tire recall case.

Firestone tire manufacturer claims that the mean length of life, in terms of miles driven before the tires are considered to fail the minimum safety standards, of the Firestone Wilderness brand tires is 35,000 miles. (This is your null hypothesis, see column one above). Suppose, that you collect tire failure data from 100 tires of that specific brand, and want to determine if your sample data support the tire manufacturers claim. (This will give you your alternate hypothesis, column two above).

Let's look at a few different scenarios.

First, suppose, that your sample mean (or average) length of life (in terms of miles driven as above) of the 100 sample tires turns out to be also 35,000 miles. Now, using common sense, and without calculating anything, when you compare the sample mean (35,000 miles) and the manufacturer claim (35,000 miles), they are identical. Therefore, you might conclude that at least your sample does not appear to suggest anything different, and that, possibly the manufacturer's claim is true. (Now, if you look at the formula in the third column titled Test Statistic, the nominator forms a difference between the sample mean and the claimed mean. In this example case we have 35,000 miles-35,000 miles = 0 miles, or no difference between the sample mean and the claimed mean).

Secondly, suppose now, that your sample mean (or average) length of life (in terms of miles driven as above) of the 100 sample tires turns out to be 34,990 miles. You notice, that now the sample mean is lower than the manufacturer's claim (34,990 miles-35,000 miles = -10 miles) by a whole 10 miles. Most people would conclude here that, in this case, a difference of 10 miles is really not that great, and that possibly the manufacturer's claim is accurate.

Thirdly, how about if the difference between the sample mean and the claimed mean is, say, 1,000 miles or 2,000 miles or 5,000 miles or more. What would you conclude then!

Some people might say, that e.g. "If the difference between the sample and the claim is 5,000 miles or more, then the difference is too large, and we cannot believe the manufacturer's claim". The main weakness in this approach is the selection of an arbitrary number 5,000.

Others might suggest to collect more samples of 100 tires and see how the sample means from all samples compare to the claim and to each other. Suppose, that a total of 50 samples were collected, and 50 sample means were calculated. Suppose first, that all those sample means fell between 33,000 and 36,000 miles. Using common sense again, many would conclude here that the range of sample means includes the manufacturer's claimed mean of 35,000 miles, and therefore there seems to be support for the claim. However, and on the other hand, suppose, that the range of the sample means is from 29,000 miles to 31,000 miles. In this case, many would conclude that the claim may be false, as the range does not include the manufacturer's claimed mean.

This latter approach, as a matter of fact, is already hypothesis testing without fancy formulae. From now on, please keep in mind, that this is what all hypothesis testing is, i.e. make a claim and using sample information try either to support or reject the claim. In general, this approach will give you satisfactory answers in most practical cases. The bad news is that in most practical situations we may not have the resources, time and/or data available.

In hypothesis testing we, generally speaking, study such questions as 'what size of a difference can be considered significant enough to reject a claim, or how much room for variability can we allow around a claim'. The answer is, that it depends.

The theoretical approach, which generalizes and simplifies this hypothesis test, is based on the probability distribution or sampling distribution of the random variable (called the test statistic) which 'relates' the sample mean (which is also a random variable, because its value changes from sample to sample), the claimed mean (which is a constant or parameter to be determined from the population), population standard deviation (which is constant or parameter to be determined from the population) and sample size (which is selected by the analyst). In this case it can be shown that the sampling distribution of the test statistic is the normal distribution. The test statistic formula standardizes the sample mean variable to one single scale and one single new variable (in this case called the standardized normal random variable or Z for simplicity and convenience.

We already studied the nominator of the test statistic formula, and noticed that it is simply the difference between the sample mean and the claimed mean. Now, look at the denominator of the formula. This denominator is the standard deviation of the sampling distribution, called also the standard error. You notice, that this standard error itself is a ratio of the population standard deviation and the sample size. You can see that if the standard error increases then the value of the test statistic decreases, because we are dividing a number by a larger and larger number. Clearly, the opposite is also true, i.e. if the standard error decreases then the value of the test statistic increases, because we are dividing a number by a smaller and smaller number. You can easily see also, that the magnitude of the standard error is related to the sample size. The standard error will decrease when the sample size increases and vice versa.

When conducting a hypothesis test, in order to simplify decision making and reduce guessing, we form a probability- or confidence interval around the test statistic using the properties of the sampling distribution. Alternatively, we simply evaluate the value of the test statistic and compare it to a standardized confidence interval from the theoretical sampling distribution. The table above (see column 4, titled Critical Region) establishes the interval to which the value of the test statistic is compared. The critical region refers to the interval or region outside of the confidence interval. If the test statistic value falls inside the boundary values of the confidence interval (and not in the critical region) then the test supports the claim. Note, that the test may also be a so called one-sided test, in which case only one boundary value is used (upper-tail or lower-tail). Those boundary values are obtained from the theoretical distribution by assigning a large probability inside the interval, because e.g. 'we want to be quite certain, say 95%, in our decision to support or to reject a claim'. Please note, that there is no theoretical difference between these various versions of the test.

Let's use some numbers to illustrate the example: