Review of Hypothesis Testing, Confidence Intervals and Correlation

Hypothesis Test for µ with Known

The hypothesis test of the true mean (µ) of a population uses the Central Limit Theorem and the normal distribution when either the population standard deviation sigma is known or the sample is sufficiently large (n>30).

Note: Don't get intimidated about formulae and notation like those above. Here is a common sense interpretation of the hypothesis test: Hypothesis Test Example - Ford vs. Firestone

Hypothesis Test for µ with Unknown

The hypothesis test of the true mean of a population µ uses the t-distribution when the population standard deviation sigma is unknown, and the sample is small (n<30) with the assumption that the underlying population distribution is approximately normal.

The corresponding confidence interval can be derived from the Test Statistic.

Hypothesis Test for µ12 with 1, 2 Known

The hypothesis test of equality of true population means, as measured by the difference between the means, uses the Central Limit Theorem and the normal distribution. It is also assumed that the two populations are independent.

The corresponding confidence interval can be derived from the Test Statistic.

Hypothesis Test for µ12 with 1 2 and Unknown

The hypothesis test for the equality of true population means, as measured by the difference between the means, uses the t-distribution with the assumption that the underlying population distributions are approximately normal, and that the two populations are independent. It is also assumed, even though the population variances and standard deviations are unknown, that they are not equal. The assumption of equality of the variances should be tested first using the F-test.

The corresponding confidence interval can be derived from the Test Statistic. Please read also the section t-test for independent samples in the StatSoft electronic textbook.

Hypothesis Test for µ12 with 1 = 2 and Unknown

The hypothesis test of equality of true population means, as measured by the difference between the means, uses the t-distribution with the assumption that the underlying population distributions are approximately normal, and that the two populations are independent. It is also assumed, even though the population variances and standard deviations are unknown, that they are equal. The assumption of equality of the variances should be tested first using the F-test.

The corresponding confidence interval can be derived from the Test Statistic. Please read also the section t-test for independent samples in the StatSoft electronic textbook.

Hypothesis Test for µAB with A, B and Unknown

The hypothesis test is used in before-after studies, i.e. paired observations. The test uses the t-distribution with the assumption that the underlying population distribution is approximately normal.

Hypothesis Test for p

This hypothesis test for the true population proportion p uses the Central Limit Theorem and approximates a binomial experiment using the normal distribution.

The corresponding confidence interval can be derived from the Test Statistic.

Hypothesis Test for p1-p2

This hypothesis test for the true difference between population proportions p1-p2 uses the Central Limit Theorem, and approximates a binomial experiment using the normal distribution.

Hypothesis Test for 2

This hypothesis test of a single population variance, or standard deviation, assumes that the underlying population distribution is approximately normal.

The corresponding confidence interval can be derived from the Test Statistic.

Hypothesis Test for 12/ 22

This hypothesis test of equality of two population variances uses the F-distribution. It is assumed that the two populations are independent and approximately normally distributed.

The corresponding confidence interval can be derived from the Test Statistic.

Test of Goodness-of-Fit

The test of goodness-of-fit is used to determine if a population or sample can be described by a theoretical distribution, or in other words, if a theoretical distribution fits a data set at a chosen level of significance. The goodness-of-fit test is a hypothesis test with a null hypothesis, H0, and an alternative hypothesis, H1, formulated as follows:

The test utilizes the Chi-Square- (2) distribution, because the sampling distribution of a random variable 2, calculated from the expected and observed frequencies, ei and oi respectively, is well approximated by that distribution. The random variable has the following form:

In general, it can be stated that, if the calculated 2calc-value is small then the fit is good, otherwise the fit is poor. The decision about a specific 2calc-value, whether to accept or reject the null hypothesis, is done by the comparison of 2calc-value and the theoretical 2-value value at a chosen level of significance . Such theoretical 2 values (called here 2critical, -value) are tabulated in most introductory statistics books, and are also available in statistical software programs (see e.g. StatSoft electronic textbook Statistical Tables. If the fit is good, the null hypothesis, H0, is accepted, indicating that the theoretical distribution describes the data at the chosen level of significance. For acceptance of the null hypothesis, H0, the 2calc-value has to fall into the acceptance region of the Chi-Square distribution. In this case 2calc < 2critical, . If 2calc > 2critical, then the null hypothesis H0 is not accepted. If 2calc is approximately equal to 2critical, then further sampling is recommended. For the 2critical, the degrees of freedom have to be determined first. The following table summarizes the degrees of freedom to be used for the 2critical, - values when fitting some common theoretical distributions.

Test of Independency of Variables

Test of independency of variables is a hypothesis test utilizing the Chi-Square - test procedure in a similar manner as discussed above. The test uses the, so called, contingency tables for record keeping. Contingency tables are used, for example, in before-after studies. The null hypothesis, H0, and alternative hypothesis, H1, are stated as follows:

The degrees of freedom, , for the 2-value are determined using

The 2calc is calculated as before using

In general, it can be stated that, if the calculated 2calc is small then there is support for independency of the variables, otherwise the variables may be dependent. The decision about a specific 2calc- value, whether to accept or reject the null hypothesis, is done by the comparison of 2calc and the theoretical 2 critical, - value at a chosen level of significance as before.