Review of Hypothesis Testing, Confidence Intervals and Correlation
Hypothesis Test for µ with
Known
The hypothesis test of the true mean (µ) of a population uses
the Central Limit Theorem and the normal distribution when either the population
standard deviation sigma
is known or the sample
is sufficiently large (n>30).
Note: Don't get intimidated about formulae and notation like those above. Here is a common sense interpretation of the hypothesis test: Hypothesis Test Example - Ford vs. Firestone
Hypothesis Test for µ with
Unknown
The hypothesis test of the true mean of a population µ
uses the t-distribution when the population standard deviation sigma
is unknown, and the
sample is small (n<30) with the assumption that the underlying
population distribution is approximately normal.
The corresponding confidence interval can be derived from the Test Statistic.
Hypothesis Test for µ1-µ2 with
1,
2 Known
The hypothesis test of equality of true population means, as measured by the difference between the means, uses the Central Limit Theorem and the normal distribution. It is also assumed that the two populations are independent.
The corresponding confidence interval can be derived from the Test Statistic.
Hypothesis Test for µ1-µ2 with
1
2 and Unknown
The hypothesis test for the equality of true population means, as measured by the difference between the means, uses the t-distribution with the assumption that the underlying population distributions are approximately normal, and that the two populations are independent. It is also assumed, even though the population variances and standard deviations are unknown, that they are not equal. The assumption of equality of the variances should be tested first using the F-test.
The corresponding confidence interval can be derived from the Test Statistic. Please read also the section t-test for independent samples in the StatSoft electronic textbook.
Hypothesis Test for µ1-µ2 with
1 =
2 and Unknown
The hypothesis test of equality of true population means, as measured by the difference between the means, uses the t-distribution with the assumption that the underlying population distributions are approximately normal, and that the two populations are independent. It is also assumed, even though the population variances and standard deviations are unknown, that they are equal. The assumption of equality of the variances should be tested first using the F-test.
The corresponding confidence interval can be derived from the Test Statistic. Please read also the section t-test for independent samples in the StatSoft electronic textbook.
Hypothesis Test for µA-µB
with
A,
B and Unknown
The hypothesis test is used in before-after studies, i.e. paired observations. The test uses the t-distribution with the assumption that the underlying population distribution is approximately normal.
Hypothesis Test for p
This hypothesis test for the true population proportion p uses the Central Limit Theorem and approximates a binomial experiment using the normal distribution.
The corresponding confidence interval can be derived from the Test Statistic.
Hypothesis Test for p1-p2
This hypothesis test for the true difference between population proportions p1-p2 uses the Central Limit Theorem, and approximates a binomial experiment using the normal distribution.
Hypothesis Test for
2
This hypothesis test of a single population variance, or standard deviation, assumes that the underlying population distribution is approximately normal.
The corresponding confidence interval can be derived from the Test Statistic.
Hypothesis Test for
12/
22
This hypothesis test of equality of two population variances uses the F-distribution. It is assumed that the two populations are independent and approximately normally distributed.
The corresponding confidence interval can be derived from the Test Statistic.
Test of Goodness-of-Fit
The test of goodness-of-fit is used to determine if a population or sample can be described by a theoretical distribution, or in other words, if a theoretical distribution fits a data set at a chosen level of significance. The goodness-of-fit test is a hypothesis test with a null hypothesis, H0, and an alternative hypothesis, H1, formulated as follows:
The test utilizes the Chi-Square-
(
2) distribution, because the sampling
distribution of a random variable
2,
calculated from the expected and observed
frequencies, ei and oi respectively, is well approximated
by that distribution. The random variable has the following form:
In general, it can be stated that, if the calculated
2calc-value is small
then the fit is good, otherwise the fit is poor. The decision
about a specific
2calc-value,
whether to accept or reject the null hypothesis,
is done by the comparison of
2calc-value
and the theoretical
2-value
value at a chosen level of significance
.
Such theoretical
2 values (called
here
2critical,
-value) are tabulated in
most introductory statistics books, and are also available in statistical software
programs (see e.g. StatSoft electronic textbook
Statistical Tables.
If the fit is good, the null hypothesis, H0, is accepted, indicating that
the theoretical distribution describes the data at the chosen level of significance.
For acceptance of the null hypothesis, H0, the
2calc-value has to fall
into the acceptance region of the Chi-Square
distribution. In this case
2calc
<
2critical,
.
If
2calc
>
2critical,
then the null hypothesis H0 is not accepted.
If
2calc
is approximately equal to
2critical,
then further sampling is recommended.
For the
2critical,
the degrees of freedom
have to be determined first.
The following table summarizes the degrees of freedom
to be used for the
2critical,
- values when fitting
some common theoretical distributions.
Test of Independency of Variables
Test of independency of variables is a hypothesis test utilizing the Chi-Square - test procedure in a similar manner as discussed above. The test uses the, so called, contingency tables for record keeping. Contingency tables are used, for example, in before-after studies. The null hypothesis, H0, and alternative hypothesis, H1, are stated as follows:
The degrees of freedom,
, for the
2-value are determined using
= (m-1)(n-1)
where
m = number of rows in the contingency table
n = number of columns in the contingency table
The
2calc
is calculated as before using
In general, it can be stated that, if the calculated
2calc
is small then there is support for independency of the variables, otherwise the
variables may be dependent. The decision about a specific
2calc- value,
whether to accept or reject the null hypothesis, is done by the comparison of
2calc
and the theoretical
2 critical,
- value at a chosen level
of significance
as before.