An Introduction to ANOVA
Analysis of variance, or ANOVA, is a statistical technique to find if there is significant difference or change between means of two or many comparison groups.
There are various types of ANOVA tests, the common ones are: One-Way ANOVA, Two-Way ANOVA and N-Way ANOVA. One-Way ANOVA is used to test differences between groups based on one independent variable. Two-Way ANOVA is used when there are two independent variables. N-Way ANOVA, you can guess from its name, is used when there are more than two independent variables.
ANOVA tests the null hypothesis:
- H0: All the sample means between groups are equal
- H1: At least one of the sample means is different from other group
The formula for One-Way ANOVA is shown in ANOVA table as follows
| Source of Variation | Sum of Squares | Degree of Freedom | Mean Squares (MS) | F |
|---|---|---|---|---|
| Between | ||||
| Error | ||||
| Total |
- where
- is individual observation.
- is sample mean of the jth group.
- is overall sample mean.
- is the number of independent comparison groups.
- is the number of observations or sample size in jth group.
- is total number of observations or total sample size.
F is the ratio of the mean squares between groups (MSB) to the mean squares error (MSE). Another way to look at this is
- F = Variation of sample means between groups / Variation within samples
F ratio is used to determine if we shall accept or reject null hypothesis. Under the null hypothesis, the two variations are expected to be roughly equal which produces F-statistic close to 1. A larger F ratio indicates the variation between sample means is greater than the variation within the samples thus, an indication of the evidence that there is a difference between the group means.
There are three important ANOVA assumptions:
- Independent of observations: There is no hidden relationships among observations.
- Normally-distributed: The values of the dependent variable follow normal distribution.
- Homogeneity of variance: The variances among the comparison groups are same.
Under these assumptions, the F ratio follows F statistic distribution. With the distribution, we can calculate the probability of observing an F-statistic that is at least as high as the value we obtained. This probability is known as the p-value and is the probability that we reject the null hypothesis when it is true.
Usually we need a small p-value to safely reject the null hypothesis. A typical level used is 0.05, which means, on average, a 1 in 20 chance that we reject the null hypothesis when it is in fact true.
