What is ANOVA?
ANOVA stands for Analysis of Variance. It is a statistical method for seeing how different several groups are on some quantitative measure. For example, we might want to know if marital status (single, married, cohabitating, divorced, widowed or separated, other) is related to weight. We could take a sample of people, ask their marital status, weigh them and then do an ANOVA to see how different the groups are.
Here weight is called the dependent variable and marital status the independent variable.
If there are more independent variables and they are all categorical, we get factorial ANOVA. For example, we might also want to look at sex in the above, since men generally weigh more than women.
If there are independent variables that are quantitative, then we get ANCOVA or Analysis of Covariance. We might, for example, want to include height in the above, since taller people tend to weigh more than shorter people.
When there are only two groups, the t-test is usually used, but a t-test is mathematically equivalent to an ANOVA with only two groups.
If ANOVA is about comparing means, why is it called analysis of variance?
That is a very good question! Fortunately, it has a good answer.
Suppose we gather the data for the simplest case above. Lets say we wind the weights of (say) 20 single people, 30 married people, 10 cohabitating people, and so on. We can then take the mean of each groups. The means will not be the same in the different groups. Nor will all people in any group weigh the same. That is, for example, not all single people weigh the same amount.
We have variation within each group, and we have variation between groups. One way of measuring variation is with a statistic called the variance. If most of the variance is within groups, then we cannot conclude that the groups are different with regard to weight. On the other hand, if most of the variance is between groups, that is evidence that the groups differ. How do we decide which variance is bigger, and what it means? We analyze it. That is, we perform an ANOVA, an analysis of variance.
ANOVA is the same as regression. Really!
Another common method of relating dependent variables to independent ones is regression. ANOVA, ANCOVA and regression are all the same, mathematically. Here is the one equation, in matrix algebra terms. It is the equation for ANOVA, ANCOVA and regression.
(skip if you want to ….)
If there are n subjects (e.g. people) and p variables (e.g. marital status, sex, height) then
Y + Xb + e
Where Y is a n x 1 vector of values on the dependent variable (weight)
X is an n x p matrix of values on the independent variables
b is a p x 1 vector of parameters to estimate
e is an n x 1 vector of errors
(end of math)
If ANOVA is the same as regression, why do we have both?
ANOVA and regression arose in different substantive areas, and were developed by different people. ANOVA mostly arose in the field of agriculture, where people were concerned with questions such as the effect of different fertilizers on crop yields. A key person in its development was Ronald Fisher. Another field was the food and beverage industry; the inventor of the t-test, William Gossett (aka Student) worked for Guinness Brewing. Regression started in geography, where people were trying to figure out exactly how big the Earth was, and one key figure was Gauss.
If ANOVA is the same as regression, why do the outputs look so different?
If you’ve done ANOVA and regression, or seen the output from programs that do them, or even read results in a paper, you’ll know that the output looks very different. That’s because they phrase the same question in different ways.
ANOVA is about partitioning the variance into within and between groups.
Regression is about finding an equation to relate the dependent variable to the independent variables.
Analysis of variance (ANOVA) is a method for determining whether groups are different on a quantitative variable. It is equivalent to regression, although the equivalence isn’t obvious.