One-Way ANOVA
The one-way ANOVA tests whether the mean of some numeric variable differs across the levels of one categorical variable. It essentially answers the question: do any of the group means differ from one another? We won’t get into the details of carrying out an ANOVA by hand as it involves more calculations than the t-test, but the process is similar: you go through several calculations to arrive at a test statistic and then you compare the test statistic to a critical value based on a probability distribution. In the case of the ANOVA, you use the “f-distribution”.
The scipy library has a function for carrying out one-way ANOVA tests called scipy.stats.f_oneway(). Let’s generate some fake voter age and demographic data and use the ANOVA to compare average ages across the groups:
The test output yields an F-statistic of 1.774 and a p-value of 0.1317, indicating that there is no significant difference between the means of each group.
Now let’s make new age data where the group means do differ and run a second ANOVA:
The test result suggests the groups don’t have the same sample means in this case, since the p-value is significant at a 99% confidence level. We know that it is the white voters who differ because we set it up that way in the code, but when testing real data, you may not know which group(s) caused the test to throw a positive result. To check which groups differ after getting a positive ANOVA result, you can perform a follow up test or “post-hoc test”.
One post-hoc test is to perform a separate t-test for each pair of groups. You can perform a t-test between all pairs using by running each pair through the stats.ttest_ind() we covered in the lesson on t-tests:
The p-values for each pairwise t-test suggest mean of white voters is likely different from the other groups, since the p-values for each t-test involving the white group is below 0.05. Using unadjusted pairwise t-tests can overestimate significance, however, because the more comparisons you make, the more likely you are to come across an unlikely result due to chance. We can adjust for this multiple comparison problem by dividing the statistical significance level by the number of comparisons made. In this case, if we were looking for a significance level of 5%, we’d be looking for p-values of 0.05/10 = 0.005 or less. This simple adjustment for multiple comparisons is known as the Bonferroni correction.
The Bonferroni correction is a conservative approach to account for the multiple comparisons problem that may end up rejecting results that are actually significant. Another common post hoc-test is Tukey’s test. You can carry out Tukey’s test using the pairwise_tukeyhsd() function in the statsmodels.stats.multicomp library:
The new R-squared and lower RMSE suggest this is a better model than any we made previously and we wouldn’t be too concerned about over-fitting since it only includes 2 variables and 2 squared terms. Note that when working with multidimensional models, it becomes difficult to visualise results, so you rely heavily on numeric output.
We could continue adding more explanatory variables in an attempt to improve the model. Adding variables that have little relationship with the response or including variables that are too closely related to one another can hurt your results when using linear regression. You should also be wary of numeric variables that take on few unique values since they often act more like categorical variables than numeric ones.
Wrap Up
The ANOVA test lets us check whether a numeric response variable varies according to the levels of a categorical variable. Python’s scipy library makes it easy to perform an ANOVA without diving too deep into the details of the procedure.
Next time, we’ll move on from statistical inference to the final topic of this guide: predictive modelling.