10 MOST POPULAR STATISTICAL HYPOTHESIS TESTING METHODS USING PYTHON
Hypothesis testing is a critical tool in inferential statistics, for determining what the value of a population parameter could be. We often draw this conclusion based on a sample data analysis.
The basis of hypothesis testing has two attributes:
Null Hypothesis: H
Alternative Hypothesis: H1
Types of Hypothesis Testing Methods
a . Chi-Squared Test ( Correlation Tests )
b.Analysis of Variance Test (ANOVA) ( Parametric Statistical Hypothesis Tests)
c. Student’s t-test ( Parametric Statistical Hypothesis Tests)
d. Shapiro-Wilk Test ( Normality Tests)
e. D’Agostino’s K² Test ( Normality Tests)
f. Pearson’s Correlation Coefficient ( Correlation Tests )
g. Spearman’s Rank Correlation ( Correlation Tests )
h. Kruskal-Wallis H Test ( Nonparametric Statistical Hypothesis Tests)
i. Friedman Test ( Nonparametric Statistical Hypothesis Tests)
j. Mann-Whitney U Test ( Nonparametric Statistical Hypothesis Tests)
Chi-Squared Test
The chi-squared test is a well-known test even for those who are starting with statistical machine learning. Here, this test is used to check whether two categorical variables are related or independent. And, it is assumed that the observations used in the calculation of the contingency table are independent.
Tests whether two categorical variables are related or independent.
Assumptions
- Observations used in the calculation of the contingency table are independent.
- 25 or more examples in each cell of the contingency table.
Interpretation
- H0: the two samples are independent.
- H1: there is a dependency between the samples.
Python Code from
scipy.stats import chi2_contingency
table = …
stat, p, dof, expected = chi2_contingency(table)’
Example
Analysis of Variance Test (ANOVA)
ANOVA is another widely popular test that is used to test how independent two samples are of each other. Here the observations are assumed to follow a normal distribution without any change in the variance.
Tests whether the means of two or more independent samples are significantly different.
Assumptions
- Observations in each sample are independent and identically distributed (iid).
- Observations in each sample are normally distributed.
- Observations in each sample have the same variance.
Interpretation
- H0: the means of the samples are equal.
- H1: one or more of the means of the samples are unequal.
Python Code
from scipy.stats import f_oneway
data1, data2, … = …
stat, p = f_oneway(data1, data2, …)
Excample
Student’s t-test
Tests whether the means of two independent samples are significantly different.
Observations in each sample are independent and identically distributed (iid). Observations in each sample are normally distributed. Observations in each sample have the same variance.
Tests whether the means of two paired samples are significantly different.
Assumptions
- Observations in each sample are independent and identically distributed (iid).
- Observations in each sample are normally distributed.
- Observations in each sample have the same variance.
- Observations across each sample are paired.
Interpretation
- H0: the means of the samples are equal.
- H1: the means of the samples are unequal.
python Code
from scipy.stats import ttest_ind
data1, data2 = …
stat, p = ttest_ind(data1, data2)
Example
Shapiro-Wilk Test
This test is used to check whether the sample data has a Gaussian distribution.
Tests whether a data sample has a Gaussian distribution.
Assumptions
- Observations in each sample are independent and identically distributed (iid).
Interpretation
- H0: the sample has a Gaussian distribution.
- H1: the sample does not have a Gaussian distribution.
Python Code
from scipy.stats import shapiro
data1 = ….
stat, p = shapiro(data)
Example
D’Agostino’s K² Test
Similar to Shapiro-Wilk test, this too is used to check for Gaussian distribution in data samples.
Tests whether a data sample has a Gaussian distribution.
Assumptions
- Observations in each sample are independent and identically distributed (iid).
Interpretation
- H0: the sample has a Gaussian distribution.
- H1: the sample does not have a Gaussian distribution.
Python Code
from scipy.stats import normaltest
data1 = ….
stat, p = normaltest(data)
Example
Pearson’s Correlation Coefficient
A statistical test for checking correlation between two samples and whether they have a linear relationship.
Tests whether two samples have a linear relationship.
Assumptions
- Observations in each sample are independent and identically distributed (iid).
- Observations in each sample are normally distributed.
- Observations in each sample have the same variance.
Interpretation
- H0: the two samples are independent.
- H1: there is a dependency between the samples.
Python Code
from scipy.stats import pearsonr
data1, data2 = …
corr, p = pearsonr(data1, data2)
Spearman’s Rank Correlation
Observations in each sample are assumed that they can be ranked, for checking whether the relationship is monotonic or not.
Tests whether two samples have a monotonic relationship.
Assumptions
- Observations in each sample are independent and identically distributed (iid).
- Observations in each sample can be ranked.
Interpretation
- H0: the two samples are independent.
- H1: there is a dependency between the samples.
Python Code
from scipy.stats import spearmanr
data1, data2 = …
corr, p = spearmanr(data1, data2)
Mann-Whitney U Test
A non-parametric statistical hypothesis test to check for independent samples and to find whether the distributions are equal or not.
Tests whether the distributions of two independent samples are equal or not.
Assumptions
- Observations in each sample are independent and identically distributed (iid).
- Observations in each sample can be ranked.
Interpretation
- H0: the distributions of both samples are equal.
- H1: the distributions of both samples are not equal.
Python Code
from scipy.stats import mannwhitneyu
data1, data2 = ...
stat, p = mannwhitneyu(data1, data2)
Example
Kruskal-Wallis H Test
Like previous tests, Kruskal-Wallis hypothesis test also makes the same assumptions regarding the distribution and ranking of the observations in each sample. And, the test is carried to check for the independence of the observations from each other.
Tests whether the distributions of two or more independent samples are equal or not.
Assumptions
- Observations in each sample are independent and identically distributed (iid).
- Observations in each sample can be ranked.
Interpretation
- H0: the distributions of all samples are equal.
- H1: the distributions of one or more samples are not equal.
Python Code
from scipy.stats import kruskal
data1, data2, ... = ...
stat, p = kruskal(data1, data2, ...)
Example
Friedman Test
Friedman test checks whether the distributions of two or more paired samples are equal or not.
Tests whether the distributions of two or more paired samples are equal or not.
Assumptions
- Observations in each sample are independent and identically distributed (iid).
- Observations in each sample can be ranked.
- Observations across each sample are paired.
Interpretation
- H0: the distributions of all samples are equal.
- H1: the distributions of one or more samples are not equal.
Python Code
from scipy.stats import friedmanchisquare
data1, data2, ... = ...
stat, p = friedmanchisquare(data1, data2, ...)
Example