10 MOST POPULAR STATISTICAL HYPOTHESIS TESTING METHODS USING PYTHON

6 min readDec 12, 2019

Hypothesis testing is a critical tool in inferential statistics, for determining what the value of a population parameter could be. We often draw this conclusion based on a sample data analysis.

The basis of hypothesis testing has two attributes:

Null Hypothesis: H

Alternative Hypothesis: H1

Types of Hypothesis Testing Methods

a . Chi-Squared Test ( Correlation Tests )

b.Analysis of Variance Test (ANOVA) ( Parametric Statistical Hypothesis Tests)

c. Student’s t-test ( Parametric Statistical Hypothesis Tests)

d. Shapiro-Wilk Test ( Normality Tests)

e. D’Agostino’s K² Test ( Normality Tests)

f. Pearson’s Correlation Coefficient ( Correlation Tests )

g. Spearman’s Rank Correlation ( Correlation Tests )

h. Kruskal-Wallis H Test ( Nonparametric Statistical Hypothesis Tests)

i. Friedman Test ( Nonparametric Statistical Hypothesis Tests)

j. Mann-Whitney U Test ( Nonparametric Statistical Hypothesis Tests)

Chi-Squared Test

The chi-squared test is a well-known test even for those who are starting with statistical machine learning. Here, this test is used to check whether two categorical variables are related or independent. And, it is assumed that the observations used in the calculation of the contingency table are independent.

Tests whether two categorical variables are related or independent.

Assumptions

Observations used in the calculation of the contingency table are independent.
25 or more examples in each cell of the contingency table.

Interpretation

H0: the two samples are independent.
H1: there is a dependency between the samples.

Python Code from

scipy.stats import chi2_contingency

table = …
stat, p, dof, expected = chi2_contingency(table)’

Example

Analysis of Variance Test (ANOVA)

ANOVA is another widely popular test that is used to test how independent two samples are of each other. Here the observations are assumed to follow a normal distribution without any change in the variance.

Tests whether the means of two or more independent samples are significantly different.

Assumptions

Observations in each sample are independent and identically distributed (iid).
Observations in each sample are normally distributed.
Observations in each sample have the same variance.

Interpretation

H0: the means of the samples are equal.
H1: one or more of the means of the samples are unequal.

Python Code

from scipy.stats import f_oneway
data1, data2, … = …
stat, p = f_oneway(data1, data2, …)

Excample

Student’s t-test

Tests whether the means of two independent samples are significantly different.

Observations in each sample are independent and identically distributed (iid). Observations in each sample are normally distributed. Observations in each sample have the same variance.

Tests whether the means of two paired samples are significantly different.

Assumptions

Observations in each sample are independent and identically distributed (iid).
Observations in each sample are normally distributed.
Observations in each sample have the same variance.
Observations across each sample are paired.

Interpretation

H0: the means of the samples are equal.
H1: the means of the samples are unequal.

python Code

from scipy.stats import ttest_ind
data1, data2 = …
stat, p = ttest_ind(data1, data2)

Example

Shapiro-Wilk Test

This test is used to check whether the sample data has a Gaussian distribution.

Tests whether a data sample has a Gaussian distribution.

Assumptions

Observations in each sample are independent and identically distributed (iid).

Interpretation

H0: the sample has a Gaussian distribution.
H1: the sample does not have a Gaussian distribution.

Python Code

from scipy.stats import shapiro
data1 = ….
stat, p = shapiro(data)

Example

D’Agostino’s K² Test

Similar to Shapiro-Wilk test, this too is used to check for Gaussian distribution in data samples.

Tests whether a data sample has a Gaussian distribution.

Assumptions

Observations in each sample are independent and identically distributed (iid).

Interpretation

H0: the sample has a Gaussian distribution.
H1: the sample does not have a Gaussian distribution.

Python Code

from scipy.stats import normaltest
data1 = ….
stat, p = normaltest(data)

Example

Pearson’s Correlation Coefficient

A statistical test for checking correlation between two samples and whether they have a linear relationship.

Tests whether two samples have a linear relationship.

Assumptions

Observations in each sample are independent and identically distributed (iid).
Observations in each sample are normally distributed.
Observations in each sample have the same variance.

Interpretation

H0: the two samples are independent.
H1: there is a dependency between the samples.

Python Code

from scipy.stats import pearsonr
data1, data2 = …
corr, p = pearsonr(data1, data2)

Spearman’s Rank Correlation

Observations in each sample are assumed that they can be ranked, for checking whether the relationship is monotonic or not.

Tests whether two samples have a monotonic relationship.

Assumptions

Observations in each sample are independent and identically distributed (iid).
Observations in each sample can be ranked.

Interpretation

H0: the two samples are independent.
H1: there is a dependency between the samples.

Python Code

from scipy.stats import spearmanr
data1, data2 = …
corr, p = spearmanr(data1, data2)

Mann-Whitney U Test

A non-parametric statistical hypothesis test to check for independent samples and to find whether the distributions are equal or not.

Tests whether the distributions of two independent samples are equal or not.

Assumptions

Observations in each sample are independent and identically distributed (iid).
Observations in each sample can be ranked.

Interpretation

H0: the distributions of both samples are equal.
H1: the distributions of both samples are not equal.

Python Code

from scipy.stats import mannwhitneyu data1, data2 = ... stat, p = mannwhitneyu(data1, data2)

Example

Kruskal-Wallis H Test

Like previous tests, Kruskal-Wallis hypothesis test also makes the same assumptions regarding the distribution and ranking of the observations in each sample. And, the test is carried to check for the independence of the observations from each other.

Tests whether the distributions of two or more independent samples are equal or not.

Assumptions

Observations in each sample are independent and identically distributed (iid).
Observations in each sample can be ranked.

Interpretation

H0: the distributions of all samples are equal.
H1: the distributions of one or more samples are not equal.

Python Code

from scipy.stats import kruskal data1, data2, ... = ... stat, p = kruskal(data1, data2, ...)

Example

Friedman Test

Friedman test checks whether the distributions of two or more paired samples are equal or not.

Tests whether the distributions of two or more paired samples are equal or not.

Assumptions

Observations in each sample are independent and identically distributed (iid).
Observations in each sample can be ranked.
Observations across each sample are paired.

Interpretation

H0: the distributions of all samples are equal.
H1: the distributions of one or more samples are not equal.

Python Code

from scipy.stats import friedmanchisquare data1, data2, ... = ... stat, p = friedmanchisquare(data1, data2, ...)

Example

10 MOST POPULAR STATISTICAL HYPOTHESIS TESTING METHODS USING PYTHON

Chi-Squared Test

Analysis of Variance Test (ANOVA)

Student’s t-test

Shapiro-Wilk Test

D’Agostino’s K² Test

Pearson’s Correlation Coefficient

Spearman’s Rank Correlation

Mann-Whitney U Test

Kruskal-Wallis H Test

Friedman Test

Written by Ajay Mane