10 MOST POPULAR STATISTICAL HYPOTHESIS TESTING METHODS USING PYTHON

Ajay Mane
6 min readDec 12, 2019

Hypothesis testing is a critical tool in inferential statistics, for determining what the value of a population parameter could be. We often draw this conclusion based on a sample data analysis.

The basis of hypothesis testing has two attributes:

Null Hypothesis: H

Alternative Hypothesis: H1

Types of Hypothesis Testing Methods

a . Chi-Squared Test ( Correlation Tests )

b.Analysis of Variance Test (ANOVA) ( Parametric Statistical Hypothesis Tests)

c. Student’s t-test ( Parametric Statistical Hypothesis Tests)

d. Shapiro-Wilk Test ( Normality Tests)

e. D’Agostino’s K² Test ( Normality Tests)

f. Pearson’s Correlation Coefficient ( Correlation Tests )

g. Spearman’s Rank Correlation ( Correlation Tests )

h. Kruskal-Wallis H Test ( Nonparametric Statistical Hypothesis Tests)

i. Friedman Test ( Nonparametric Statistical Hypothesis Tests)

j. Mann-Whitney U Test ( Nonparametric Statistical Hypothesis Tests)

Chi-Squared Test

The chi-squared test is a well-known test even for those who are starting with statistical machine learning. Here, this test is used to check whether two categorical variables are related or independent. And, it is assumed that the observations used in the calculation of the contingency table are independent.

Tests whether two categorical variables are related or independent.

Assumptions

  • Observations used in the calculation of the contingency table are independent.
  • 25 or more examples in each cell of the contingency table.

Interpretation

  • H0: the two samples are independent.
  • H1: there is a dependency between the samples.

Python Code from

scipy.stats import chi2_contingency

table = …
stat, p, dof, expected = chi2_contingency(table)’

Example

Analysis of Variance Test (ANOVA)

ANOVA is another widely popular test that is used to test how independent two samples are of each other. Here the observations are assumed to follow a normal distribution without any change in the variance.

Tests whether the means of two or more independent samples are significantly different.

Assumptions

  • Observations in each sample are independent and identically distributed (iid).
  • Observations in each sample are normally distributed.
  • Observations in each sample have the same variance.

Interpretation

  • H0: the means of the samples are equal.
  • H1: one or more of the means of the samples are unequal.

Python Code

from scipy.stats import f_oneway
data1, data2, … = …
stat, p = f_oneway(data1, data2, …)

Excample

Student’s t-test

Tests whether the means of two independent samples are significantly different.

Observations in each sample are independent and identically distributed (iid). Observations in each sample are normally distributed. Observations in each sample have the same variance.

Tests whether the means of two paired samples are significantly different.

Assumptions

  • Observations in each sample are independent and identically distributed (iid).
  • Observations in each sample are normally distributed.
  • Observations in each sample have the same variance.
  • Observations across each sample are paired.

Interpretation

  • H0: the means of the samples are equal.
  • H1: the means of the samples are unequal.

python Code

from scipy.stats import ttest_ind
data1, data2 = …
stat, p = ttest_ind(data1, data2)

Example

Shapiro-Wilk Test

This test is used to check whether the sample data has a Gaussian distribution.

Tests whether a data sample has a Gaussian distribution.

Assumptions

  • Observations in each sample are independent and identically distributed (iid).

Interpretation

  • H0: the sample has a Gaussian distribution.
  • H1: the sample does not have a Gaussian distribution.

Python Code

from scipy.stats import shapiro
data1 = ….
stat, p = shapiro(data)

Example

D’Agostino’s K² Test

Similar to Shapiro-Wilk test, this too is used to check for Gaussian distribution in data samples.

Tests whether a data sample has a Gaussian distribution.

Assumptions

  • Observations in each sample are independent and identically distributed (iid).

Interpretation

  • H0: the sample has a Gaussian distribution.
  • H1: the sample does not have a Gaussian distribution.

Python Code

from scipy.stats import normaltest
data1 = ….
stat, p = normaltest(data)

Example

Pearson’s Correlation Coefficient

A statistical test for checking correlation between two samples and whether they have a linear relationship.

Tests whether two samples have a linear relationship.

Assumptions

  • Observations in each sample are independent and identically distributed (iid).
  • Observations in each sample are normally distributed.
  • Observations in each sample have the same variance.

Interpretation

  • H0: the two samples are independent.
  • H1: there is a dependency between the samples.

Python Code

from scipy.stats import pearsonr
data1, data2 = …
corr, p = pearsonr(data1, data2)

Spearman’s Rank Correlation

Observations in each sample are assumed that they can be ranked, for checking whether the relationship is monotonic or not.

Tests whether two samples have a monotonic relationship.

Assumptions

  • Observations in each sample are independent and identically distributed (iid).
  • Observations in each sample can be ranked.

Interpretation

  • H0: the two samples are independent.
  • H1: there is a dependency between the samples.

Python Code

from scipy.stats import spearmanr
data1, data2 = …
corr, p = spearmanr(data1, data2)

Mann-Whitney U Test

A non-parametric statistical hypothesis test to check for independent samples and to find whether the distributions are equal or not.

Tests whether the distributions of two independent samples are equal or not.

Assumptions

  • Observations in each sample are independent and identically distributed (iid).
  • Observations in each sample can be ranked.

Interpretation

  • H0: the distributions of both samples are equal.
  • H1: the distributions of both samples are not equal.

Python Code

from scipy.stats import mannwhitneyu
data1, data2 = ...
stat, p = mannwhitneyu(data1, data2)

Example

Kruskal-Wallis H Test

Like previous tests, Kruskal-Wallis hypothesis test also makes the same assumptions regarding the distribution and ranking of the observations in each sample. And, the test is carried to check for the independence of the observations from each other.

Tests whether the distributions of two or more independent samples are equal or not.

Assumptions

  • Observations in each sample are independent and identically distributed (iid).
  • Observations in each sample can be ranked.

Interpretation

  • H0: the distributions of all samples are equal.
  • H1: the distributions of one or more samples are not equal.

Python Code

from scipy.stats import kruskal
data1, data2, ... = ...
stat, p = kruskal(data1, data2, ...)

Example

Friedman Test

Friedman test checks whether the distributions of two or more paired samples are equal or not.

Tests whether the distributions of two or more paired samples are equal or not.

Assumptions

  • Observations in each sample are independent and identically distributed (iid).
  • Observations in each sample can be ranked.
  • Observations across each sample are paired.

Interpretation

  • H0: the distributions of all samples are equal.
  • H1: the distributions of one or more samples are not equal.

Python Code

from scipy.stats import friedmanchisquare
data1, data2, ... = ...
stat, p = friedmanchisquare(data1, data2, ...)

Example

--

--