Test that two or more groups have the same population median
Compute the Kruskal-Wallis H-test for independent samples.
The Kruskal-Wallis H-test tests the null hypothesis that the population median of all of the groups are equal. It is a non-parametric version of ANOVA. The test works on 2 or more independent samples, which may have different sizes. Note that rejecting the null hypothesis does not indicate which of the groups differs. Post hoc comparisons between groups are required to determine which groups are different.
Note: due to the assumption that H has a chi square distribution, the number of samples in each group must not be too small. A typical rule is that each sample must have at least 5 measurements.
- Input: a table containing the sample measurements, with the name of the samples.
- Output: a table listing the correlation coefficient, and its associated p-value for each pairwise comparison testing.
- Config Parameters:
preselected_column_names
: List of columns to pre-select for pairwise comparisons. By default a maximum pre-defined number of columns are selected (see configuration).row_tag_key
: If give, this parameter is used for group-wise comparisons along row tags (see example below).
Example 1: Direct column comparisons
Let's say you have the following table.
A | B | C |
---|---|---|
1 | 5 | 3 |
2 | 6 | 8 |
3 | 7 | 5 |
4 | 8 | 4 |
This task performs population comparison of almost all the columns of the table
(the first 500
columns are pre-selected by default).
Example 2: Advanced comparisons along row tags using row_tag_key
parameter
In general, the table rows represent real-world observations (e.g. measured samples) and columns correspond to descriptors (a.k.a features or variables). Theses rows (samples) may therefore be related to metadata information given by row tags as follows:
row_tags | A | B | C |
---|---|---|---|
Gender : M Age : 10 |
1 | 5 | 3 |
Gender : F Age : 10 |
2 | 6 | 8 |
Gender : F Age : 10 |
8 | 7 | 5 |
Gender : X Age : 20 |
4 | 8 | 4 |
Gender : X Age : 10 |
2 | 7 | 5 |
Gender : M Age : 20 |
4 | 1 | 4 |
Actually, the column row_tags
does not really exist in the table. It is just to show here the tags of the rows
Here, the first row correspond to 10-years old male individuals.
In this this case, we may be interested in only comparing several columns along row metadata tags.
For instance, to compare gender populations M
, F
, X
for each columns separately, you can therefore use the advance parameter row_tag_key
=Gender
.
For more details, see https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kruskal.html
Input
Output
Configuration
preselected_column_names
The names of column to pre-select for comparison. By default, the first 500 columns are used
List
-1
name
The name of the column(s) to pre-select
string
is_regex
Set True if it is a text pattern (regular expression), False otherwise
bool
row_tag_key
The key of the row tag (representing the group axis) along which one would like to compare each column
string
adjust_pvalue
Adjust p-values for multiple tests. It is only used when the `row_tag_key` is set.
List
1
method
The method used to adjust (correct) p-values
string
bonferroni
alpha
FWER, family-wise error rate Default is 0.05
float
0.05