Pearson correlation
Compute Pearson correlation coefficients between two groups with p-value
Compute the Pearson correlation coefficient for pairwise samples, with its p-value.
The Pearson correlation coefficient measures the linear relationship between two datasets. The calculation of the p-value relies on the assumption that each dataset is normally distributed. The p-value returned is a two-sided p-value. Like other correlation coefficients, this ones vary between -1 and +1 with 0 implying no correlation. Correlations of -1 or +1 imply an exact linear relationship. Positive correlations imply that as x increases, so does y. Negative correlations imply that as x increases, y decreases.
- Input: a table containing the sample measurements, with the name of the samples.
- Output: a table listing the correlation coefficient, and its associated p-value for each pairwise comparison testing.
- Config Parameters:
preselected_column_names
: List of columns to pre-select for pairwise comparisons. By default a maximum pre-defined number of columns are selected (see configuration).reference_column
: If given, this reference column is compared against all the other columns.row_tag_key
: If give, this parameter is used for group-wise comparisons along row tags (see example below). This parameter is ignored of areference_column
is given.adjust_pvalue
:method
: The correction method for p-value adjustment in multiple testing.alpha
: The FWER, family-wise error rate. Default is 0.05.
Example 1: Direct column comparisons
Let's say you have the following table.
A | B | C |
---|---|---|
1 | 5 | 3 |
2 | 6 | 8 |
3 | 7 | 5 |
4 | 8 | 4 |
This task performs pairwise comparison of almost all the columns of the table
(the first 500
columns are pre-selected by default).
A
will be compared withB
and withC
, respectivelyB
will be compared withC
To only compare a given column with all the others, set the name of the reference_column
(a.k.a Reference column).
Suppose that B
is used as reference column,only the following comaprisons will be done:
B
versusA
B
versusC
It is also possible to perform comparison on a well-defined subset of the table by pre-selecting the columns of interest.
Parameter preselected_column_names
(a.k.a. Selected columns names) allows pre-selecting a subset of columns for analysis.
Example 2: Advanced comparisons along row tags using row_tag_key
parameter
In general, the table rows represent real-world observations (e.g. measured samples) and columns correspond to descriptors (a.k.a features or variables). Theses rows (samples) may therefore be related to metadata information given by row tags as follows:
row_tags | A | B | C |
---|---|---|---|
Gender : M Age : 10 |
1 | 5 | 3 |
Gender : F Age : 10 |
2 | 6 | 8 |
Gender : F Age : 10 |
3 | 7 | 5 |
Gender : M Age : 20 |
4 | 8 | 4 |
Actually, the column row_tags
does not really exist in the table. It is just to show here the tags of the rows
Here, the first row correspond to 10-years old male individuals.
In this this case, we may be interested in only comparing each columns along row metadata tags.
For instance, to compare Males (M)
versus Females (F)
of each columns separately, you can use the advance parameter row_tag_key
=Gender
.
For more details on the Pearson correlation coefficient, see https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.pearsonr.html.
Input
Output
Configuration
preselected_column_names
The names of column to pre-select for comparison. By default, the first 500 columns are used
List
-1
name
The name of the column(s) to pre-select
string
is_regex
Set True if it is a text pattern (regular expression), False otherwise
bool
reference_column
The column used as reference for pairwise comparison. Only this column is compared with the others.
string
row_tag_key
The key of the row tag (representing the group axis) along which one would like to compare each column. This parameter is not used if a `reference column` is given.
string
adjust_pvalue
Adjust p-values for multiple tests.
List
1
method
The method used to adjust (correct) p-values
string
bonferroni
fdr_bh
fdr_by
fdr_tsbh
fdr_tsbky
sidak
holm-sidak
holm
simes-hochberg
hommel
bonferroni
alpha
FWER, family-wise error rate. Default is 0.05
float
0.05