# Pearson correlation

Compute Pearson correlation coefficients between two groups with p-value

Compute the Pearson correlation coefficient for pairwise samples, with its p-value.

The Pearson correlation coefficient measures the linear relationship between two datasets. The calculation of the p-value relies on the assumption that each dataset is normally distributed. The p-value returned is a two-sided p-value. Like other correlation coefficients, this ones vary between -1 and +1 with 0 implying no correlation. Correlations of -1 or +1 imply an exact linear relationship. Positive correlations imply that as x increases, so does y. Negative correlations imply that as x increases, y decreases.

- Input: a table containing the sample measurements, with the name of the samples.
- Output: a table listing the correlation coefficient, and its associated p-value for each pairwise comparison testing.
- Config Parameters:
`preselected_column_names`

: List of columns to pre-select for pairwise comparisons. By default a maximum pre-defined number of columns are selected (see configuration).`reference_column`

: If given, this reference column is compared against all the other columns.`row_tag_key`

: If give, this parameter is used for group-wise comparisons along row tags (see example below). This parameter is ignored of a`reference_column`

is given.`adjust_pvalue`

:`method`

: The correction method for p-value adjustment in multiple testing.`alpha`

: The FWER, family-wise error rate. Default is 0.05.

# Example 1: Direct column comparisons

Let's say you have the following table.

A | B | C |
---|---|---|

1 | 5 | 3 |

2 | 6 | 8 |

3 | 7 | 5 |

4 | 8 | 4 |

This task performs pairwise comparison of almost all the columns of the table
(the first `500`

columns are pre-selected by default).

`A`

will be compared with`B`

and with`C`

, respectively`B`

will be compared with`C`

To only compare a given column with all the others, set the name of the `reference_column`

(a.k.a Reference column).
Suppose that `B`

is used as reference column,only the following comaprisons will be done:

`B`

versus`A`

`B`

versus`C`

It is also possible to perform comparison on a well-defined subset of the table by pre-selecting the columns of interest.
Parameter `preselected_column_names`

(a.k.a. Selected columns names) allows pre-selecting a subset of columns for analysis.

# Example 2: Advanced comparisons along row tags using `row_tag_key`

parameter

In general, the table rows represent real-world observations (e.g. measured samples) and columns correspond to descriptors (a.k.a features or variables). Theses rows (samples) may therefore be related to metadata information given by row tags as follows:

row_tags | A | B | C |
---|---|---|---|

Gender : M Age : 10 |
1 | 5 | 3 |

Gender : F Age : 10 |
2 | 6 | 8 |

Gender : F Age : 10 |
3 | 7 | 5 |

Gender : M Age : 20 |
4 | 8 | 4 |

Actually, the column `row_tags`

does not really exist in the table. It is just to show here the tags of the rows
Here, the first row correspond to 10-years old male individuals.
In this this case, we may be interested in only comparing each columns along row metadata tags.
For instance, to compare `Males (M)`

versus `Females (F)`

of each columns separately, you can use the advance parameter `row_tag_key`

=`Gender`

.

For more details on the Pearson correlation coefficient, see https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.pearsonr.html.

### Input

### Output

### Configuration

preselected_column_names

The names of column to pre-select for comparison. By default, the first 500 columns are used

`List`

`-1`

name

The name of the column(s) to pre-select

`string`

is_regex

Set True if it is a text pattern (regular expression), False otherwise

`bool`

reference_column

The column used as reference for pairwise comparison. Only this column is compared with the others.

`string`

row_tag_key

The key of the row tag (representing the group axis) along which one would like to compare each column. This parameter is not used if a `reference column` is given.

`string`

adjust_pvalue

Adjust p-values for multiple tests.

`List`

`1`

method

The method used to adjust (correct) p-values

`string`

`bonferroni`

`fdr_bh`

`fdr_by`

`fdr_tsbh`

`fdr_tsbky`

`sidak`

`holm-sidak`

`holm`

`simes-hochberg`

`hommel`

`bonferroni`

alpha

FWER, family-wise error rate. Default is 0.05

`float`

`0.05`