Typing name : TASK.gws_stats.BasePopulationStatsTask

Brick : gws_stats

BasePopulationStatsTask

Performs comparison of multiple columns of a table

Input: a table containing the sample measurements, with the name of the samples.
Output: a table listing the correlation coefficient, and its associated p-value for each pairwise comparison testing.
Config Parameters:
- preselected_column_names: List of columns to pre-select for pairwise comparisons. By default a maximum pre-defined number of columns are selected (see configuration).
- row_tag_key: If give, this parameter is used for group-wise comparisons along row tags (see example below). This parameter is ignored of a reference_column is given.

Example 1: Direct column comparisons

Let's say you have the following table.

A	B	C
1	5	3
2	6	8
3	7	5
4	8	4

This task performs population comparison of almost all the columns of the table (the first 500 columns are pre-selected by default).

Example 2: Advanced comparisons along row tags using `row_tag_key` parameter

In general, the table rows represent real-world observations (e.g. measured samples) and columns correspond to descriptors (a.k.a features or variables). Theses rows (samples) may therefore be related to metadata information given by row tags as follows:

row_tags	A	B	C
Gender : M Age : 10	1	5	3
Gender : F Age : 10	2	6	8
Gender : F Age : 10	8	7	5
Gender : X Age : 20	4	8	4
Gender : X Age : 10	2	7	5
Gender : M Age : 20	4	1	4

Actually, the column row_tags does not really exist in the table. It is just to show here the tags of the rows Here, the first row correspond to 10-years old male individuals. In this this case, we may be interested in only comparing several columns along row metadata tags. For instance, to compare gender populations M, F, X for each columns separately, you can therefore use the advance parameter row_tag_key=Gender.

Input

Table

The input table

Table

Output

Result

The output result

Base population stats result

Configuration

preselected_column_names

Optional

The names of column to pre-select for comparison. By default, the first 500 columns are used

Type : List

Maximum occurrences number : -1

name

Optional

The name of the column(s) to pre-select

Type : string

is_regex

Optional

Set True if it is a text pattern (regular expression), False otherwise

Type : bool

row_tag_key

Optional

Advanced parameter

The key of the row tag (representing the group axis) along which one would like to compare each column

Type : string

adjust_pvalue

Optional

Advanced parameter

Adjust p-values for multiple tests. It is only used when the `row_tag_key` is set.

Type : List

Maximum occurrences number : 1

method

Optional

Advanced parameter

The method used to adjust (correct) p-values

Type : string

Default value : bonferroni

alpha

Optional

Advanced parameter

FWER, family-wise error rate Default is 0.05

Type : float

Default value : 0.05

gws_stats

Base population stats task

Example 1: Direct column comparisons

Example 2: Advanced comparisons along row tags using row_tag_key parameter

Input

Output

Configuration

Example 2: Advanced comparisons along row tags using `row_tag_key` parameter