Login
Back to bricks list
Introduction Version

Base population stats task

TASK
Typing name :  TASK.gws_stats.BasePopulationStatsTask Brick :  gws_stats

BasePopulationStatsTask

Performs comparison of multiple columns of a table

  • Input: a table containing the sample measurements, with the name of the samples.
  • Output: a table listing the correlation coefficient, and its associated p-value for each pairwise comparison testing.
  • Config Parameters:
    • preselected_column_names: List of columns to pre-select for pairwise comparisons. By default a maximum pre-defined number of columns are selected (see configuration).
    • row_tag_key: If give, this parameter is used for group-wise comparisons along row tags (see example below). This parameter is ignored of a reference_column is given.

Example 1: Direct column comparisons

Let's say you have the following table.

A B C
1 5 3
2 6 8
3 7 5
4 8 4

This task performs population comparison of almost all the columns of the table (the first 500 columns are pre-selected by default).

Example 2: Advanced comparisons along row tags using row_tag_key parameter

In general, the table rows represent real-world observations (e.g. measured samples) and columns correspond to descriptors (a.k.a features or variables). Theses rows (samples) may therefore be related to metadata information given by row tags as follows:

row_tags A B C
Gender : M
Age : 10
1 5 3
Gender : F
Age : 10
2 6 8
Gender : F
Age : 10
8 7 5
Gender : X
Age : 20
4 8 4
Gender : X
Age : 10
2 7 5
Gender : M
Age : 20
4 1 4

Actually, the column row_tags does not really exist in the table. It is just to show here the tags of the rows Here, the first row correspond to 10-years old male individuals. In this this case, we may be interested in only comparing several columns along row metadata tags. For instance, to compare gender populations M, F, X for each columns separately, you can therefore use the advance parameter row_tag_key=Gender.

Input

Table
The input table

Output

Result
The output result

Configuration

preselected_column_names

Optional

The names of column to pre-select for comparison. By default, the first 500 columns are used

Type : ListMaximum occurrences number : -1

name

Optional

The name of the column(s) to pre-select

Type : string

is_regex

Optional

Set True if it is a text pattern (regular expression), False otherwise

Type : bool

row_tag_key

OptionalAdvanced parameter

The key of the row tag (representing the group axis) along which one would like to compare each column

Type : string

adjust_pvalue

OptionalAdvanced parameter

Adjust p-values for multiple tests. It is only used when the `row_tag_key` is set.

Type : ListMaximum occurrences number : 1

method

OptionalAdvanced parameter

The method used to adjust (correct) p-values

Type : stringAllowed values : bonferroni  fdr_bh  fdr_by  fdr_tsbh  fdr_tsbky  sidak  holm-sidak  holm  simes-hochberg  hommel  Default value : bonferroni

alpha

OptionalAdvanced parameter

FWER, family-wise error rate Default is 0.05

Type : floatDefault value : 0.05