Introduction
STATS is dedicated to the statistical analysis of your data. We offer here the most widely used statistical methods for biological data analysis. It is a collection of ready-to-use and customizable tools for the statistical analysis of data. It offers the most widely used statistical methods for biological data analysis, from descriptive to parametric and non-parametric inference statistics, to quantitatively assess whether your biological data supports your hypothesis.
Why do we use statistics
Statistical knowledge helps you use the proper methods to collect data, employ the correct analyses, and effectively present the results. Statistics is a crucial process behind how we make discoveries in science, make decisions based on data, and make predictions. The issues of statistical applications and interpretations have been identified as one of the leading contributors to current research irreproducibility [1] and many suggestions have been put forth [2]. Since not all biologists are familiar with statistics, we aim at bridging the gap between data generation and data analysis to extract insightful conclusions.
Descriptive versus inference statistics
Descriptive statistics
provide simple summaries about the observations that have been made. Such summaries may be either quantitative, or visual. Data can be represented through tables or graphical representation, such as line charts, bar charts, histograms, scatter plot. Also measures of central tendency (e.g. mean, median) and variability (e.g.standard deviation) can be very useful to describe an overview of the data. These summaries may either form the basis of the initial description of the data as part of a more extensive statistical analysis, or they may be sufficient in and of themselves for a particular investigation.
Inferencial statistics
is used to infer properties of a population from observed sample data from this population. Hypothesis testing is a way for you to test the results of an experiment to see if you have meaningful results. You’re basically testing whether your results are valid by figuring out the odds that your results have happened by chance. If your results may have happened by chance, the experiment won’t be repeatable and so has little use.Two main types of statistical tests can be distinguished:
- the parametric tests
, where an assumption is made on the distribution of the data.
- the non parametric tests,
where no or few assumption is made on the distribution of the data.
Acknowledgements
We believe in open innovation and designed our platforms to accelerate the standardisation and integration of open digital resources in biology. STATS relies on the following open libraries:
- Pandas, the reference Python library for Data analysis and manipulation
- SciPy, the reference Python library for Fundamental algorithms dedicated to scientific computing
References
[1] Baker M. 1,500 scientists lift the lid on reproducibility. Nature. 2016;533:452-4
[2] Kass R, Caffo B, Davidian M, Meng X, Yu B, Reid N. Ten Simple Rules for Effective Statistical Practice. PLoS Comput Biol. 2016;12:e1004961
[3] Pandas the reference Python library for Data frame manipulation
[4] SciPy, the reference Python library for Fundamental algorithms dedicated to scientific computing
Notice
Gencovery Numerical Resources (GNR) refer to the software, librairies and data provided by us through our web services. GNR may be covered by third-party licenses. Gencovery guarantees that GNR are accessible for your commercial and non-commercial use through Gencovery web services. For ad-hoc use of GNR outside Gencovery web services, please check third-party licenses to ensure you are legally authorised. Gencovery does not warrant or assume any legal liability or responsibility for the accuracy, completeness of any information disclosed through Gencovery web services. This is not a legal notice. Please refer to our terms of use for any legal notice about our web services.