gws_ubiome

Introduction
Getting Started
ID Cards
Use cases
Releases
FAQ
Technical documentations
Version

FAQ

Q: Is constellab pipeline adapted to ligation sequecing ?


The Constellab pipeline is not currently optimized for ligation-based sequencing workflows. 


It targets the V3–V4 regions and is adapted for reads generated by NovaSeq, iSeq, or MiSeq sequencers.


Q: What should I do if the number of non-chimeric reads is low after the denoising step?


If you notice that the number of non-chimeric reads is low after the denoising step, you may need to adjust the --p-min-fold-parent-over-abundance parameter in the Q2FeatureInferencePE and Q2FeatureInferenceSE tasks.


This parameter specifies the minimum abundance required for potential parent sequences of a sequence being tested as chimeric. It is expressed as a fold-change relative to the abundance of the sequence under test. Values should be greater than or equal to 1 (i.e., potential parent sequences should be at least as abundant as the sequence being evaluated).


By default, the parameter is set to 1. It is recommended not to exceed a value of 16.


Q: What is the consequence of setting the --p-min-fold-parent-over-abundance parameter to a higher value?


Increasing this parameter generally leads to a higher number of non-chimeric reads being retained. As a result, the Shannon index tends to increase, since allowing more reads to pass through mechanically reveals a greater observed diversity. 


Q: What artifacts could influence the Shannon Index?


Several factors can influence the Shannon index. This index can be calculated using different logarithmic bases, and each tool often uses a specific base by default. This variation explains why discrepancies may arise when comparing results across different tools or with published literature.


Other factors may also contribute to these variations, such as the sequencing technologies, the selected molecular targets, the animal species studied, the geographical location, as well as the transition from OTUs to ASVs.


For reference, ASVs are now preferred in most cases due to their higher resolution and reproducibility.


It is important to remember that the Shannon index is a relative measure. Its primary value lies in the differences observed between sample groups. This index is particularly sensitive to experiment-specific parameters.


Q: How can I minimize batch effects and ensure meaningful comparisons across studies after using --p-min-fold-parent-over-abundance parameter ?


To minimize batch effects and allow for meaningful comparisons between different studies, It's recommended ideally eprocessing all relevant datasets using the same parameter value. This value should be as parsimonious as possible; that is, the lowest value from your benchmark that still ensures a sufficient number of non-chimeric reads is retained.


Q: Why do I see weird quality check plots, and how should I assess read quality properly?


Text editor image

If your data comes from a NovaSeq or iSeq platform, it is normal to observe unusual quality check plots when using the Constellab pipeline.


To better assess the quality of your reads, we recommend using the FastQC and MultiQC tasks available in the gws_omix brick.


However, in all cases, you must also run the Q2QualityCheck task.


Text editor image

Q: what's the difference between Absolute Abundance vs Relative Abundance in the Microbiome


The main differences between absolute and relative abundance are as follows:


  • Absolute abundance provides the actual count of microorganisms, which reflects the true number of microbes in the sample.
    • Relative abundance describes the proportional relationship between different microorganisms within a sample, allowing for comparison of their relative distributions (source).

      Q: When to Use Absolute Abundance and When to Use Relative Abundance?


      Absolute abundance: If the goal is to determine the actual number of microorganisms (such as in disease monitoring or precise quantification of microbial load), absolute abundance is more reliable.


      Relative abundance: If the focus is on understanding the community structure and comparing the proportions of different microorganisms within a sample (such as in ecological studies of microbial populations), relative abundance is often preferred. This approach highlights the proportional relationships among microbes within the community (source).  



      Q: Why LINDA AND DESEQ2 DETECT DIFFERENT KOs in 16s Functional Analysis Prediction


      Different statistical methods detecting different features is completely expected and normal. Here's why:


      STATISTICAL DIFFERENCES:


      - LinDA: Uses compositional data analysis, more conservative approach


      - DESeq2: Uses negative binomial distribution, often detects more features



      PRACTICAL IMPLICATIONS:


      - Features detected by BOTH methods = HIGH CONFIDENCE results (focus on these)


      - Features detected by ONE method = MODERATE CONFIDENCE (still valid)


      - Each method has different sensitivity to low-abundance features


      RECOMMENDATION:


      Focus on the KOs that appear in both LinDA and DESeq2 results - these are your most reliable findings.



      Let’s focus on analyzing this dataset:



      Text editor image
      Text editor image

      Overall statistics


      • LinDA: 58 significant pathways
        • DESeq2: 105 significant pathways (as expected, it’s more sensitive)
          • Consensus: 55 pathways in common → 94.8% of LinDA results overlap with DESeq2
            • This high overlap (55 shared pathways) suggests the findings are very robust.

              Focus: ko00052 — Galactose metabolism


              Status: Detected by both methods → high confidence


              Your numbers:


              • LinDA p-adjust: 0.030 (significant)
                • DESeq2 p-adjust: 0.000000004 (highly significant)
                  • Log2 fold change: +0.152
                    • Pathway class: Metabolism → Carbohydrate metabolism

                      Interpretation:


                      • Galactose metabolism is enriched in R2 vs R1.
                        • The positive log2 fold change indicates higher activity in R2.
                          • Agreement between LinDA and DESeq2 strengthens confidence in this result.
                            • DESeq2 shows stronger significance, which fits its higher sensitivity.

                              Other high-confidence pathways (both methods)


                              1. Primary bile acid biosynthesis (ko00120) — Log2FC: +0.493
                                1. Secondary bile acid biosynthesis (ko00121) — Log2FC: +0.493
                                  1. Ubiquinone biosynthesis (ko00130) — Log2FC: +0.603
                                    1. Glutathione metabolism (ko00480) — Log2FC: +0.355
                                      1. Arginine and proline metabolism (ko00330) — Log2FC: −0.141

                                        These point to core metabolic differences between R1 and R2.


                                        What this means biologically ?


                                        1. Bile acid metabolism: Both primary and secondary pathways are enriched in R2 Consistent with shifts in gut microbiome composition May reflect differences in dietary fat processing
                                          1. Carbohydrate metabolism: Multiple pathways (including galactose) are altered Suggests distinct energy-use patterns between groups
                                            1. Antioxidant systems: Glutathione metabolism is enriched in R2 May indicate different oxidative stress responses
                                              1. Amino acid metabolism: Mixed directionality (some enriched, some depleted) Points to broader metabolic reprogramming between groups

                                                Recommendations


                                                • The 94.8% overlap between LinDA and DESeq2 is excellent—treat the 55 consensus pathways as high-confidence results.
                                                  • For publication focus, highlight the bile acid pathways (ko00120, ko00121; Log2FC ~ 0.5) as strongest effects.
                                                    • Consider experimental validation of bile acid and glutathione pathways to confirm these computational findings.