Pesquisa | Portal Regional da BVS

1.

Use of Elasticsearch-based business intelligence tools for integration and visualization of biological data.

Scott-Boyer, Marie-Pier; Dufour, Pascal; Belleau, François; Ongaro-Carcy, Regis; Plessis, Clément; Périn, Olivier; Droit, Arnaud.

Brief Bioinform ; 24(6)2023 09 22.

Artigo em Inglês | MEDLINE | ID: mdl-37798252

RESUMO

The emergence of massive datasets exploring the multiple levels of molecular biology has made their analysis and knowledge transfer more complex. Flexible tools to manage big biological datasets could be of great help for standardizing the usage of developed data visualizations and integration methods. Business intelligence (BI) tools have been used in many fields as exploratory tools. They have numerous connectors to link numerous data repositories with a unified graphic interface, offering an overview of data and facilitating interpretation for decision makers. BI tools could be a flexible and user-friendly way of handling molecular biological data with interactive visualizations. However, it is rather uncommon to see such tools used for the exploration of massive and complex datasets in biological fields. We believe that two main obstacles could be the reason. Firstly, we posit that the way to import data into BI tools are not compatible with biological databases. Secondly, BI tools may not be adapted to certain particularities of complex biological data, namely, the size, the variability of datasets and the availability of specialized visualizations. This paper highlights the use of five BI tools (Elastic Kibana, Siren Investigate, Microsoft Power BI, Salesforce Tableau and Apache Superset) onto which the massive data management repository engine called Elasticsearch is compatible. Four case studies will be discussed in which these BI tools were applied on biological datasets with different characteristics. We conclude that the performance of the tools depends on the complexity of the biological questions and the size of the datasets.

Assuntos

Conjuntos de Dados como Assunto , Software , Visualização de Dados

2.

Overview of methods for characterization and visualization of a protein-protein interaction network in a multi-omics integration context.

Robin, Vivian; Bodein, Antoine; Scott-Boyer, Marie-Pier; Leclercq, Mickaël; Périn, Olivier; Droit, Arnaud.

Front Mol Biosci ; 9: 962799, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36158572

RESUMO

At the heart of the cellular machinery through the regulation of cellular functions, protein-protein interactions (PPIs) have a significant role. PPIs can be analyzed with network approaches. Construction of a PPI network requires prediction of the interactions. All PPIs form a network. Different biases such as lack of data, recurrence of information, and false interactions make the network unstable. Integrated strategies allow solving these different challenges. These approaches have shown encouraging results for the understanding of molecular mechanisms, drug action mechanisms, and identification of target genes. In order to give more importance to an interaction, it is evaluated by different confidence scores. These scores allow the filtration of the network and thus facilitate the representation of the network, essential steps to the identification and understanding of molecular mechanisms. In this review, we will discuss the main computational methods for predicting PPI, including ones confirming an interaction as well as the integration of PPIs into a network, and we will discuss visualization of these complex data.

3.

Machine Learning and Deep Learning Applications in Metagenomic Taxonomy and Functional Annotation.

Mathieu, Alban; Leclercq, Mickael; Sanabria, Melissa; Perin, Olivier; Droit, Arnaud.

Front Microbiol ; 13: 811495, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35359727

RESUMO

Shotgun sequencing of environmental DNA (i.e., metagenomics) has revolutionized the field of environmental microbiology, allowing the characterization of all microorganisms in a sequencing experiment. To identify the microbes in terms of taxonomy and biological activity, the sequenced reads must necessarily be aligned on known microbial genomes/genes. However, current alignment methods are limited in terms of speed and can produce a significant number of false positives when detecting bacterial species or false negatives in specific cases (virus, plasmids, and gene detection). Moreover, recent advances in metagenomics have enabled the reconstruction of new genomes using de novo binning strategies, but these genomes, not yet fully characterized, are not used in classic approaches, whereas machine and deep learning methods can use them as models. In this article, we attempted to review the different methods and their efficiency to improve the annotation of metagenomic sequences. Deep learning models have reached the performance of the widely used k-mer alignment-based tools, with better accuracy in certain cases; however, they still must demonstrate their robustness across the variety of environmental samples and across the rapid expansion of accessible genomes in databases.

4.

Advances in Microbiome-Derived Solutions and Methodologies Are Founding a New Era in Skin Health and Care.

Gueniche, Audrey; Perin, Olivier; Bouslimani, Amina; Landemaine, Leslie; Misra, Namita; Cupferman, Sylvie; Aguilar, Luc; Clavaud, Cécile; Chopra, Tarun; Khodr, Ahmad.

Pathogens ; 11(2)2022 Jan 20.

Artigo em Inglês | MEDLINE | ID: mdl-35215065

RESUMO

The microbiome, as a community of microorganisms and their structural elements, genomes, metabolites/signal molecules, has been shown to play an important role in human health, with significant beneficial applications for gut health. Skin microbiome has emerged as a new field with high potential to develop disruptive solutions to manage skin health and disease. Despite an incomplete toolbox for skin microbiome analyses, much progress has been made towards functional dissection of microbiomes and host-microbiome interactions. A standardized and robust investigation of the skin microbiome is necessary to provide accurate microbial information and set the base for a successful translation of innovations in the dermo-cosmetic field. This review provides an overview of how the landscape of skin microbiome research has evolved from method development (multi-omics/data-based analytical approaches) to the discovery and development of novel microbiome-derived ingredients. Moreover, it provides a summary of the latest findings on interactions between the microbiomes (gut and skin) and skin health/disease. Solutions derived from these two paths are used to develop novel microbiome-based ingredients or solutions acting on skin homeostasis are proposed. The most promising skin and gut-derived microbiome interventional strategies are presented, along with regulatory, safety, industrial, and technical challenges related to a successful translation of these microbiome-based concepts/technologies in the dermo-cosmetic industry.

5.

timeOmics: an R package for longitudinal multi-omics data integration.

Bodein, Antoine; Scott-Boyer, Marie-Pier; Perin, Olivier; Lê Cao, Kim-Anh; Droit, Arnaud.

Bioinformatics ; 38(2): 577-579, 2022 01 03.

Artigo em Inglês | MEDLINE | ID: mdl-34554215

RESUMO

MOTIVATION: Multi-omics data integration enables the global analysis of biological systems and discovery of new biological insights. Multi-omics experimental designs have been further extended with a longitudinal dimension to study dynamic relationships between molecules. However, methods that integrate longitudinal multi-omics data are still in their infancy. RESULTS: We introduce the R package timeOmics, a generic analytical framework for the integration of longitudinal multi-omics data. The framework includes pre-processing, modeling and clustering to identify molecular features strongly associated with time. We illustrate this framework in a case study to detect seasonal patterns of mRNA, metabolites, gut taxa and clinical variables in patients with diabetes mellitus from the integrative Human Microbiome Project. AVAILABILITYAND IMPLEMENTATION: timeOmics is available on Bioconductor and github.com/abodein/timeOmics. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Genômica , Multiômica , Humanos , Genômica/métodos , Análise por Conglomerados

6.

Interpretation of network-based integration from multi-omics longitudinal data.

Bodein, Antoine; Scott-Boyer, Marie-Pier; Perin, Olivier; Lê Cao, Kim-Anh; Droit, Arnaud.

Nucleic Acids Res ; 50(5): e27, 2022 03 21.

Artigo em Inglês | MEDLINE | ID: mdl-34883510

RESUMO

Multi-omics integration is key to fully understand complex biological processes in an holistic manner. Furthermore, multi-omics combined with new longitudinal experimental design can unreveal dynamic relationships between omics layers and identify key players or interactions in system development or complex phenotypes. However, integration methods have to address various experimental designs and do not guarantee interpretable biological results. The new challenge of multi-omics integration is to solve interpretation and unlock the hidden knowledge within the multi-omics data. In this paper, we go beyond integration and propose a generic approach to face the interpretation problem. From multi-omics longitudinal data, this approach builds and explores hybrid multi-omics networks composed of both inferred and known relationships within and between omics layers. With smart node labelling and propagation analysis, this approach predicts regulation mechanisms and multi-omics functional modules. We applied the method on 3 case studies with various multi-omics designs and identified new multi-layer interactions involved in key biological functions that could not be revealed with single omics analysis. Moreover, we highlighted interplay in the kinetics that could help identify novel biological mechanisms. This method is available as an R package netOmics to readily suit any application.

Assuntos

Genômica , Biologia de Sistemas/métodos , Genômica/métodos , Fenótipo

7.

Integration strategies of multi-omics data for machine learning analysis.

Picard, Milan; Scott-Boyer, Marie-Pier; Bodein, Antoine; Périn, Olivier; Droit, Arnaud.

Comput Struct Biotechnol J ; 19: 3735-3746, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34285775

RESUMO

Increased availability of high-throughput technologies has generated an ever-growing number of omics data that seek to portray many different but complementary biological layers including genomics, epigenomics, transcriptomics, proteomics, and metabolomics. New insight from these data have been obtained by machine learning algorithms that have produced diagnostic and classification biomarkers. Most biomarkers obtained to date however only include one omic measurement at a time and thus do not take full advantage of recent multi-omics experiments that now capture the entire complexity of biological systems. Multi-omics data integration strategies are needed to combine the complementary knowledge brought by each omics layer. We have summarized the most recent data integration methods/ frameworks into five different integration strategies: early, mixed, intermediate, late and hierarchical. In this mini-review, we focus on challenges and existing multi-omics integration strategies by paying special attention to machine learning applications.

8.

HCK and ABAA: A Newly Designed Pipeline to Improve Fungi Metabarcoding Analysis.

Mlaga, Kodjovi D; Mathieu, Alban; Beauparlant, Charles Joly; Ott, Alban; Khodr, Ahmad; Perin, Olivier; Droit, Arnaud.

Front Microbiol ; 12: 640693, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34025601

RESUMO

INTRODUCTION: The fungi ITS sequence length dissimilarity, non-specific amplicons, including chimaera formed during Polymerase Chain Reaction (PCR), added to sequencing errors, create bias during similarity clustering and abundance estimation in the downstream analysis. To overcome these challenges, we present a novel approach, Hierarchical Clustering with Kraken (HCK), to classify ITS1 amplicons and Abundance-Base Alternative Approach (ABAA) pipeline to detect and filter non-specific amplicons in fungi metabarcoding sequencing datasets. MATERIALS AND METHODS: We compared the performances of both pipelines against QIIME, KRAKEN, and DADA2 using publicly available fungi ITS mock community datasets and using BLASTn as a reference. We calculated the Precision, Recall, F-score using the True-Positive, False-positive, and False-negative estimation. Alpha diversity (Chao1 and Shannon metrics) was also used to evaluate the diversity estimation of our method. RESULTS: The analysis shows that ABAA reduced the number of false-positive with all metabarcoding methods tested, and HCK increases precision and recall. HCK, coupled with ABAA, improves the F-score and bring alpha diversity metric value close to that of the BLASTn alpha diversity values when compared to QIIME, KRAKEN, and DADA2. CONCLUSION: The developed HCK-ABAA approach allows better identification of the fungi community structures while avoiding use of a reference database for non-specific amplicons filtration. It results in a more robust and stable methodology over time. The software can be downloaded on the following link: https://bitbucket.org/GottySG36/hck/src/master/.

9.

GWENA: gene co-expression networks analysis and extended modules characterization in a single Bioconductor package.

Lemoine, Gwenaëlle G; Scott-Boyer, Marie-Pier; Ambroise, Bathilde; Périn, Olivier; Droit, Arnaud.

BMC Bioinformatics ; 22(1): 267, 2021 May 25.

Artigo em Inglês | MEDLINE | ID: mdl-34034647

RESUMO

BACKGROUND: Network-based analysis of gene expression through co-expression networks can be used to investigate modular relationships occurring between genes performing different biological functions. An extended description of each of the network modules is therefore a critical step to understand the underlying processes contributing to a disease or a phenotype. Biological integration, topology study and conditions comparison (e.g. wild vs mutant) are the main methods to do so, but to date no tool combines them all into a single pipeline. RESULTS: Here we present GWENA, a new R package that integrates gene co-expression network construction and whole characterization of the detected modules through gene set enrichment, phenotypic association, hub genes detection, topological metric computation, and differential co-expression. To demonstrate its performance, we applied GWENA on two skeletal muscle datasets from young and old patients of GTEx study. Remarkably, we prioritized a gene whose involvement was unknown in the muscle development and growth. Moreover, new insights on the variations in patterns of co-expression were identified. The known phenomena of connectivity loss associated with aging was found coupled to a global reorganization of the relationships leading to expression of known aging related functions. CONCLUSION: GWENA is an R package available through Bioconductor ( https://bioconductor.org/packages/release/bioc/html/GWENA.html ) that has been developed to perform extended analysis of gene co-expression networks. Thanks to biological and topological information as well as differential co-expression, the package helps to dissect the role of genes relationships in diseases conditions or targeted phenotypes. GWENA goes beyond existing packages that perform co-expression analysis by including new tools to fully characterize modules, such as differential co-expression, additional enrichment databases, and network visualization.

Assuntos

Redes Reguladoras de Genes , Software , Expressão Gênica , Perfilação da Expressão Gênica , Humanos

10.

KibioR & Kibio: a new architecture for next-generation data querying and sharing in big biology.

Ongaro-Carcy, Régis; Scott-Boyer, Marie-Pier; Dessemond, Adrien; Belleau, François; Leclercq, Mickael; Périn, Olivier; Droit, Arnaud.

Bioinformatics ; 37(17): 2706-2713, 2021 Sep 09.

Artigo em Inglês | MEDLINE | ID: mdl-33751043

RESUMO

MOTIVATION: The growing production of massive heterogeneous biological data offers opportunities for new discoveries. However, performing multi-omics data analysis is challenging, and researchers are forced to handle the ever-increasing complexity of both data management and evolution of our biological understanding. Substantial efforts have been made to unify biological datasets into integrated systems. Unfortunately, they are not easily scalable, deployable and searchable, locally or globally. RESULTS: This publication presents two tools with a simple structure that can help any data provider, organization or researcher, requiring a reliable data search and analysis base. The first tool is Kibio, a scalable and adaptable data storage based on Elasticsearch search engine. The second tool is KibioR, a R package to pull, push and search Kibio datasets or any accessible Elasticsearch-based databases. These tools apply a uniform data exchange model and minimize the burden of data management by organizing data into a decentralized, versatile, searchable and shareable structure. Several case studies are presented using multiple databases, from drug characterization to miRNAs and pathways identification, emphasizing the ease of use and versatility of the Kibio/KibioR framework. AVAILABILITYAND IMPLEMENTATION: Both KibioR and Elasticsearch are open source. KibioR package source is available at https://github.com/regisoc/kibior and the library on CRAN at https://cran.r-project.org/package=kibior. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

11.

Large-Scale Automatic Feature Selection for Biomarker Discovery in High-Dimensional OMICs Data.

Leclercq, Mickael; Vittrant, Benjamin; Martin-Magniette, Marie Laure; Scott Boyer, Marie Pier; Perin, Olivier; Bergeron, Alain; Fradet, Yves; Droit, Arnaud.

Front Genet ; 10: 452, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31156708

RESUMO

The identification of biomarker signatures in omics molecular profiling is usually performed to predict outcomes in a precision medicine context, such as patient disease susceptibility, diagnosis, prognosis, and treatment response. To identify these signatures, we have developed a biomarker discovery tool, called BioDiscML. From a collection of samples and their associated characteristics, i.e., the biomarkers (e.g., gene expression, protein levels, clinico-pathological data), BioDiscML exploits various feature selection procedures to produce signatures associated to machine learning models that will predict efficiently a specified outcome. To this purpose, BioDiscML uses a large variety of machine learning algorithms to select the best combination of biomarkers for predicting categorical or continuous outcomes from highly unbalanced datasets. The software has been implemented to automate all machine learning steps, including data pre-processing, feature selection, model selection, and performance evaluation. BioDiscML is delivered as a stand-alone program and is available for download at https://github.com/mickaelleclercq/BioDiscML.

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA