Pesquisa | Portal de Pesquisa da BVS

Cophylogeny Reconstruction Allowing for Multiple Associations Through Approximate Bayesian Computation.

Sinaimeri, Blerina; Urbini, Laura; Sagot, Marie-France; Matias, Catherine.

Syst Biol ; 72(6): 1370-1386, 2023 Dec 30.

Artigo em Inglês | MEDLINE | ID: mdl-37703307

RESUMO

Phylogenetic tree reconciliation is extensively employed for the examination of coevolution between host and symbiont species. An important concern is the requirement for dependable cost values when selecting event-based parsimonious reconciliation. Although certain approaches deduce event probabilities unique to each pair of host and symbiont trees, which can subsequently be converted into cost values, a significant limitation lies in their inability to model the invasion of diverse host species by the same symbiont species (termed as a spread event), which is believed to occur in symbiotic relationships. Invasions lead to the observation of multiple associations between symbionts and their hosts (indicating that a symbiont is no longer exclusive to a single host), which are incompatible with the existing methods of coevolution. Here, we present a method called AmoCoala (an enhanced version of the tool Coala) that provides a more realistic estimation of cophylogeny event probabilities for a given pair of host and symbiont trees, even in the presence of spread events. We expand the classical 4-event coevolutionary model to include 2 additional outcomes, vertical and horizontal spreads, that lead to multiple associations. In the initial step, we estimate the probabilities of spread events using heuristic frequencies. Subsequently, in the second step, we employ an approximate Bayesian computation approach to infer the probabilities of the remaining 4 classical events (cospeciation, duplication, host switch, and loss) based on these values. By incorporating spread events, our reconciliation model enables a more accurate consideration of multiple associations. This improvement enhances the precision of estimated cost sets, paving the way to a more reliable reconciliation of host and symbiont trees. To validate our method, we conducted experiments on synthetic datasets and demonstrated its efficacy using real-world examples. Our results showcase that AmoCoala produces biologically plausible reconciliation scenarios, further emphasizing its effectiveness.

Assuntos

Especificidade de Hospedeiro , Simbiose , Filogenia , Teorema de Bayes

Correction: PPanGGOLiN: Depicting microbial diversity via a partitioned pangenome graph.

Gautreau, Guillaume; Bazin, Adelme; Gachet, Mathieu; Planel, Rémi; Burlot, Laura; Dubois, Mathieu; Perrin, Amandine; Médigue, Claudine; Calteau, Alexandra; Cruveiller, Stéphane; Matias, Catherine; Ambroise, Christophe; Rocha, Eduardo P C; Vallenet, David.

PLoS Comput Biol ; 17(12): e1009687, 2021 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-34890406

RESUMO

[This corrects the article DOI: 10.1371/journal.pcbi.1007732.].

PPanGGOLiN: Depicting microbial diversity via a partitioned pangenome graph.

PLoS Comput Biol ; 16(3): e1007732, 2020 03.

Artigo em Inglês | MEDLINE | ID: mdl-32191703

RESUMO

The use of comparative genomics for functional, evolutionary, and epidemiological studies requires methods to classify gene families in terms of occurrence in a given species. These methods usually lack multivariate statistical models to infer the partitions and the optimal number of classes and don't account for genome organization. We introduce a graph structure to model pangenomes in which nodes represent gene families and edges represent genomic neighborhood. Our method, named PPanGGOLiN, partitions nodes using an Expectation-Maximization algorithm based on multivariate Bernoulli Mixture Model coupled with a Markov Random Field. This approach takes into account the topology of the graph and the presence/absence of genes in pangenomes to classify gene families into persistent, cloud, and one or several shell partitions. By analyzing the partitioned pangenome graphs of isolate genomes from 439 species and metagenome-assembled genomes from 78 species, we demonstrate that our method is effective in estimating the persistent genome. Interestingly, it shows that the shell genome is a key element to understand genome dynamics, presumably because it reflects how genes present at intermediate frequencies drive adaptation of species, and its proportion in genomes is independent of genome size. The graph-based approach proposed by PPanGGOLiN is useful to depict the overall genomic diversity of thousands of strains in a compact structure and provides an effective basis for very large scale comparative genomics. The software is freely available at https://github.com/labgem/PPanGGOLiN.

Assuntos

Genoma Bacteriano/genética , Genômica/métodos , Software , Algoritmos , Bactérias/classificação , Bactérias/genética , Análise Multivariada

A time warping approach to multiple sequence alignment.

Arribas-Gil, Ana; Matias, Catherine.

Stat Appl Genet Mol Biol ; 16(2): 133-144, 2017 04 25.

Artigo em Inglês | MEDLINE | ID: mdl-28593899

RESUMO

We propose an approach for multiple sequence alignment (MSA) derived from the dynamic time warping viewpoint and recent techniques of curve synchronization developed in the context of functional data analysis. Starting from pairwise alignments of all the sequences (viewed as paths in a certain space), we construct a median path that represents the MSA we are looking for. We establish a proof of concept that our method could be an interesting ingredient to include into refined MSA techniques. We present a simple synthetic experiment as well as the study of a benchmark dataset, together with comparisons with 2 widely used MSA softwares.

Assuntos

Alinhamento de Sequência/métodos , Software , Algoritmos , Sequência de Bases/genética , Simulação por Computador

Nine quick tips for analyzing network data.

Miele, Vincent; Matias, Catherine; Robin, Stéphane; Dray, Stéphane.

PLoS Comput Biol ; 15(12): e1007434, 2019 12.

Artigo em Inglês | MEDLINE | ID: mdl-31856181

Assuntos

Biologia de Sistemas , Animais , Análise por Conglomerados , Biologia Computacional , Gráficos por Computador , Simulação por Computador , Interpretação Estatística de Dados , Redes Reguladoras de Genes , Humanos , Modelos Biológicos , Mapas de Interação de Proteínas , Rede Social , Software

A context dependent pair hidden Markov model for statistical alignment.

Arribas-Gil, Ana; Matias, Catherine.

Stat Appl Genet Mol Biol ; 11(1): Article 5, 2012 Jan 06.

Artigo em Inglês | MEDLINE | ID: mdl-22499681

RESUMO

This article proposes a novel approach to statistical alignment of nucleotide sequences by introducing a context dependent structure on the substitution process in the underlying evolutionary model. We propose to estimate alignments and context dependent mutation rates relying on the observation of two homologous sequences. The procedure is based on a generalized pair-hidden Markov structure, where conditional on the alignment path, the nucleotide sequences follow a Markov distribution. We use a stochastic approximation expectation maximization (saem) algorithm to give accurate estimators of parameters and alignments. We provide results both on simulated data and vertebrate genomes, which are known to have a high mutation rate from CG dinucleotide. In particular, we establish that the method improves the accuracy of the alignment of a human pseudogene and its functional gene.

Assuntos

Sequência de Bases , Cadeias de Markov , Modelos Estatísticos , Alinhamento de Sequência/métodos

SIMoNe: Statistical Inference for MOdular NEtworks.

Chiquet, Julien; Smith, Alexander; Grasseau, Gilles; Matias, Catherine; Ambroise, Christophe.

Bioinformatics ; 25(3): 417-8, 2009 Feb 01.

Artigo em Inglês | MEDLINE | ID: mdl-19073589

RESUMO

SUMMARY: The R package SIMoNe (Statistical Inference for MOdular NEtworks) enables inference of gene-regulatory networks based on partial correlation coefficients from microarray experiments. Modelling gene expression data with a Gaussian graphical model (hereafter GGM), the algorithm estimates non-zero entries of the concentration matrix, in a sparse and possibly high-dimensional setting. Its originality lies in the fact that it searches for a latent modular structure to drive the inference procedure through adaptive penalization of the concentration matrix. AVAILABILITY: Under the GNU General Public Licence at http://cran.r-project.org/web/packages/simone/

Assuntos

Algoritmos , Redes Reguladoras de Genes , Software , Simulação por Computador , Bases de Dados Genéticas , Perfilação da Expressão Gênica

Exploring the Robustness of the Parsimonious Reconciliation Method in Host-Symbiont Cophylogeny.

Urbini, Laura; Sinaimeri, Blerina; Matias, Catherine; Sagot, Marie-France.

IEEE/ACM Trans Comput Biol Bioinform ; 2018 May 21.

Artigo em Inglês | MEDLINE | ID: mdl-29993554

RESUMO

The aim of this paper is to explore the robustness of the parsimonious host-symbiont tree reconciliation method under editing or small perturbations of the input. The editing involves making different choices of unique symbiont mapping to a host in the case where multiple associations exist. This is made necessary by the fact that the tree reconciliation model is currently unable to handle such associations. The analysis performed could however also address the problem of errors. The perturbations are re-rootings of the symbiont tree to deal with a possibly wrong placement of the root specially in the case of fast-evolving species. In order to do this robustness analysis, we introduce a simulation scheme specifically designed for the host-symbiont cophylogeny context, as well as a measure to compare sets of tree reconciliations, both of which are of interest by themselves.

Revealing the hidden structure of dynamic ecological networks.

Miele, Vincent; Matias, Catherine.

R Soc Open Sci ; 4(6): 170251, 2017 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-28680678

RESUMO

In ecology, recent technological advances and long-term data studies now provide longitudinal interaction data (e.g. between individuals or species). Most often, time is the parameter along which interactions evolve but any other one-dimensional gradient (temperature, altitude, depth, humidity, etc.) can be considered. These data can be modelled through a sequence of different snapshots of an evolving ecological network, i.e. a dynamic network. Here, we present how the dynamic stochastic block model approach developed by Matias & Miele (Matias & Miele In press J. R. Stat. Soc. B (doi:10.1111/rssb.12200)) can capture the complexity and dynamics of these networks. First, we analyse a dynamic contact network of ants and we observe a clear high-level assembly with some variations in time at the individual level. Second, we explore the structure of a food web evolving during a year and we detect a stable predator-prey organization but also seasonal differences in the prey assemblage. Our approach, based on a rigorous statistical method implemented in the R package dynsbm, can pave the way for exploration of evolving ecological networks.

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA