Pesquisa | Portal de Pesquisa da BVS Enfermagem

1.

Cophylogeny Reconstruction Allowing for Multiple Associations Through Approximate Bayesian Computation.

Sinaimeri, Blerina; Urbini, Laura; Sagot, Marie-France; Matias, Catherine.

Syst Biol ; 72(6): 1370-1386, 2023 Dec 30.

Artigo em Inglês | MEDLINE | ID: mdl-37703307

RESUMO

Phylogenetic tree reconciliation is extensively employed for the examination of coevolution between host and symbiont species. An important concern is the requirement for dependable cost values when selecting event-based parsimonious reconciliation. Although certain approaches deduce event probabilities unique to each pair of host and symbiont trees, which can subsequently be converted into cost values, a significant limitation lies in their inability to model the invasion of diverse host species by the same symbiont species (termed as a spread event), which is believed to occur in symbiotic relationships. Invasions lead to the observation of multiple associations between symbionts and their hosts (indicating that a symbiont is no longer exclusive to a single host), which are incompatible with the existing methods of coevolution. Here, we present a method called AmoCoala (an enhanced version of the tool Coala) that provides a more realistic estimation of cophylogeny event probabilities for a given pair of host and symbiont trees, even in the presence of spread events. We expand the classical 4-event coevolutionary model to include 2 additional outcomes, vertical and horizontal spreads, that lead to multiple associations. In the initial step, we estimate the probabilities of spread events using heuristic frequencies. Subsequently, in the second step, we employ an approximate Bayesian computation approach to infer the probabilities of the remaining 4 classical events (cospeciation, duplication, host switch, and loss) based on these values. By incorporating spread events, our reconciliation model enables a more accurate consideration of multiple associations. This improvement enhances the precision of estimated cost sets, paving the way to a more reliable reconciliation of host and symbiont trees. To validate our method, we conducted experiments on synthetic datasets and demonstrated its efficacy using real-world examples. Our results showcase that AmoCoala produces biologically plausible reconciliation scenarios, further emphasizing its effectiveness.

Assuntos

Especificidade de Hospedeiro , Simbiose , Filogenia , Teorema de Bayes

2.

The transposable element-rich genome of the cereal pest Sitophilus oryzae.

Parisot, Nicolas; Vargas-Chávez, Carlos; Goubert, Clément; Baa-Puyoulet, Patrice; Balmand, Séverine; Beranger, Louis; Blanc, Caroline; Bonnamour, Aymeric; Boulesteix, Matthieu; Burlet, Nelly; Calevro, Federica; Callaerts, Patrick; Chancy, Théo; Charles, Hubert; Colella, Stefano; Da Silva Barbosa, André; Dell'Aglio, Elisa; Di Genova, Alex; Febvay, Gérard; Gabaldón, Toni; Galvão Ferrarini, Mariana; Gerber, Alexandra; Gillet, Benjamin; Hubley, Robert; Hughes, Sandrine; Jacquin-Joly, Emmanuelle; Maire, Justin; Marcet-Houben, Marina; Masson, Florent; Meslin, Camille; Montagné, Nicolas; Moya, Andrés; Ribeiro de Vasconcelos, Ana Tereza; Richard, Gautier; Rosen, Jeb; Sagot, Marie-France; Smit, Arian F A; Storer, Jessica M; Vincent-Monegat, Carole; Vallier, Agnès; Vigneron, Aurélien; Zaidman-Rémy, Anna; Zamoum, Waël; Vieira, Cristina; Rebollo, Rita; Latorre, Amparo; Heddi, Abdelaziz.

BMC Biol ; 19(1): 241, 2021 11 09.

Artigo em Inglês | MEDLINE | ID: mdl-34749730

RESUMO

BACKGROUND: The rice weevil Sitophilus oryzae is one of the most important agricultural pests, causing extensive damage to cereal in fields and to stored grains. S. oryzae has an intracellular symbiotic relationship (endosymbiosis) with the Gram-negative bacterium Sodalis pierantonius and is a valuable model to decipher host-symbiont molecular interactions. RESULTS: We sequenced the Sitophilus oryzae genome using a combination of short and long reads to produce the best assembly for a Curculionidae species to date. We show that S. oryzae has undergone successive bursts of transposable element (TE) amplification, representing 72% of the genome. In addition, we show that many TE families are transcriptionally active, and changes in their expression are associated with insect endosymbiotic state. S. oryzae has undergone a high gene expansion rate, when compared to other beetles. Reconstruction of host-symbiont metabolic networks revealed that, despite its recent association with cereal weevils (30 kyear), S. pierantonius relies on the host for several amino acids and nucleotides to survive and to produce vitamins and essential amino acids required for insect development and cuticle biosynthesis. CONCLUSIONS: Here we present the genome of an agricultural pest beetle, which may act as a foundation for pest control. In addition, S. oryzae may be a useful model for endosymbiosis, and studying TE evolution and regulation, along with the impact of TEs on eukaryotic genomes.

Assuntos

Besouros , Gorgulhos , Animais , Comunicação Celular , Elementos de DNA Transponíveis/genética , Grão Comestível , Humanos , Gorgulhos/genética

3.

Capybara: equivalence ClAss enumeration of coPhylogenY event-BAsed ReconciliAtions.

Wang, Yishu; Mary, Arnaud; Sagot, Marie-France; Sinaimeri, Blerina.

Bioinformatics ; 36(14): 4197-4199, 2020 08 15.

Artigo em Inglês | MEDLINE | ID: mdl-32556075

RESUMO

MOTIVATION: Phylogenetic tree reconciliation is the method of choice in analyzing host-symbiont systems. Despite the many reconciliation tools that have been proposed in the literature, two main issues remain unresolved: (i) listing suboptimal solutions (i.e. whose score is 'close' to the optimal ones) and (ii) listing only solutions that are biologically different 'enough'. The first issue arises because the optimal solutions are not always the ones biologically most significant; providing many suboptimal solutions as alternatives for the optimal ones is thus very useful. The second one is related to the difficulty to analyze an often huge number of optimal solutions. In this article, we propose Capybara that addresses both of these problems in an efficient way. Furthermore, it includes a tool for visualizing the solutions that significantly helps the user in the process of analyzing the results. AVAILABILITY AND IMPLEMENTATION: The source code, documentation and binaries for all platforms are freely available at https://capybara-doc.readthedocs.io/. CONTACT: yishu.wang@univ-lyon1.fr or blerina.sinaimeri@inria.fr. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Roedores , Animais , Filogenia , Software

4.

MOOMIN - Mathematical explOration of 'Omics data on a MetabolIc Network.

Pusa, Taneli; Ferrarini, Mariana Galvão; Andrade, Ricardo; Mary, Arnaud; Marchetti-Spaccamela, Alberto; Stougie, Leen; Sagot, Marie-France.

Bioinformatics ; 36(2): 514-523, 2020 01 15.

Artigo em Inglês | MEDLINE | ID: mdl-31504164

RESUMO

MOTIVATION: Analysis of differential expression of genes is often performed to understand how the metabolic activity of an organism is impacted by a perturbation. However, because the system of metabolic regulation is complex and all changes are not directly reflected in the expression levels, interpreting these data can be difficult. RESULTS: In this work, we present a new algorithm and computational tool that uses a genome-scale metabolic reconstruction to infer metabolic changes from differential expression data. Using the framework of constraint-based analysis, our method produces a qualitative hypothesis of a change in metabolic activity. In other words, each reaction of the network is inferred to have increased, decreased, or remained unchanged in flux. In contrast to similar previous approaches, our method does not require a biological objective function and does not assign on/off activity states to genes. An implementation is provided and it is available online. We apply the method to three published datasets to show that it successfully accomplishes its two main goals: confirming or rejecting metabolic changes suggested by differentially expressed genes based on how well they fit in as parts of a coordinated metabolic change, as well as inferring changes in reactions whose genes did not undergo differential expression. AVAILABILITY AND IMPLEMENTATION: github.com/htpusa/moomin. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Redes e Vias Metabólicas , Algoritmos , Biologia Computacional , Genoma , Modelos Biológicos

5.

MOMO - multi-objective metabolic mixed integer optimization: application to yeast strain engineering.

Andrade, Ricardo; Doostmohammadi, Mahdi; Santos, João L; Sagot, Marie-France; Mira, Nuno P; Vinga, Susana.

BMC Bioinformatics ; 21(1): 69, 2020 Feb 24.

Artigo em Inglês | MEDLINE | ID: mdl-32093622

RESUMO

BACKGROUND: In this paper, we explore the concept of multi-objective optimization in the field of metabolic engineering when both continuous and integer decision variables are involved in the model. In particular, we propose a multi-objective model that may be used to suggest reaction deletions that maximize and/or minimize several functions simultaneously. The applications may include, among others, the concurrent maximization of a bioproduct and of biomass, or maximization of a bioproduct while minimizing the formation of a given by-product, two common requirements in microbial metabolic engineering. RESULTS: Production of ethanol by the widely used cell factory Saccharomyces cerevisiae was adopted as a case study to demonstrate the usefulness of the proposed approach in identifying genetic manipulations that improve productivity and yield of this economically highly relevant bioproduct. We did an in vivo validation and we could show that some of the predicted deletions exhibit increased ethanol levels in comparison with the wild-type strain. CONCLUSIONS: The multi-objective programming framework we developed, called MOMO, is open-source and uses POLYSCIP (Available at http://polyscip.zib.de/). as underlying multi-objective solver. MOMO is available at http://momo-sysbio.gforge.inria.fr.

Assuntos

Engenharia Metabólica/métodos , Software , Biomassa , Etanol/metabolismo , Modelos Biológicos , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo

6.

Exploring and Visualizing Spaces of Tree Reconciliations.

Huber, Katharina T; Moulton, Vincent; Sagot, Marie-France; Sinaimeri, Blerina.

Syst Biol ; 68(4): 607-618, 2019 07 01.

Artigo em Inglês | MEDLINE | ID: mdl-30418649

RESUMO

Tree reconciliation is the mathematical tool that is used to investigate the coevolution of organisms, such as hosts and parasites. A common approach to tree reconciliation involves specifying a model that assigns costs to certain events, such as cospeciation, and then tries to find a mapping between two specified phylogenetic trees which minimizes the total cost of the implied events. For such models, it has been shown that there may be a huge number of optimal solutions, or at least solutions that are close to optimal. It is therefore of interest to be able to systematically compare and visualize whole collections of reconciliations between a specified pair of trees. In this article, we consider various metrics on the set of all possible reconciliations between a pair of trees, some that have been defined before but also new metrics that we shall propose. We show that the diameter for the resulting spaces of reconciliations can in some cases be determined theoretically, information that we use to normalize and compare properties of the metrics. We also implement the metrics and compare their behavior on several host parasite data sets, including the shapes of their distributions. In addition, we show that in combination with multidimensional scaling, the metrics can be useful for visualizing large collections of reconciliations, much in the same way as phylogenetic tree metrics can be used to explore collections of phylogenetic trees. Implementations of the metrics can be downloaded from: https://team.inria.fr/erable/en/team-members/blerina-sinaimeri/reconciliation-distances/.

Assuntos

Classificação/métodos , Interações Hospedeiro-Parasita/fisiologia , Filogenia , Modelos Biológicos

7.

Hydrogen peroxide production and myo-inositol metabolism as important traits for virulence of Mycoplasma hyopneumoniae.

Galvao Ferrarini, Mariana; Mucha, Scheila Gabriele; Parrot, Delphine; Meiffrein, Guillaume; Ruggiero Bachega, Jose Fernando; Comte, Gilles; Zaha, Arnaldo; Sagot, Marie-France.

Mol Microbiol ; 108(6): 683-696, 2018 06.

Artigo em Inglês | MEDLINE | ID: mdl-29624763

RESUMO

Mycoplasma hyopneumoniae is the causative agent of enzootic pneumonia. In our previous work, we reconstructed the metabolic models of this species along with two other mycoplasmas from the respiratory tract of swine: Mycoplasma hyorhinis, considered less pathogenic but which nonetheless causes disease and Mycoplasma flocculare, a commensal bacterium. We identified metabolic differences that partially explained their different levels of pathogenicity. One important trait was the production of hydrogen peroxide from the glycerol metabolism only in the pathogenic species. Another important feature was a pathway for the metabolism of myo-inositol in M. hyopneumoniae. Here, we tested these traits to understand their relation to the different levels of pathogenicity, comparing not only the species but also pathogenic and attenuated strains of M. hyopneumoniae. Regarding the myo-inositol metabolism, we show that only M. hyopneumoniae assimilated this carbohydrate and remained viable when myo-inositol was the primary energy source. Strikingly, only the two pathogenic strains of M. hyopneumoniae produced hydrogen peroxide in complex medium. We also show that this production was dependent on the presence of glycerol. Although further functional tests are needed, we present in this work two interesting metabolic traits of M. hyopneumoniae that might be directly related to its enhanced virulence.

Assuntos

Peróxido de Hidrogênio/metabolismo , Inositol/metabolismo , Mycoplasma hyopneumoniae/metabolismo , Mycoplasma hyopneumoniae/patogenicidade , Pneumonia Suína Micoplasmática/microbiologia , Animais , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Mycoplasma hyopneumoniae/genética , Especificidade da Espécie , Suínos , Virulência

8.

How Long Does Wolbachia Remain on Board?

Bailly-Bechet, Marc; Martins-Simões, Patricia; Szöllosi, Gergely J; Mialdea, Gladys; Sagot, Marie-France; Charlat, Sylvain.

Mol Biol Evol ; 34(5): 1183-1193, 2017 05 01.

Artigo em Inglês | MEDLINE | ID: mdl-28201740

RESUMO

Wolbachia bacteria infect about half of all arthropods, with diverse and extreme consequences ranging from sex-ratio distortion and mating incompatibilities to protection against viruses. These phenotypic effects, combined with efficient vertical transmission from mothers to offspring, satisfactorily explain the invasion dynamics of Wolbachia within species. However, beyond the species level, the lack of congruence between the host and symbiont phylogenetic trees indicates that Wolbachia horizontal transfers and extinctions do happen and underlie its global distribution. But how often do they occur? And has the Wolbachia pandemic reached its equilibrium? Here, we address these questions by inferring recent acquisition/loss events from the distribution of Wolbachia lineages across the mitochondrial DNA tree of 3,600 arthropod specimens, spanning 1,100 species from Tahiti and surrounding islands. We show that most events occurred within the last million years, but are likely attributable to individual level variation (e.g., imperfect maternal transmission) rather than population level variation (e.g., Wolbachia extinction). At the population level, we estimate that mitochondria typically accumulate 4.7% substitutions per site during an infected episode, and 7.1% substitutions per site during the uninfected phase. Using a Bayesian time calibration of the mitochondrial tree, these numbers translate into infected and uninfected phases of approximately 7 and 9 million years. Infected species thus lose Wolbachia slightly more often than uninfected species acquire it, supporting the view that its present incidence, estimated here slightly below 0.5, represents an epidemiological equilibrium.

Assuntos

Wolbachia/genética , Animais , Artrópodes/genética , DNA Mitocondrial/genética , Evolução Molecular , Variação Genética , Genética Populacional , Haplótipos , Filogenia , Simbiose/genética

9.

SNP calling from RNA-seq data without a reference genome: identification, quantification, differential analysis and impact on the protein sequence.

Lopez-Maestre, Hélène; Brinza, Lilia; Marchet, Camille; Kielbassa, Janice; Bastien, Sylvère; Boutigny, Mathilde; Monnin, David; Filali, Adil El; Carareto, Claudia Marcia; Vieira, Cristina; Picard, Franck; Kremer, Natacha; Vavre, Fabrice; Sagot, Marie-France; Lacroix, Vincent.

Nucleic Acids Res ; 44(19): e148, 2016 Nov 02.

Artigo em Inglês | MEDLINE | ID: mdl-27458203

RESUMO

SNPs (Single Nucleotide Polymorphisms) are genetic markers whose precise identification is a prerequisite for association studies. Methods to identify them are currently well developed for model species, but rely on the availability of a (good) reference genome, and therefore cannot be applied to non-model species. They are also mostly tailored for whole genome (re-)sequencing experiments, whereas in many cases, transcriptome sequencing can be used as a cheaper alternative which already enables to identify SNPs located in transcribed regions. In this paper, we propose a method that identifies, quantifies and annotates SNPs without any reference genome, using RNA-seq data only. Individuals can be pooled prior to sequencing, if not enough material is available from one individual. Using pooled human RNA-seq data, we clarify the precision and recall of our method and discuss them with respect to other methods which use a reference genome or an assembled transcriptome. We then validate experimentally the predictions of our method using RNA-seq data from two non-model species. The method can be used for any species to annotate SNPs and predict their impact on the protein sequence. We further enable to test for the association of the identified SNPs with a phenotype of interest.

Assuntos

Sequência de Bases , Genoma , Polimorfismo de Nucleotídeo Único , Análise de Sequência de RNA , Algoritmos , Sequência de Aminoácidos , Animais , Biologia Computacional/métodos , Marcadores Genéticos , Genômica/métodos , Genótipo , Humanos , Fenótipo , Reprodutibilidade dos Testes , Análise de Sequência de DNA/métodos , Análise de Sequência de RNA/métodos , Transcriptoma

10.

DegreeCox - a network-based regularization method for survival analysis.

Veríssimo, André; Oliveira, Arlindo Limede; Sagot, Marie-France; Vinga, Susana.

BMC Bioinformatics ; 17(Suppl 16): 449, 2016 Dec 13.

Artigo em Inglês | MEDLINE | ID: mdl-28105908

RESUMO

BACKGROUND: Modeling survival oncological data has become a major challenge as the increase in the amount of molecular information nowadays available means that the number of features greatly exceeds the number of observations. One possible solution to cope with this dimensionality problem is the use of additional constraints in the cost function optimization. LASSO and other sparsity methods have thus already been successfully applied with such idea. Although this leads to more interpretable models, these methods still do not fully profit from the relations between the features, specially when these can be represented through graphs. We propose DEGREECOX, a method that applies network-based regularizers to infer Cox proportional hazard models, when the features are genes and the outcome is patient survival. In particular, we propose to use network centrality measures to constrain the model in terms of significant genes. RESULTS: We applied DEGREECOX to three datasets of ovarian cancer carcinoma and tested several centrality measures such as weighted degree, betweenness and closeness centrality. The a priori network information was retrieved from Gene Co-Expression Networks and Gene Functional Maps. When compared with RIDGE and LASSO, DEGREECOX shows an improvement in the classification of high and low risk patients in a par with NET-COX. The use of network information is especially relevant with datasets that are not easily separated. In terms of RMSE and C-index, DEGREECOX gives results that are similar to those of the best performing methods, in a few cases slightly better. CONCLUSIONS: Network-based regularization seems a promising framework to deal with the dimensionality problem. The centrality metrics proposed can be easily expanded to accommodate other topological properties of different biological networks.

Assuntos

Algoritmos , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Neoplasias Ovarianas/genética , Modelos de Riscos Proporcionais , Feminino , Humanos , Modelos Genéticos

11.

Insights on the virulence of swine respiratory tract mycoplasmas through genome-scale metabolic modeling.

Ferrarini, Mariana G; Siqueira, Franciele M; Mucha, Scheila G; Palama, Tony L; Jobard, Élodie; Elena-Herrmann, Bénédicte; R Vasconcelos, Ana T; Tardy, Florence; Schrank, Irene S; Zaha, Arnaldo; Sagot, Marie-France.

BMC Genomics ; 17: 353, 2016 05 13.

Artigo em Inglês | MEDLINE | ID: mdl-27178561

RESUMO

BACKGROUND: The respiratory tract of swine is colonized by several bacteria among which are three Mycoplasma species: Mycoplasma flocculare, Mycoplasma hyopneumoniae and Mycoplasma hyorhinis. While colonization by M. flocculare is virtually asymptomatic, M. hyopneumoniae is the causative agent of enzootic pneumonia and M. hyorhinis is present in cases of pneumonia, polyserositis and arthritis. The genomic resemblance among these three Mycoplasma species combined with their different levels of pathogenicity is an indication that they have unknown mechanisms of virulence and differential expression, as for most mycoplasmas. METHODS: In this work, we performed whole-genome metabolic network reconstructions for these three mycoplasmas. Cultivation tests and metabolomic experiments through nuclear magnetic resonance spectroscopy (NMR) were also performed to acquire experimental data and further refine the models reconstructed in silico. RESULTS: Even though the refined models have similar metabolic capabilities, interesting differences include a wider range of carbohydrate uptake in M. hyorhinis, which in turn may also explain why this species is a widely contaminant in cell cultures. In addition, the myo-inositol catabolism is exclusive to M. hyopneumoniae and may be an important trait for virulence. However, the most important difference seems to be related to glycerol conversion to dihydroxyacetone-phosphate, which produces toxic hydrogen peroxide. This activity, missing only in M. flocculare, may be directly involved in cytotoxicity, as already described for two lung pathogenic mycoplasmas, namely Mycoplasma pneumoniae in human and Mycoplasma mycoides subsp. mycoides in ruminants. Metabolomic data suggest that even though these mycoplasmas are extremely similar in terms of genome and metabolism, distinct products and reaction rates may be the result of differential expression throughout the species. CONCLUSIONS: We were able to infer from the reconstructed networks that the lack of pathogenicity of M. flocculare if compared to the highly pathogenic M. hyopneumoniae may be related to its incapacity to produce cytotoxic hydrogen peroxide. Moreover, the ability of M. hyorhinis to grow in diverse sites and even in different hosts may be a reflection of its enhanced and wider carbohydrate uptake. Altogether, the metabolic differences highlighted in silico and in vitro provide important insights to the different levels of pathogenicity observed in each of the studied species.

Assuntos

Metabolismo Energético , Genoma Bacteriano , Genômica , Modelos Biológicos , Mycoplasma hyopneumoniae/fisiologia , Pneumonia Suína Micoplasmática/microbiologia , Virulência/genética , Animais , Carga Bacteriana , Biomassa , Biologia Computacional/métodos , Ontologia Genética , Genômica/métodos , Espectroscopia de Ressonância Magnética , Redes e Vias Metabólicas , Metabolômica/métodos , Viabilidade Microbiana , Mycoplasma hyopneumoniae/patogenicidade , Suínos

12.

Mycoplasma non-coding RNA: identification of small RNAs and targets.

Siqueira, Franciele Maboni; de Morais, Guilherme Loss; Higashi, Susan; Beier, Laura Scherer; Breyer, Gabriela Merker; de Sá Godinho, Caio Padoan; Sagot, Marie-France; Schrank, Irene Silveira; Zaha, Arnaldo; de Vasconcelos, Ana Tereza Ribeiro.

BMC Genomics ; 17(Suppl 8): 743, 2016 10 25.

Artigo em Inglês | MEDLINE | ID: mdl-27801290

RESUMO

BACKGROUND: Bacterial non-coding RNAs act by base-pairing as regulatory elements in crucial biological processes. We performed the identification of trans-encoded small RNAs (sRNA) from the genomes of Mycoplama hyopneumoniae, Mycoplasma flocculare and Mycoplasma hyorhinis, which are Mycoplasma species that have been identified in the porcine respiratory system. RESULTS: A total of 47, 15 and 11 putative sRNAs were predicted in M. hyopneumoniae, M. flocculare and M. hyorhinis, respectively. A comparative genomic analysis revealed the presence of species or lineage specific sRNA candidates. Furthermore, the expression profile of some M. hyopneumoniae sRNAs was determined by a reverse transcription amplification approach, in three different culture conditions. All tested sRNAs were transcribed in at least one condition. A detailed investigation revealed a differential expression profile for two M. hyopneumoniae sRNAs in response to oxidative and heat shock stress conditions, suggesting that their expression is influenced by environmental signals. Moreover, we analyzed sRNA-mRNA hybrids and accessed putative target genes for the novel sRNA candidates. The majority of the sRNAs showed interaction with multiple target genes, some of which could be linked to pathogenesis and cell homeostasis activity. CONCLUSION: This study contributes to our knowledge of Mycoplasma sRNAs and their response to environmental changes. Furthermore, the mRNA target prediction provides a perspective for the characterization and comprehension of the function of the sRNA regulatory mechanisms.

Assuntos

Regulação Bacteriana da Expressão Gênica , Mycoplasma/genética , Interferência de RNA , RNA não Traduzido/genética , Animais , Biologia Computacional/métodos , Perfilação da Expressão Gênica , RNA não Traduzido/química , Suínos

13.

MeDuSa: a multi-draft based scaffolder.

Bosi, Emanuele; Donati, Beatrice; Galardini, Marco; Brunetti, Sara; Sagot, Marie-France; Lió, Pietro; Crescenzi, Pierluigi; Fani, Renato; Fondi, Marco.

Bioinformatics ; 31(15): 2443-51, 2015 Aug 01.

Artigo em Inglês | MEDLINE | ID: mdl-25810435

RESUMO

MOTIVATION: Completing the genome sequence of an organism is an important task in comparative, functional and structural genomics. However, this remains a challenging issue from both a computational and an experimental viewpoint. Genome scaffolding (i.e. the process of ordering and orientating contigs) of de novo assemblies usually represents the first step in most genome finishing pipelines. RESULTS: In this article we present MeDuSa (Multi-Draft based Scaffolder), an algorithm for genome scaffolding. MeDuSa exploits information obtained from a set of (draft or closed) genomes from related organisms to determine the correct order and orientation of the contigs. MeDuSa formalizes the scaffolding problem by means of a combinatorial optimization formulation on graphs and implements an efficient constant factor approximation algorithm to solve it. In contrast to currently used scaffolders, it does not require either prior knowledge on the microrganisms dataset under analysis (e.g. their phylogenetic relationships) or the availability of paired end read libraries. This makes usability and running time two additional important features of our method. Moreover, benchmarks and tests on real bacterial datasets showed that MeDuSa is highly accurate and, in most cases, outperforms traditional scaffolders. The possibility to use MeDuSa on eukaryotic datasets has also been evaluated, leading to interesting results.

Assuntos

Algoritmos , Mapeamento de Sequências Contíguas/métodos , Genômica/métodos , Software

14.

Mirinho: An efficient and general plant and animal pre-miRNA predictor for genomic and deep sequencing data.

Higashi, Susan; Fournier, Cyril; Gautier, Christian; Gaspin, Christine; Sagot, Marie-France.

BMC Bioinformatics ; 16: 179, 2015 May 29.

Artigo em Inglês | MEDLINE | ID: mdl-26022464

RESUMO

BACKGROUND: Several methods exist for the prediction of precursor miRNAs (pre-miRNAs) in genomic or sRNA-seq (small RNA sequences) data produced by NGS (Next Generation Sequencing). One key information used for this task is the characteristic hairpin structure adopted by pre-miRNAs, that in general are identified using RNA folders whose complexity is cubic in the size of the input. The vast majority of pre-miRNA predictors then rely on further information learned from previously validated miRNAs from the same or a closely related genome for the final prediction of new miRNAs. With this paper, we wished to address three main issues. The first was methodological and aimed at obtaining a more time-efficient predictor, however without losing in accuracy which represented a second issue. We indeed aimed at better predicting miRNAs at a genome scale, but also from sRNAseq data where in some cases, notably of plants, the current folding methods often infer the wrong structure. The third issue is related to the fact that it is important to rely as little as possible on previously recorded examples of miRNAs. We therefore also sought a method that is less dependent on previous miRNA records. RESULTS: As concerns the first and second issues, we present a novel alternative to a classical folder based on a thermodynamic Nearest-Neighbour (NN) model for computing the free energy and predicting the classical hairpin structure of a pre-miRNA. We show that the free energies thus computed correlate well with those of RNAFOLD. This novel method, called MIRINHO, has quadratic instead of cubic complexity and is much more efficient also in practice. When applied to sRNAseq data of plants, it gives in general better results than classical folders. On the third issue, we show that MIRINHO, which uses as only knowledge the length of the loops and stem-arms and the free energy of the pre-miRNA hairpin, compares well with algorithms that require more information. The results, obtained with different datasets, are indeed similar to those of other approaches with which such a comparison was possible. These needed to be publicly available softwares that could be used on a large input. In some cases, MIRINHO is even better in terms of sensitivity or precision. CONCLUSION: We provide a simpler and much faster method with very reasonable sensitivity and precision, which can be applied without special adaptation to the prediction of both animal and plant pre-miRNAs, using as input either genomic sequences or sRNA-seq data.

Assuntos

Arabidopsis/genética , Genoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Insetos/genética , MicroRNAs/genética , Análise de Sequência de RNA/métodos , Software , Algoritmos , Animais , Pareamento de Bases , Sequência de Bases , Genômica/métodos , Dados de Sequência Molecular , Homologia de Sequência do Ácido Nucleico

15.

Genome reduction and potential metabolic complementation of the dual endosymbionts in the whitefly Bemisia tabaci.

Rao, Qiong; Rollat-Farnier, Pierre-Antoine; Zhu, Dan-Tong; Santos-Garcia, Diego; Silva, Francisco J; Moya, Andrés; Latorre, Amparo; Klein, Cecilia C; Vavre, Fabrice; Sagot, Marie-France; Liu, Shu-Sheng; Mouton, Laurence; Wang, Xiao-Wei.

BMC Genomics ; 16: 226, 2015 Mar 21.

Artigo em Inglês | MEDLINE | ID: mdl-25887812

RESUMO

BACKGROUND: The whitefly Bemisia tabaci is an important agricultural pest with global distribution. This phloem-sap feeder harbors a primary symbiont, "Candidatus Portiera aleyrodidarum", which compensates for the deficient nutritional composition of its food sources, and a variety of secondary symbionts. Interestingly, all of these secondary symbionts are found in co-localization with the primary symbiont within the same bacteriocytes, which should favor the evolution of strong interactions between symbionts. RESULTS: In this paper, we analyzed the genome sequences of the primary symbiont Portiera and of the secondary symbiont Hamiltonella in the B. tabaci Mediterranean (MED) species in order to gain insight into the metabolic role of each symbiont in the biology of their host. The genome sequences of the uncultured symbionts Portiera and Hamiltonella were obtained from one single bacteriocyte of MED B. tabaci. As already reported, the genome of Portiera is highly reduced (357 kb), but has kept a number of genes encoding most essential amino-acids and carotenoids. On the other hand, Portiera lacks almost all the genes involved in the synthesis of vitamins and cofactors. Moreover, some pathways are incomplete, notably those involved in the synthesis of some essential amino-acids. Interestingly, the genome of Hamiltonella revealed that this secondary symbiont can not only provide vitamins and cofactors, but also complete the missing steps of some of the pathways of Portiera. In addition, some critical amino-acid biosynthetic genes are missing in the two symbiotic genomes, but analysis of whitefly transcriptome suggests that the missing steps may be performed by the whitefly itself or its microbiota. CONCLUSIONS: These data suggest that Portiera and Hamiltonella are not only complementary but could also be mutually dependent to provide a full complement of nutrients to their host. Altogether, these results illustrate how functional redundancies can lead to gene losses in the genomes of the different symbiotic partners, reinforcing their inter-dependency.

Assuntos

Enterobacteriaceae/genética , Genoma Bacteriano , Halomonadaceae/genética , Hemípteros/genética , Hemípteros/microbiologia , Simbiose/genética , Aminoácidos/biossíntese , Animais , DNA/análise , DNA/isolamento & purificação , DNA/metabolismo , Hemípteros/metabolismo , Sequenciamento de Nucleotídeos em Larga Escala , Hibridização in Situ Fluorescente , Redes e Vias Metabólicas/genética , Dados de Sequência Molecular , Análise de Sequência de DNA , Vitaminas/biossíntese

16.

Telling metabolic stories to explore metabolomics data: a case study on the yeast response to cadmium exposure.

Milreu, Paulo Vieira; Klein, Cecilia Coimbra; Cottret, Ludovic; Acuña, Vicente; Birmelé, Etienne; Borassi, Michele; Junot, Christophe; Marchetti-Spaccamela, Alberto; Marino, Andrea; Stougie, Leen; Jourdan, Fabien; Crescenzi, Pierluigi; Lacroix, Vincent; Sagot, Marie-France.

Bioinformatics ; 30(1): 61-70, 2014 Jan 01.

Artigo em Inglês | MEDLINE | ID: mdl-24167155

RESUMO

MOTIVATION: The increasing availability of metabolomics data enables to better understand the metabolic processes involved in the immediate response of an organism to environmental changes and stress. The data usually come in the form of a list of metabolites whose concentrations significantly changed under some conditions, and are thus not easy to interpret without being able to precisely visualize how such metabolites are interconnected. RESULTS: We present a method that enables to organize the data from any metabolomics experiment into metabolic stories. Each story corresponds to a possible scenario explaining the flow of matter between the metabolites of interest. These scenarios may then be ranked in different ways depending on which interpretation one wishes to emphasize for the causal link between two affected metabolites: enzyme activation, enzyme inhibition or domino effect on the concentration changes of substrates and products. Equally probable stories under any selected ranking scheme can be further grouped into a single anthology that summarizes, in a unique subnetwork, all equivalently plausible alternative stories. An anthology is simply a union of such stories. We detail an application of the method to the response of yeast to cadmium exposure. We use this system as a proof of concept for our method, and we show that we are able to find a story that reproduces very well the current knowledge about the yeast response to cadmium. We further show that this response is mostly based on enzyme activation. We also provide a framework for exploring the alternative pathways or side effects this local response is expected to have in the rest of the network. We discuss several interpretations for the changes we see, and we suggest hypotheses that could in principle be experimentally tested. Noticeably, our method requires simple input data and could be used in a wide variety of applications. AVAILABILITY AND IMPLEMENTATION: The code for the method presented in this article is available at http://gobbolino.gforge.inria.fr.

Assuntos

Cádmio/farmacologia , Metabolômica/métodos , Saccharomyces cerevisiae/efeitos dos fármacos , Saccharomyces cerevisiae/metabolismo , Ativação Enzimática , Glutationa/biossíntese

17.

Endosymbiosis in trypanosomatids: the genomic cooperation between bacterium and host in the synthesis of essential amino acids is heavily influenced by multiple horizontal gene transfers.

Alves, João M P; Klein, Cecilia C; da Silva, Flávia Maia; Costa-Martins, André G; Serrano, Myrna G; Buck, Gregory A; Vasconcelos, Ana Tereza R; Sagot, Marie-France; Teixeira, Marta M G; Motta, Maria Cristina M; Camargo, Erney P.

BMC Evol Biol ; 13: 190, 2013 Sep 09.

Artigo em Inglês | MEDLINE | ID: mdl-24015778

RESUMO

BACKGROUND: Trypanosomatids of the genera Angomonas and Strigomonas live in a mutualistic association characterized by extensive metabolic cooperation with obligate endosymbiotic Betaproteobacteria. However, the role played by the symbiont has been more guessed by indirect means than evidenced. Symbiont-harboring trypanosomatids, in contrast to their counterparts lacking symbionts, exhibit lower nutritional requirements and are autotrophic for essential amino acids. To evidence the symbiont's contributions to this autotrophy, entire genomes of symbionts and trypanosomatids with and without symbionts were sequenced here. RESULTS: Analyses of the essential amino acid pathways revealed that most biosynthetic routes are in the symbiont genome. By contrast, the host trypanosomatid genome contains fewer genes, about half of which originated from different bacterial groups, perhaps only one of which (ornithine cyclodeaminase, EC:4.3.1.12) derived from the symbiont. Nutritional, enzymatic, and genomic data were jointly analyzed to construct an integrated view of essential amino acid metabolism in symbiont-harboring trypanosomatids. This comprehensive analysis showed perfect concordance among all these data, and revealed that the symbiont contains genes for enzymes that complete essential biosynthetic routes for the host amino acid production, thus explaining the low requirement for these elements in symbiont-harboring trypanosomatids. Phylogenetic analyses show that the cooperation between symbionts and their hosts is complemented by multiple horizontal gene transfers, from bacterial lineages to trypanosomatids, that occurred several times in the course of their evolution. Transfers occur preferentially in parts of the pathways that are missing from other eukaryotes. CONCLUSION: We have herein uncovered the genetic and evolutionary bases of essential amino acid biosynthesis in several trypanosomatids with and without endosymbionts, explaining and complementing decades of experimental results. We uncovered the remarkable plasticity in essential amino acid biosynthesis pathway evolution in these protozoans, demonstrating heavy influence of horizontal gene transfer events, from Bacteria to trypanosomatid nuclei, in the evolution of these pathways.

Assuntos

Aminoácidos Essenciais/biossíntese , Betaproteobacteria/genética , Transferência Genética Horizontal , Simbiose , Trypanosomatina/genética , Trypanosomatina/microbiologia , Betaproteobacteria/fisiologia , Evolução Biológica , Genoma Bacteriano , Filogenia , Trypanosomatina/classificação , Trypanosomatina/metabolismo

18.

Short and long-term genome stability analysis of prokaryotic genomes.

Brilli, Matteo; Liò, Pietro; Lacroix, Vincent; Sagot, Marie-France.

BMC Genomics ; 14: 309, 2013 May 08.

Artigo em Inglês | MEDLINE | ID: mdl-23651581

RESUMO

BACKGROUND: Gene organization dynamics is actively studied because it provides useful evolutionary information, makes functional annotation easier and often enables to characterize pathogens. There is therefore a strong interest in understanding the variability of this trait and the possible correlations with life-style. Two kinds of events affect genome organization: on one hand translocations and recombinations change the relative position of genes shared by two genomes (i.e. the backbone gene order); on the other, insertions and deletions leave the backbone gene order unchanged but they alter the gene neighborhoods by breaking the syntenic regions. A complete picture about genome organization evolution therefore requires to account for both kinds of events. RESULTS: We developed an approach where we model chromosomes as graphs on which we compute different stability estimators; we consider genome rearrangements as well as the effect of gene insertions and deletions. In a first part of the paper, we fit a measure of backbone gene order conservation (hereinafter called backbone stability) against phylogenetic distance for over 3000 genome comparisons, improving existing models for the divergence in time of backbone stability. Intra- and inter-specific comparisons were treated separately to focus on different time-scales. The use of multiple genomes of a same species allowed to identify genomes with diverging gene order with respect to their conspecific. The inter-species analysis indicates that pathogens are more often unstable with respect to non-pathogens. In a second part of the text, we show that in pathogens, gene content dynamics (insertions and deletions) have a much more dramatic effect on genome organization stability than backbone rearrangements. CONCLUSION: In this work, we studied genome organization divergence taking into account the contribution of both genome order rearrangements and genome content dynamics. By studying species with multiple sequenced genomes available, we were able to explore genome organization stability at different time-scales and to find significant differences for pathogen and non-pathogen species. The output of our framework also allows to identify the conserved gene clusters and/or partial occurrences thereof, making possible to explore how gene clusters assembled during evolution.

Assuntos

Genoma Arqueal/genética , Genoma Bacteriano/genética , Instabilidade Genômica , Modelos Genéticos , Especificidade da Espécie

19.

On the genetic architecture of cytoplasmic incompatibility: inference from phenotypic data.

Nor, Igor; Engelstädter, Jan; Duron, Olivier; Reuter, Max; Sagot, Marie-France; Charlat, Sylvain.

Am Nat ; 182(1): E15-24, 2013 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-23778233

RESUMO

Numerous insects carry intracellular bacteria that manipulate the insects' reproduction and thus facilitate their own spread. Cytoplasmic incompatibility (CI) is a common form of such manipulation, where a (currently uncharacterized) bacterial modification of male sperm induces the early death of embryos unless the fertilized eggs carry the same bacteria, inherited from the mother. The death of uninfected embryos provides an indirect selective advantage to infected ones, thus enabling the spread of the bacteria. Here we use and expand recently developed algorithms to infer the genetic architecture underlying the complex incompatibility data from the mosquito Culex pipiens. We show that CI requires more genetic determinants than previously believed and that quantitative variation in gene products potentially contributes to the observed CI patterns. In line with population genetic theory of CI, our analysis suggests that toxin factors (those inducing embryo death) are present in fewer copies in the bacterial genomes than antitoxin factors (those ensuring that infected embryos survive). In combination with comparative genomics, our approach will provide helpful guidance to identify the genetic basis of CI and more generally of other toxin/antitoxin systems that can be conceptualized under the same framework.

Assuntos

Culex/genética , Culex/microbiologia , Evolução Molecular , Genoma Bacteriano , Simbiose , Wolbachia/genética , Wolbachia/fisiologia , Algoritmos , Alelos , Animais , Culex/fisiologia , Citoplasma/microbiologia , Drosophila , Feminino , Hibridização Genética , Masculino , Modelos Genéticos , Mutação , Reprodução

20.

Navigating the unexplored seascape of pre-miRNA candidates in single-genome approaches.

Mendes, Nuno D; Heyne, Steffen; Freitas, Ana T; Sagot, Marie-France; Backofen, Rolf.

Bioinformatics ; 28(23): 3034-41, 2012 Dec 01.

Artigo em Inglês | MEDLINE | ID: mdl-23052038

RESUMO

MOTIVATION: The computational search for novel microRNA (miRNA) precursors often involves some sort of structural analysis with the aim of identifying which type of structures are prone to being recognized and processed by the cellular miRNA-maturation machinery. A natural way to tackle this problem is to perform clustering over the candidate structures along with known miRNA precursor structures. Mixed clusters allow then the identification of candidates that are similar to known precursors. Given the large number of pre-miRNA candidates that can be identified in single-genome approaches, even after applying several filters for precursor robustness and stability, a conventional structural clustering approach is unfeasible. RESULTS: We propose a method to represent candidate structures in a feature space, which summarizes key sequence/structure characteristics of each candidate. We demonstrate that proximity in this feature space is related to sequence/structure similarity, and we select candidates that have a high similarity to known precursors. Additional filtering steps are then applied to further reduce the number of candidates to those with greater transcriptional potential. Our method is compared with another single-genome method (TripletSVM) in two datasets, showing better performance in one and comparable performance in the other, for larger training sets. Additionally, we show that our approach allows for a better interpretation of the results. AVAILABILITY AND IMPLEMENTATION: The MinDist method is implemented using Perl scripts and is freely available at http://www.cravela.org/?mindist=1. CONTACT: backofen@informatik.uni-freiburg.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , MicroRNAs/química , Software , Animais , Anopheles/genética , Sequência de Bases , Análise por Conglomerados , Biologia Computacional/métodos , Drosophila melanogaster/genética , Genoma , MicroRNAs/genética , Conformação de Ácido Nucleico , Análise de Componente Principal , Curva ROC

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA