Pesquisa | BVS Educação Profissional em Saúde

proGenomes3: approaching one million accurately and consistently annotated high-quality prokaryotic genomes.

Fullam, Anthony; Letunic, Ivica; Schmidt, Thomas S B; Ducarmon, Quinten R; Karcher, Nicolai; Khedkar, Supriya; Kuhn, Michael; Larralde, Martin; Maistrenko, Oleksandr M; Malfertheiner, Lukas; Milanese, Alessio; Rodrigues, Joao Frederico Matias; Sanchis-López, Claudia; Schudoma, Christian; Szklarczyk, Damian; Sunagawa, Shinichi; Zeller, Georg; Huerta-Cepas, Jaime; von Mering, Christian; Bork, Peer; Mende, Daniel R.

Nucleic Acids Res ; 51(D1): D760-D766, 2023 01 06.

Artigo em Inglês | MEDLINE | ID: mdl-36408900

RESUMO

The interpretation of genomic, transcriptomic and other microbial 'omics data is highly dependent on the availability of well-annotated genomes. As the number of publicly available microbial genomes continues to increase exponentially, the need for quality control and consistent annotation is becoming critical. We present proGenomes3, a database of 907 388 high-quality genomes containing 4 billion genes that passed stringent criteria and have been consistently annotated using multiple functional and taxonomic databases including mobile genetic elements and biosynthetic gene clusters. proGenomes3 encompasses 41 171 species-level clusters, defined based on universal single copy marker genes, for which pan-genomes and contextual habitat annotations are provided. The database is available at http://progenomes.embl.de/.

Assuntos

Genoma , Células Procarióticas , Bases de Dados Genéticas , Genômica , Anotação de Sequência Molecular , Bactérias/classificação , Bactérias/genética

Superessential reactions in metabolic networks.

Barve, Aditya; Rodrigues, João Frederico Matias; Wagner, Andreas.

Proc Natl Acad Sci U S A ; 109(18): E1121-30, 2012 May 01.

Artigo em Inglês | MEDLINE | ID: mdl-22509034

RESUMO

The metabolic genotype of an organism can change through loss and acquisition of enzyme-coding genes, while preserving its ability to survive and synthesize biomass in specific environments. This evolutionary plasticity allows pathogens to evolve resistance to antimetabolic drugs by acquiring new metabolic pathways that bypass an enzyme blocked by a drug. We here study quantitatively the extent to which individual metabolic reactions and enzymes can be bypassed. To this end, we use a recently developed computational approach to create large metabolic network ensembles that can synthesize all biomass components in a given environment but contain an otherwise random set of known biochemical reactions. Using this approach, we identify a small connected core of 124 reactions that are absolutely superessential (that is, required in all metabolic networks). Many of these reactions have been experimentally confirmed as essential in different organisms. We also report a superessentiality index for thousands of reactions. This index indicates how easily a reaction can be bypassed. We find that it correlates with the number of sequenced genomes that encode an enzyme for the reaction. Superessentiality can help choose an enzyme as a potential drug target, especially because the index is not highly sensitive to the chemical environment that a pathogen requires. Our work also shows how analyses of large network ensembles can help understand the evolution of complex and robust metabolic networks.

Assuntos

Redes e Vias Metabólicas/genética , Biomassa , Carbono/metabolismo , Simulação por Computador , Resistência a Medicamentos/genética , Escherichia coli/genética , Escherichia coli/crescimento & desenvolvimento , Escherichia coli/metabolismo , Evolução Molecular , Genótipo , Cadeias de Markov , Modelos Biológicos , Modelos Genéticos , Método de Monte Carlo , Biologia de Sistemas

Ecologically informed microbial biomarkers and accurate classification of mixed and unmixed samples in an extensive cross-study of human body sites.

Tackmann, Janko; Arora, Natasha; Schmidt, Thomas Sebastian Benedikt; Rodrigues, João Frederico Matias; von Mering, Christian.

Microbiome ; 6(1): 192, 2018 10 24.

Artigo em Inglês | MEDLINE | ID: mdl-30355348

RESUMO

BACKGROUND: The identification of body site-specific microbial biomarkers and their use for classification tasks have promising applications in medicine, microbial ecology, and forensics. Previous studies have characterized site-specific microbiota and shown that sample origin can be accurately predicted by microbial content. However, these studies were usually restricted to single datasets with consistent experimental methods and conditions, as well as comparatively small sample numbers. The effects of study-specific biases and statistical power on classification performance and biomarker identification thus remain poorly understood. Furthermore, reliable detection in mixtures of different body sites or with noise from environmental contamination has rarely been investigated thus far. Finally, the impact of ecological associations between microbes on biomarker discovery was usually not considered in previous work. RESULTS: Here we present the analysis of one of the largest cross-study sequencing datasets of microbial communities from human body sites (15,082 samples from 57 publicly available studies). We show that training a Random Forest Classifier on this aggregated dataset increases prediction performance for body sites by 35% compared to a single-study classifier. Using simulated datasets, we further demonstrate that the source of different microbial contributions in mixtures of different body sites or with soil can be detected starting at 1% of the total microbial community. We apply a biomarker selection method that excludes indirect environmental associations driven by microbe-microbe associations, yielding a parsimonious set of highly predictive taxa including novel biomarkers and excluding many previously reported taxa. We find a considerable fraction of unclassified biomarkers ("microbial dark matter") and observe that negatively associated taxa have a surprisingly high impact on classification performance. We further detect a significant enrichment of rod-shaped, motile, and sporulating taxa for feces biomarkers, consistent with a highly competitive environment. CONCLUSIONS: Our machine learning model shows strong body site classification performance, both in single-source samples and mixtures, making it promising for tasks requiring high accuracy, such as forensic applications. We report a core set of ecologically informed biomarkers, inferred across a wide range of experimental protocols and conditions, providing the most concise, general, and least biased overview of body site-associated microbes to date.

Assuntos

Bactérias/classificação , Bactérias/genética , DNA Bacteriano/genética , Genoma Bacteriano/genética , Microbiota/genética , Biomarcadores/análise , Corpo Humano , Humanos , Aprendizado de Máquina

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA