Pesquisa | Portal de Pesquisa da BVS

1.

The PATRIC Bioinformatics Resource Center: expanding data and analysis capabilities.

Davis, James J; Wattam, Alice R; Aziz, Ramy K; Brettin, Thomas; Butler, Ralph; Butler, Rory M; Chlenski, Philippe; Conrad, Neal; Dickerman, Allan; Dietrich, Emily M; Gabbard, Joseph L; Gerdes, Svetlana; Guard, Andrew; Kenyon, Ronald W; Machi, Dustin; Mao, Chunhong; Murphy-Olson, Dan; Nguyen, Marcus; Nordberg, Eric K; Olsen, Gary J; Olson, Robert D; Overbeek, Jamie C; Overbeek, Ross; Parrello, Bruce; Pusch, Gordon D; Shukla, Maulik; Thomas, Chris; VanOeffelen, Margo; Vonstein, Veronika; Warren, Andrew S; Xia, Fangfang; Xie, Dawen; Yoo, Hyunseung; Stevens, Rick.

Nucleic Acids Res ; 48(D1): D606-D612, 2020 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-31667520

RESUMO

The PathoSystems Resource Integration Center (PATRIC) is the bacterial Bioinformatics Resource Center funded by the National Institute of Allergy and Infectious Diseases (https://www.patricbrc.org). PATRIC supports bioinformatic analyses of all bacteria with a special emphasis on pathogens, offering a rich comparative analysis environment that provides users with access to over 250 000 uniformly annotated and publicly available genomes with curated metadata. PATRIC offers web-based visualization and comparative analysis tools, a private workspace in which users can analyze their own data in the context of the public collections, services that streamline complex bioinformatic workflows and command-line tools for bulk data analysis. Over the past several years, as genomic and other omics-related experiments have become more cost-effective and widespread, we have observed considerable growth in the usage of and demand for easy-to-use, publicly available bioinformatic tools and services. Here we report the recent updates to the PATRIC resource, including new web-based comparative analysis tools, eight new services and the release of a command-line interface to access, query and analyze data.

Assuntos

Bactérias/genética , Biologia Computacional/métodos , Bases de Dados Genéticas , Algoritmos , Animais , Caenorhabditis elegans/genética , Galinhas/genética , Drosophila melanogaster/genética , Interações Hospedeiro-Patógeno/genética , Humanos , Internet , Macaca mulatta/genética , Metagenômica , Camundongos , National Institute of Allergy and Infectious Diseases (U.S.) , Fenótipo , Filogenia , Ratos , Suínos/genética , Estados Unidos , Peixe-Zebra/genética

2.

PATRIC as a unique resource for studying antimicrobial resistance.

Antonopoulos, Dionysios A; Assaf, Rida; Aziz, Ramy Karam; Brettin, Thomas; Bun, Christopher; Conrad, Neal; Davis, James J; Dietrich, Emily M; Disz, Terry; Gerdes, Svetlana; Kenyon, Ronald W; Machi, Dustin; Mao, Chunhong; Murphy-Olson, Daniel E; Nordberg, Eric K; Olsen, Gary J; Olson, Robert; Overbeek, Ross; Parrello, Bruce; Pusch, Gordon D; Santerre, John; Shukla, Maulik; Stevens, Rick L; VanOeffelen, Margo; Vonstein, Veronika; Warren, Andrew S; Wattam, Alice R; Xia, Fangfang; Yoo, Hyunseung.

Brief Bioinform ; 20(4): 1094-1102, 2019 07 19.

Artigo em Inglês | MEDLINE | ID: mdl-28968762

RESUMO

The Pathosystems Resource Integration Center (PATRIC, www.patricbrc.org) is designed to provide researchers with the tools and services that they need to perform genomic and other 'omic' data analyses. In response to mounting concern over antimicrobial resistance (AMR), the PATRIC team has been developing new tools that help researchers understand AMR and its genetic determinants. To support comparative analyses, we have added AMR phenotype data to over 15 000 genomes in the PATRIC database, often assembling genomes from reads in public archives and collecting their associated AMR panel data from the literature to augment the collection. We have also been using this collection of AMR metadata to build machine learning-based classifiers that can predict the AMR phenotypes and the genomic regions associated with resistance for genomes being submitted to the annotation service. Likewise, we have undertaken a large AMR protein annotation effort by manually curating data from the literature and public repositories. This collection of 7370 AMR reference proteins, which contains many protein annotations (functional roles) that are unique to PATRIC and RAST, has been manually curated so that it projects stably across genomes. The collection currently projects to 1 610 744 proteins in the PATRIC database. Finally, the PATRIC Web site has been expanded to enable AMR-based custom page views so that researchers can easily explore AMR data and design experiments based on whole genomes or individual genes.

Assuntos

Biologia Computacional/métodos , Bases de Dados Genéticas , Resistência Microbiana a Medicamentos/genética , Integração de Sistemas , Biologia Computacional/tendências , Bases de Dados Genéticas/estatística & dados numéricos , Genoma Microbiano , Humanos , Internet , Anotação de Sequência Molecular

3.

A machine learning-based service for estimating quality of genomes using PATRIC.

Parrello, Bruce; Butler, Rory; Chlenski, Philippe; Olson, Robert; Overbeek, Jamie; Pusch, Gordon D; Vonstein, Veronika; Overbeek, Ross.

BMC Bioinformatics ; 20(1): 486, 2019 Oct 03.

Artigo em Inglês | MEDLINE | ID: mdl-31581946

RESUMO

BACKGROUND: Recent advances in high-volume sequencing technology and mining of genomes from metagenomic samples call for rapid and reliable genome quality evaluation. The current release of the PATRIC database contains over 220,000 genomes, and current metagenomic technology supports assemblies of many draft-quality genomes from a single sample, most of which will be novel. DESCRIPTION: We have added two quality assessment tools to the PATRIC annotation pipeline. EvalCon uses supervised machine learning to calculate an annotation consistency score. EvalG implements a variant of the CheckM algorithm to estimate contamination and completeness of an annotated genome.We report on the performance of these tools and the potential utility of the consistency score. Additionally, we provide contamination, completeness, and consistency measures for all genomes in PATRIC and in a recent set of metagenomic assemblies. CONCLUSION: EvalG and EvalCon facilitate the rapid quality control and exploration of PATRIC-annotated draft genomes.

Assuntos

Bases de Dados Genéticas , Genoma Arqueal , Genoma Bacteriano , Aprendizado de Máquina , Metagenômica/métodos , Metagenômica/normas , Software

4.

Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource Center.

Wattam, Alice R; Davis, James J; Assaf, Rida; Boisvert, Sébastien; Brettin, Thomas; Bun, Christopher; Conrad, Neal; Dietrich, Emily M; Disz, Terry; Gabbard, Joseph L; Gerdes, Svetlana; Henry, Christopher S; Kenyon, Ronald W; Machi, Dustin; Mao, Chunhong; Nordberg, Eric K; Olsen, Gary J; Murphy-Olson, Daniel E; Olson, Robert; Overbeek, Ross; Parrello, Bruce; Pusch, Gordon D; Shukla, Maulik; Vonstein, Veronika; Warren, Andrew; Xia, Fangfang; Yoo, Hyunseung; Stevens, Rick L.

Nucleic Acids Res ; 45(D1): D535-D542, 2017 01 04.

Artigo em Inglês | MEDLINE | ID: mdl-27899627

RESUMO

The Pathosystems Resource Integration Center (PATRIC) is the bacterial Bioinformatics Resource Center (https://www.patricbrc.org). Recent changes to PATRIC include a redesign of the web interface and some new services that provide users with a platform that takes them from raw reads to an integrated analysis experience. The redesigned interface allows researchers direct access to tools and data, and the emphasis has changed to user-created genome-groups, with detailed summaries and views of the data that researchers have selected. Perhaps the biggest change has been the enhanced capability for researchers to analyze their private data and compare it to the available public data. Researchers can assemble their raw sequence reads and annotate the contigs using RASTtk. PATRIC also provides services for RNA-Seq, variation, model reconstruction and differential expression analysis, all delivered through an updated private workspace. Private data can be compared by 'virtual integration' to any of PATRIC's public data. The number of genomes available for comparison in PATRIC has expanded to over 80 000, with a special emphasis on genomes with antimicrobial resistance data. PATRIC uses this data to improve both subsystem annotation and k-mer classification, and tags new genomes as having signatures that indicate susceptibility or resistance to specific antibiotics.

Assuntos

Bactérias/genética , Biologia Computacional/métodos , Bases de Dados Genéticas , Genoma Bacteriano , Genômica/métodos , Antibacterianos/farmacologia , Bactérias/efeitos dos fármacos , Bactérias/metabolismo , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Farmacorresistência Bacteriana , Anotação de Sequência Molecular , Proteoma , Proteômica/métodos , Software , Navegador

5.

High-throughput comparison, functional annotation, and metabolic modeling of plant genomes using the PlantSEED resource.

Seaver, Samuel M D; Gerdes, Svetlana; Frelin, Océane; Lerma-Ortiz, Claudia; Bradbury, Louis M T; Zallot, Rémi; Hasnain, Ghulam; Niehaus, Thomas D; El Yacoubi, Basma; Pasternak, Shiran; Olson, Robert; Pusch, Gordon; Overbeek, Ross; Stevens, Rick; de Crécy-Lagard, Valérie; Ware, Doreen; Hanson, Andrew D; Henry, Christopher S.

Proc Natl Acad Sci U S A ; 111(26): 9645-50, 2014 Jul 01.

Artigo em Inglês | MEDLINE | ID: mdl-24927599

RESUMO

The increasing number of sequenced plant genomes is placing new demands on the methods applied to analyze, annotate, and model these genomes. Today's annotation pipelines result in inconsistent gene assignments that complicate comparative analyses and prevent efficient construction of metabolic models. To overcome these problems, we have developed the PlantSEED, an integrated, metabolism-centric database to support subsystems-based annotation and metabolic model reconstruction for plant genomes. PlantSEED combines SEED subsystems technology, first developed for microbial genomes, with refined protein families and biochemical data to assign fully consistent functional annotations to orthologous genes, particularly those encoding primary metabolic pathways. Seamless integration with its parent, the prokaryotic SEED database, makes PlantSEED a unique environment for cross-kingdom comparative analysis of plant and bacterial genomes. The consistent annotations imposed by PlantSEED permit rapid reconstruction and modeling of primary metabolism for all plant genomes in the database. This feature opens the unique possibility of model-based assessment of the completeness and accuracy of gene annotation and thus allows computational identification of genes and pathways that are restricted to certain genomes or need better curation. We demonstrate the PlantSEED system by producing consistent annotations for 10 reference genomes. We also produce a functioning metabolic model for each genome, gapfilling to identify missing annotations and proposing gene candidates for missing annotations. Models are built around an extended biomass composition representing the most comprehensive published to date. To our knowledge, our models are the first to be published for seven of the genomes analyzed.

Assuntos

Biologia Computacional/métodos , Bases de Dados Genéticas , Genoma de Planta/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Anotação de Sequência Molecular/métodos , Plantas/genética , Software , Redes e Vias Metabólicas/genética , Modelos Biológicos , Plantas/metabolismo , Biologia de Sistemas/métodos

6.

Modeling central metabolism and energy biosynthesis across microbial life.

Edirisinghe, Janaka N; Weisenhorn, Pamela; Conrad, Neal; Xia, Fangfang; Overbeek, Ross; Stevens, Rick L; Henry, Christopher S.

BMC Genomics ; 17: 568, 2016 Aug 08.

Artigo em Inglês | MEDLINE | ID: mdl-27502787

RESUMO

BACKGROUND: Automatically generated bacterial metabolic models, and even some curated models, lack accuracy in predicting energy yields due to poor representation of key pathways in energy biosynthesis and the electron transport chain (ETC). Further compounding the problem, complex interlinking pathways in genome-scale metabolic models, and the need for extensive gapfilling to support complex biomass reactions, often results in predicting unrealistic yields or unrealistic physiological flux profiles. RESULTS: To overcome this challenge, we developed methods and tools ( http://coremodels.mcs.anl.gov ) to build high quality core metabolic models (CMM) representing accurate energy biosynthesis based on a well studied, phylogenetically diverse set of model organisms. We compare these models to explore the variability of core pathways across all microbial life, and by analyzing the ability of our core models to synthesize ATP and essential biomass precursors, we evaluate the extent to which the core metabolic pathways and functional ETCs are known for all microbes. 6,600 (80 %) of our models were found to have some type of aerobic ETC, whereas 5,100 (62 %) have an anaerobic ETC, and 1,279 (15 %) do not have any ETC. Using our manually curated ETC and energy biosynthesis pathways with no gapfilling at all, we predict accurate ATP yields for nearly 5586 (70 %) of the models under aerobic and anaerobic growth conditions. This study revealed gaps in our knowledge of the central pathways that result in 2,495 (30 %) CMMs being unable to produce ATP under any of the tested conditions. We then established a methodology for the systematic identification and correction of inconsistent annotations using core metabolic models coupled with phylogenetic analysis. CONCLUSIONS: We predict accurate energy yields based on our improved annotations in energy biosynthesis pathways and the implementation of diverse ETC reactions across the microbial tree of life. We highlighted missing annotations that were essential to energy biosynthesis in our models. We examine the diversity of these pathways across all microbial life and enable the scientific community to explore the analyses generated from this large-scale analysis of over 8000 microbial genomes.

Assuntos

Metabolismo Energético , Redes e Vias Metabólicas , Modelos Biológicos , Trifosfato de Adenosina/biossíntese , Bactérias/classificação , Bactérias/genética , Bactérias/metabolismo , Biomassa , Biologia Computacional/métodos , Complexo de Proteínas da Cadeia de Transporte de Elétrons/metabolismo , Genômica/métodos , Anotação de Sequência Molecular , Filogenia

7.

Genome-scale bacterial transcriptional regulatory networks: reconstruction and integrated analysis with metabolic models.

Faria, José P; Overbeek, Ross; Xia, Fangfang; Rocha, Miguel; Rocha, Isabel; Henry, Christopher S.

Brief Bioinform ; 15(4): 592-611, 2014 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-23422247

RESUMO

Advances in sequencing technology are resulting in the rapid emergence of large numbers of complete genome sequences. High-throughput annotation and metabolic modeling of these genomes is now a reality. The high-throughput reconstruction and analysis of genome-scale transcriptional regulatory networks represent the next frontier in microbial bioinformatics. The fruition of this next frontier will depend on the integration of numerous data sources relating to mechanisms, components and behavior of the transcriptional regulatory machinery, as well as the integration of the regulatory machinery into genome-scale cellular models. Here, we review existing repositories for different types of transcriptional regulatory data, including expression data, transcription factor data and binding site locations and we explore how these data are being used for the reconstruction of new regulatory networks. From template network-based methods to de novo reverse engineering from expression data, we discuss how regulatory networks can be reconstructed and integrated with metabolic models to improve model predictions and performance. We also explore the impact these integrated models can have in simulating phenotypes, optimizing the production of compounds of interest or paving the way to a whole-cell model.

Assuntos

Redes Reguladoras de Genes , Genoma Bacteriano , Metabolismo , Modelos Biológicos , Transcrição Gênica , Bactérias/classificação , Bactérias/genética , Bases de Dados Genéticas , Filogenia

8.

The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST).

Overbeek, Ross; Olson, Robert; Pusch, Gordon D; Olsen, Gary J; Davis, James J; Disz, Terry; Edwards, Robert A; Gerdes, Svetlana; Parrello, Bruce; Shukla, Maulik; Vonstein, Veronika; Wattam, Alice R; Xia, Fangfang; Stevens, Rick.

Nucleic Acids Res ; 42(Database issue): D206-14, 2014 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-24293654

RESUMO

In 2004, the SEED (http://pubseed.theseed.org/) was created to provide consistent and accurate genome annotations across thousands of genomes and as a platform for discovering and developing de novo annotations. The SEED is a constantly updated integration of genomic data with a genome database, web front end, API and server scripts. It is used by many scientists for predicting gene functions and discovering new pathways. In addition to being a powerful database for bioinformatics research, the SEED also houses subsystems (collections of functionally related protein families) and their derived FIGfams (protein families), which represent the core of the RAST annotation engine (http://rast.nmpdr.org/). When a new genome is submitted to RAST, genes are called and their annotations are made by comparison to the FIGfam collection. If the genome is made public, it is then housed within the SEED and its proteins populate the FIGfam collection. This annotation cycle has proven to be a robust and scalable solution to the problem of annotating the exponentially increasing number of genomes. To date, >12 000 users worldwide have annotated >60 000 distinct genomes using RAST. Here we describe the interconnectedness of the SEED database and RAST, the RAST annotation pipeline and updates to both resources.

Assuntos

Bases de Dados Genéticas , Genoma Arqueal , Genoma Bacteriano , Anotação de Sequência Molecular , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Proteínas de Bactérias/fisiologia , Genômica , Internet , Software

9.

PATRIC, the bacterial bioinformatics database and analysis resource.

Wattam, Alice R; Abraham, David; Dalay, Oral; Disz, Terry L; Driscoll, Timothy; Gabbard, Joseph L; Gillespie, Joseph J; Gough, Roger; Hix, Deborah; Kenyon, Ronald; Machi, Dustin; Mao, Chunhong; Nordberg, Eric K; Olson, Robert; Overbeek, Ross; Pusch, Gordon D; Shukla, Maulik; Schulman, Julie; Stevens, Rick L; Sullivan, Daniel E; Vonstein, Veronika; Warren, Andrew; Will, Rebecca; Wilson, Meredith J C; Yoo, Hyun Seung; Zhang, Chengdong; Zhang, Yan; Sobral, Bruno W.

Nucleic Acids Res ; 42(Database issue): D581-91, 2014 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-24225323

RESUMO

The Pathosystems Resource Integration Center (PATRIC) is the all-bacterial Bioinformatics Resource Center (BRC) (http://www.patricbrc.org). A joint effort by two of the original National Institute of Allergy and Infectious Diseases-funded BRCs, PATRIC provides researchers with an online resource that stores and integrates a variety of data types [e.g. genomics, transcriptomics, protein-protein interactions (PPIs), three-dimensional protein structures and sequence typing data] and associated metadata. Datatypes are summarized for individual genomes and across taxonomic levels. All genomes in PATRIC, currently more than 10,000, are consistently annotated using RAST, the Rapid Annotations using Subsystems Technology. Summaries of different data types are also provided for individual genes, where comparisons of different annotations are available, and also include available transcriptomic data. PATRIC provides a variety of ways for researchers to find data of interest and a private workspace where they can store both genomic and gene associations, and their own private data. Both private and public data can be analyzed together using a suite of tools to perform comparative genomic or transcriptomic analysis. PATRIC also includes integrated information related to disease and PPIs. All the data and integrated analysis and visualization tools are freely available. This manuscript describes updates to the PATRIC since its initial report in the 2007 NAR Database Issue.

Assuntos

Bases de Dados Genéticas , Genoma Bacteriano , Bactérias/classificação , Bactérias/genética , Infecções Bacterianas/microbiologia , Proteínas de Bactérias/química , Proteínas de Bactérias/metabolismo , Técnicas de Tipagem Bacteriana , Perfilação da Expressão Gênica , Genômica , Humanos , Internet , Conformação Proteica , Mapeamento de Interação de Proteínas

10.

Comparative Genomic Analysis of Bacterial Data in BV-BRC: An Example Exploring Antimicrobial Resistance.

Wattam, Alice R; Bowers, Nicole; Brettin, Thomas; Conrad, Neal; Cucinell, Clark; Davis, James J; Dickerman, Allan W; Dietrich, Emily M; Kenyon, Ronald W; Machi, Dustin; Mao, Chunhong; Nguyen, Marcus; Olson, Robert D; Overbeek, Ross; Parrello, Bruce; Pusch, Gordon D; Shukla, Maulik; Stevens, Rick L; Vonstein, Veronika; Warren, Andrew S.

Methods Mol Biol ; 2802: 547-571, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38819571

RESUMO

As genomic and related data continue to expand, research biologists are often hampered by the computational hurdles required to analyze their data. The National Institute of Allergy and Infectious Diseases (NIAID) established the Bioinformatics Resource Centers (BRC) to assist researchers with their analysis of genome sequence and other omics-related data. Recently, the PAThosystems Resource Integration Center (PATRIC), the Influenza Research Database (IRD), and the Virus Pathogen Database and Analysis Resource (ViPR) BRCs merged to form the Bacterial and Viral Bioinformatics Resource Center (BV-BRC) at https://www.bv-brc.org/ . The combined BV-BRC leverages the functionality of the original resources for bacterial and viral research communities with a unified data model, enhanced web-based visualization and analysis tools, and bioinformatics services. Here we demonstrate how antimicrobial resistance data can be analyzed in the new resource.

Assuntos

Bactérias , Biologia Computacional , Bases de Dados Genéticas , Farmacorresistência Bacteriana , Genômica , Genômica/métodos , Biologia Computacional/métodos , Farmacorresistência Bacteriana/genética , Bactérias/genética , Bactérias/efeitos dos fármacos , Humanos , Software , Genoma Bacteriano , Antibacterianos/farmacologia , Navegador , Estados Unidos , National Institute of Allergy and Infectious Diseases (U.S.)

11.

Real time metagenomics: using k-mers to annotate metagenomes.

Edwards, Robert A; Olson, Robert; Disz, Terry; Pusch, Gordon D; Vonstein, Veronika; Stevens, Rick; Overbeek, Ross.

Bioinformatics ; 28(24): 3316-7, 2012 Dec 15.

Artigo em Inglês | MEDLINE | ID: mdl-23047562

RESUMO

Annotation of metagenomes involves comparing the individual sequence reads with a database of known sequences and assigning a unique function to each read. This is a time-consuming task that is computationally intensive (though not computationally complex). Here we present a novel approach to annotate metagenomes using unique k-mer oligopeptide sequences from 7 to 12 amino acids long. We demonstrate that k-mer-based annotations are faster and approach the sensitivity and precision of blastx-based annotations without loosing accuracy. A last-common ancestor approach was also developed to describe the members of the community.

Assuntos

Metagenômica/métodos , Anotação de Sequência Molecular , Algoritmos , Metagenoma , Análise de Sequência de DNA

12.

Genomes of the class Erysipelotrichia clarify the firmicute origin of the class Mollicutes.

Davis, James J; Xia, Fangfang; Overbeek, Ross A; Olsen, Gary J.

Int J Syst Evol Microbiol ; 63(Pt 7): 2727-2741, 2013 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-23606477

RESUMO

The tree of life is paramount for achieving an integrated understanding of microbial evolution and the relationships between physiology, genealogy and genomics. It provides the framework for interpreting environmental sequence data, whether applied to microbial ecology or to human health. However, there remain many instances where there is ambiguity in our understanding of the phylogeny of major lineages, and/or confounding nomenclature. Here we apply recent genomic sequence data to examine the evolutionary history of members of the classes Mollicutes (phylum Tenericutes) and Erysipelotrichia (phylum Firmicutes). Consistent with previous analyses, we find evidence of a specific relationship between them in molecular phylogenies and signatures of the 16S rRNA, 23S rRNA, ribosomal proteins and aminoacyl-tRNA synthetase proteins. Furthermore, by mapping functions over the phylogenetic tree we find that the erysipelotrichia lineages are involved in various stages of genomic reduction, having lost (often repeatedly) a variety of metabolic functions and the ability to form endospores. Although molecular phylogeny has driven numerous taxonomic revisions, we find it puzzling that the most recent taxonomic revision of the phyla Firmicutes and Tenericutes has further separated them into distinct phyla, rather than reflecting their common roots.

Assuntos

Genoma Bacteriano , Filogenia , Tenericutes/classificação , Aminoacil-tRNA Sintetases/genética , Proteínas de Bactérias/genética , DNA Bacteriano/genética , Conformação de Ácido Nucleico , RNA Ribossômico 16S/genética , RNA Ribossômico 23S/genética , Proteínas Ribossômicas/genética , Alinhamento de Sequência , Tenericutes/genética

13.

Connecting genotype to phenotype in the era of high-throughput sequencing.

Henry, Christopher S; Overbeek, Ross; Xia, Fangfang; Best, Aaron A; Glass, Elizabeth; Gilbert, Jack; Larsen, Peter; Edwards, Rob; Disz, Terry; Meyer, Folker; Vonstein, Veronika; Dejongh, Matthew; Bartels, Daniela; Desai, Narayan; D'Souza, Mark; Devoid, Scott; Keegan, Kevin P; Olson, Robert; Wilke, Andreas; Wilkening, Jared; Stevens, Rick L.

Biochim Biophys Acta ; 1810(10): 967-77, 2011 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-21421023

RESUMO

BACKGROUND: The development of next generation sequencing technology is rapidly changing the face of the genome annotation and analysis field. One of the primary uses for genome sequence data is to improve our understanding and prediction of phenotypes for microbes and microbial communities, but the technologies for predicting phenotypes must keep pace with the new sequences emerging. SCOPE OF REVIEW: This review presents an integrated view of the methods and technologies used in the inference of phenotypes for microbes and microbial communities based on genomic and metagenomic data. Given the breadth of this topic, we place special focus on the resources available within the SEED Project. We discuss the two steps involved in connecting genotype to phenotype: sequence annotation, and phenotype inference, and we highlight the challenges in each of these steps when dealing with both single genome and metagenome data. MAJOR CONCLUSIONS: This integrated view of the genotype-to-phenotype problem highlights the importance of a controlled ontology in the annotation of genomic data, as this benefits subsequent phenotype inference and metagenome annotation. We also note the importance of expanding the set of reference genomes to improve the annotation of all sequence data, and we highlight metagenome assembly as a potential new source for complete genomes. Finally, we find that phenotype inference, particularly from metabolic models, generates predictions that can be validated and reconciled to improve annotations. GENERAL SIGNIFICANCE: This review presents the first look at the challenges and opportunities associated with the inference of phenotype from genotype during the next generation sequencing revolution. This article is part of a Special Issue entitled: Systems Biology of Microorganisms.

Assuntos

Genótipo , Fenótipo , Análise de Sequência de DNA/métodos , Animais , Humanos , Metagenômica/métodos

14.

Synergistic use of plant-prokaryote comparative genomics for functional annotations.

Gerdes, Svetlana; El Yacoubi, Basma; Bailly, Marc; Blaby, Ian K; Blaby-Haas, Crysten E; Jeanguenin, Linda; Lara-Núñez, Aurora; Pribat, Anne; Waller, Jeffrey C; Wilke, Andreas; Overbeek, Ross; Hanson, Andrew D; de Crécy-Lagard, Valérie.

BMC Genomics ; 12 Suppl 1: S2, 2011 Jun 15.

Artigo em Inglês | MEDLINE | ID: mdl-21810204

RESUMO

BACKGROUND: Identifying functions for all gene products in all sequenced organisms is a central challenge of the post-genomic era. However, at least 30-50% of the proteins encoded by any given genome are of unknown or vaguely known function, and a large number are wrongly annotated. Many of these 'unknown' proteins are common to prokaryotes and plants. We set out to predict and experimentally test the functions of such proteins. Our approach to functional prediction integrates comparative genomics based mainly on microbial genomes with functional genomic data from model microorganisms and post-genomic data from plants. This approach bridges the gap between automated homology-based annotations and the classical gene discovery efforts of experimentalists, and is more powerful than purely computational approaches to identifying gene-function associations. RESULTS: Among Arabidopsis genes, we focused on those (2,325 in total) that (i) are unique or belong to families with no more than three members, (ii) occur in prokaryotes, and (iii) have unknown or poorly known functions. Computer-assisted selection of promising targets for deeper analysis was based on homology-independent characteristics associated in the SEED database with the prokaryotic members of each family. In-depth comparative genomic analysis was performed for 360 top candidate families. From this pool, 78 families were connected to general areas of metabolism and, of these families, specific functional predictions were made for 41. Twenty-one predicted functions have been experimentally tested or are currently under investigation by our group in at least one prokaryotic organism (nine of them have been validated, four invalidated, and eight are in progress). Ten additional predictions have been independently validated by other groups. Discovering the function of very widespread but hitherto enigmatic proteins such as the YrdC or YgfZ families illustrates the power of our approach. CONCLUSIONS: Our approach correctly predicted functions for 19 uncharacterized protein families from plants and prokaryotes; none of these functions had previously been correctly predicted by computational methods. The resulting annotations could be propagated with confidence to over six thousand homologous proteins encoded in over 900 bacterial, archaeal, and eukaryotic genomes currently available in public databases.

Assuntos

Arabidopsis/genética , Arabidopsis/metabolismo , Genômica/métodos , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Proteínas de Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Sequência de Bases , Sequência Conservada , Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo , Bases de Dados Genéticas , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Genes Bacterianos , Genética Microbiana , Genoma de Planta , Família Multigênica , Células Procarióticas , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo

15.

Comparative genomic reconstruction of transcriptional networks controlling central metabolism in the Shewanella genus.

Rodionov, Dmitry A; Novichkov, Pavel S; Stavrovskaya, Elena D; Rodionova, Irina A; Li, Xiaoqing; Kazanov, Marat D; Ravcheev, Dmitry A; Gerasimova, Anna V; Kazakov, Alexey E; Kovaleva, Galina Yu; Permina, Elizabeth A; Laikova, Olga N; Overbeek, Ross; Romine, Margaret F; Fredrickson, James K; Arkin, Adam P; Dubchak, Inna; Osterman, Andrei L; Gelfand, Mikhail S.

BMC Genomics ; 12 Suppl 1: S3, 2011 Jun 15.

Artigo em Inglês | MEDLINE | ID: mdl-21810205

RESUMO

BACKGROUND: Genome-scale prediction of gene regulation and reconstruction of transcriptional regulatory networks in bacteria is one of the critical tasks of modern genomics. The Shewanella genus is comprised of metabolically versatile gamma-proteobacteria, whose lifestyles and natural environments are substantially different from Escherichia coli and other model bacterial species. The comparative genomics approaches and computational identification of regulatory sites are useful for the in silico reconstruction of transcriptional regulatory networks in bacteria. RESULTS: To explore conservation and variations in the Shewanella transcriptional networks we analyzed the repertoire of transcription factors and performed genomics-based reconstruction and comparative analysis of regulons in 16 Shewanella genomes. The inferred regulatory network includes 82 transcription factors and their DNA binding sites, 8 riboswitches and 6 translational attenuators. Forty five regulons were newly inferred from the genome context analysis, whereas others were propagated from previously characterized regulons in the Enterobacteria and Pseudomonas spp.. Multiple variations in regulatory strategies between the Shewanella spp. and E. coli include regulon contraction and expansion (as in the case of PdhR, HexR, FadR), numerous cases of recruiting non-orthologous regulators to control equivalent pathways (e.g. PsrA for fatty acid degradation) and, conversely, orthologous regulators to control distinct pathways (e.g. TyrR, ArgR, Crp). CONCLUSIONS: We tentatively defined the first reference collection of ~100 transcriptional regulons in 16 Shewanella genomes. The resulting regulatory network contains ~600 regulated genes per genome that are mostly involved in metabolism of carbohydrates, amino acids, fatty acids, vitamins, metals, and stress responses. Several reconstructed regulons including NagR for N-acetylglucosamine catabolism were experimentally validated in S. oneidensis MR-1. Analysis of correlations in gene expression patterns helps to interpret the reconstructed regulatory network. The inferred regulatory interactions will provide an additional regulatory constrains for an integrated model of metabolism and regulation in S. oneidensis MR-1.

Assuntos

Redes Reguladoras de Genes , Regulon , Shewanella/genética , Shewanella/metabolismo , Acetilglucosamina/metabolismo , Aminoácidos/metabolismo , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Sítios de Ligação , Metabolismo dos Carboidratos , Proteínas de Ligação a DNA/genética , Escherichia coli/genética , Proteínas de Escherichia coli/genética , Ácidos Graxos/metabolismo , Regulação Bacteriana da Expressão Gênica , Genoma Bacteriano , Genômica/métodos , Família Multigênica , Proteínas Repressoras/genética , Riboswitch , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo

16.

FIGfams: yet another set of protein families.

Meyer, Folker; Overbeek, Ross; Rodriguez, Alex.

Nucleic Acids Res ; 37(20): 6643-54, 2009 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-19762480

RESUMO

We present FIGfams, a new collection of over 100,000 protein families that are the product of manual curation and close strain comparison. Using the Subsystem approach the manual curation is carried out, ensuring a previously unattained degree of throughput and consistency. FIGfams are based on over 950,000 manually annotated proteins and across many hundred Bacteria and Archaea. Associated with each FIGfam is a two-tiered, rapid, accurate decision procedure to determine family membership for new proteins. FIGfams are freely available under an open source license. These can be downloaded at ftp://ftp.theseed.org/FIGfams/. The web site for FIGfams is http://www.theseed.org/wiki/FIGfams/

Assuntos

Proteínas/classificação , Software , Proteínas Arqueais/classificação , Proteínas de Bactérias/classificação , Proteínas/genética , Proteínas Ribossômicas/classificação

17.

Supervised extraction of near-complete genomes from metagenomic samples: A new service in PATRIC.

Parrello, Bruce; Butler, Rory; Chlenski, Philippe; Pusch, Gordon D; Overbeek, Ross.

PLoS One ; 16(4): e0250092, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33857229

RESUMO

Large amounts of metagenomically-derived data are submitted to PATRIC for analysis. In the future, we expect even more jobs submitted to PATRIC will use metagenomic data. One in-demand use case is the extraction of near-complete draft genomes from assembled contigs of metagenomic origin. The PATRIC metagenome binning service utilizes the PATRIC database to furnish a large, diverse set of reference genomes. We provide a new service for supervised extraction and annotation of high-quality, near-complete genomes from metagenomically-derived contigs. Reference genomes are assigned to putative draft genome bins based on the presence of single-copy universal marker roles in the sample, and contigs are sorted into these bins by their similarity to reference genomes in PATRIC. Each set of binned contigs represents a draft genome that will be annotated by RASTtk in PATRIC. A structured-language binning report is provided containing quality measurements and taxonomic information about the contig bins. The PATRIC metagenome binning service emphasizes extraction of high-quality genomes for downstream analysis using other PATRIC tools and services. Due to its supervised nature, the binning service is not appropriate for mining novel or extremely low-coverage genomes from metagenomic samples.

Assuntos

Metagenoma , Metagenômica/métodos , Análise por Conglomerados , Humanos , Análise de Sequência de DNA/métodos

18.

Accessing the SEED genome databases via Web services API: tools for programmers.

Disz, Terry; Akhter, Sajia; Cuevas, Daniel; Olson, Robert; Overbeek, Ross; Vonstein, Veronika; Stevens, Rick; Edwards, Robert A.

BMC Bioinformatics ; 11: 319, 2010 Jun 14.

Artigo em Inglês | MEDLINE | ID: mdl-20546611

RESUMO

BACKGROUND: The SEED integrates many publicly available genome sequences into a single resource. The database contains accurate and up-to-date annotations based on the subsystems concept that leverages clustering between genomes and other clues to accurately and efficiently annotate microbial genomes. The backend is used as the foundation for many genome annotation tools, such as the Rapid Annotation using Subsystems Technology (RAST) server for whole genome annotation, the metagenomics RAST server for random community genome annotations, and the annotation clearinghouse for exchanging annotations from different resources. In addition to a web user interface, the SEED also provides Web services based API for programmatic access to the data in the SEED, allowing the development of third-party tools and mash-ups. RESULTS: The currently exposed Web services encompass over forty different methods for accessing data related to microbial genome annotations. The Web services provide comprehensive access to the database back end, allowing any programmer access to the most consistent and accurate genome annotations available. The Web services are deployed using a platform independent service-oriented approach that allows the user to choose the most suitable programming platform for their application. Example code demonstrate that Web services can be used to access the SEED using common bioinformatics programming languages such as Perl, Python, and Java. CONCLUSIONS: We present a novel approach to access the SEED database. Using Web services, a robust API for access to genomics data is provided, without requiring large volume downloads all at once. The API ensures timely access to the most current datasets available, including the new genomes as soon as they come online.

Assuntos

Bases de Dados Genéticas , Genoma , Metagenômica/métodos , Software

19.

Genomic encyclopedia of sugar utilization pathways in the Shewanella genus.

Rodionov, Dmitry A; Yang, Chen; Li, Xiaoqing; Rodionova, Irina A; Wang, Yanbing; Obraztsova, Anna Y; Zagnitko, Olga P; Overbeek, Ross; Romine, Margaret F; Reed, Samantha; Fredrickson, James K; Nealson, Kenneth H; Osterman, Andrei L.

BMC Genomics ; 11: 494, 2010 Sep 13.

Artigo em Inglês | MEDLINE | ID: mdl-20836887

RESUMO

BACKGROUND: Carbohydrates are a primary source of carbon and energy for many bacteria. Accurate projection of known carbohydrate catabolic pathways across diverse bacteria with complete genomes constitutes a substantial challenge due to frequent variations in components of these pathways. To address a practically and fundamentally important challenge of reconstruction of carbohydrate utilization machinery in any microorganism directly from its genomic sequence, we combined a subsystems-based comparative genomic approach with experimental validation of selected bioinformatic predictions by a combination of biochemical, genetic and physiological experiments. RESULTS: We applied this integrated approach to systematically map carbohydrate utilization pathways in 19 genomes from the Shewanella genus. The obtained genomic encyclopedia of sugar utilization includes ~170 protein families (mostly metabolic enzymes, transporters and transcriptional regulators) spanning 17 distinct pathways with a mosaic distribution across Shewanella species providing insights into their ecophysiology and adaptive evolution. Phenotypic assays revealed a remarkable consistency between predicted and observed phenotype, an ability to utilize an individual sugar as a sole source of carbon and energy, over the entire matrix of tested strains and sugars.Comparison of the reconstructed catabolic pathways with E. coli identified multiple differences that are manifested at various levels, from the presence or absence of certain sugar catabolic pathways, nonorthologous gene replacements and alternative biochemical routes to a different organization of transcription regulatory networks. CONCLUSIONS: The reconstructed sugar catabolome in Shewanella spp includes 62 novel isofunctional families of enzymes, transporters, and regulators. In addition to improving our knowledge of genomics and functional organization of carbohydrate utilization in Shewanella, this study led to a substantial expansion of our current version of the Genomic Encyclopedia of Carbohydrate Utilization. A systematic and iterative application of this approach to multiple taxonomic groups of bacteria will further enhance it, creating a knowledge base adequate for the efficient analysis of any newly sequenced genome as well as of the emerging metagenomic data.

Assuntos

Metabolismo dos Carboidratos/genética , Genoma Bacteriano/genética , Redes e Vias Metabólicas/genética , Shewanella/genética , Shewanella/metabolismo , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Sequência de Bases , Carbono/metabolismo , Enterobacteriaceae/metabolismo , Regulação Bacteriana da Expressão Gênica , Genes Bacterianos/genética , Dados de Sequência Molecular , Fenótipo , Regulon/genética , Reprodutibilidade dos Testes , Shewanella/enzimologia , Shewanella/isolamento & purificação , Transcrição Gênica

20.

The National Microbial Pathogen Database Resource (NMPDR): a genomics platform based on subsystem annotation.

McNeil, Leslie Klis; Reich, Claudia; Aziz, Ramy K; Bartels, Daniela; Cohoon, Matthew; Disz, Terry; Edwards, Robert A; Gerdes, Svetlana; Hwang, Kaitlyn; Kubal, Michael; Margaryan, Gohar Rem; Meyer, Folker; Mihalo, William; Olsen, Gary J; Olson, Robert; Osterman, Andrei; Paarmann, Daniel; Paczian, Tobias; Parrello, Bruce; Pusch, Gordon D; Rodionov, Dmitry A; Shi, Xinghua; Vassieva, Olga; Vonstein, Veronika; Zagnitko, Olga; Xia, Fangfang; Zinner, Jenifer; Overbeek, Ross; Stevens, Rick.

Nucleic Acids Res ; 35(Database issue): D347-53, 2007 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-17145713

RESUMO

The National Microbial Pathogen Data Resource (NMPDR) (http://www.nmpdr.org) is a National Institute of Allergy and Infections Disease (NIAID)-funded Bioinformatics Resource Center that supports research in selected Category B pathogens. NMPDR contains the complete genomes of approximately 50 strains of pathogenic bacteria that are the focus of our curators, as well as >400 other genomes that provide a broad context for comparative analysis across the three phylogenetic Domains. NMPDR integrates complete, public genomes with expertly curated biological subsystems to provide the most consistent genome annotations. Subsystems are sets of functional roles related by a biologically meaningful organizing principle, which are built over large collections of genomes; they provide researchers with consistent functional assignments in a biologically structured context. Investigators can browse subsystems and reactions to develop accurate reconstructions of the metabolic networks of any sequenced organism. NMPDR provides a comprehensive bioinformatics platform, with tools and viewers for genome analysis. Results of precomputed gene clustering analyses can be retrieved in tabular or graphic format with one-click tools. NMPDR tools include Signature Genes, which finds the set of genes in common or that differentiates two groups of organisms. Essentiality data collated from genome-wide studies have been curated. Drug target identification and high-throughput, in silico, compound screening are in development.

Assuntos

Bases de Dados de Ácidos Nucleicos , Genoma Bacteriano , Bactérias/efeitos dos fármacos , Bactérias/metabolismo , Bactérias/patogenicidade , Proteínas de Bactérias/genética , Proteínas de Bactérias/fisiologia , DNA Bacteriano/química , Sistemas de Liberação de Medicamentos , Genes Bacterianos , Genes Essenciais , Genômica , Internet , Homologia de Sequência do Ácido Nucleico , Software , Interface Usuário-Computador

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA