Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
1.
Nucleic Acids Res ; 51(D1): D678-D689, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36350631

RESUMO

The National Institute of Allergy and Infectious Diseases (NIAID) established the Bioinformatics Resource Center (BRC) program to assist researchers with analyzing the growing body of genome sequence and other omics-related data. In this report, we describe the merger of the PAThosystems Resource Integration Center (PATRIC), the Influenza Research Database (IRD) and the Virus Pathogen Database and Analysis Resource (ViPR) BRCs to form the Bacterial and Viral Bioinformatics Resource Center (BV-BRC) https://www.bv-brc.org/. The combined BV-BRC leverages the functionality of the bacterial and viral resources to provide a unified data model, enhanced web-based visualization and analysis tools, bioinformatics services, and a powerful suite of command line tools that benefit the bacterial and viral research communities.


Assuntos
Genômica , Software , Vírus , Humanos , Bactérias/genética , Biologia Computacional , Bases de Dados Genéticas , Influenza Humana , Vírus/genética
2.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34379107

RESUMO

Antimicrobial resistance (AMR) is a major global health threat that affects millions of people each year. Funding agencies worldwide and the global research community have expended considerable capital and effort tracking the evolution and spread of AMR by isolating and sequencing bacterial strains and performing antimicrobial susceptibility testing (AST). For the last several years, we have been capturing these efforts by curating data from the literature and data resources and building a set of assembled bacterial genome sequences that are paired with laboratory-derived AST data. This collection currently contains AST data for over 67 000 genomes encompassing approximately 40 genera and over 100 species. In this paper, we describe the characteristics of this collection, highlighting areas where sampling is comparatively deep or shallow, and showing areas where attention is needed from the research community to improve sampling and tracking efforts. In addition to using the data to track the evolution and spread of AMR, it also serves as a useful starting point for building machine learning models for predicting AMR phenotypes. We demonstrate this by describing two machine learning models that are built from the entire dataset to show where the predictive power is comparatively high or low. This AMR metadata collection is freely available and maintained on the Bacterial and Viral Bioinformatics Center (BV-BRC) FTP site ftp://ftp.bvbrc.org/RELEASE_NOTES/PATRIC_genomes_AMR.txt.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Resistência Microbiana a Medicamentos , Genômica/métodos , Testes de Sensibilidade Microbiana , Inteligência Artificial , Bactérias/efeitos dos fármacos , Bactérias/genética , Genoma Bacteriano , Humanos , Laboratórios , Aprendizado de Máquina , Fenótipo
3.
Nucleic Acids Res ; 48(D1): D606-D612, 2020 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-31667520

RESUMO

The PathoSystems Resource Integration Center (PATRIC) is the bacterial Bioinformatics Resource Center funded by the National Institute of Allergy and Infectious Diseases (https://www.patricbrc.org). PATRIC supports bioinformatic analyses of all bacteria with a special emphasis on pathogens, offering a rich comparative analysis environment that provides users with access to over 250 000 uniformly annotated and publicly available genomes with curated metadata. PATRIC offers web-based visualization and comparative analysis tools, a private workspace in which users can analyze their own data in the context of the public collections, services that streamline complex bioinformatic workflows and command-line tools for bulk data analysis. Over the past several years, as genomic and other omics-related experiments have become more cost-effective and widespread, we have observed considerable growth in the usage of and demand for easy-to-use, publicly available bioinformatic tools and services. Here we report the recent updates to the PATRIC resource, including new web-based comparative analysis tools, eight new services and the release of a command-line interface to access, query and analyze data.


Assuntos
Bactérias/genética , Biologia Computacional/métodos , Bases de Dados Genéticas , Algoritmos , Animais , Caenorhabditis elegans/genética , Galinhas/genética , Drosophila melanogaster/genética , Interações Hospedeiro-Patógeno/genética , Humanos , Internet , Macaca mulatta/genética , Metagenômica , Camundongos , National Institute of Allergy and Infectious Diseases (U.S.) , Fenótipo , Filogenia , Ratos , Suínos/genética , Estados Unidos , Peixe-Zebra/genética
4.
Brief Bioinform ; 20(4): 1094-1102, 2019 07 19.
Artigo em Inglês | MEDLINE | ID: mdl-28968762

RESUMO

The Pathosystems Resource Integration Center (PATRIC, www.patricbrc.org) is designed to provide researchers with the tools and services that they need to perform genomic and other 'omic' data analyses. In response to mounting concern over antimicrobial resistance (AMR), the PATRIC team has been developing new tools that help researchers understand AMR and its genetic determinants. To support comparative analyses, we have added AMR phenotype data to over 15 000 genomes in the PATRIC database, often assembling genomes from reads in public archives and collecting their associated AMR panel data from the literature to augment the collection. We have also been using this collection of AMR metadata to build machine learning-based classifiers that can predict the AMR phenotypes and the genomic regions associated with resistance for genomes being submitted to the annotation service. Likewise, we have undertaken a large AMR protein annotation effort by manually curating data from the literature and public repositories. This collection of 7370 AMR reference proteins, which contains many protein annotations (functional roles) that are unique to PATRIC and RAST, has been manually curated so that it projects stably across genomes. The collection currently projects to 1 610 744 proteins in the PATRIC database. Finally, the PATRIC Web site has been expanded to enable AMR-based custom page views so that researchers can easily explore AMR data and design experiments based on whole genomes or individual genes.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Resistência Microbiana a Medicamentos/genética , Integração de Sistemas , Biologia Computacional/tendências , Bases de Dados Genéticas/estatística & dados numéricos , Genoma Microbiano , Humanos , Internet , Anotação de Sequência Molecular
5.
BMC Bioinformatics ; 20(1): 486, 2019 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-31581946

RESUMO

BACKGROUND: Recent advances in high-volume sequencing technology and mining of genomes from metagenomic samples call for rapid and reliable genome quality evaluation. The current release of the PATRIC database contains over 220,000 genomes, and current metagenomic technology supports assemblies of many draft-quality genomes from a single sample, most of which will be novel. DESCRIPTION: We have added two quality assessment tools to the PATRIC annotation pipeline. EvalCon uses supervised machine learning to calculate an annotation consistency score. EvalG implements a variant of the CheckM algorithm to estimate contamination and completeness of an annotated genome.We report on the performance of these tools and the potential utility of the consistency score. Additionally, we provide contamination, completeness, and consistency measures for all genomes in PATRIC and in a recent set of metagenomic assemblies. CONCLUSION: EvalG and EvalCon facilitate the rapid quality control and exploration of PATRIC-annotated draft genomes.


Assuntos
Bases de Dados Genéticas , Genoma Arqueal , Genoma Bacteriano , Aprendizado de Máquina , Metagenômica/métodos , Metagenômica/normas , Software
6.
Nucleic Acids Res ; 45(D1): D535-D542, 2017 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-27899627

RESUMO

The Pathosystems Resource Integration Center (PATRIC) is the bacterial Bioinformatics Resource Center (https://www.patricbrc.org). Recent changes to PATRIC include a redesign of the web interface and some new services that provide users with a platform that takes them from raw reads to an integrated analysis experience. The redesigned interface allows researchers direct access to tools and data, and the emphasis has changed to user-created genome-groups, with detailed summaries and views of the data that researchers have selected. Perhaps the biggest change has been the enhanced capability for researchers to analyze their private data and compare it to the available public data. Researchers can assemble their raw sequence reads and annotate the contigs using RASTtk. PATRIC also provides services for RNA-Seq, variation, model reconstruction and differential expression analysis, all delivered through an updated private workspace. Private data can be compared by 'virtual integration' to any of PATRIC's public data. The number of genomes available for comparison in PATRIC has expanded to over 80 000, with a special emphasis on genomes with antimicrobial resistance data. PATRIC uses this data to improve both subsystem annotation and k-mer classification, and tags new genomes as having signatures that indicate susceptibility or resistance to specific antibiotics.


Assuntos
Bactérias/genética , Biologia Computacional/métodos , Bases de Dados Genéticas , Genoma Bacteriano , Genômica/métodos , Antibacterianos/farmacologia , Bactérias/efeitos dos fármacos , Bactérias/metabolismo , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Farmacorresistência Bacteriana , Anotação de Sequência Molecular , Proteoma , Proteômica/métodos , Software , Navegador
7.
Nucleic Acids Res ; 42(Database issue): D206-14, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24293654

RESUMO

In 2004, the SEED (http://pubseed.theseed.org/) was created to provide consistent and accurate genome annotations across thousands of genomes and as a platform for discovering and developing de novo annotations. The SEED is a constantly updated integration of genomic data with a genome database, web front end, API and server scripts. It is used by many scientists for predicting gene functions and discovering new pathways. In addition to being a powerful database for bioinformatics research, the SEED also houses subsystems (collections of functionally related protein families) and their derived FIGfams (protein families), which represent the core of the RAST annotation engine (http://rast.nmpdr.org/). When a new genome is submitted to RAST, genes are called and their annotations are made by comparison to the FIGfam collection. If the genome is made public, it is then housed within the SEED and its proteins populate the FIGfam collection. This annotation cycle has proven to be a robust and scalable solution to the problem of annotating the exponentially increasing number of genomes. To date, >12 000 users worldwide have annotated >60 000 distinct genomes using RAST. Here we describe the interconnectedness of the SEED database and RAST, the RAST annotation pipeline and updates to both resources.


Assuntos
Bases de Dados Genéticas , Genoma Arqueal , Genoma Bacteriano , Anotação de Sequência Molecular , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Proteínas de Bactérias/fisiologia , Genômica , Internet , Software
8.
Nucleic Acids Res ; 42(Database issue): D581-91, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24225323

RESUMO

The Pathosystems Resource Integration Center (PATRIC) is the all-bacterial Bioinformatics Resource Center (BRC) (http://www.patricbrc.org). A joint effort by two of the original National Institute of Allergy and Infectious Diseases-funded BRCs, PATRIC provides researchers with an online resource that stores and integrates a variety of data types [e.g. genomics, transcriptomics, protein-protein interactions (PPIs), three-dimensional protein structures and sequence typing data] and associated metadata. Datatypes are summarized for individual genomes and across taxonomic levels. All genomes in PATRIC, currently more than 10,000, are consistently annotated using RAST, the Rapid Annotations using Subsystems Technology. Summaries of different data types are also provided for individual genes, where comparisons of different annotations are available, and also include available transcriptomic data. PATRIC provides a variety of ways for researchers to find data of interest and a private workspace where they can store both genomic and gene associations, and their own private data. Both private and public data can be analyzed together using a suite of tools to perform comparative genomic or transcriptomic analysis. PATRIC also includes integrated information related to disease and PPIs. All the data and integrated analysis and visualization tools are freely available. This manuscript describes updates to the PATRIC since its initial report in the 2007 NAR Database Issue.


Assuntos
Bases de Dados Genéticas , Genoma Bacteriano , Bactérias/classificação , Bactérias/genética , Infecções Bacterianas/microbiologia , Proteínas de Bactérias/química , Proteínas de Bactérias/metabolismo , Técnicas de Tipagem Bacteriana , Perfilação da Expressão Gênica , Genômica , Humanos , Internet , Conformação Proteica , Mapeamento de Interação de Proteínas
9.
Methods Mol Biol ; 2802: 547-571, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38819571

RESUMO

As genomic and related data continue to expand, research biologists are often hampered by the computational hurdles required to analyze their data. The National Institute of Allergy and Infectious Diseases (NIAID) established the Bioinformatics Resource Centers (BRC) to assist researchers with their analysis of genome sequence and other omics-related data. Recently, the PAThosystems Resource Integration Center (PATRIC), the Influenza Research Database (IRD), and the Virus Pathogen Database and Analysis Resource (ViPR) BRCs merged to form the Bacterial and Viral Bioinformatics Resource Center (BV-BRC) at https://www.bv-brc.org/ . The combined BV-BRC leverages the functionality of the original resources for bacterial and viral research communities with a unified data model, enhanced web-based visualization and analysis tools, and bioinformatics services. Here we demonstrate how antimicrobial resistance data can be analyzed in the new resource.


Assuntos
Bactérias , Biologia Computacional , Bases de Dados Genéticas , Farmacorresistência Bacteriana , Genômica , Genômica/métodos , Biologia Computacional/métodos , Farmacorresistência Bacteriana/genética , Bactérias/genética , Bactérias/efeitos dos fármacos , Humanos , Software , Genoma Bacteriano , Antibacterianos/farmacologia , Navegador , Estados Unidos , National Institute of Allergy and Infectious Diseases (U.S.)
10.
Bioinformatics ; 28(24): 3316-7, 2012 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-23047562

RESUMO

Annotation of metagenomes involves comparing the individual sequence reads with a database of known sequences and assigning a unique function to each read. This is a time-consuming task that is computationally intensive (though not computationally complex). Here we present a novel approach to annotate metagenomes using unique k-mer oligopeptide sequences from 7 to 12 amino acids long. We demonstrate that k-mer-based annotations are faster and approach the sensitivity and precision of blastx-based annotations without loosing accuracy. A last-common ancestor approach was also developed to describe the members of the community.


Assuntos
Metagenômica/métodos , Anotação de Sequência Molecular , Algoritmos , Metagenoma , Análise de Sequência de DNA
11.
Biochim Biophys Acta ; 1810(10): 967-77, 2011 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-21421023

RESUMO

BACKGROUND: The development of next generation sequencing technology is rapidly changing the face of the genome annotation and analysis field. One of the primary uses for genome sequence data is to improve our understanding and prediction of phenotypes for microbes and microbial communities, but the technologies for predicting phenotypes must keep pace with the new sequences emerging. SCOPE OF REVIEW: This review presents an integrated view of the methods and technologies used in the inference of phenotypes for microbes and microbial communities based on genomic and metagenomic data. Given the breadth of this topic, we place special focus on the resources available within the SEED Project. We discuss the two steps involved in connecting genotype to phenotype: sequence annotation, and phenotype inference, and we highlight the challenges in each of these steps when dealing with both single genome and metagenome data. MAJOR CONCLUSIONS: This integrated view of the genotype-to-phenotype problem highlights the importance of a controlled ontology in the annotation of genomic data, as this benefits subsequent phenotype inference and metagenome annotation. We also note the importance of expanding the set of reference genomes to improve the annotation of all sequence data, and we highlight metagenome assembly as a potential new source for complete genomes. Finally, we find that phenotype inference, particularly from metabolic models, generates predictions that can be validated and reconciled to improve annotations. GENERAL SIGNIFICANCE: This review presents the first look at the challenges and opportunities associated with the inference of phenotype from genotype during the next generation sequencing revolution. This article is part of a Special Issue entitled: Systems Biology of Microorganisms.


Assuntos
Genótipo , Fenótipo , Análise de Sequência de DNA/métodos , Animais , Humanos , Metagenômica/métodos
12.
BMC Bioinformatics ; 11: 319, 2010 Jun 14.
Artigo em Inglês | MEDLINE | ID: mdl-20546611

RESUMO

BACKGROUND: The SEED integrates many publicly available genome sequences into a single resource. The database contains accurate and up-to-date annotations based on the subsystems concept that leverages clustering between genomes and other clues to accurately and efficiently annotate microbial genomes. The backend is used as the foundation for many genome annotation tools, such as the Rapid Annotation using Subsystems Technology (RAST) server for whole genome annotation, the metagenomics RAST server for random community genome annotations, and the annotation clearinghouse for exchanging annotations from different resources. In addition to a web user interface, the SEED also provides Web services based API for programmatic access to the data in the SEED, allowing the development of third-party tools and mash-ups. RESULTS: The currently exposed Web services encompass over forty different methods for accessing data related to microbial genome annotations. The Web services provide comprehensive access to the database back end, allowing any programmer access to the most consistent and accurate genome annotations available. The Web services are deployed using a platform independent service-oriented approach that allows the user to choose the most suitable programming platform for their application. Example code demonstrate that Web services can be used to access the SEED using common bioinformatics programming languages such as Perl, Python, and Java. CONCLUSIONS: We present a novel approach to access the SEED database. Using Web services, a robust API for access to genomics data is provided, without requiring large volume downloads all at once. The API ensures timely access to the most current datasets available, including the new genomes as soon as they come online.


Assuntos
Bases de Dados Genéticas , Genoma , Metagenômica/métodos , Software
13.
Nucleic Acids Res ; 35(Database issue): D347-53, 2007 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-17145713

RESUMO

The National Microbial Pathogen Data Resource (NMPDR) (http://www.nmpdr.org) is a National Institute of Allergy and Infections Disease (NIAID)-funded Bioinformatics Resource Center that supports research in selected Category B pathogens. NMPDR contains the complete genomes of approximately 50 strains of pathogenic bacteria that are the focus of our curators, as well as >400 other genomes that provide a broad context for comparative analysis across the three phylogenetic Domains. NMPDR integrates complete, public genomes with expertly curated biological subsystems to provide the most consistent genome annotations. Subsystems are sets of functional roles related by a biologically meaningful organizing principle, which are built over large collections of genomes; they provide researchers with consistent functional assignments in a biologically structured context. Investigators can browse subsystems and reactions to develop accurate reconstructions of the metabolic networks of any sequenced organism. NMPDR provides a comprehensive bioinformatics platform, with tools and viewers for genome analysis. Results of precomputed gene clustering analyses can be retrieved in tabular or graphic format with one-click tools. NMPDR tools include Signature Genes, which finds the set of genes in common or that differentiates two groups of organisms. Essentiality data collated from genome-wide studies have been curated. Drug target identification and high-throughput, in silico, compound screening are in development.


Assuntos
Bases de Dados de Ácidos Nucleicos , Genoma Bacteriano , Bactérias/efeitos dos fármacos , Bactérias/metabolismo , Bactérias/patogenicidade , Proteínas de Bactérias/genética , Proteínas de Bactérias/fisiologia , DNA Bacteriano/química , Sistemas de Liberação de Medicamentos , Genes Bacterianos , Genes Essenciais , Genômica , Internet , Homologia de Sequência do Ácido Nucleico , Software , Interface Usuário-Computador
14.
BMC Genomics ; 9: 75, 2008 Feb 08.
Artigo em Inglês | MEDLINE | ID: mdl-18261238

RESUMO

BACKGROUND: The number of prokaryotic genome sequences becoming available is growing steadily and is growing faster than our ability to accurately annotate them. DESCRIPTION: We describe a fully automated service for annotating bacterial and archaeal genomes. The service identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user. In addition, the annotated genome can be browsed in an environment that supports comparative analysis with the annotated genomes maintained in the SEED environment. The service normally makes the annotated genome available within 12-24 hours of submission, but ultimately the quality of such a service will be judged in terms of accuracy, consistency, and completeness of the produced annotations. We summarize our attempts to address these issues and discuss plans for incrementally enhancing the service. CONCLUSION: By providing accurate, rapid annotation freely to the community we have created an important community resource. The service has now been utilized by over 120 external users annotating over 350 distinct genomes.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos , Genes de RNAr/genética , Genoma Arqueal , Genoma Bacteriano , Fases de Leitura Aberta/genética , Filogenia , Proteínas/genética , RNA de Transferência/genética , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Fatores de Tempo , Interface Usuário-Computador
15.
Methods Mol Biol ; 1704: 79-101, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29277864

RESUMO

In the "big data" era, research biologists are faced with analyzing new types that usually require some level of computational expertise. A number of programs and pipelines exist, but acquiring the expertise to run them, and then understanding the output can be a challenge.The Pathosystems Resource Integration Center (PATRIC, www.patricbrc.org ) has created an end-to-end analysis platform that allows researchers to take their raw reads, assemble a genome, annotate it, and then use a suite of user-friendly tools to compare it to any public data that is available in the repository. With close to 113,000 bacterial and more than 1000 archaeal genomes, PATRIC creates a unique research experience with "virtual integration" of private and public data. PATRIC contains many diverse tools and functionalities to explore both genome-scale and gene expression data, but the main focus of this chapter is on assembly, annotation, and the downstream comparative analysis functionality that is freely available in the resource.


Assuntos
Bactérias/genética , Bases de Dados Genéticas , Genoma Bacteriano , Genômica/métodos , Anotação de Sequência Molecular , Software , Biologia Computacional , Internet , Estatística como Assunto
16.
Nucleic Acids Res ; 33(17): 5691-702, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-16214803

RESUMO

The release of the 1000th complete microbial genome will occur in the next two to three years. In anticipation of this milestone, the Fellowship for Interpretation of Genomes (FIG) launched the Project to Annotate 1000 Genomes. The project is built around the principle that the key to improved accuracy in high-throughput annotation technology is to have experts annotate single subsystems over the complete collection of genomes, rather than having an annotation expert attempt to annotate all of the genes in a single genome. Using the subsystems approach, all of the genes implementing the subsystem are analyzed by an expert in that subsystem. An annotation environment was created where populated subsystems are curated and projected to new genomes. A portable notion of a populated subsystem was defined, and tools developed for exchanging and curating these objects. Tools were also developed to resolve conflicts between populated subsystems. The SEED is the first annotation environment that supports this model of annotation. Here, we describe the subsystem approach, and offer the first release of our growing library of populated subsystems. The initial release of data includes 180 177 distinct proteins with 2133 distinct functional roles. This data comes from 173 subsystems and 383 different organisms.


Assuntos
Genoma Arqueal , Genoma Bacteriano , Genômica/métodos , Software , Acil Coenzima A/metabolismo , Coenzima A/biossíntese , Biologia Computacional , Internet , Leucina/metabolismo , Proteínas Ribossômicas/classificação , Terminologia como Assunto , Vocabulário Controlado
17.
Nucleic Acids Res ; 31(1): 164-71, 2003 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-12519973

RESUMO

The ERGO (http://ergo.integratedgenomics.com/ERGO/) genome analysis and discovery suite is an integration of biological data from genomics, biochemistry, high-throughput expression profiling, genetics and peer-reviewed journals to achieve a comprehensive analysis of genes and genomes. Far beyond any conventional systems that facilitate functional assignments, ERGO combines pattern-based analysis with comparative genomics by visualizing genes within the context of regulation, expression profiling, phylogenetic clusters, fusion events, networked cellular pathways and chromosomal neighborhoods of other functionally related genes. The result of this multifaceted approach is to provide an extensively curated database of the largest available integration of genomes, with a vast collection of reconstructed cellular pathways spanning all domains of life. Although access to ERGO is provided only under subscription, it is already widely used by the academic community. The current version of the system integrates 500 genomes from all domains of life in various levels of completion, 403 of which are available for subscription.


Assuntos
Bases de Dados Genéticas , Genoma , Genômica , Animais , Biologia Computacional , Perfilação da Expressão Gênica , Metabolismo , Proteínas/fisiologia
18.
Front Microbiol ; 7: 275, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27047450

RESUMO

We introduce a manually constructed and curated regulatory network model that describes the current state of knowledge of transcriptional regulation of Bacillus subtilis. The model corresponds to an updated and enlarged version of the regulatory model of central metabolism originally proposed in 2008. We extended the original network to the whole genome by integration of information from DBTBS, a compendium of regulatory data that includes promoters, transcription factors (TFs), binding sites, motifs, and regulated operons. Additionally, we consolidated our network with all the information on regulation included in the SporeWeb and Subtiwiki community-curated resources on B. subtilis. Finally, we reconciled our network with data from RegPrecise, which recently released their own less comprehensive reconstruction of the regulatory network for B. subtilis. Our model describes 275 regulators and their target genes, representing 30 different mechanisms of regulation such as TFs, RNA switches, Riboswitches, and small regulatory RNAs. Overall, regulatory information is included in the model for ∼2500 of the ∼4200 genes in B. subtilis 168. In an effort to further expand our knowledge of B. subtilis regulation, we reconciled our model with expression data. For this process, we reconstructed the Atomic Regulons (ARs) for B. subtilis, which are the sets of genes that share the same "ON" and "OFF" gene expression profiles across multiple samples of experimental data. We show how ARs for B. subtilis are able to capture many sets of genes corresponding to regulated operons in our manually curated network. Additionally, we demonstrate how ARs can be used to help expand or validate the knowledge of the regulatory networks by looking at highly correlated genes in the ARs for which regulatory information is lacking. During this process, we were also able to infer novel stimuli for hypothetical genes by exploring the genome expression metadata relating to experimental conditions, gaining insights into novel biology.

19.
Front Microbiol ; 7: 118, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26903996

RESUMO

The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation, and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org) in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based function assignments available through RAST (Rapid Annotation using Subsystem Technology) to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL). This new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods.

20.
3 Biotech ; 5(1): 101-105, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-28324362

RESUMO

For many scientific applications, it is highly desirable to be able to compare metabolic models of closely related genomes. In this short report, we attempt to raise awareness to the fact that taking annotated genomes from public repositories and using them for metabolic model reconstructions is far from being trivial due to annotation inconsistencies. We are proposing a protocol for comparative analysis of metabolic models on closely related genomes, using fifteen strains of genus Brucella, which contains pathogens of both humans and livestock. This study lead to the identification and subsequent correction of inconsistent annotations in the SEED database, as well as the identification of 31 biochemical reactions that are common to Brucella, which are not originally identified by automated metabolic reconstructions. We are currently implementing this protocol for improving automated annotations within the SEED database and these improvements have been propagated into PATRIC, Model-SEED, KBase and RAST. This method is an enabling step for the future creation of consistent annotation systems and high-quality model reconstructions that will support in predicting accurate phenotypes such as pathogenicity, media requirements or type of respiration.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA