Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
Brief Bioinform ; 20(4): 1151-1159, 2019 07 19.
Artigo em Inglês | MEDLINE | ID: mdl-29028869

RESUMO

As technologies change, MG-RAST is adapting. Newly available software is being included to improve accuracy and performance. As a computational service constantly running large volume scientific workflows, MG-RAST is the right location to perform benchmarking and implement algorithmic or platform improvements, in many cases involving trade-offs between specificity, sensitivity and run-time cost. The work in [Glass EM, Dribinsky Y, Yilmaz P, et al. ISME J 2014;8:1-3] is an example; we use existing well-studied data sets as gold standards representing different environments and different technologies to evaluate any changes to the pipeline. Currently, we use well-understood data sets in MG-RAST as platform for benchmarking. The use of artificial data sets for pipeline performance optimization has not added value, as these data sets are not presenting the same challenges as real-world data sets. In addition, the MG-RAST team welcomes suggestions for improvements of the workflow. We are currently working on versions 4.02 and 4.1, both of which contain significant input from the community and our partners that will enable double barcoding, stronger inferences supported by longer-read technologies, and will increase throughput while maintaining sensitivity by using Diamond and SortMeRNA. On the technical platform side, the MG-RAST team intends to support the Common Workflow Language as a standard to specify bioinformatics workflows, both to facilitate development and efficient high-performance implementation of the community's data analysis tasks.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Metagenoma , Metagenômica/métodos , Software , Algoritmos , Orçamentos , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/economia , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Internet , Metagenômica/economia , Metagenômica/estatística & dados numéricos , Análise de Sequência de DNA/economia , Análise de Sequência de DNA/métodos , Análise de Sequência de DNA/estatística & dados numéricos , Interface Usuário-Computador , Fluxo de Trabalho
2.
BMC Bioinformatics ; 20(1): 561, 2019 Nov 08.
Artigo em Inglês | MEDLINE | ID: mdl-31703549

RESUMO

BACKGROUND: The MG-RAST API provides search capabilities and delivers organism and function data as well as raw or annotated sequence data via the web interface and its RESTful API. For casual users, however, RESTful APIs are hard to learn and work with. RESULTS: We created the graphical MG-RAST API explorer to help researchers more easily build and export API queries; understand the data abstractions and indices available in MG-RAST; and use the results presented in-browser for exploration, development, and debugging. CONCLUSIONS: The API explorer lowers the barrier to entry for occasional or first-time MG-RAST API users.


Assuntos
Ferramenta de Busca , Software , Interface Usuário-Computador , Archaea/genética , Sequência de Bases , Bases de Dados Genéticas , Internet
3.
Nucleic Acids Res ; 44(D1): D590-4, 2016 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-26656948

RESUMO

MG-RAST (http://metagenomics.anl.gov) is an open-submission data portal for processing, analyzing, sharing and disseminating metagenomic datasets. The system currently hosts over 200,000 datasets and is continuously updated. The volume of submissions has increased 4-fold over the past 24 months, now averaging 4 terabasepairs per month. In addition to several new features, we report changes to the analysis workflow and the technologies used to scale the pipeline up to the required throughput levels. To show possible uses for the data from MG-RAST, we present several examples integrating data and analyses from MG-RAST into popular third-party analysis tools or sequence alignment tools.


Assuntos
Bases de Dados de Ácidos Nucleicos , Metagenômica , Internet , Alinhamento de Sequência
4.
PLoS Comput Biol ; 11(1): e1004008, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25569221

RESUMO

Metagenomic sequencing has produced significant amounts of data in recent years. For example, as of summer 2013, MG-RAST has been used to annotate over 110,000 data sets totaling over 43 Terabases. With metagenomic sequencing finding even wider adoption in the scientific community, the existing web-based analysis tools and infrastructure in MG-RAST provide limited capability for data retrieval and analysis, such as comparative analysis between multiple data sets. Moreover, although the system provides many analysis tools, it is not comprehensive. By opening MG-RAST up via a web services API (application programmers interface) we have greatly expanded access to MG-RAST data, as well as provided a mechanism for the use of third-party analysis tools with MG-RAST data. This RESTful API makes all data and data objects created by the MG-RAST pipeline accessible as JSON objects. As part of the DOE Systems Biology Knowledgebase project (KBase, http://kbase.us) we have implemented a web services API for MG-RAST. This API complements the existing MG-RAST web interface and constitutes the basis of KBase's microbial community capabilities. In addition, the API exposes a comprehensive collection of data to programmers. This API, which uses a RESTful (Representational State Transfer) implementation, is compatible with most programming environments and should be easy to use for end users and third parties. It provides comprehensive access to sequence data, quality control results, annotations, and many other data types. Where feasible, we have used standards to expose data and metadata. Code examples are provided in a number of languages both to show the versatility of the API and to provide a starting point for users. We present an API that exposes the data in MG-RAST for consumption by our users, greatly enhancing the utility of the MG-RAST service.


Assuntos
Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Genoma Bacteriano/genética , Metagenômica/métodos , Interface Usuário-Computador , Internet , Anotação de Sequência Molecular/métodos , Software
5.
Environ Microbiol ; 16(11): 3443-62, 2014 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-24628880

RESUMO

We reconstructed the complete 2.4 Mb-long genome of a previously uncultivated epsilonproteobacterium, Candidatus Sulfuricurvum sp. RIFRC-1, via assembly of short-read shotgun metagenomic data using a complexity reduction approach. Genome-based comparisons indicate the bacterium is a novel species within the Sulfuricurvum genus, which contains one cultivated representative, S. kujiense. Divergence between the species appears due in part to extensive genomic rearrangements, gene loss and chromosomal versus plasmid encoding of certain (respiratory) genes by RIFRC-1. Deoxyribonucleic acid for the genome was obtained from terrestrial aquifer sediment, in which RIFRC-1 comprised ∼ 47% of the bacterial community. Genomic evidence suggests RIFRC-1 is a chemolithoautotrophic diazotroph capable of deriving energy for growth by microaerobic or nitrate-/nitric oxide-dependent oxidation of S°, sulfide or sulfite or H2oxidation. Carbon may be fixed via the reductive tricarboxylic acid cycle. Consistent with these physiological attributes, the local aquifer was microoxic with small concentrations of available nitrate, small but elevated concentrations of reduced sulfur and NH(4)(+) /NH3-limited. Additionally, various mechanisms for heavy metal and metalloid tolerance and virulence point to a lifestyle well-adapted for metal(loid)-rich environments and a shared evolutionary past with pathogenic Epsilonproteobacteria. Results expand upon recent findings highlighting the potential importance of sulfur and hydrogen metabolism in the terrestrial subsurface.


Assuntos
Epsilonproteobacteria/genética , Genoma Bacteriano , Água Subterrânea/microbiologia , Sequência de Bases , Carbono/metabolismo , Sedimentos Geológicos/química , Água Subterrânea/química , Hidrogênio/metabolismo , Metagenoma , Metagenômica , Oxirredução , Plasmídeos/genética , Enxofre/metabolismo
6.
Nucleic Acids Res ; 35(Database issue): D347-53, 2007 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-17145713

RESUMO

The National Microbial Pathogen Data Resource (NMPDR) (http://www.nmpdr.org) is a National Institute of Allergy and Infections Disease (NIAID)-funded Bioinformatics Resource Center that supports research in selected Category B pathogens. NMPDR contains the complete genomes of approximately 50 strains of pathogenic bacteria that are the focus of our curators, as well as >400 other genomes that provide a broad context for comparative analysis across the three phylogenetic Domains. NMPDR integrates complete, public genomes with expertly curated biological subsystems to provide the most consistent genome annotations. Subsystems are sets of functional roles related by a biologically meaningful organizing principle, which are built over large collections of genomes; they provide researchers with consistent functional assignments in a biologically structured context. Investigators can browse subsystems and reactions to develop accurate reconstructions of the metabolic networks of any sequenced organism. NMPDR provides a comprehensive bioinformatics platform, with tools and viewers for genome analysis. Results of precomputed gene clustering analyses can be retrieved in tabular or graphic format with one-click tools. NMPDR tools include Signature Genes, which finds the set of genes in common or that differentiates two groups of organisms. Essentiality data collated from genome-wide studies have been curated. Drug target identification and high-throughput, in silico, compound screening are in development.


Assuntos
Bases de Dados de Ácidos Nucleicos , Genoma Bacteriano , Bactérias/efeitos dos fármacos , Bactérias/metabolismo , Bactérias/patogenicidade , Proteínas de Bactérias/genética , Proteínas de Bactérias/fisiologia , DNA Bacteriano/química , Sistemas de Liberação de Medicamentos , Genes Bacterianos , Genes Essenciais , Genômica , Internet , Homologia de Sequência do Ácido Nucleico , Software , Interface Usuário-Computador
7.
BMC Genomics ; 9: 75, 2008 Feb 08.
Artigo em Inglês | MEDLINE | ID: mdl-18261238

RESUMO

BACKGROUND: The number of prokaryotic genome sequences becoming available is growing steadily and is growing faster than our ability to accurately annotate them. DESCRIPTION: We describe a fully automated service for annotating bacterial and archaeal genomes. The service identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user. In addition, the annotated genome can be browsed in an environment that supports comparative analysis with the annotated genomes maintained in the SEED environment. The service normally makes the annotated genome available within 12-24 hours of submission, but ultimately the quality of such a service will be judged in terms of accuracy, consistency, and completeness of the produced annotations. We summarize our attempts to address these issues and discuss plans for incrementally enhancing the service. CONCLUSION: By providing accurate, rapid annotation freely to the community we have created an important community resource. The service has now been utilized by over 120 external users annotating over 350 distinct genomes.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos , Genes de RNAr/genética , Genoma Arqueal , Genoma Bacteriano , Fases de Leitura Aberta/genética , Filogenia , Proteínas/genética , RNA de Transferência/genética , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Fatores de Tempo , Interface Usuário-Computador
8.
Stand Genomic Sci ; 9: 18, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25780508

RESUMO

BACKGROUND: As the impact and prevalence of large-scale metagenomic surveys grow, so does the acute need for more complete and standards compliant metadata. Metadata (data describing data) provides an essential complement to experimental data, helping to answer questions about its source, mode of collection, and reliability. Metadata collection and interpretation have become vital to the genomics and metagenomics communities, but considerable challenges remain, including exchange, curation, and distribution. Currently, tools are available for capturing basic field metadata during sampling, and for storing, updating and viewing it. Unfortunately, these tools are not specifically designed for metagenomic surveys; in particular, they lack the appropriate metadata collection templates, a centralized storage repository, and a unique ID linking system that can be used to easily port complete and compatible metagenomic metadata into widely used assembly and sequence analysis tools. RESULTS: Metazen was developed as a comprehensive framework designed to enable metadata capture for metagenomic sequencing projects. Specifically, Metazen provides a rapid, easy-to-use portal to encourage early deposition of project and sample metadata. CONCLUSIONS: Metazen is an interactive tool that aids users in recording their metadata in a complete and valid format. A defined set of mandatory fields captures vital information, while the option to add fields provides flexibility.

9.
Methods Enzymol ; 531: 487-523, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24060134

RESUMO

The democratized world of sequencing is leading to numerous data analysis challenges; MG-RAST addresses many of these challenges for diverse datasets, including amplicon datasets, shotgun metagenomes, and metatranscriptomes. The changes from version 2 to version 3 include the addition of a dedicated gene calling stage using FragGenescan, clustering of predicted proteins at 90% identity, and the use of BLAT for the computation of similarities. Together with changes in the underlying software infrastructure, this has enabled the dramatic scaling up of pipeline throughput while remaining on a limited hardware budget. The Web-based service allows upload, fully automated analysis, and visualization of results. As a result of the plummeting cost of sequencing and the readily available analytical power of MG-RAST, over 78,000 metagenomic datasets have been analyzed, with over 12,000 of them publicly available in MG-RAST.


Assuntos
Biologia Computacional/métodos , Metagenômica , Software , Bactérias/classificação , Bactérias/genética , Genoma Bacteriano , Sequenciamento de Nucleotídeos em Larga Escala , Internet
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA