Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
BMC Bioinformatics ; 20(1): 561, 2019 Nov 08.
Artículo en Inglés | MEDLINE | ID: mdl-31703549

RESUMEN

BACKGROUND: The MG-RAST API provides search capabilities and delivers organism and function data as well as raw or annotated sequence data via the web interface and its RESTful API. For casual users, however, RESTful APIs are hard to learn and work with. RESULTS: We created the graphical MG-RAST API explorer to help researchers more easily build and export API queries; understand the data abstractions and indices available in MG-RAST; and use the results presented in-browser for exploration, development, and debugging. CONCLUSIONS: The API explorer lowers the barrier to entry for occasional or first-time MG-RAST API users.


Asunto(s)
Motor de Búsqueda , Programas Informáticos , Interfaz Usuario-Computador , Archaea/genética , Secuencia de Bases , Bases de Datos Genéticas , Internet
2.
Brief Bioinform ; 20(4): 1151-1159, 2019 07 19.
Artículo en Inglés | MEDLINE | ID: mdl-29028869

RESUMEN

As technologies change, MG-RAST is adapting. Newly available software is being included to improve accuracy and performance. As a computational service constantly running large volume scientific workflows, MG-RAST is the right location to perform benchmarking and implement algorithmic or platform improvements, in many cases involving trade-offs between specificity, sensitivity and run-time cost. The work in [Glass EM, Dribinsky Y, Yilmaz P, et al. ISME J 2014;8:1-3] is an example; we use existing well-studied data sets as gold standards representing different environments and different technologies to evaluate any changes to the pipeline. Currently, we use well-understood data sets in MG-RAST as platform for benchmarking. The use of artificial data sets for pipeline performance optimization has not added value, as these data sets are not presenting the same challenges as real-world data sets. In addition, the MG-RAST team welcomes suggestions for improvements of the workflow. We are currently working on versions 4.02 and 4.1, both of which contain significant input from the community and our partners that will enable double barcoding, stronger inferences supported by longer-read technologies, and will increase throughput while maintaining sensitivity by using Diamond and SortMeRNA. On the technical platform side, the MG-RAST team intends to support the Common Workflow Language as a standard to specify bioinformatics workflows, both to facilitate development and efficient high-performance implementation of the community's data analysis tasks.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Metagenoma , Metagenómica/métodos , Programas Informáticos , Algoritmos , Presupuestos , Biología Computacional/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/economía , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Internet , Metagenómica/economía , Metagenómica/estadística & datos numéricos , Análisis de Secuencia de ADN/economía , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ADN/estadística & datos numéricos , Interfaz Usuario-Computador , Flujo de Trabajo
4.
Nucleic Acids Res ; 44(D1): D590-4, 2016 Jan 04.
Artículo en Inglés | MEDLINE | ID: mdl-26656948

RESUMEN

MG-RAST (http://metagenomics.anl.gov) is an open-submission data portal for processing, analyzing, sharing and disseminating metagenomic datasets. The system currently hosts over 200,000 datasets and is continuously updated. The volume of submissions has increased 4-fold over the past 24 months, now averaging 4 terabasepairs per month. In addition to several new features, we report changes to the analysis workflow and the technologies used to scale the pipeline up to the required throughput levels. To show possible uses for the data from MG-RAST, we present several examples integrating data and analyses from MG-RAST into popular third-party analysis tools or sequence alignment tools.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Metagenómica , Internet , Alineación de Secuencia
5.
PLoS Comput Biol ; 11(1): e1004008, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25569221

RESUMEN

Metagenomic sequencing has produced significant amounts of data in recent years. For example, as of summer 2013, MG-RAST has been used to annotate over 110,000 data sets totaling over 43 Terabases. With metagenomic sequencing finding even wider adoption in the scientific community, the existing web-based analysis tools and infrastructure in MG-RAST provide limited capability for data retrieval and analysis, such as comparative analysis between multiple data sets. Moreover, although the system provides many analysis tools, it is not comprehensive. By opening MG-RAST up via a web services API (application programmers interface) we have greatly expanded access to MG-RAST data, as well as provided a mechanism for the use of third-party analysis tools with MG-RAST data. This RESTful API makes all data and data objects created by the MG-RAST pipeline accessible as JSON objects. As part of the DOE Systems Biology Knowledgebase project (KBase, http://kbase.us) we have implemented a web services API for MG-RAST. This API complements the existing MG-RAST web interface and constitutes the basis of KBase's microbial community capabilities. In addition, the API exposes a comprehensive collection of data to programmers. This API, which uses a RESTful (Representational State Transfer) implementation, is compatible with most programming environments and should be easy to use for end users and third parties. It provides comprehensive access to sequence data, quality control results, annotations, and many other data types. Where feasible, we have used standards to expose data and metadata. Code examples are provided in a number of languages both to show the versatility of the API and to provide a starting point for users. We present an API that exposes the data in MG-RAST for consumption by our users, greatly enhancing the utility of the MG-RAST service.


Asunto(s)
Sistemas de Administración de Bases de Datos , Bases de Datos Genéticas , Genoma Bacteriano/genética , Metagenómica/métodos , Interfaz Usuario-Computador , Internet , Anotación de Secuencia Molecular/métodos , Programas Informáticos
6.
PLoS One ; 9(3): e92297, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24642836

RESUMEN

Mycoplasma salivarium belongs to the class of the smallest self-replicating Tenericutes and is predominantly found in the oral cavity of humans. In general it is considered as a non-pathogenic commensal. However, some reports point to an association with human diseases. M. salivarium was found e.g. as causative agent of a submasseteric abscess, in necrotic dental pulp, in brain abscess and clogged biliary stent. Here we describe the detection of M. salivarium on the surface of a squamous cell carcinoma of the tongue of a patient with Fanconi anaemia (FA). FA is an inherited bone marrow failure syndrome based on defective DNA-repair that increases the risk of carcinomas especially oral squamous cell carcinoma. Employing high coverage, massive parallel Roche/454-next-generation-sequencing of 16S rRNA gene amplicons we analysed the oral microbiome of this FA patient in comparison to that of an FA patient with a benign leukoplakia and five healthy individuals. The microbiota of the FA patient with leukoplakia correlated well with that of the healthy controls. A dominance of Streptococcus, Veillonella and Neisseria species was typically observed. In contrast, the microbiome of the cancer bearing FA patient was dominated by Pseudomonas aeruginosa at the healthy sites, which changed to a predominance of 98% M. salivarium on the tumour surface. Quantification of the mycoplasma load in five healthy, two tumour- and two leukoplakia-FA patients by TaqMan-PCR confirmed the prevalence of M. salivarium at the tumour sites. These new findings suggest that this mycoplasma species with its reduced coding capacity found ideal breeding grounds at the tumour sites. Interestingly, the oral cavity of all FA patients and especially samples at the tumour sites were in addition positive for Candida albicans. It remains to be elucidated in further studies whether M. salivarium can be used as a predictive biomarker for tumour development in these patients.


Asunto(s)
Carcinoma de Células Escamosas/microbiología , Anemia de Fanconi/complicaciones , Infecciones por Mycoplasma/microbiología , Mycoplasma salivarium/genética , Neoplasias de la Lengua/microbiología , Adulto , Estudios de Casos y Controles , Genes Bacterianos , Humanos , Masculino , Microbiota/genética , Tipificación Molecular , Boca/microbiología , ARN Ribosómico 16S/genética , Estudios Retrospectivos
7.
Methods Enzymol ; 531: 487-523, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24060134

RESUMEN

The democratized world of sequencing is leading to numerous data analysis challenges; MG-RAST addresses many of these challenges for diverse datasets, including amplicon datasets, shotgun metagenomes, and metatranscriptomes. The changes from version 2 to version 3 include the addition of a dedicated gene calling stage using FragGenescan, clustering of predicted proteins at 90% identity, and the use of BLAT for the computation of similarities. Together with changes in the underlying software infrastructure, this has enabled the dramatic scaling up of pipeline throughput while remaining on a limited hardware budget. The Web-based service allows upload, fully automated analysis, and visualization of results. As a result of the plummeting cost of sequencing and the readily available analytical power of MG-RAST, over 78,000 metagenomic datasets have been analyzed, with over 12,000 of them publicly available in MG-RAST.


Asunto(s)
Biología Computacional/métodos , Metagenómica , Programas Informáticos , Bacterias/clasificación , Bacterias/genética , Genoma Bacteriano , Secuenciación de Nucleótidos de Alto Rendimiento , Internet
8.
Nucleic Acids Res ; 39(14): e91, 2011 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-21586583

RESUMEN

The vast majority of microbes are unculturable and thus cannot be sequenced by means of traditional methods. High-throughput sequencing techniques like 454 or Solexa-Illumina make it possible to explore those microbes by studying whole natural microbial communities and analysing their biological diversity as well as the underlying metabolic pathways. Over the past few years, different methods have been developed for the taxonomic and functional characterization of metagenomic shotgun sequences. However, the taxonomic classification of metagenomic sequences from novel species without close homologue in the biological sequence databases poses a challenge due to the high number of wrong taxonomic predictions on lower taxonomic ranks. Here we present CARMA3, a new method for the taxonomic classification of assembled and unassembled metagenomic sequences that has been adapted to work with both BLAST and HMMER3 homology searches. We show that our method makes fewer wrong taxonomic predictions (at the same sensitivity) than other BLAST-based methods. CARMA3 is freely accessible via the web application WebCARMA from http://webcarma.cebitec.uni-bielefeld.de.


Asunto(s)
Algoritmos , Metagenómica/métodos , Clasificación/métodos , Bases de Datos de Proteínas , Filogenia , Alineación de Secuencia/métodos
9.
BMC Bioinformatics ; 10: 430, 2009 Dec 18.
Artículo en Inglés | MEDLINE | ID: mdl-20021646

RESUMEN

BACKGROUND: Metagenomics is a new field of research on natural microbial communities. High-throughput sequencing techniques like 454 or Solexa-Illumina promise new possibilities as they are able to produce huge amounts of data in much shorter time and with less efforts and costs than the traditional Sanger technique. But the data produced comes in even shorter reads (35-100 basepairs with Illumina, 100-500 basepairs with 454-sequencing). CARMA is a new software pipeline for the characterisation of species composition and the genetic potential of microbial samples using short, unassembled reads. RESULTS: In this paper, we introduce WebCARMA, a refined version of CARMA available as a web application for the taxonomic and functional classification of unassembled (ultra-)short reads from metagenomic communities. In addition, we have analysed the applicability of ultra-short reads in metagenomics. CONCLUSIONS: We show that unassembled reads as short as 35 bp can be used for the taxonomic classification of a metagenome. The web application is freely available at http://webcarma.cebitec.uni-bielefeld.de.


Asunto(s)
Biología Computacional/métodos , Genoma , Genómica/métodos , Internet , Metagenómica/métodos , Programas Informáticos
10.
Bioinformatics ; 23(5): 629-30, 2007 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-17237063

RESUMEN

UNLABELLED: Suffix tree is one of the most fundamental data structures in string algorithms and biological sequence analysis. Unfortunately, when it comes to implementing those algorithms and applying them to real genomic sequences, often the main memory size becomes the bottleneck. This is easily explained by the fact that while a DNA sequence of length n from alphabet sigma = {A, C, G, T} can be stored in n log absolute value(sigma) = 2n bits, its suffix tree occupies O(n log n) bits. In practice, the size difference easily reaches factor 50. We provide an implementation of the compressed suffix tree very recently proposed by Sadakane (Theory of Computing Systems, in press). The compressed suffix tree occupies space proportional to the text size, i.e. O(n log) absolute value(sigma)) bits, and supports all typical suffix tree operations with at most log n factor slowdown. Our experiments show that, e.g. on a 10 MB DNA sequence, the compressed suffix tree takes 10% of the space of normal suffix tree. Typical operations are slowed down by factor 60. AVAILABILITY: The C++ implementation under GNU license is available at http://www.cs.helsinki.fi/group/suds/cst/. An example program implementing a typical pattern discovery task is included. Experimental results in this note correspond to version 0.95.


Asunto(s)
Algoritmos , Genómica/métodos , Biología Computacional , ADN/química , Lenguajes de Programación , Programas Informáticos
11.
Bioinformatics ; 22(6): 762-4, 2006 Mar 15.
Artículo en Inglés | MEDLINE | ID: mdl-16403789

RESUMEN

MOTIVATION: RNA secondary structure analysis often requires searching for potential helices in large sequence data. RESULTS: We present a utility program GUUGle that efficiently locates potential helical regions under RNA base pairing rules, which include Watson-Crick as well as G-U pairs. It accepts a positive and a negative set of sequences, and determines all exact matches under RNA rules between positive and negative sequences that exceed a specified length. The GUUGle algorithm can also be adapted to use a precomputed suffix array of the positive sequence set. We show how this program can be effectively used as a filter preceding a more computationally expensive task such as miRNA target prediction. AVAILABILITY: GUUGle is available via the Bielefeld Bioinformatics Server at http://bibiserv.techfak.uni-bielefeld.de/guugle


Asunto(s)
Algoritmos , Emparejamiento Base/genética , Fosfatos de Dinucleósidos/genética , ARN/genética , Alineación de Secuencia/métodos , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Secuencia de Bases , Datos de Secuencia Molecular
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA