Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
1.
PLoS One ; 8(11): e79871, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24260313

RESUMEN

Large biological datasets are being produced at a rapid pace and create substantial storage challenges, particularly in the domain of high-throughput sequencing (HTS). Most approaches currently used to store HTS data are either unable to quickly adapt to the requirements of new sequencing or analysis methods (because they do not support schema evolution), or fail to provide state of the art compression of the datasets. We have devised new approaches to store HTS data that support seamless data schema evolution and compress datasets substantially better than existing approaches. Building on these new approaches, we discuss and demonstrate how a multi-tier data organization can dramatically reduce the storage, computational and network burden of collecting, analyzing, and archiving large sequencing datasets. For instance, we show that spliced RNA-Seq alignments can be stored in less than 4% the size of a BAM file with perfect data fidelity. Compared to the previous compression state of the art, these methods reduce dataset size more than 40% when storing exome, gene expression or DNA methylation datasets. The approaches have been integrated in a comprehensive suite of software tools (http://goby.campagnelab.org) that support common analyses for a range of high-throughput sequencing assays.


Asunto(s)
Biología Computacional/métodos , Compresión de Datos/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos
2.
PLoS One ; 8(7): e69666, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23936070

RESUMEN

We present GobyWeb, a web-based system that facilitates the management and analysis of high-throughput sequencing (HTS) projects. The software provides integrated support for a broad set of HTS analyses and offers a simple plugin extension mechanism. Analyses currently supported include quantification of gene expression for messenger and small RNA sequencing, estimation of DNA methylation (i.e., reduced bisulfite sequencing and whole genome methyl-seq), or the detection of pathogens in sequenced data. In contrast to previous analysis pipelines developed for analysis of HTS data, GobyWeb requires significantly less storage space, runs analyses efficiently on a parallel grid, scales gracefully to process tens or hundreds of multi-gigabyte samples, yet can be used effectively by researchers who are comfortable using a web browser. We conducted performance evaluations of the software and found it to either outperform or have similar performance to analysis programs developed for specialized analyses of HTS data. We found that most biologists who took a one-hour GobyWeb training session were readily able to analyze RNA-Seq data with state of the art analysis tools. GobyWeb can be obtained at http://gobyweb.campagnelab.org and is freely available for non-commercial use. GobyWeb plugins are distributed in source code and licensed under the open source LGPL3 license to facilitate code inspection, reuse and independent extensions http://github.com/CampagneLaboratory/gobyweb2-plugins.


Asunto(s)
Metilación de ADN/genética , Sistemas de Administración de Bases de Datos , Regulación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Internet , Programas Informáticos , Secuencia de Bases , Genómica , Humanos , Empalme del ARN/genética , Interfaz Usuario-Computador
3.
Bioinformatics ; 26(19): 2472-3, 2010 Oct 01.
Artículo en Inglés | MEDLINE | ID: mdl-20702395

RESUMEN

UNLABELLED: High-throughput data can be used in conjunction with clinical information to develop predictive models. Automating the process of developing, evaluating and testing such predictive models on different datasets would minimize operator errors and facilitate the comparison of different modeling approaches on the same dataset. Complete automation would also yield unambiguous documentation of the process followed to develop each model. We present the BDVal suite of programs that fully automate the construction of predictive classification models from high-throughput data and generate detailed reports about the model construction process. We have used BDVal to construct models from microarray and proteomics data, as well as from DNA-methylation datasets. The programs are designed for scalability and support the construction of thousands of alternative models from a given dataset and prediction task. AVAILABILITY AND IMPLEMENTATION: The BDVal programs are implemented in Java, provided under the GNU General Public License and freely available at http://bdval.campagnelab.org.


Asunto(s)
Biología Computacional/métodos , Modelos Biológicos , Programas Informáticos , Algoritmos , Metilación de ADN , Bases de Datos Genéticas
4.
Nat Biotechnol ; 28(8): 827-38, 2010 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-20676074

RESUMEN

Gene expression data from microarrays are being applied to predict preclinical and clinical endpoints, but the reliability of these predictions has not been established. In the MAQC-II project, 36 independent teams analyzed six microarray data sets to generate predictive models for classifying a sample with respect to one of 13 endpoints indicative of lung or liver toxicity in rodents, or of breast cancer, multiple myeloma or neuroblastoma in humans. In total, >30,000 models were built using many combinations of analytical methods. The teams generated predictive models without knowing the biological meaning of some of the endpoints and, to mimic clinical reality, tested the models on data that had not been used for training. We found that model performance depended largely on the endpoint and team proficiency and that different approaches generated models of similar performance. The conclusions and recommendations from MAQC-II should be useful for regulatory agencies, study committees and independent investigators that evaluate methods for global gene expression analysis.


Asunto(s)
Hepatopatías/genética , Enfermedades Pulmonares/genética , Neoplasias/genética , Neoplasias/mortalidad , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/normas , Animales , Neoplasias de la Mama/diagnóstico , Neoplasias de la Mama/genética , Modelos Animales de Enfermedad , Femenino , Perfilación de la Expresión Génica/métodos , Perfilación de la Expresión Génica/normas , Guías como Asunto , Humanos , Hepatopatías/etiología , Hepatopatías/patología , Enfermedades Pulmonares/etiología , Enfermedades Pulmonares/patología , Mieloma Múltiple/diagnóstico , Mieloma Múltiple/genética , Neoplasias/diagnóstico , Neuroblastoma/diagnóstico , Neuroblastoma/genética , Valor Predictivo de las Pruebas , Control de Calidad , Ratas , Análisis de Supervivencia
5.
Bioinformatics ; 26(14): 1804-5, 2010 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-20501551

RESUMEN

SUMMARY: Rapid expansion of available data about G Protein Coupled Receptor (GPCR) dimers/oligomers over the past few years requires an effective system to organize this information electronically. Based on an ontology derived from a community dialog involving colleagues using experimental and computational methodologies, we developed the GPCR-Oligomerization Knowledge Base (GPCR-OKB). GPCR-OKB is a system that supports browsing and searching for GPCR oligomer data. Such data were manually derived from the literature. While focused on GPCR oligomers, GPCR-OKB is seamlessly connected to GPCRDB, facilitating the correlation of information about GPCR protomers and oligomers. AVAILABILITY AND IMPLEMENTATION: The GPCR-OKB web application is freely available at http://www.gpcr-okb.org


Asunto(s)
Receptores Acoplados a Proteínas G/química , Programas Informáticos , Bases de Datos Factuales , Internet , Bases del Conocimiento
6.
Proteomics Clin Appl ; 3(9): 1052-1061, 2009 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-21127740

RESUMEN

Knowledge of the biologically relevant components of human tissues has enabled the invention of numerous clinically useful diagnostic tests, as well as non-invasive ways of monitoring disease and its response to treatment. Recent use of advanced MS-based proteomics revealed that the composition of human urine is more complex than anticipated. Here, we extend the current characterization of the human urinary proteome by extensively fractionating urine using ultra-centrifugation, gel electrophoresis, ion exchange and reverse-phase chromatography, effectively reducing mixture complexity while minimizing loss of material. By using high-accuracy mass measurements of the linear ion trap-Orbitrap mass spectrometer and LC-MS/MS of peptides generated from such extensively fractionated specimens, we identified 2362 proteins in routinely collected individual urine specimens, including more than 1000 proteins not described in previous studies. Many of these are biomedically significant molecules, including glomerularly filtered cytokines and shed cell surface molecules, as well as renally and urogenitally produced transporters and structural proteins. Annotation of the identified proteome reveals distinct patterns of enrichment, consistent with previously described specific physiologic mechanisms, including 336 proteins that appear to be expressed by a variety of distal organs and glomerularly filtered from serum. Comparison of the proteomes identified from 12 individual specimens revealed a subset of generally invariant proteins, as well as individually variable ones, suggesting that our approach may be used to study individual differences in age, physiologic state and clinical condition. Consistent with this, annotation of the identified proteome by using machine learning and text mining exposed possible associations with 27 common and more than 500 rare human diseases, establishing a widely useful resource for the study of human pathophysiology and biomarker discovery.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA