Búsqueda | Portal Regional de la BVS

Compression of structured high-throughput sequencing data.

Campagne, Fabien; Dorff, Kevin C; Chambwe, Nyasha; Robinson, James T; Mesirov, Jill P.

PLoS One ; 8(11): e79871, 2013.

Artículo en Inglés | MEDLINE | ID: mdl-24260313

RESUMEN

Large biological datasets are being produced at a rapid pace and create substantial storage challenges, particularly in the domain of high-throughput sequencing (HTS). Most approaches currently used to store HTS data are either unable to quickly adapt to the requirements of new sequencing or analysis methods (because they do not support schema evolution), or fail to provide state of the art compression of the datasets. We have devised new approaches to store HTS data that support seamless data schema evolution and compress datasets substantially better than existing approaches. Building on these new approaches, we discuss and demonstrate how a multi-tier data organization can dramatically reduce the storage, computational and network burden of collecting, analyzing, and archiving large sequencing datasets. For instance, we show that spliced RNA-Seq alignments can be stored in less than 4% the size of a BAM file with perfect data fidelity. Compared to the previous compression state of the art, these methods reduce dataset size more than 40% when storing exome, gene expression or DNA methylation datasets. The approaches have been integrated in a comprehensive suite of software tools (http://goby.campagnelab.org) that support common analyses for a range of high-throughput sequencing assays.

Asunto(s)

Biología Computacional/métodos , Compresión de Datos/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos

GobyWeb: simplified management and analysis of gene expression and DNA methylation sequencing data.

Dorff, Kevin C; Chambwe, Nyasha; Zeno, Zachary; Simi, Manuele; Shaknovich, Rita; Campagne, Fabien.

PLoS One ; 8(7): e69666, 2013.

Artículo en Inglés | MEDLINE | ID: mdl-23936070

RESUMEN

We present GobyWeb, a web-based system that facilitates the management and analysis of high-throughput sequencing (HTS) projects. The software provides integrated support for a broad set of HTS analyses and offers a simple plugin extension mechanism. Analyses currently supported include quantification of gene expression for messenger and small RNA sequencing, estimation of DNA methylation (i.e., reduced bisulfite sequencing and whole genome methyl-seq), or the detection of pathogens in sequenced data. In contrast to previous analysis pipelines developed for analysis of HTS data, GobyWeb requires significantly less storage space, runs analyses efficiently on a parallel grid, scales gracefully to process tens or hundreds of multi-gigabyte samples, yet can be used effectively by researchers who are comfortable using a web browser. We conducted performance evaluations of the software and found it to either outperform or have similar performance to analysis programs developed for specialized analyses of HTS data. We found that most biologists who took a one-hour GobyWeb training session were readily able to analyze RNA-Seq data with state of the art analysis tools. GobyWeb can be obtained at http://gobyweb.campagnelab.org and is freely available for non-commercial use. GobyWeb plugins are distributed in source code and licensed under the open source LGPL3 license to facilitate code inspection, reuse and independent extensions http://github.com/CampagneLaboratory/gobyweb2-plugins.

Asunto(s)

Metilación de ADN/genética , Sistemas de Administración de Bases de Datos , Regulación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Internet , Programas Informáticos , Secuencia de Bases , Genómica , Humanos , Empalme del ARN/genética , Interfaz Usuario-Computador

The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models.

Shi, Leming; Campbell, Gregory; Jones, Wendell D; Campagne, Fabien; Wen, Zhining; Walker, Stephen J; Su, Zhenqiang; Chu, Tzu-Ming; Goodsaid, Federico M; Pusztai, Lajos; Shaughnessy, John D; Oberthuer, André; Thomas, Russell S; Paules, Richard S; Fielden, Mark; Barlogie, Bart; Chen, Weijie; Du, Pan; Fischer, Matthias; Furlanello, Cesare; Gallas, Brandon D; Ge, Xijin; Megherbi, Dalila B; Symmans, W Fraser; Wang, May D; Zhang, John; Bitter, Hans; Brors, Benedikt; Bushel, Pierre R; Bylesjo, Max; Chen, Minjun; Cheng, Jie; Cheng, Jing; Chou, Jeff; Davison, Timothy S; Delorenzi, Mauro; Deng, Youping; Devanarayan, Viswanath; Dix, David J; Dopazo, Joaquin; Dorff, Kevin C; Elloumi, Fathi; Fan, Jianqing; Fan, Shicai; Fan, Xiaohui; Fang, Hong; Gonzaludo, Nina; Hess, Kenneth R; Hong, Huixiao; Huan, Jun.

Nat Biotechnol ; 28(8): 827-38, 2010 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-20676074

RESUMEN

Gene expression data from microarrays are being applied to predict preclinical and clinical endpoints, but the reliability of these predictions has not been established. In the MAQC-II project, 36 independent teams analyzed six microarray data sets to generate predictive models for classifying a sample with respect to one of 13 endpoints indicative of lung or liver toxicity in rodents, or of breast cancer, multiple myeloma or neuroblastoma in humans. In total, >30,000 models were built using many combinations of analytical methods. The teams generated predictive models without knowing the biological meaning of some of the endpoints and, to mimic clinical reality, tested the models on data that had not been used for training. We found that model performance depended largely on the endpoint and team proficiency and that different approaches generated models of similar performance. The conclusions and recommendations from MAQC-II should be useful for regulatory agencies, study committees and independent investigators that evaluate methods for global gene expression analysis.

Asunto(s)

Hepatopatías/genética , Enfermedades Pulmonares/genética , Neoplasias/genética , Neoplasias/mortalidad , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/normas , Animales , Neoplasias de la Mama/diagnóstico , Neoplasias de la Mama/genética , Modelos Animales de Enfermedad , Femenino , Perfilación de la Expresión Génica/métodos , Perfilación de la Expresión Génica/normas , Guías como Asunto , Humanos , Hepatopatías/etiología , Hepatopatías/patología , Enfermedades Pulmonares/etiología , Enfermedades Pulmonares/patología , Mieloma Múltiple/diagnóstico , Mieloma Múltiple/genética , Neoplasias/diagnóstico , Neuroblastoma/diagnóstico , Neuroblastoma/genética , Valor Predictivo de las Pruebas , Control de Calidad , Ratas , Análisis de Supervivencia

BDVal: reproducible large-scale predictive model development and validation in high-throughput datasets.

Dorff, Kevin C; Chambwe, Nyasha; Srdanovic, Marko; Campagne, Fabien.

Bioinformatics ; 26(19): 2472-3, 2010 Oct 01.

Artículo en Inglés | MEDLINE | ID: mdl-20702395

RESUMEN

UNLABELLED: High-throughput data can be used in conjunction with clinical information to develop predictive models. Automating the process of developing, evaluating and testing such predictive models on different datasets would minimize operator errors and facilitate the comparison of different modeling approaches on the same dataset. Complete automation would also yield unambiguous documentation of the process followed to develop each model. We present the BDVal suite of programs that fully automate the construction of predictive classification models from high-throughput data and generate detailed reports about the model construction process. We have used BDVal to construct models from microarray and proteomics data, as well as from DNA-methylation datasets. The programs are designed for scalability and support the construction of thousands of alternative models from a given dataset and prediction task. AVAILABILITY AND IMPLEMENTATION: The BDVal programs are implemented in Java, provided under the GNU General Public License and freely available at http://bdval.campagnelab.org.

Asunto(s)

Biología Computacional/métodos , Modelos Biológicos , Programas Informáticos , Algoritmos , Metilación de ADN , Bases de Datos Genéticas

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA