Pesquisa | Portal Regional da BVS

Compression of structured high-throughput sequencing data.

Campagne, Fabien; Dorff, Kevin C; Chambwe, Nyasha; Robinson, James T; Mesirov, Jill P.

PLoS One ; 8(11): e79871, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-24260313

RESUMO

Large biological datasets are being produced at a rapid pace and create substantial storage challenges, particularly in the domain of high-throughput sequencing (HTS). Most approaches currently used to store HTS data are either unable to quickly adapt to the requirements of new sequencing or analysis methods (because they do not support schema evolution), or fail to provide state of the art compression of the datasets. We have devised new approaches to store HTS data that support seamless data schema evolution and compress datasets substantially better than existing approaches. Building on these new approaches, we discuss and demonstrate how a multi-tier data organization can dramatically reduce the storage, computational and network burden of collecting, analyzing, and archiving large sequencing datasets. For instance, we show that spliced RNA-Seq alignments can be stored in less than 4% the size of a BAM file with perfect data fidelity. Compared to the previous compression state of the art, these methods reduce dataset size more than 40% when storing exome, gene expression or DNA methylation datasets. The approaches have been integrated in a comprehensive suite of software tools (http://goby.campagnelab.org) that support common analyses for a range of high-throughput sequencing assays.

Assuntos

Biologia Computacional/métodos , Compressão de Dados/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Software

GobyWeb: simplified management and analysis of gene expression and DNA methylation sequencing data.

Dorff, Kevin C; Chambwe, Nyasha; Zeno, Zachary; Simi, Manuele; Shaknovich, Rita; Campagne, Fabien.

PLoS One ; 8(7): e69666, 2013.

Artigo em Inglês | MEDLINE | ID: mdl-23936070

RESUMO

We present GobyWeb, a web-based system that facilitates the management and analysis of high-throughput sequencing (HTS) projects. The software provides integrated support for a broad set of HTS analyses and offers a simple plugin extension mechanism. Analyses currently supported include quantification of gene expression for messenger and small RNA sequencing, estimation of DNA methylation (i.e., reduced bisulfite sequencing and whole genome methyl-seq), or the detection of pathogens in sequenced data. In contrast to previous analysis pipelines developed for analysis of HTS data, GobyWeb requires significantly less storage space, runs analyses efficiently on a parallel grid, scales gracefully to process tens or hundreds of multi-gigabyte samples, yet can be used effectively by researchers who are comfortable using a web browser. We conducted performance evaluations of the software and found it to either outperform or have similar performance to analysis programs developed for specialized analyses of HTS data. We found that most biologists who took a one-hour GobyWeb training session were readily able to analyze RNA-Seq data with state of the art analysis tools. GobyWeb can be obtained at http://gobyweb.campagnelab.org and is freely available for non-commercial use. GobyWeb plugins are distributed in source code and licensed under the open source LGPL3 license to facilitate code inspection, reuse and independent extensions http://github.com/CampagneLaboratory/gobyweb2-plugins.

Assuntos

Metilação de DNA/genética , Sistemas de Gerenciamento de Base de Dados , Regulação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Internet , Software , Sequência de Bases , Genômica , Humanos , Splicing de RNA/genética , Interface Usuário-Computador

BDVal: reproducible large-scale predictive model development and validation in high-throughput datasets.

Dorff, Kevin C; Chambwe, Nyasha; Srdanovic, Marko; Campagne, Fabien.

Bioinformatics ; 26(19): 2472-3, 2010 Oct 01.

Artigo em Inglês | MEDLINE | ID: mdl-20702395

RESUMO

UNLABELLED: High-throughput data can be used in conjunction with clinical information to develop predictive models. Automating the process of developing, evaluating and testing such predictive models on different datasets would minimize operator errors and facilitate the comparison of different modeling approaches on the same dataset. Complete automation would also yield unambiguous documentation of the process followed to develop each model. We present the BDVal suite of programs that fully automate the construction of predictive classification models from high-throughput data and generate detailed reports about the model construction process. We have used BDVal to construct models from microarray and proteomics data, as well as from DNA-methylation datasets. The programs are designed for scalability and support the construction of thousands of alternative models from a given dataset and prediction task. AVAILABILITY AND IMPLEMENTATION: The BDVal programs are implemented in Java, provided under the GNU General Public License and freely available at http://bdval.campagnelab.org.

Assuntos

Biologia Computacional/métodos , Modelos Biológicos , Software , Algoritmos , Metilação de DNA , Bases de Dados Genéticas

The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models.

Shi, Leming; Campbell, Gregory; Jones, Wendell D; Campagne, Fabien; Wen, Zhining; Walker, Stephen J; Su, Zhenqiang; Chu, Tzu-Ming; Goodsaid, Federico M; Pusztai, Lajos; Shaughnessy, John D; Oberthuer, André; Thomas, Russell S; Paules, Richard S; Fielden, Mark; Barlogie, Bart; Chen, Weijie; Du, Pan; Fischer, Matthias; Furlanello, Cesare; Gallas, Brandon D; Ge, Xijin; Megherbi, Dalila B; Symmans, W Fraser; Wang, May D; Zhang, John; Bitter, Hans; Brors, Benedikt; Bushel, Pierre R; Bylesjo, Max; Chen, Minjun; Cheng, Jie; Cheng, Jing; Chou, Jeff; Davison, Timothy S; Delorenzi, Mauro; Deng, Youping; Devanarayan, Viswanath; Dix, David J; Dopazo, Joaquin; Dorff, Kevin C; Elloumi, Fathi; Fan, Jianqing; Fan, Shicai; Fan, Xiaohui; Fang, Hong; Gonzaludo, Nina; Hess, Kenneth R; Hong, Huixiao; Huan, Jun.

Nat Biotechnol ; 28(8): 827-38, 2010 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-20676074

RESUMO

Gene expression data from microarrays are being applied to predict preclinical and clinical endpoints, but the reliability of these predictions has not been established. In the MAQC-II project, 36 independent teams analyzed six microarray data sets to generate predictive models for classifying a sample with respect to one of 13 endpoints indicative of lung or liver toxicity in rodents, or of breast cancer, multiple myeloma or neuroblastoma in humans. In total, >30,000 models were built using many combinations of analytical methods. The teams generated predictive models without knowing the biological meaning of some of the endpoints and, to mimic clinical reality, tested the models on data that had not been used for training. We found that model performance depended largely on the endpoint and team proficiency and that different approaches generated models of similar performance. The conclusions and recommendations from MAQC-II should be useful for regulatory agencies, study committees and independent investigators that evaluate methods for global gene expression analysis.

Assuntos

Hepatopatias/genética , Pneumopatias/genética , Neoplasias/genética , Neoplasias/mortalidade , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Análise de Sequência com Séries de Oligonucleotídeos/normas , Animais , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/genética , Modelos Animais de Doenças , Feminino , Perfilação da Expressão Gênica/métodos , Perfilação da Expressão Gênica/normas , Guias como Assunto , Humanos , Hepatopatias/etiologia , Hepatopatias/patologia , Pneumopatias/etiologia , Pneumopatias/patologia , Mieloma Múltiplo/diagnóstico , Mieloma Múltiplo/genética , Neoplasias/diagnóstico , Neuroblastoma/diagnóstico , Neuroblastoma/genética , Valor Preditivo dos Testes , Controle de Qualidade , Ratos , Análise de Sobrevida

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA