Your browser doesn't support javascript.
loading
EnsembleCNV: an ensemble machine learning algorithm to identify and genotype copy number variation using SNP array data.
Zhang, Zhongyang; Cheng, Haoxiang; Hong, Xiumei; Di Narzo, Antonio F; Franzen, Oscar; Peng, Shouneng; Ruusalepp, Arno; Kovacic, Jason C; Bjorkegren, Johan L M; Wang, Xiaobin; Hao, Ke.
Afiliação
  • Zhang Z; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
  • Cheng H; Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
  • Hong X; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
  • Di Narzo AF; Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
  • Franzen O; Center on the Early Life Origins of Disease, Department of Population, Family and Reproductive Health, Johns Hopkins University Bloomberg School of Public Health, Baltimore, MD 21205, USA.
  • Peng S; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
  • Ruusalepp A; Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
  • Kovacic JC; Integrated Cardio Metabolic Centre, Department of Medicine, Karolinska Institutet, Karolinska Universitetssjukhuset, Huddinge, Sweden.
  • Bjorkegren JLM; Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
  • Wang X; Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
  • Hao K; Department of Cardiac Surgery, Tartu University Hospital, Tartu, Estonia.
Nucleic Acids Res ; 47(7): e39, 2019 04 23.
Article em En | MEDLINE | ID: mdl-30722045
ABSTRACT
The associations between diseases/traits and copy number variants (CNVs) have not been systematically investigated in genome-wide association studies (GWASs), primarily due to a lack of robust and accurate tools for CNV genotyping. Herein, we propose a novel ensemble learning framework, ensembleCNV, to detect and genotype CNVs using single nucleotide polymorphism (SNP) array data. EnsembleCNV (a) identifies and eliminates batch effects at raw data level; (b) assembles individual CNV calls into CNV regions (CNVRs) from multiple existing callers with complementary strengths by a heuristic algorithm; (c) re-genotypes each CNVR with local likelihood model adjusted by global information across multiple CNVRs; (d) refines CNVR boundaries by local correlation structure in copy number intensities; (e) provides direct CNV genotyping accompanied with confidence score, directly accessible for downstream quality control and association analysis. Benchmarked on two large datasets, ensembleCNV outperformed competing methods and achieved a high call rate (93.3%) and reproducibility (98.6%), while concurrently achieving high sensitivity by capturing 85% of common CNVs documented in the 1000 Genomes Project. Given this CNV call rate and accuracy, which are comparable to SNP genotyping, we suggest ensembleCNV holds significant promise for performing genome-wide CNV association studies and investigating how CNVs predispose to human diseases.
Assuntos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Polimorfismo de Nucleotídeo Único / Variações do Número de Cópias de DNA / Técnicas de Genotipagem / Aprendizado de Máquina Tipo de estudo: Prognostic_studies Limite: Humans Idioma: En Ano de publicação: 2019 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Polimorfismo de Nucleotídeo Único / Variações do Número de Cópias de DNA / Técnicas de Genotipagem / Aprendizado de Máquina Tipo de estudo: Prognostic_studies Limite: Humans Idioma: En Ano de publicação: 2019 Tipo de documento: Article