DeepCNV: a deep learning approach for authenticating copy number variations.

Glessner, Joseph T; Hou, Xiurui; Zhong, Cheng; Zhang, Jie; Khan, Munir; Brand, Fabian; Krawitz, Peter; Sleiman, Patrick M A; Hakonarson, Hakon; Wei, Zhi

Glessner, Joseph T; Hou, Xiurui; Zhong, Cheng; Zhang, Jie; Khan, Munir; Brand, Fabian; Krawitz, Peter; Sleiman, Patrick M A; Hakonarson, Hakon; Wei, Zhi.

Afiliação

Glessner JT; Center for Applied Genomics, Department of Human Genetics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
Hou X; Perelman School of Medicine, Department of Pediatrics, University of Pennsylvania, Philadelphia, PA 19102, USA.
Zhong C; Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, USA.
Zhang J; Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, USA.
Khan M; Adobe Inc., San Jose, CA 95110, USA.
Brand F; Center for Applied Genomics, Department of Human Genetics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
Krawitz P; Perelman School of Medicine, Department of Pediatrics, University of Pennsylvania, Philadelphia, PA 19102, USA.
Sleiman PMA; University of Bonn, 53113 Bonn, Germany.
Hakonarson H; University of Bonn, 53113 Bonn, Germany.
Wei Z; Perelman School of Medicine, Department of Pediatrics, University of Pennsylvania, Philadelphia, PA 19102, USA.

Brief Bioinform ; 22(5)2021 09 02.

Article em En | MEDLINE | ID: mdl-33429424

RESUMO

Copy number variations (CNVs) are an important class of variations contributing to the pathogenesis of many disease phenotypes. Detecting CNVs from genomic data remains difficult, and the most currently applied methods suffer from an unacceptably high false positive rate. A common practice is to have human experts manually review original CNV calls for filtering false positives before further downstream analysis or experimental validation. Here, we propose DeepCNV, a deep learning-based tool, intended to replace human experts when validating CNV calls, focusing on the calls made by one of the most accurate CNV callers, PennCNV. The sophistication of the deep neural network algorithm is enriched with over 10 000 expert-scored samples that are split into training and testing sets. Variant confidence, especially for CNVs, is a main roadblock impeding the progress of linking CNVs with the disease. We show that DeepCNV adds to the confidence of the CNV calls with an optimal area under the receiver operating characteristic curve of 0.909, exceeding other machine learning methods. The superiority of DeepCNV was also benchmarked and confirmed using an experimental wet-lab validation dataset. We conclude that the improvement obtained by DeepCNV results in significantly fewer false positive results and failures to replicate the CNV association results.

Assuntos

Variações do Número de Cópias de DNA; Aprendizado Profundo; Doença/genética; Genoma Humano; Área Sob a Curva; Benchmarking; Conjuntos de Dados como Assunto; Doença/classificação; Reações Falso-Positivas; Humanos; Curva ROC

Palavras-chave

copy number variation; deep learning

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Genoma Humano / Doença / Variações do Número de Cópias de DNA / Aprendizado Profundo Tipo de estudo: Prognostic_studies Limite: Humans Idioma: En Revista: Brief Bioinform Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2021 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google