Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings.

Sasse, Alexander; Ng, Bernard; Spiro, Anna E; Tasaki, Shinya; Bennett, David A; Gaiteri, Christopher; De Jager, Philip L; Chikina, Maria; Mostafavi, Sara

Sasse, Alexander; Ng, Bernard; Spiro, Anna E; Tasaki, Shinya; Bennett, David A; Gaiteri, Christopher; De Jager, Philip L; Chikina, Maria; Mostafavi, Sara.

Afiliação

Sasse A; Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA.
Ng B; Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA.
Spiro AE; Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA.
Tasaki S; Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA.
Bennett DA; Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA.
Gaiteri C; Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA.
De Jager PL; Department of Psychiatry, SUNY Upstate Medical University, Syracuse, NY, USA.
Chikina M; Center for Translational & Computational Neuroimmunology, Department of Neurology, and the Taub Institute for the Study of Alzheimer's Disease and the Aging Brain, Columbia University Irving Medical Center, New York, NY, USA.
Mostafavi S; Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, PA, USA. mchikina@gmail.com.

Nat Genet ; 55(12): 2060-2064, 2023 Dec.

Article em En | MEDLINE | ID: mdl-38036778

RESUMO

Deep learning methods have recently become the state of the art in a variety of regulatory genomic tasks1-6, including the prediction of gene expression from genomic DNA. As such, these methods promise to serve as important tools in interpreting the full spectrum of genetic variation observed in personal genomes. Previous evaluation strategies have assessed their predictions of gene expression across genomic regions; however, systematic benchmarking is lacking to assess their predictions across individuals, which would directly evaluate their utility as personal DNA interpreters. We used paired whole genome sequencing and gene expression from 839 individuals in the ROSMAP study7 to evaluate the ability of current methods to predict gene expression variation across individuals at varied loci. Our approach identifies a limitation of current methods to correctly predict the direction of variant effects. We show that this limitation stems from insufficiently learned sequence motif grammar and suggest new model training strategies to improve performance.

Assuntos

Benchmarking; Redes Neurais de Computação; Humanos; Sequência de Bases; DNA; Expressão Gênica

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Redes Neurais de Computação / Benchmarking Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Redes Neurais de Computação / Benchmarking Idioma: En Ano de publicação: 2023 Tipo de documento: Article