Assessing the reliability of point mutation as data augmentation for deep learning with genomic data.

Lee, Hyunjung; Ozbulak, Utku; Park, Homin; Depuydt, Stephen; De Neve, Wesley; Vankerschaver, Joris

Lee, Hyunjung; Ozbulak, Utku; Park, Homin; Depuydt, Stephen; De Neve, Wesley; Vankerschaver, Joris.

Afiliação

Lee H; Korea University, Seoul, South Korea.
Ozbulak U; Center for Biosystems and Biotech Data Science, Ghent University Global Campus, Incheon, South Korea.
Park H; Center for Biosystems and Biotech Data Science, Ghent University Global Campus, Incheon, South Korea.
Depuydt S; IDLab, Department of Electronics and Information Systems, Ghent University, Ghent, Belgium.
De Neve W; Erasmus Brussels University of Applied Sciences and Arts, Brussels, Belgium.
Vankerschaver J; Center for Biosystems and Biotech Data Science, Ghent University Global Campus, Incheon, South Korea.

BMC Bioinformatics ; 25(1): 170, 2024 Apr 30.

Article em En | MEDLINE | ID: mdl-38689247

ABSTRACT

ABSTRACT

BACKGROUND:

Deep neural networks (DNNs) have the potential to revolutionize our understanding and treatment of genetic diseases. An inherent limitation of deep neural networks, however, is their high demand for data during training. To overcome this challenge, other fields, such as computer vision, use various data augmentation techniques to artificially increase the available training data for DNNs. Unfortunately, most data augmentation techniques used in other domains do not transfer well to genomic data.

RESULTS:

Most genomic data possesses peculiar properties and data augmentations may significantly alter the intrinsic properties of the data. In this work, we propose a novel data augmentation technique for genomic data inspired by biology point mutations. By employing point mutations as substitutes for codons, we demonstrate that our newly proposed data augmentation technique enhances the performance of DNNs across various genomic tasks that involve coding regions, such as translation initiation and splice site detection.

CONCLUSION:

Silent and missense mutations are found to positively influence effectiveness, while nonsense mutations and random mutations in non-coding regions generally lead to degradation. Overall, point mutation-based augmentations in genomic datasets present valuable opportunities for improving the accuracy and reliability of predictive models for DNA sequences.

Assuntos

Aprendizado Profundo; Genômica; Mutação Puntual; Genômica/métodos; Humanos; Reprodutibilidade dos Testes; Redes Neurais de Computação

Palavras-chave

Data augmentation; Deep learning; Point mutations; Splicing; Translation initiation

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Mutação Puntual / Genômica / Aprendizado Profundo Limite: Humans Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google