Your browser doesn't support javascript.
loading
A self-supervised deep learning method for data-efficient training in genomics.
Gündüz, Hüseyin Anil; Binder, Martin; To, Xiao-Yin; Mreches, René; Bischl, Bernd; McHardy, Alice C; Münch, Philipp C; Rezaei, Mina.
Afiliación
  • Gündüz HA; Department of Statistics, LMU Munich, Munich, Germany.
  • Binder M; Munich Center for Machine Learning, Munich, Germany.
  • To XY; Department of Statistics, LMU Munich, Munich, Germany.
  • Mreches R; Munich Center for Machine Learning, Munich, Germany.
  • Bischl B; Department of Statistics, LMU Munich, Munich, Germany.
  • McHardy AC; Munich Center for Machine Learning, Munich, Germany.
  • Münch PC; Department for Computational Biology of Infection Research, Helmholtz Center for Infection Research, 38124, Braunschweig, Germany.
  • Rezaei M; Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany.
Commun Biol ; 6(1): 928, 2023 09 11.
Article en En | MEDLINE | ID: mdl-37696966
ABSTRACT
Deep learning in bioinformatics is often limited to problems where extensive amounts of labeled data are available for supervised classification. By exploiting unlabeled data, self-supervised learning techniques can improve the performance of machine learning models in the presence of limited labeled data. Although many self-supervised learning methods have been suggested before, they have failed to exploit the unique characteristics of genomic data. Therefore, we introduce Self-GenomeNet, a self-supervised learning technique that is custom-tailored for genomic data. Self-GenomeNet leverages reverse-complement sequences and effectively learns short- and long-term dependencies by predicting targets of different lengths. Self-GenomeNet performs better than other self-supervised methods in data-scarce genomic tasks and outperforms standard supervised training with ~10 times fewer labeled training data. Furthermore, the learned representations generalize well to new datasets and tasks. These findings suggest that Self-GenomeNet is well suited for large-scale, unlabeled genomic datasets and could substantially improve the performance of genomic models.
Asunto(s)

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: Aprendizaje Profundo Tipo de estudio: Prognostic_studies Idioma: En Revista: Commun Biol Año: 2023 Tipo del documento: Article País de afiliación: Alemania

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: Aprendizaje Profundo Tipo de estudio: Prognostic_studies Idioma: En Revista: Commun Biol Año: 2023 Tipo del documento: Article País de afiliación: Alemania