Search | VHL Regional Portal

1.

Optimized model architectures for deep learning on genomic data.

Gündüz, Hüseyin Anil; Mreches, René; Moosbauer, Julia; Robertson, Gary; To, Xiao-Yin; Franzosa, Eric A; Huttenhower, Curtis; Rezaei, Mina; McHardy, Alice C; Bischl, Bernd; Münch, Philipp C; Binder, Martin.

Commun Biol ; 7(1): 516, 2024 Apr 30.

Article in English | MEDLINE | ID: mdl-38693292

ABSTRACT

The success of deep learning in various applications depends on task-specific architecture design choices, including the types, hyperparameters, and number of layers. In computational biology, there is no consensus on the optimal architecture design, and decisions are often made using insights from more well-established fields such as computer vision. These may not consider the domain-specific characteristics of genome sequences, potentially limiting performance. Here, we present GenomeNet-Architect, a neural architecture design framework that automatically optimizes deep learning models for genome sequence data. It optimizes the overall layout of the architecture, with a search space specifically designed for genomics. Additionally, it optimizes hyperparameters of individual layers and the model training procedure. On a viral classification task, GenomeNet-Architect reduced the read-level misclassification rate by 19%, with 67% faster inference and 83% fewer parameters, and achieved similar contig-level accuracy with ~100 times fewer parameters compared to the best-performing deep learning baselines.

Subject(s)

Deep Learning , Genomics , Genomics/methods , Computational Biology/methods , Humans , Neural Networks, Computer

2.

Author Correction: Optimized model architectures for deep learning on genomic data.

Gündüz, Hüseyin Anil; Mreches, René; Moosbauer, Julia; Robertson, Gary; To, Xiao-Yin; Franzosa, Eric A; Huttenhower, Curtis; Rezaei, Mina; McHardy, Alice C; Bischl, Bernd; Münch, Philipp C; Binder, Martin.

Commun Biol ; 7(1): 625, 2024 May 23.

Article in English | MEDLINE | ID: mdl-38783006

3.

A self-supervised deep learning method for data-efficient training in genomics.

Gündüz, Hüseyin Anil; Binder, Martin; To, Xiao-Yin; Mreches, René; Bischl, Bernd; McHardy, Alice C; Münch, Philipp C; Rezaei, Mina.

Commun Biol ; 6(1): 928, 2023 09 11.

Article in English | MEDLINE | ID: mdl-37696966

ABSTRACT

Deep learning in bioinformatics is often limited to problems where extensive amounts of labeled data are available for supervised classification. By exploiting unlabeled data, self-supervised learning techniques can improve the performance of machine learning models in the presence of limited labeled data. Although many self-supervised learning methods have been suggested before, they have failed to exploit the unique characteristics of genomic data. Therefore, we introduce Self-GenomeNet, a self-supervised learning technique that is custom-tailored for genomic data. Self-GenomeNet leverages reverse-complement sequences and effectively learns short- and long-term dependencies by predicting targets of different lengths. Self-GenomeNet performs better than other self-supervised methods in data-scarce genomic tasks and outperforms standard supervised training with ~10 times fewer labeled training data. Furthermore, the learned representations generalize well to new datasets and tasks. These findings suggest that Self-GenomeNet is well suited for large-scale, unlabeled genomic datasets and could substantially improve the performance of genomic models.

Subject(s)

Deep Learning , Genomics , Computational Biology , Machine Learning

4.

Correction: A computational reproducibility study of PLOS ONE articles featuring longitudinal data analyses.

Seibold, Heidi; Czerny, Severin; Decke, Siona; Dieterle, Roman; Eder, Thomas; Fohr, Steffen; Hahn, Nico; Hartmann, Rabea; Heindl, Christoph; Kopper, Philipp; Lepke, Dario; Loidl, Verena; Mandl, Maximilian; Musiol, Sarah; Peter, Jessica; Piehler, Alexander; Rojas, Elio; Schmid, Stefanie; Schmidt, Hannah; Schmoll, Melissa; Schneider, Lennart; To, Xiao-Yin; Tran, Viet; Völker, Antje; Wagner, Moritz; Wagner, Joshua; Waize, Maria; Wecker, Hannah; Yang, Rui; Zellner, Simone; Nalenz, Malte.

PLoS One ; 17(5): e0269047, 2022.

Article in English | MEDLINE | ID: mdl-35604918

ABSTRACT

[This corrects the article DOI: 10.1371/journal.pone.0251194.].

5.

A computational reproducibility study of PLOS ONE articles featuring longitudinal data analyses.

Seibold, Heidi; Czerny, Severin; Decke, Siona; Dieterle, Roman; Eder, Thomas; Fohr, Steffen; Hahn, Nico; Hartmann, Rabea; Heindl, Christoph; Kopper, Philipp; Lepke, Dario; Loidl, Verena; Mandl, Maximilian; Musiol, Sarah; Peter, Jessica; Piehler, Alexander; Rojas, Elio; Schmid, Stefanie; Schmidt, Hannah; Schmoll, Melissa; Schneider, Lennart; To, Xiao-Yin; Tran, Viet; Völker, Antje; Wagner, Moritz; Wagner, Joshua; Waize, Maria; Wecker, Hannah; Yang, Rui; Zellner, Simone; Nalenz, Malte.

PLoS One ; 16(6): e0251194, 2021.

Article in English | MEDLINE | ID: mdl-34153038

ABSTRACT

Computational reproducibility is a corner stone for sound and credible research. Especially in complex statistical analyses-such as the analysis of longitudinal data-reproducing results is far from simple, especially if no source code is available. In this work we aimed to reproduce analyses of longitudinal data of 11 articles published in PLOS ONE. Inclusion criteria were the availability of data and author consent. We investigated the types of methods and software used and whether we were able to reproduce the data analysis using open source software. Most articles provided overview tables and simple visualisations. Generalised Estimating Equations (GEEs) were the most popular statistical models among the selected articles. Only one article used open source software and only one published part of the analysis code. Replication was difficult in most cases and required reverse engineering of results or contacting the authors. For three articles we were not able to reproduce the results, for another two only parts of them. For all but two articles we had to contact the authors to be able to reproduce the results. Our main learning is that reproducing papers is difficult if no code is supplied and leads to a high burden for those conducting the reproductions. Open data policies in journals are good, but to truly boost reproducibility we suggest adding open code policies.

Subject(s)

Computational Biology/methods , Data Analysis , Humans , Longitudinal Studies , Publications , Reproducibility of Results , Research Design , Software

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL