Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
Add more filters










Database
Language
Publication year range
1.
Sci Rep ; 13(1): 17216, 2023 10 11.
Article in English | MEDLINE | ID: mdl-37821530

ABSTRACT

Artificial neural networks show promising performance in detecting correlations within data that are associated with specific outcomes. However, the black-box nature of such models can hinder the knowledge advancement in research fields by obscuring the decision process and preventing scientist to fully conceptualize predicted outcomes. Furthermore, domain experts like healthcare providers need explainable predictions to assess whether a predicted outcome can be trusted in high stakes scenarios and to help them integrating a model into their own routine. Therefore, interpretable models play a crucial role for the incorporation of machine learning into high stakes scenarios like healthcare. In this paper we introduce Convolutional Motif Kernel Networks, a neural network architecture that involves learning a feature representation within a subspace of the reproducing kernel Hilbert space of the position-aware motif kernel function. The resulting model enables to directly interpret and evaluate prediction outcomes by providing a biologically and medically meaningful explanation without the need for additional post-hoc analysis. We show that our model is able to robustly learn on small datasets and reaches state-of-the-art performance on relevant healthcare prediction tasks. Our proposed method can be utilized on DNA and protein sequences. Furthermore, we show that the proposed method learns biologically meaningful concepts directly from data using an end-to-end learning scheme.


Subject(s)
Algorithms , Neural Networks, Computer , Machine Learning
2.
Bioinformatics ; 39(39 Suppl 1): i86-i93, 2023 06 30.
Article in English | MEDLINE | ID: mdl-37387133

ABSTRACT

MOTIVATION: Machine learning methods can be used to support scientific discovery in healthcare-related research fields. However, these methods can only be reliably used if they can be trained on high-quality and curated datasets. Currently, no such dataset for the exploration of Plasmodium falciparum protein antigen candidates exists. The parasite P.falciparum causes the infectious disease malaria. Thus, identifying potential antigens is of utmost importance for the development of antimalarial drugs and vaccines. Since exploring antigen candidates experimentally is an expensive and time-consuming process, applying machine learning methods to support this process has the potential to accelerate the development of drugs and vaccines, which are needed for fighting and controlling malaria. RESULTS: We developed PlasmoFAB, a curated benchmark that can be used to train machine learning methods for the exploration of P.falciparum protein antigen candidates. We combined an extensive literature search with domain expertise to create high-quality labels for P.falciparum specific proteins that distinguish between antigen candidates and intracellular proteins. Additionally, we used our benchmark to compare different well-known prediction models and available protein localization prediction services on the task of identifying protein antigen candidates. We show that available general-purpose services are unable to provide sufficient performance on identifying protein antigen candidates and are outperformed by our models that were trained on this tailored data. AVAILABILITY AND IMPLEMENTATION: PlasmoFAB is publicly available on Zenodo with DOI 10.5281/zenodo.7433087. Furthermore, all scripts that were used in the creation of PlasmoFAB and the training and evaluation of machine learning models are open source and publicly available on GitHub here: https://github.com/msmdev/PlasmoFAB.


Subject(s)
Benchmarking , Malaria, Falciparum , Humans , Plasmodium falciparum , Machine Learning , Malaria, Falciparum/diagnosis , Protein Transport
3.
Bioinformatics ; 39(39 Suppl 1): i76-i85, 2023 06 30.
Article in English | MEDLINE | ID: mdl-37387152

ABSTRACT

MOTIVATION: The size of available omics datasets is steadily increasing with technological advancement in recent years. While this increase in sample size can be used to improve the performance of relevant prediction tasks in healthcare, models that are optimized for large datasets usually operate as black boxes. In high-stakes scenarios, like healthcare, using a black-box model poses safety and security issues. Without an explanation about molecular factors and phenotypes that affected the prediction, healthcare providers are left with no choice but to blindly trust the models. We propose a new type of artificial neural network, named Convolutional Omics Kernel Network (COmic). By combining convolutional kernel networks with pathway-induced kernels, our method enables robust and interpretable end-to-end learning on omics datasets ranging in size from a few hundred to several hundreds of thousands of samples. Furthermore, COmic can be easily adapted to utilize multiomics data. RESULTS: We evaluated the performance capabilities of COmic on six different breast cancer cohorts. Additionally, we trained COmic models on multiomics data using the METABRIC cohort. Our models performed either better or similar to competitors on both tasks. We show how the use of pathway-induced Laplacian kernels opens the black-box nature of neural networks and results in intrinsically interpretable models that eliminate the need for post hoc explanation models. AVAILABILITY AND IMPLEMENTATION: Datasets, labels, and pathway-induced graph Laplacians used for the single-omics tasks can be downloaded at https://ibm.ent.box.com/s/ac2ilhyn7xjj27r0xiwtom4crccuobst/folder/48027287036. While datasets and graph Laplacians for the METABRIC cohort can be downloaded from the above mentioned repository, the labels have to be downloaded from cBioPortal at https://www.cbioportal.org/study/clinicalData?id=brca\_metabric. COmic source code as well as all scripts necessary to reproduce the experiments and analysis are publicly available at https://github.com/jditz/comics.


Subject(s)
Algorithms , Neural Networks, Computer , Software , Multiomics , Phenotype
4.
Nat Methods ; 19(2): 159-170, 2022 02.
Article in English | MEDLINE | ID: mdl-35027767

ABSTRACT

Computational trajectory inference enables the reconstruction of cell state dynamics from single-cell RNA sequencing experiments. However, trajectory inference requires that the direction of a biological process is known, largely limiting its application to differentiating systems in normal development. Here, we present CellRank ( https://cellrank.org ) for single-cell fate mapping in diverse scenarios, including regeneration, reprogramming and disease, for which direction is unknown. Our approach combines the robustness of trajectory inference with directional information from RNA velocity, taking into account the gradual and stochastic nature of cellular fate decisions, as well as uncertainty in velocity vectors. On pancreas development data, CellRank automatically detects initial, intermediate and terminal populations, predicts fate potentials and visualizes continuous gene expression trends along individual lineages. Applied to lineage-traced cellular reprogramming data, predicted fate probabilities correctly recover reprogramming outcomes. CellRank also predicts a new dedifferentiation trajectory during postinjury lung regeneration, including previously unknown intermediate cell states, which we confirm experimentally.


Subject(s)
Algorithms , Computational Biology/methods , Pancreas, Exocrine/cytology , Single-Cell Analysis/methods , Software , Animals , Cell Differentiation/genetics , Cell Lineage , Cellular Reprogramming , Humans , Lung/cytology , RNA , Regeneration
5.
J Chem Phys ; 150(17): 174103, 2019 May 07.
Article in English | MEDLINE | ID: mdl-31067901

ABSTRACT

Markov state models are to date the gold standard for modeling molecular kinetics since they enable the identification and analysis of metastable states and related kinetics in a very instructive manner. The state-of-the-art Markov state modeling methods and tools are very well developed for the modeling of reversible processes in closed equilibrium systems. On the contrary, they are largely not well suited to deal with nonreversible or even nonautonomous processes of nonequilibrium systems. Thus, we generalized the common Robust Perron Cluster Cluster Analysis (PCCA+) method to enable straightforward modeling of nonequilibrium systems as well. The resulting Generalized PCCA (G-PCCA) method readily handles equilibrium as well as nonequilibrium data by utilizing real Schur vectors instead of eigenvectors. This is implemented in the G-PCCA algorithm that enables the semiautomatic coarse graining of molecular kinetics. G-PCCA is not limited to the detection of metastable states but also enables the identification and modeling of cyclic processes. This is demonstrated by three typical examples of nonreversible systems.

6.
J Chem Theory Comput ; 14(7): 3579-3594, 2018 Jul 10.
Article in English | MEDLINE | ID: mdl-29812922

ABSTRACT

Markov state models (MSMs) have received an unabated increase in popularity in recent years, as they are very well suited for the identification and analysis of metastable states and related kinetics. However, the state-of-the-art Markov state modeling methods and tools enforce the fulfillment of a detailed balance condition, restricting their applicability to equilibrium MSMs. To date, they are unsuitable to deal with general dominant data structures including cyclic processes, which are essentially associated with nonequilibrium systems. To overcome this limitation, we developed a generalization of the common robust Perron Cluster Cluster Analysis (PCCA+) method, termed generalized PCCA (G-PCCA). This method handles equilibrium and nonequilibrium simulation data, utilizing Schur vectors instead of eigenvectors. G-PCCA is not limited to the detection of metastable states but enables the identification of dominant structures in a general sense, unraveling cyclic processes. This is exemplified by application of G-PCCA on nonequilibrium molecular dynamics data of the Amyloid ß (1-40) peptide, periodically driven by an oscillating electric field.


Subject(s)
Amyloid beta-Peptides/chemistry , Peptide Fragments/chemistry , Algorithms , Cluster Analysis , Electricity , Kinetics , Markov Chains , Molecular Dynamics Simulation
SELECTION OF CITATIONS
SEARCH DETAIL
...