Search | VHL Regional Portal

1.

Alignment-Free Sequence Comparison: A Systematic Survey From a Machine Learning Perspective.

Bohnsack, Katrin Sophie; Kaden, Marika; Abel, Julia; Villmann, Thomas.

IEEE/ACM Trans Comput Biol Bioinform ; 20(1): 119-135, 2023.

Article in English | MEDLINE | ID: mdl-34990369

ABSTRACT

The encounter of large amounts of biological sequence data generated during the last decades and the algorithmic and hardware improvements have offered the possibility to apply machine learning techniques in bioinformatics. While the machine learning community is aware of the necessity to rigorously distinguish data transformation from data comparison and adopt reasonable combinations thereof, this awareness is often lacking in the field of comparative sequence analysis. With realization of the disadvantages of alignments for sequence comparison, some typical applications use more and more so-called alignment-free approaches. In light of this development, we present a conceptual framework for alignment-free sequence comparison, which highlights the delineation of: 1) the sequence data transformation comprising of adequate mathematical sequence coding and feature generation, from 2) the subsequent (dis-)similarity evaluation of the transformed data by means of problem-specific but mathematically consistent proximity measures. We consider coding to be an information-loss free data transformation in order to get an appropriate representation, whereas feature generation is inevitably information-lossy with the intention to extract just the task-relevant information. This distinction sheds light on the plethora of methods available and assists in identifying suitable methods in machine learning and data analysis to compare the sequences under these premises.

Subject(s)

Algorithms , Machine Learning , Sequence Alignment , Sequence Analysis , Mathematics

2.

Learning vector quantization as an interpretable classifier for the detection of SARS-CoV-2 types based on their RNA sequences.

Kaden, Marika; Bohnsack, Katrin Sophie; Weber, Mirko; Kudla, Mateusz; Gutowska, Kaja; Blazewicz, Jacek; Villmann, Thomas.

Neural Comput Appl ; 34(1): 67-78, 2022.

Article in English | MEDLINE | ID: mdl-33935376

ABSTRACT

We present an approach to discriminate SARS-CoV-2 virus types based on their RNA sequence descriptions avoiding a sequence alignment. For that purpose, sequences are preprocessed by feature extraction and the resulting feature vectors are analyzed by prototype-based classification to remain interpretable. In particular, we propose to use variants of learning vector quantization (LVQ) based on dissimilarity measures for RNA sequence data. The respective matrix LVQ provides additional knowledge about the classification decisions like discriminant feature correlations and, additionally, can be equipped with easy to realize reject options for uncertain data. Those options provide self-controlled evidence, i.e., the model refuses to make a classification decision if the model evidence for the presented data is not sufficient. This model is first trained using a GISAID dataset with given virus types detected according to the molecular differences in coronavirus populations by phylogenetic tree clustering. In a second step, we apply the trained model to another but unlabeled SARS-CoV-2 virus dataset. For these data, we can either assign a virus type to the sequences or reject atypical samples. Those rejected sequences allow to speculate about new virus types with respect to nucleotide base mutations in the viral sequences. Moreover, this rejection analysis improves model robustness. Last but not least, the presented approach has lower computational complexity compared to methods based on (multiple) sequence alignment. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s00521-021-06018-2.

3.

The Resolved Mutual Information Function as a Structural Fingerprint of Biomolecular Sequences for Interpretable Machine Learning Classifiers.

Bohnsack, Katrin Sophie; Kaden, Marika; Abel, Julia; Saralajew, Sascha; Villmann, Thomas.

Entropy (Basel) ; 23(10)2021 Oct 17.

Article in English | MEDLINE | ID: mdl-34682081

ABSTRACT

In the present article we propose the application of variants of the mutual information function as characteristic fingerprints of biomolecular sequences for classification analysis. In particular, we consider the resolved mutual information functions based on Shannon-, Rényi-, and Tsallis-entropy. In combination with interpretable machine learning classifier models based on generalized learning vector quantization, a powerful methodology for sequence classification is achieved which allows substantial knowledge extraction in addition to the high classification ability due to the model-inherent robustness. Any potential (slightly) inferior performance of the used classifier is compensated by the additional knowledge provided by interpretable models. This knowledge may assist the user in the analysis and understanding of the used data and considered task. After theoretical justification of the concepts, we demonstrate the approach for various example data sets covering different areas in biomolecular sequence analysis.

4.

AI-Based Multi Sensor Fusion for Smart Decision Making: A Bi-Functional System for Single Sensor Evaluation in a Classification Task.

Zoghlami, Feryel; Kaden, Marika; Villmann, Thomas; Schneider, Germar; Heinrich, Harald.

Sensors (Basel) ; 21(13)2021 Jun 27.

Article in English | MEDLINE | ID: mdl-34199090

ABSTRACT

Sensor fusion has gained a great deal of attention in recent years. It is used as an application tool in many different fields, especially the semiconductor, automotive, and medical industries. However, this field of research, regardless of the field of application, still presents different challenges concerning the choice of the sensors to be combined and the fusion architecture to be developed. To decrease application costs and engineering efforts, it is very important to analyze the sensors' data beforehand once the application target is defined. This pre-analysis is a basic step to establish a working environment with fewer misclassification cases and high safety. One promising approach to do so is to analyze the system using deep neural networks. The disadvantages of this approach are mainly the required huge storage capacity, the big training effort, and that these networks are difficult to interpret. In this paper, we focus on developing a smart and interpretable bi-functional artificial intelligence (AI) system, which has to discriminate the combined data regarding predefined classes. Furthermore, the system can evaluate the single source signals used in the classification task. The evaluation here covers each sensor contribution and robustness. More precisely, we train a smart and interpretable prototype-based neural network, which learns automatically to weight the influence of the sensors for the classification decision. Moreover, the prototype-based classifier is equipped with a reject option to measure classification certainty. To validate our approach's efficiency, we refer to different industrial sensor fusion applications.

Subject(s)

Artificial Intelligence , Neural Networks, Computer , Decision Making

5.

Application of an interpretable classification model on Early Folding Residues during protein folding.

Bittrich, Sebastian; Kaden, Marika; Leberecht, Christoph; Kaiser, Florian; Villmann, Thomas; Labudde, Dirk.

BioData Min ; 12: 1, 2019.

Article in English | MEDLINE | ID: mdl-30627219

ABSTRACT

BACKGROUND: Machine learning strategies are prominent tools for data analysis. Especially in life sciences, they have become increasingly important to handle the growing datasets collected by the scientific community. Meanwhile, algorithms improve in performance, but also gain complexity, and tend to neglect interpretability and comprehensiveness of the resulting models. RESULTS: Generalized Matrix Learning Vector Quantization (GMLVQ) is a supervised, prototype-based machine learning method and provides comprehensive visualization capabilities not present in other classifiers which allow for a fine-grained interpretation of the data. In contrast to commonly used machine learning strategies, GMLVQ is well-suited for imbalanced classification problems which are frequent in life sciences. We present a Weka plug-in implementing GMLVQ. The feasibility of GMLVQ is demonstrated on a dataset of Early Folding Residues (EFR) that have been shown to initiate and guide the protein folding process. Using 27 features, an area under the receiver operating characteristic of 76.6% was achieved which is comparable to other state-of-the-art classifiers. The obtained model is accessible at https://biosciences.hs-mittweida.de/efpred/. CONCLUSIONS: The application on EFR prediction demonstrates how an easy interpretation of classification models can promote the comprehension of biological mechanisms. The results shed light on the special features of EFR which were reported as most influential for the classification: EFR are embedded in ordered secondary structure elements and they participate in networks of hydrophobic residues. Visualization capabilities of GMLVQ are presented as we demonstrate how to interpret the results.

6.

Clustering by fuzzy neural gas and evaluation of fuzzy clusters.

Geweniger, Tina; Fischer, Lydia; Kaden, Marika; Lange, Mandy; Villmann, Thomas.

Comput Intell Neurosci ; 2013: 165248, 2013.

Article in English | MEDLINE | ID: mdl-24396342

ABSTRACT

We consider some modifications of the neural gas algorithm. First, fuzzy assignments as known from fuzzy c-means and neighborhood cooperativeness as known from self-organizing maps and neural gas are combined to obtain a basic Fuzzy Neural Gas. Further, a kernel variant and a simulated annealing approach are derived. Finally, we introduce a fuzzy extension of the ConnIndex to obtain an evaluation measure for clusterings based on fuzzy vector quantization.

Subject(s)

Algorithms , Fuzzy Logic , Neural Networks, Computer , Cluster Analysis

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL