Search | VHL Regional Portal

Show: 20 | 50 | 100

Results 1 - 10 de 10

Filter

Predicting hosts and cross-species transmission of Streptococcus agalactiae by interpretable machine learning.

Ren, Yunxiao; Li, Carmen; Nanayakkara Sapugahawatte, Dulmini; Zhu, Chendi; Spänig, Sebastian; Jamrozy, Dorota; Rothen, Julian; Daubenberger, Claudia A; Bentley, Stephen D; Ip, Margaret; Heider, Dominik.

Comput Biol Med ; 171: 108185, 2024 Mar.

Article in English | MEDLINE | ID: mdl-38401454

ABSTRACT

BACKGROUND: Streptococcus agalactiae, commonly known as Group B Streptococcus (GBS), exhibits a broad host range, manifesting as both a beneficial commensal and an opportunistic pathogen across various species. In humans, it poses significant risks, causing neonatal sepsis and meningitis, along with severe infections in adults. Additionally, it impacts livestock by inducing mastitis in bovines and contributing to epidemic mortality in fish populations. Despite its wide host spectrum, the mechanisms enabling GBS to adapt to specific hosts remain inadequately elucidated. Therefore, the development of a rapid and accurate method differentiates GBS strains associated with particular animal hosts based on genome-wide information holds immense potential. Such a tool would not only bolster the identification and containment efforts during GBS outbreaks but also deepen our comprehension of the bacteria's host adaptations spanning humans, livestock, and other natural animal reservoirs. METHODS AND RESULTS: Here, we developed three machine learning models-random forest (RF), logistic regression (LR), and support vector machine (SVM) based on genome-wide mutation data. These models enabled precise prediction of the host origin of GBS, accurately distinguishing between human, bovine, fish, and pig hosts. Moreover, we conducted an interpretable machine learning using SHapley Additive exPlanations (SHAP) and variant annotation to uncover the most influential genomic features and associated genes for each host. Additionally, by meticulously examining misclassified samples, we gained valuable insights into the dynamics of host transmission and the potential for zoonotic infections. CONCLUSIONS: Our study underscores the effectiveness of random forest (RF) and logistic regression (LR) models based on mutation data for accurately predicting GBS host origins. Additionally, we identify the key features associated with each GBS host, thereby enhancing our understanding of the bacteria's host-specific adaptations.

Subject(s)

Streptococcal Infections , Streptococcus agalactiae , Female , Adult , Animals , Humans , Cattle , Swine , Streptococcus agalactiae/genetics , Streptococcal Infections/veterinary , Genomics , Fishes , Machine Learning

Unsupervised encoding selection through ensemble pruning for biomedical classification.

Spänig, Sebastian; Michel, Alexander; Heider, Dominik.

BioData Min ; 16(1): 10, 2023 Mar 16.

Article in English | MEDLINE | ID: mdl-36927546

ABSTRACT

BACKGROUND: Owing to the rising levels of multi-resistant pathogens, antimicrobial peptides, an alternative strategy to classic antibiotics, got more attention. A crucial part is thereby the costly identification and validation. With the ever-growing amount of annotated peptides, researchers leverage artificial intelligence to circumvent the cumbersome, wet-lab-based identification and automate the detection of promising candidates. However, the prediction of a peptide's function is not limited to antimicrobial efficiency. To date, multiple studies successfully classified additional properties, e.g., antiviral or cell-penetrating effects. In this light, ensemble classifiers are employed aiming to further improve the prediction. Although we recently presented a workflow to significantly diminish the initial encoding choice, an entire unsupervised encoding selection, considering various machine learning models, is still lacking. RESULTS: We developed a workflow, automatically selecting encodings and generating classifier ensembles by employing sophisticated pruning methods. We observed that the Pareto frontier pruning is a good method to create encoding ensembles for the datasets at hand. In addition, encodings combined with the Decision Tree classifier as the base model are often superior. However, our results also demonstrate that none of the ensemble building techniques is outstanding for all datasets. CONCLUSION: The workflow conducts multiple pruning methods to evaluate ensemble classifiers composed from a wide range of peptide encodings and base models. Consequently, researchers can use the workflow for unsupervised encoding selection and ensemble creation. Ultimately, the extensible workflow can be used as a plugin for the PEPTIDE REACToR, further establishing it as a versatile tool in the domain.

A parametric approach for molecular encodings using multilevel atomic neighborhoods applied to peptide classification.

Hattab, Georges; Anzel, Aleksandar; Spänig, Sebastian; Neumann, Nils; Heider, Dominik.

NAR Genom Bioinform ; 5(1): lqac103, 2023 Mar.

Article in English | MEDLINE | ID: mdl-36632611

ABSTRACT

Exploring new ways to represent and discover organic molecules is critical to the development of new therapies. Fingerprinting algorithms are used to encode or machine-read organic molecules. Molecular encodings facilitate the computation of distance and similarity measurements to support tasks such as similarity search or virtual screening. Motivated by the ubiquity of carbon and the emerging structured patterns, we propose a parametric approach for molecular encodings using carbon-based multilevel atomic neighborhoods. It implements a walk along the carbon chain of a molecule to compute different representations of the neighborhoods in the form of a binary or numerical array that can later be exported into an image. Applied to the task of binary peptide classification, the evaluation was performed by using forty-nine encodings of twenty-nine data sets from various biomedical fields, resulting in well over 1421 machine learning models. By design, the parametric approach is domain- and task-agnostic and scopes all organic molecules including unnatural and exotic amino acids as well as cyclic peptides. Applied to peptide classification, our results point to a number of promising applications and extensions. The parametric approach was developed as a Python package (cmangoes), the source code and documentation of which can be found at https://github.com/ghattab/cmangoes and https://doi.org/10.5281/zenodo.7483771.

Multivalent binding kinetics resolved by fluorescence proximity sensing.

Schulte, Clemens; Soldà, Alice; Spänig, Sebastian; Adams, Nathan; Bekic, Ivana; Streicher, Werner; Heider, Dominik; Strasser, Ralf; Maric, Hans Michael.

Commun Biol ; 5(1): 1070, 2022 10 07.

Article in English | MEDLINE | ID: mdl-36207490

ABSTRACT

Multivalent protein interactors are an attractive modality for probing protein function and exploring novel pharmaceutical strategies. The throughput and precision of state-of-the-art methodologies and workflows for the effective development of multivalent binders is currently limited by surface immobilization, fluorescent labelling and sample consumption. Using the gephyrin protein, the master regulator of the inhibitory synapse, as benchmark, we exemplify the application of Fluorescence proximity sensing (FPS) for the systematic kinetic and thermodynamic optimization of multivalent peptide architectures. High throughput synthesis of +100 peptides with varying combinatorial dimeric, tetrameric, and octameric architectures combined with direct FPS measurements resolved on-rates, off-rates, and dissociation constants with high accuracy and low sample consumption compared to three complementary technologies. The dataset and its machine learning-based analysis deciphered the relationship of specific architectural features and binding kinetics and thereby identified binders with unprecedented protein inhibition capacity; thus, highlighting the value of FPS for the rational engineering of multivalent inhibitors.

Subject(s)

Peptides , Fluorescence , Kinetics , Pharmaceutical Preparations , Thermodynamics

A multi-omics study on quantifying antimicrobial resistance in European freshwater lakes.

Spänig, Sebastian; Eick, Lisa; Nuy, Julia K; Beisser, Daniela; Ip, Margaret; Heider, Dominik; Boenigk, Jens.

Environ Int ; 157: 106821, 2021 12.

Article in English | MEDLINE | ID: mdl-34403881

ABSTRACT

The surveillance of wastewater for the Covid-19 virus during this unprecedented pandemic and mapped to the distribution and magnitude of the infected in the population near real-time exemplifies the importance of tracking rapidly changing trends of pathogens or public health problems at a large scale. The rising trends of antimicrobial resistance (AMR) with multidrug-resistant pathogens from the environmental water have similarly gained much attention in recent years. Wastewater-based epidemiology from water samples has shown that a wide range of AMR-related genes is frequently detected. Albeit sewage is treated before release and thus, the abundance of pathogens should be significantly reduced or even pathogen-free, several studies indicated the contrary. Pathogens are still measurable in the released water, ultimately entering freshwaters, such as rivers and lakes. Furthermore, socio-economic and environmental factors, such as chemical industries and animal farming nearby, impact the presence of AMR. Many bacterial species from the environment are intrinsically resistant and also contribute to the resistome of freshwater lakes. This study collected the most extensive standardized freshwater data set from hundreds of European lakes and conducted a comprehensive multi-omics analysis on antimicrobial resistance from these freshwater lakes. Our research shows that genes encoding for AMR against tetracyclines, cephalosporins, and quinolones were commonly identified, while for some, such as sulfonamides, resistance was less frequently present. We provide an estimation of the characteristic resistance of AMR in European lakes, which can be used as a comprehensive resistome dataset to facilitate and monitor temporal changes in the development of AMR in European freshwater lakes.

Subject(s)

Anti-Bacterial Agents , COVID-19 , Animals , Anti-Bacterial Agents/pharmacology , Drug Resistance, Bacterial , Humans , Lakes , SARS-CoV-2

A large-scale comparative study on peptide encodings for biomedical classification.

Spänig, Sebastian; Mohsen, Siba; Hattab, Georges; Hauschild, Anne-Christin; Heider, Dominik.

NAR Genom Bioinform ; 3(2): lqab039, 2021 Jun.

Article in English | MEDLINE | ID: mdl-34046590

ABSTRACT

Owing to the great variety of distinct peptide encodings, working on a biomedical classification task at hand is challenging. Researchers have to determine encodings capable to represent underlying patterns as numerical input for the subsequent machine learning. A general guideline is lacking in the literature, thus, we present here the first large-scale comprehensive study to investigate the performance of a wide range of encodings on multiple datasets from different biomedical domains. For the sake of completeness, we added additional sequence- and structure-based encodings. In particular, we collected 50 biomedical datasets and defined a fixed parameter space for 48 encoding groups, leading to a total of 397 700 encoded datasets. Our results demonstrate that none of the encodings are superior for all biomedical domains. Nevertheless, some encodings often outperform others, thus reducing the initial encoding selection substantially. Our work offers researchers to objectively compare novel encodings to the state of the art. Our findings pave the way for a more sophisticated encoding optimization, for example, as part of automated machine learning pipelines. The work presented here is implemented as a large-scale, end-to-end workflow designed for easy reproducibility and extensibility. All standardized datasets and results are available for download to comply with FAIR standards.

The virtual doctor: An interactive clinical-decision-support system based on deep learning for non-invasive prediction of diabetes.

Spänig, Sebastian; Emberger-Klein, Agnes; Sowa, Jan-Peter; Canbay, Ali; Menrad, Klaus; Heider, Dominik.

Artif Intell Med ; 100: 101706, 2019 09.

Article in English | MEDLINE | ID: mdl-31607340

ABSTRACT

Artificial intelligence (AI) will pave the way to a new era in medicine. However, currently available AI systems do not interact with a patient, e.g., for anamnesis, and thus are only used by the physicians for predictions in diagnosis or prognosis. However, these systems are widely used, e.g., in diabetes or cancer prediction. In the current study, we developed an AI that is able to interact with a patient (virtual doctor) by using a speech recognition and speech synthesis system and thus can autonomously interact with the patient, which is particularly important for, e.g., rural areas, where the availability of primary medical care is strongly limited by low population densities. As a proof-of-concept, the system is able to predict type 2 diabetes mellitus (T2DM) based on non-invasive sensors and deep neural networks. Moreover, the system provides an easy-to-interpret probability estimation for T2DM for a given patient. Besides the development of the AI, we further analyzed the acceptance of young people for AI in healthcare to estimate the impact of such a system in the future.

Subject(s)

Decision Support Systems, Clinical , Deep Learning , Diabetes Mellitus, Type 2/diagnosis , User-Computer Interface , Artificial Intelligence , Body Height , Body Mass Index , Body Weight , Female , Humans , Male , Middle Aged , Neural Networks, Computer , Probability , Speech Recognition Software , Surveys and Questionnaires , Waist Circumference

Males, the Wrongly Neglected Partners of the Biologically Unprecedented Male-Female Interaction of Schistosomes.

Lu, Zhigang; Spänig, Sebastian; Weth, Oliver; Grevelding, Christoph G.

Front Genet ; 10: 796, 2019.

Article in English | MEDLINE | ID: mdl-31552097

ABSTRACT

Schistosomes are the only platyhelminths that have evolved separate sexes, and they exhibit a unique reproductive biology because the female's sexual maturation depends on a constant pairing contact with the male. In the female, pairing leads to gonad differentiation, which is associated with substantial morphological changes, and controls among others the expression of gonad-associated genes. In the male, no morphological changes have been observed after pairing, although first data indicated an effect of pairing on gene transcription. Comprehensive transcriptomic approaches have revealed an unexpected high number of genes that are differentially transcribed in the male after pairing. Their identities suggest roles for the male that are not restricted to feeding and enhanced muscular power to transport paired female and, as assumed before, to induce its sexual maturation by one "magic" factor. Instead, a more complex picture emerges in which both partners live in a reciprocal sender-recipient relationship that not only affects the gonads of both genders but may also involve tactile stimuli, transforming growth factor ß signaling, nutritional parts, and neuronal processes, including neuropeptides and G protein-coupled receptor signaling. This review provides a summary of transcriptomics including an overview of genes expressed in a pairing-dependent manner in schistosome males. This may stimulate further research in understanding the role of the male as the recipient of the female's signals upon pairing, the male's "capacitation," and its subsequent competence as a sender of information. The latter process finally transforms a sexually immature, autonomous female without completely developed gonads into a sexually mature, partially non-autonomous female with fully differentiated gonads and enormous egg production capacity.

Encodings and models for antimicrobial peptide classification for multi-resistant pathogens.

Spänig, Sebastian; Heider, Dominik.

BioData Min ; 12: 7, 2019.

Article in English | MEDLINE | ID: mdl-30867681

ABSTRACT

Antimicrobial peptides (AMPs) are part of the inherent immune system. In fact, they occur in almost all organisms including, e.g., plants, animals, and humans. Remarkably, they show effectivity also against multi-resistant pathogens with a high selectivity. This is especially crucial in times, where society is faced with the major threat of an ever-increasing amount of antibiotic resistant microbes. In addition, AMPs can also exhibit antitumor and antiviral effects, thus a variety of scientific studies dealt with the prediction of active peptides in recent years. Due to their potential, even the pharmaceutical industry is keen on discovering and developing novel AMPs. However, AMPs are difficult to verify in vitro, hence researchers conduct sequence similarity experiments against known, active peptides. Unfortunately, this approach is very time-consuming and limits potential candidates to sequences with a high similarity to known AMPs. Machine learning methods offer the opportunity to explore the huge space of sequence variations in a timely manner. These algorithms have, in principal, paved the way for an automated discovery of AMPs. However, machine learning models require a numerical input, thus an informative encoding is very important. Unfortunately, developing an appropriate encoding is a major challenge, which has not been entirely solved so far. For this reason, the development of novel amino acid encodings is established as a stand-alone research branch. The present review introduces state-of-the-art encodings of amino acids as well as their properties in sequence and structure based aggregation. Moreover, albeit a well-chosen encoding is essential, performant classifiers are required, which is reflected by a tendency towards specifically designed models in the literature. Furthermore, we introduce these models with a particular focus on encodings derived from support vector machines and deep learning approaches. Albeit a strong focus has been set on AMP predictions, not all of the mentioned encodings have been elaborated as part of antimicrobial research studies, but rather as general protein or peptide representations.

10.

EDGAR 2.0: an enhanced software platform for comparative gene content analyses.

Blom, Jochen; Kreis, Julian; Spänig, Sebastian; Juhre, Tobias; Bertelli, Claire; Ernst, Corinna; Goesmann, Alexander.

Nucleic Acids Res ; 44(W1): W22-8, 2016 Jul 08.

Article in English | MEDLINE | ID: mdl-27098043

ABSTRACT

The rapidly increasing availability of microbial genome sequences has led to a growing demand for bioinformatics software tools that support the functional analysis based on the comparison of closely related genomes. By utilizing comparative approaches on gene level it is possible to gain insights into the core genes which represent the set of shared features for a set of organisms under study. Vice versa singleton genes can be identified to elucidate the specific properties of an individual genome. Since initial publication, the EDGAR platform has become one of the most established software tools in the field of comparative genomics. Over the last years, the software has been continuously improved and a large number of new analysis features have been added. For the new version, EDGAR 2.0, the gene orthology estimation approach was newly designed and completely re-implemented. Among other new features, EDGAR 2.0 provides extended phylogenetic analysis features like AAI (Average Amino Acid Identity) and ANI (Average Nucleotide Identity) matrices, genome set size statistics and modernized visualizations like interactive synteny plots or Venn diagrams. Thereby, the software supports a quick and user-friendly survey of evolutionary relationships between microbial genomes and simplifies the process of obtaining new biological insights into their differential gene content. All features are offered to the scientific community via a web-based and therefore platform-independent user interface, which allows easy browsing of precomputed datasets. The web server is accessible at http://edgar.computational.bio.

Subject(s)

Computational Biology/statistics & numerical data , Microbial Consortia/genetics , Software , Computational Biology/methods , Conserved Sequence , Databases, Genetic , Datasets as Topic , Internet , Phylogeny , Species Specificity , Synteny

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL