Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 71
Filter
1.
Bioinformatics ; 40(Supplement_1): i501-i510, 2024 Jun 28.
Article in English | MEDLINE | ID: mdl-38940158

ABSTRACT

MOTIVATION: In many biomedical applications, we are confronted with paired groups of samples, such as treated versus control. The aim is to detect discriminating features, i.e. biomarkers, based on high-dimensional (omics-) data. This problem can be phrased more generally as a two-sample problem requiring statistical significance testing to establish differences, and interpretations to identify distinguishing features. The multivariate maximum mean discrepancy (MMD) test quantifies group-level differences, whereas statistically significantly associated features are usually found by univariate feature selection. Currently, few general-purpose methods simultaneously perform multivariate feature selection and two-sample testing. RESULTS: We introduce a sparse, interpretable, and optimized MMD test (SpInOpt-MMD) that enables two-sample testing and feature selection in the same experiment. SpInOpt-MMD is a versatile method and we demonstrate its application to a variety of synthetic and real-world data types including images, gene expression measurements, and text data. SpInOpt-MMD is effective in identifying relevant features in small sample sizes and outperforms other feature selection methods such as SHapley Additive exPlanations and univariate association analysis in several experiments. AVAILABILITY AND IMPLEMENTATION: The code and links to our public data are available at https://github.com/BorgwardtLab/spinoptmmd.


Subject(s)
Biomarkers , Humans , Algorithms , Computational Biology/methods
2.
Bioinformatics ; 40(Supplement_1): i247-i256, 2024 Jun 28.
Article in English | MEDLINE | ID: mdl-38940165

ABSTRACT

MOTIVATION: Acute kidney injury (AKI) is a syndrome that affects a large fraction of all critically ill patients, and early diagnosis to receive adequate treatment is as imperative as it is challenging to make early. Consequently, machine learning approaches have been developed to predict AKI ahead of time. However, the prevalence of AKI is often underestimated in state-of-the-art approaches, as they rely on an AKI event annotation solely based on creatinine, ignoring urine output.We construct and evaluate early warning systems for AKI in a multi-disciplinary ICU setting, using the complete KDIGO definition of AKI. We propose several variants of gradient-boosted decision tree (GBDT)-based models, including a novel time-stacking based approach. A state-of-the-art LSTM-based model previously proposed for AKI prediction is used as a comparison, which was not specifically evaluated in ICU settings yet. RESULTS: We find that optimal performance is achieved by using GBDT with the time-based stacking technique (AUPRC = 65.7%, compared with the LSTM-based model's AUPRC = 62.6%), which is motivated by the high relevance of time since ICU admission for this task. Both models show mildly reduced performance in the limited training data setting, perform fairly across different subcohorts, and exhibit no issues in gender transfer.Following the official KDIGO definition substantially increases the number of annotated AKI events. In our study GBDTs outperform LSTM models for AKI prediction. Generally, we find that both model types are robust in a variety of challenging settings arising for ICU data. AVAILABILITY AND IMPLEMENTATION: The code to reproduce the findings of our manuscript can be found at: https://github.com/ratschlab/AKI-EWS.


Subject(s)
Acute Kidney Injury , Intensive Care Units , Humans , Machine Learning , Male , Female , Decision Trees , Aged , Middle Aged
3.
Nat Commun ; 15(1): 5034, 2024 Jun 12.
Article in English | MEDLINE | ID: mdl-38866791

ABSTRACT

Functionally relevant coronary artery disease (fCAD) can result in premature death or nonfatal acute myocardial infarction. Its early detection is a fundamentally important task in medicine. Classical detection approaches suffer from limited diagnostic accuracy or expose patients to possibly harmful radiation. Here we show how machine learning (ML) can outperform cardiologists in predicting the presence of stress-induced fCAD in terms of area under the receiver operating characteristic (AUROC: 0.71 vs. 0.64, p = 4.0E-13). We present two ML approaches, the first using eight static clinical variables, whereas the second leverages electrocardiogram signals from exercise stress testing. At a target post-test probability for fCAD of <15%, ML facilitates a potential reduction of imaging procedures by 15-17% compared to the cardiologist's judgement. Predictive performance is validated on an internal temporal data split as well as externally. We also show that combining clinical judgement with conventional ML and deep learning using logistic regression results in a mean AUROC of 0.74.


Subject(s)
Coronary Artery Disease , Electrocardiography , Exercise Test , Machine Learning , ROC Curve , Humans , Coronary Artery Disease/diagnosis , Coronary Artery Disease/diagnostic imaging , Male , Female , Middle Aged , Exercise Test/methods , Aged , Area Under Curve , Logistic Models
4.
Science ; 383(6680): eadg7942, 2024 01 19.
Article in English | MEDLINE | ID: mdl-38236961

ABSTRACT

Long Covid is a debilitating condition of unknown etiology. We performed multimodal proteomics analyses of blood serum from COVID-19 patients followed up to 12 months after confirmed severe acute respiratory syndrome coronavirus 2 infection. Analysis of >6500 proteins in 268 longitudinal samples revealed dysregulated activation of the complement system, an innate immune protection and homeostasis mechanism, in individuals experiencing Long Covid. Thus, active Long Covid was characterized by terminal complement system dysregulation and ongoing activation of the alternative and classical complement pathways, the latter associated with increased antibody titers against several herpesviruses possibly stimulating this pathway. Moreover, markers of hemolysis, tissue injury, platelet activation, and monocyte-platelet aggregates were increased in Long Covid. Machine learning confirmed complement and thromboinflammatory proteins as top biomarkers, warranting diagnostic and therapeutic interrogation of these systems.


Subject(s)
Complement Activation , Complement System Proteins , Post-Acute COVID-19 Syndrome , Proteome , Thromboinflammation , Humans , Complement System Proteins/analysis , Complement System Proteins/metabolism , Post-Acute COVID-19 Syndrome/blood , Post-Acute COVID-19 Syndrome/complications , Post-Acute COVID-19 Syndrome/immunology , Thromboinflammation/blood , Thromboinflammation/immunology , Biomarkers/blood , Proteomics , Male , Female , Young Adult , Adult , Middle Aged , Aged
5.
Stem Cell Reports ; 19(2): 285-298, 2024 Feb 13.
Article in English | MEDLINE | ID: mdl-38278155

ABSTRACT

Reproducible functional assays to study in vitro neuronal networks represent an important cornerstone in the quest to develop physiologically relevant cellular models of human diseases. Here, we introduce DeePhys, a MATLAB-based analysis tool for data-driven functional phenotyping of in vitro neuronal cultures recorded by high-density microelectrode arrays. DeePhys is a modular workflow that offers a range of techniques to extract features from spike-sorted data, allowing for the examination of functional phenotypes both at the individual cell and network levels, as well as across development. In addition, DeePhys incorporates the capability to integrate novel features and to use machine-learning-assisted approaches, which facilitates a comprehensive evaluation of pharmacological interventions. To illustrate its practical application, we apply DeePhys to human induced pluripotent stem cell-derived dopaminergic neurons obtained from both patients and healthy individuals and showcase how DeePhys enables phenotypic screenings.


Subject(s)
Induced Pluripotent Stem Cells , Humans , Microelectrodes , Dopaminergic Neurons , Electrophysiological Phenomena , Action Potentials/physiology
6.
Bioinformatics ; 39(12)2023 12 01.
Article in English | MEDLINE | ID: mdl-38001023

ABSTRACT

MOTIVATION: Large-scale clinical proteomics datasets of infectious pathogens, combined with antimicrobial resistance outcomes, have recently opened the door for machine learning models which aim to improve clinical treatment by predicting resistance early. However, existing prediction frameworks typically train a separate model for each antimicrobial and species in order to predict a pathogen's resistance outcome, resulting in missed opportunities for chemical knowledge transfer and generalizability. RESULTS: We demonstrate the effectiveness of multimodal learning over proteomic and chemical features by exploring two clinically relevant tasks for our proposed deep learning models: drug recommendation and generalized resistance prediction. By adopting this multi-view representation of the pathogenic samples and leveraging the scale of the available datasets, our models outperformed the previous single-drug and single-species predictive models by statistically significant margins. We extensively validated the multi-drug setting, highlighting the challenges in generalizing beyond the training data distribution, and quantitatively demonstrate how suitable representations of antimicrobial drugs constitute a crucial tool in the development of clinically relevant predictive models. AVAILABILITY AND IMPLEMENTATION: The code used to produce the results presented in this article is available at https://github.com/BorgwardtLab/MultimodalAMR.


Subject(s)
Anti-Bacterial Agents , Proteomics , Drug Resistance, Bacterial , Machine Learning
7.
EClinicalMedicine ; 62: 102124, 2023 Aug.
Article in English | MEDLINE | ID: mdl-37588623

ABSTRACT

Background: When sepsis is detected, organ damage may have progressed to irreversible stages, leading to poor prognosis. The use of machine learning for predicting sepsis early has shown promise, however international validations are missing. Methods: This was a retrospective, observational, multi-centre cohort study. We developed and externally validated a deep learning system for the prediction of sepsis in the intensive care unit (ICU). Our analysis represents the first international, multi-centre in-ICU cohort study for sepsis prediction using deep learning to our knowledge. Our dataset contains 136,478 unique ICU admissions, representing a refined and harmonised subset of four large ICU databases comprising data collected from ICUs in the US, the Netherlands, and Switzerland between 2001 and 2016. Using the international consensus definition Sepsis-3, we derived hourly-resolved sepsis annotations, amounting to 25,694 (18.8%) patient stays with sepsis. We compared our approach to clinical baselines as well as machine learning baselines and performed an extensive internal and external statistical validation within and across databases, reporting area under the receiver-operating-characteristic curve (AUC). Findings: Averaged over sites, our model was able to predict sepsis with an AUC of 0.846 (95% confidence interval [CI], 0.841-0.852) on a held-out validation cohort internal to each site, and an AUC of 0.761 (95% CI, 0.746-0.770) when validating externally across sites. Given access to a small fine-tuning set (10% per site), the transfer to target sites was improved to an AUC of 0.807 (95% CI, 0.801-0.813). Our model raised 1.4 false alerts per true alert and detected 80% of the septic patients 3.7 h (95% CI, 3.0-4.3) prior to the onset of sepsis, opening a vital window for intervention. Interpretation: By monitoring clinical and laboratory measurements in a retrospective simulation of a real-time prediction scenario, a deep learning system for the detection of sepsis generalised to previously unseen ICU cohorts, internationally. Funding: This study was funded by the Personalized Health and Related Technologies (PHRT) strategic focus area of the ETH domain.

8.
Bioinformatics ; 39(39 Suppl 1): i523-i533, 2023 06 30.
Article in English | MEDLINE | ID: mdl-37387173

ABSTRACT

MOTIVATION: Complex phenotypes, such as many common diseases and morphological traits, are controlled by multiple genetic factors, namely genetic mutations and genes, and are influenced by environmental conditions. Deciphering the genetics underlying such traits requires a systemic approach, where many different genetic factors and their interactions are considered simultaneously. Many association mapping techniques available nowadays follow this reasoning, but have some severe limitations. In particular, they require binary encodings for the genetic markers, forcing the user to decide beforehand whether to use, e.g. a recessive or a dominant encoding. Moreover, most methods cannot include any biological prior or are limited to testing only lower-order interactions among genes for association with the phenotype, potentially missing a large number of marker combinations. RESULTS: We propose HOGImine, a novel algorithm that expands the class of discoverable genetic meta-markers by considering higher-order interactions of genes and by allowing multiple encodings for the genetic variants. Our experimental evaluation shows that the algorithm has a substantially higher statistical power compared to previous methods, allowing it to discover genetic mutations statistically associated with the phenotype at hand that could not be found before. Our method can exploit prior biological knowledge on gene interactions, such as protein-protein interaction networks, genetic pathways, and protein complexes, to restrict its search space. Since computing higher-order gene interactions poses a high computational burden, we also develop a more efficient search strategy and support computation to make our approach applicable in practice, leading to substantial runtime improvements compared to state-of-the-art methods. AVAILABILITY AND IMPLEMENTATION: Code and data are available at https://github.com/BorgwardtLab/HOGImine.


Subject(s)
Algorithms , Mutation , Phenotype , Protein Interaction Maps , Protein Interaction Mapping , Genome-Wide Association Study
9.
Bioinformatics ; 39(6)2023 06 01.
Article in English | MEDLINE | ID: mdl-37285313

ABSTRACT

MOTIVATION: While the search for associations between genetic markers and complex traits has led to the discovery of tens of thousands of trait-related genetic variants, the vast majority of these only explain a small fraction of the observed phenotypic variation. One possible strategy to overcome this while leveraging biological prior is to aggregate the effects of several genetic markers and to test entire genes, pathways or (sub)networks of genes for association to a phenotype. The latter, network-based genome-wide association studies, in particular suffer from a vast search space and an inherent multiple testing problem. As a consequence, current approaches are either based on greedy feature selection, thereby risking that they miss relevant associations, or neglect doing a multiple testing correction, which can lead to an abundance of false positive findings. RESULTS: To address the shortcomings of current approaches of network-based genome-wide association studies, we propose networkGWAS, a computationally efficient and statistically sound approach to network-based genome-wide association studies using mixed models and neighborhood aggregation. It allows for population structure correction and for well-calibrated P-values, which are obtained through circular and degree-preserving network permutations. networkGWAS successfully detects known associations on diverse synthetic phenotypes, as well as known and novel genes in phenotypes from Saccharomycescerevisiae and Homo sapiens. It thereby enables the systematic combination of gene-based genome-wide association studies with biological network information. AVAILABILITY AND IMPLEMENTATION: https://github.com/BorgwardtLab/networkGWAS.git.


Subject(s)
Genome-Wide Association Study , Population Groups , Humans , Genetic Markers , Phenotype , Polymorphism, Single Nucleotide
10.
Bioinformatics ; 39(6)2023 06 01.
Article in English | MEDLINE | ID: mdl-37220903

ABSTRACT

MOTIVATION: Developing new crop varieties with superior performance is highly important to ensure robust and sustainable global food security. The speed of variety development is limited by long field cycles and advanced generation selections in plant breeding programs. While methods to predict yield from genotype or phenotype data have been proposed, improved performance and integrated models are needed. RESULTS: We propose a machine learning model that leverages both genotype and phenotype measurements by fusing genetic variants with multiple data sources collected by unmanned aerial systems. We use a deep multiple instance learning framework with an attention mechanism that sheds light on the importance given to each input during prediction, enhancing interpretability. Our model reaches 0.754 ± 0.024 Pearson correlation coefficient when predicting yield in similar environmental conditions; a 34.8% improvement over the genotype-only linear baseline (0.559 ± 0.050). We further predict yield on new lines in an unseen environment using only genotypes, obtaining a prediction accuracy of 0.386 ± 0.010, a 13.5% improvement over the linear baseline. Our multi-modal deep learning architecture efficiently accounts for plant health and environment, distilling the genetic contribution and providing excellent predictions. Yield prediction algorithms leveraging phenotypic observations during training therefore promise to improve breeding programs, ultimately speeding up delivery of improved varieties. AVAILABILITY AND IMPLEMENTATION: Available at https://github.com/BorgwardtLab/PheGeMIL (code) and https://doi.org/doi:10.5061/dryad.kprr4xh5p (data).


Subject(s)
Deep Learning , Phenomics , Triticum/genetics , Plant Breeding/methods , Selection, Genetic , Phenotype , Genotype , Genomics/methods , Edible Grain/genetics
11.
Bioinformatics ; 39(1)2023 01 01.
Article in English | MEDLINE | ID: mdl-36610707

ABSTRACT

SUMMARY: In many modern bioinformatics applications, such as statistical genetics, or single-cell analysis, one frequently encounters datasets which are orders of magnitude too large for conventional in-memory analysis. To tackle this challenge, we introduce SIMBSIG (SIMmilarity Batched Search Integrated GPU), a highly scalable Python package which provides a scikit-learn-like interface for out-of-core, GPU-enabled similarity searches, principal component analysis and clustering. Due to the PyTorch backend, it is highly modular and particularly tailored to many data types with a particular focus on biobank data analysis. AVAILABILITY AND IMPLEMENTATION: SIMBSIG is freely available from PyPI and its source code and documentation can be found on GitHub (https://github.com/BorgwardtLab/simbsig) under a BSD-3 license.


Subject(s)
Biological Specimen Banks , Software , Computational Biology , Documentation , Cluster Analysis
12.
Bioinformatics ; 38(Suppl 1): i101-i108, 2022 06 24.
Article in English | MEDLINE | ID: mdl-35758775

ABSTRACT

MOTIVATION: Sepsis is a leading cause of death and disability in children globally, accounting for ∼3 million childhood deaths per year. In pediatric sepsis patients, the multiple organ dysfunction syndrome (MODS) is considered a significant risk factor for adverse clinical outcomes characterized by high mortality and morbidity in the pediatric intensive care unit. The recent rapidly growing availability of electronic health records (EHRs) has allowed researchers to vastly develop data-driven approaches like machine learning in healthcare and achieved great successes. However, effective machine learning models which could make the accurate early prediction of the recovery in pediatric sepsis patients from MODS to a mild state and thus assist the clinicians in the decision-making process is still lacking. RESULTS: This study develops a machine learning-based approach to predict the recovery from MODS to zero or single organ dysfunction by 1 week in advance in the Swiss Pediatric Sepsis Study cohort of children with blood-culture confirmed bacteremia. Our model achieves internal validation performance on the SPSS cohort with an area under the receiver operating characteristic (AUROC) of 79.1% and area under the precision-recall curve (AUPRC) of 73.6%, and it was also externally validated on another pediatric sepsis patients cohort collected in the USA, yielding an AUROC of 76.4% and AUPRC of 72.4%. These results indicate that our model has the potential to be included into the EHRs system and contribute to patient assessment and triage in pediatric sepsis patient care. AVAILABILITY AND IMPLEMENTATION: Code available at https://github.com/BorgwardtLab/MODS-recovery. The data underlying this article is not publicly available for the privacy of individuals that participated in the study. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Multiple Organ Failure , Sepsis , Child , Cohort Studies , Humans , Intensive Care Units, Pediatric , Multiple Organ Failure/diagnosis , Multiple Organ Failure/etiology , ROC Curve , Sepsis/complications , Sepsis/diagnosis
13.
Virus Evol ; 8(1): veac002, 2022.
Article in English | MEDLINE | ID: mdl-35310621

ABSTRACT

Transmission chains within small urban areas (accommodating ∼30 per cent of the European population) greatly contribute to case burden and economic impact during the ongoing coronavirus pandemic and should be a focus for preventive measures to achieve containment. Here, at very high spatio-temporal resolution, we analysed determinants of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) transmission in a European urban area, Basel-City (Switzerland). We combined detailed epidemiological, intra-city mobility and socio-economic data sets with whole-genome sequencing during the first SARS-CoV-2 wave. For this, we succeeded in sequencing 44 per cent of all reported cases from Basel-City and performed phylogenetic clustering and compartmental modelling based on the dominating viral variant (B.1-C15324T; 60 per cent of cases) to identify drivers and patterns of transmission. Based on these results we simulated vaccination scenarios and corresponding healthcare system burden (intensive care unit (ICU) occupancy). Transmissions were driven by socio-economically weaker and highly mobile population groups with mostly cryptic transmissions which lacked genetic and identifiable epidemiological links. Amongst more senior population transmission was clustered. Simulated vaccination scenarios assuming 60-90 per cent transmission reduction and 70-90 per cent reduction of severe cases showed that prioritising mobile, socio-economically weaker populations for vaccination would effectively reduce case numbers. However, long-term ICU occupation would also be effectively reduced if senior population groups were prioritised, provided there were no changes in testing and prevention strategies. Reducing SARS-CoV-2 transmission through vaccination strongly depends on the efficacy of the deployed vaccine. A combined strategy of protecting risk groups by extensive testing coupled with vaccination of the drivers of transmission (i.e. highly mobile groups) would be most effective at reducing the spread of SARS-CoV-2 within an urban area.

14.
Nat Med ; 28(1): 164-174, 2022 01.
Article in English | MEDLINE | ID: mdl-35013613

ABSTRACT

Early use of effective antimicrobial treatments is critical for the outcome of infections and the prevention of treatment resistance. Antimicrobial resistance testing enables the selection of optimal antibiotic treatments, but current culture-based techniques can take up to 72 hours to generate results. We have developed a novel machine learning approach to predict antimicrobial resistance directly from matrix-assisted laser desorption/ionization-time of flight (MALDI-TOF) mass spectra profiles of clinical isolates. We trained calibrated classifiers on a newly created publicly available database of mass spectra profiles from the clinically most relevant isolates with linked antimicrobial susceptibility phenotypes. This dataset combines more than 300,000 mass spectra with more than 750,000 antimicrobial resistance phenotypes from four medical institutions. Validation on a panel of clinically important pathogens, including Staphylococcus aureus, Escherichia coli and Klebsiella pneumoniae, resulting in areas under the receiver operating characteristic curve of 0.80, 0.74 and 0.74, respectively, demonstrated the potential of using machine learning to substantially accelerate antimicrobial resistance determination and change of clinical management. Furthermore, a retrospective clinical case study of 63 patients found that implementing this approach would have changed the clinical treatment in nine cases, which would have been beneficial in eight cases (89%). MALDI-TOF mass spectra-based machine learning may thus be an important new tool for treatment optimization and antibiotic stewardship.


Subject(s)
Anti-Bacterial Agents/pharmacology , Drug Resistance, Microbial , Machine Learning , Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization/methods , Escherichia coli/drug effects , Humans , Klebsiella pneumoniae/drug effects , Microbial Sensitivity Tests , Retrospective Studies , Staphylococcus aureus/drug effects
15.
Bioinform Adv ; 2(1): vbac071, 2022.
Article in English | MEDLINE | ID: mdl-36699372

ABSTRACT

Motivation: With the steadily increasing abundance of omics data produced all over the world under vastly different experimental conditions residing in public databases, a crucial step in many data-driven bioinformatics applications is that of data integration. The challenge of batch-effect removal for entire databases lies in the large number of batches and biological variation, which can result in design matrix singularity. This problem can currently not be solved satisfactorily by any common batch-correction algorithm. Results: We present reComBat, a regularized version of the empirical Bayes method to overcome this limitation and benchmark it against popular approaches for the harmonization of public gene-expression data (both microarray and bulkRNAsq) of the human opportunistic pathogen Pseudomonas aeruginosa. Batch-effects are successfully mitigated while biologically meaningful gene-expression variation is retained. reComBat fills the gap in batch-correction approaches applicable to large-scale, public omics databases and opens up new avenues for data-driven analysis of complex biological processes beyond the scope of a single study. Availability and implementation: The code is available at https://github.com/BorgwardtLab/reComBat, all data and evaluation code can be found at https://github.com/BorgwardtLab/batchCorrectionPublicData. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

16.
Front Med (Lausanne) ; 8: 607952, 2021.
Article in English | MEDLINE | ID: mdl-34124082

ABSTRACT

Background: Sepsis is among the leading causes of death in intensive care units (ICUs) worldwide and its recognition, particularly in the early stages of the disease, remains a medical challenge. The advent of an affluence of available digital health data has created a setting in which machine learning can be used for digital biomarker discovery, with the ultimate goal to advance the early recognition of sepsis. Objective: To systematically review and evaluate studies employing machine learning for the prediction of sepsis in the ICU. Data Sources: Using Embase, Google Scholar, PubMed/Medline, Scopus, and Web of Science, we systematically searched the existing literature for machine learning-driven sepsis onset prediction for patients in the ICU. Study Eligibility Criteria: All peer-reviewed articles using machine learning for the prediction of sepsis onset in adult ICU patients were included. Studies focusing on patient populations outside the ICU were excluded. Study Appraisal and Synthesis Methods: A systematic review was performed according to the PRISMA guidelines. Moreover, a quality assessment of all eligible studies was performed. Results: Out of 974 identified articles, 22 and 21 met the criteria to be included in the systematic review and quality assessment, respectively. A multitude of machine learning algorithms were applied to refine the early prediction of sepsis. The quality of the studies ranged from "poor" (satisfying ≤ 40% of the quality criteria) to "very good" (satisfying ≥ 90% of the quality criteria). The majority of the studies (n = 19, 86.4%) employed an offline training scenario combined with a horizon evaluation, while two studies implemented an online scenario (n = 2, 9.1%). The massive inter-study heterogeneity in terms of model development, sepsis definition, prediction time windows, and outcomes precluded a meta-analysis. Last, only two studies provided publicly accessible source code and data sources fostering reproducibility. Limitations: Articles were only eligible for inclusion when employing machine learning algorithms for the prediction of sepsis onset in the ICU. This restriction led to the exclusion of studies focusing on the prediction of septic shock, sepsis-related mortality, and patient populations outside the ICU. Conclusions and Key Findings: A growing number of studies employs machine learning to optimize the early prediction of sepsis through digital biomarker discovery. This review, however, highlights several shortcomings of the current approaches, including low comparability and reproducibility. Finally, we gather recommendations how these challenges can be addressed before deploying these models in prospective analyses. Systematic Review Registration Number: CRD42020200133.

17.
Nat Commun ; 12(1): 3282, 2021 06 02.
Article in English | MEDLINE | ID: mdl-34078900

ABSTRACT

Bacterial processes necessary for adaption to stressful host environments are potential targets for new antimicrobials. Here, we report large-scale transcriptomic analyses of 32 human bacterial pathogens grown under 11 stress conditions mimicking human host environments. The potential relevance of the in vitro stress conditions and responses is supported by comparisons with available in vivo transcriptomes of clinically important pathogens. Calculation of a probability score enables comparative cross-microbial analyses of the stress responses, revealing common and unique regulatory responses to different stresses, as well as overlapping processes participating in different stress responses. We identify conserved and species-specific 'universal stress responders', that is, genes showing altered expression in multiple stress conditions. Non-coding RNAs are involved in a substantial proportion of the responses. The data are collected in a freely available, interactive online resource (PATHOgenex).


Subject(s)
Gene Expression Regulation, Bacterial , Gram-Negative Bacteria/genetics , Gram-Positive Bacteria/genetics , RNA, Bacterial/genetics , Stress, Physiological/genetics , Transcriptome , Adaptation, Physiological/genetics , Atlases as Topic , Databases, Genetic , Gene Expression Profiling , Genes, Bacterial , Gram-Negative Bacteria/classification , Gram-Negative Bacteria/metabolism , Gram-Negative Bacteria/pathogenicity , Gram-Positive Bacteria/classification , Gram-Positive Bacteria/metabolism , Gram-Positive Bacteria/pathogenicity , Host Microbial Interactions/genetics , Humans , Internet , Microbiota/genetics , Phylogeny , RNA, Bacterial/metabolism
19.
Bioinformatics ; 37(1): 57-65, 2021 04 09.
Article in English | MEDLINE | ID: mdl-32573681

ABSTRACT

MOTIVATION: Correlating genetic loci with a disease phenotype is a common approach to improve our understanding of the genetics underlying complex diseases. Standard analyses mostly ignore two aspects, namely genetic heterogeneity and interactions between loci. Genetic heterogeneity, the phenomenon that genetic variants at different loci lead to the same phenotype, promises to increase statistical power by aggregating low-signal variants. Incorporating interactions between loci results in a computational and statistical bottleneck due to the vast amount of candidate interactions. RESULTS: We propose a novel method SiNIMin that addresses these two aspects by finding pairs of interacting genes that are, upon combination, associated with a phenotype of interest under a model of genetic heterogeneity. We guide the interaction search using biological prior knowledge in the form of protein-protein interaction networks. Our method controls type I error and outperforms state-of-the-art methods with respect to statistical power. Additionally, we find novel associations for multiple Arabidopsis thaliana phenotypes, and, with an adapted variant of SiNIMin, for a study of rare variants in migraine patients. AVAILABILITY AND IMPLEMENTATION: Code available at https://github.com/BorgwardtLab/SiNIMin. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genetic Heterogeneity , Protein Interaction Maps , Genetic Loci , Humans , Phenotype , Software
20.
Methods Mol Biol ; 2190: 33-71, 2021.
Article in English | MEDLINE | ID: mdl-32804360

ABSTRACT

With the biomedical field generating large quantities of time series data, there has been a growing interest in developing and refining machine learning methods that allow its mining and exploitation. Classification is one of the most important and challenging machine learning tasks related to time series. Many biomedical phenomena, such as the brain's activity or blood pressure, change over time. The objective of this chapter is to provide a gentle introduction to time series classification. In the first part we describe the characteristics of time series data and challenges in its analysis. The second part provides an overview of common machine learning methods used for time series classification. A real-world use case, the early recognition of sepsis, demonstrates the applicability of the methods discussed.


Subject(s)
Biomedical Research/methods , Deep Learning , Machine Learning , Data Mining/methods , Humans
SELECTION OF CITATIONS
SEARCH DETAIL