Search | VHL Regional Portal

1.

Machine learning-driven development of a disease risk score for COVID-19 hospitalization and mortality: a Swedish and Norwegian register-based study.

Shakibfar, Saeed; Zhao, Jing; Li, Huiqi; Nordeng, Hedvig; Lupattelli, Angela; Pavlovic, Milena; Sandve, Geir Kjetil; Nyberg, Fredrik; Wettermark, Björn; Hajiebrahimi, Mohammadhossein; Andersen, Morten; Sessa, Maurizio.

Front Public Health ; 11: 1258840, 2023.

Article in English | MEDLINE | ID: mdl-38146473

ABSTRACT

Aims: To develop a disease risk score for COVID-19-related hospitalization and mortality in Sweden and externally validate it in Norway. Method: We employed linked data from the national health registries of Sweden and Norway to conduct our study. We focused on individuals in Sweden with confirmed SARS-CoV-2 infection through RT-PCR testing up to August 2022 as our study cohort. Within this group, we identified hospitalized cases as those who were admitted to the hospital within 14 days of testing positive for SARS-CoV-2 and matched them with five controls from the same cohort who were not hospitalized due to SARS-CoV-2. Additionally, we identified individuals who died within 30 days after being hospitalized for COVID-19. To develop our disease risk scores, we considered various factors, including demographics, infectious, somatic, and mental health conditions, recorded diagnoses, and pharmacological treatments. We also conducted age-specific analyses and assessed model performance through 5-fold cross-validation. Finally, we performed external validation using data from the Norwegian population with COVID-19 up to December 2021. Results: During the study period, a total of 124,560 individuals in Sweden were hospitalized, and 15,877 individuals died within 30 days following COVID-19 hospitalization. Disease risk scores for both hospitalization and mortality demonstrated predictive capabilities with ROC-AUC values of 0.70 and 0.72, respectively, across the entire study period. Notably, these scores exhibited a positive correlation with the likelihood of hospitalization or death. In the external validation using data from the Norwegian COVID-19 population (consisting of 53,744 individuals), the disease risk score predicted hospitalization with an AUC of 0.47 and death with an AUC of 0.74. Conclusion: The disease risk score showed moderately good performance to predict COVID-19-related mortality but performed poorly in predicting hospitalization when externally validated.

Subject(s)

COVID-19 , Humans , COVID-19/epidemiology , SARS-CoV-2 , Sweden/epidemiology , Risk Factors , Hospitalization , Machine Learning

2.

Artificial intelligence-driven prediction of COVID-19-related hospitalization and death: a systematic review.

Shakibfar, Saeed; Nyberg, Fredrik; Li, Huiqi; Zhao, Jing; Nordeng, Hedvig Marie Egeland; Sandve, Geir Kjetil Ferkingstad; Pavlovic, Milena; Hajiebrahimi, Mohammadhossein; Andersen, Morten; Sessa, Maurizio.

Front Public Health ; 11: 1183725, 2023.

Article in English | MEDLINE | ID: mdl-37408750

ABSTRACT

Aim: To perform a systematic review on the use of Artificial Intelligence (AI) techniques for predicting COVID-19 hospitalization and mortality using primary and secondary data sources. Study eligibility criteria: Cohort, clinical trials, meta-analyses, and observational studies investigating COVID-19 hospitalization or mortality using artificial intelligence techniques were eligible. Articles without a full text available in the English language were excluded. Data sources: Articles recorded in Ovid MEDLINE from 01/01/2019 to 22/08/2022 were screened. Data extraction: We extracted information on data sources, AI models, and epidemiological aspects of retrieved studies. Bias assessment: A bias assessment of AI models was done using PROBAST. Participants: Patients tested positive for COVID-19. Results: We included 39 studies related to AI-based prediction of hospitalization and death related to COVID-19. The articles were published in the period 2019-2022, and mostly used Random Forest as the model with the best performance. AI models were trained using cohorts of individuals sampled from populations of European and non-European countries, mostly with cohort sample size <5,000. Data collection generally included information on demographics, clinical records, laboratory results, and pharmacological treatments (i.e., high-dimensional datasets). In most studies, the models were internally validated with cross-validation, but the majority of studies lacked external validation and calibration. Covariates were not prioritized using ensemble approaches in most of the studies, however, models still showed moderately good performances with Area under the Receiver operating characteristic Curve (AUC) values >0.7. According to the assessment with PROBAST, all models had a high risk of bias and/or concern regarding applicability. Conclusions: A broad range of AI techniques have been used to predict COVID-19 hospitalization and mortality. The studies reported good prediction performance of AI models, however, high risk of bias and/or concern regarding applicability were detected.

Subject(s)

Artificial Intelligence , COVID-19 , Humans , COVID-19/epidemiology , Hospitalization , Language , ROC Curve

3.

Profiling the baseline performance and limits of machine learning models for adaptive immune receptor repertoire classification.

Kanduri, Chakravarthi; Pavlovic, Milena; Scheffer, Lonneke; Motwani, Keshav; Chernigovskaya, Maria; Greiff, Victor; Sandve, Geir K.

Gigascience ; 112022 05 25.

Article in English | MEDLINE | ID: mdl-35639633

ABSTRACT

BACKGROUND: Machine learning (ML) methodology development for the classification of immune states in adaptive immune receptor repertoires (AIRRs) has seen a recent surge of interest. However, so far, there does not exist a systematic evaluation of scenarios where classical ML methods (such as penalized logistic regression) already perform adequately for AIRR classification. This hinders investigative reorientation to those scenarios where method development of more sophisticated ML approaches may be required. RESULTS: To identify those scenarios where a baseline ML method is able to perform well for AIRR classification, we generated a collection of synthetic AIRR benchmark data sets encompassing a wide range of data set architecture-associated and immune state-associated sequence patterns (signal) complexity. We trained ≈1,700 ML models with varying assumptions regarding immune signal on ≈1,000 data sets with a total of ≈250,000 AIRRs containing ≈46 billion TCRß CDR3 amino acid sequences, thereby surpassing the sample sizes of current state-of-the-art AIRR-ML setups by two orders of magnitude. We found that L1-penalized logistic regression achieved high prediction accuracy even when the immune signal occurs only in 1 out of 50,000 AIR sequences. CONCLUSIONS: We provide a reference benchmark to guide new AIRR-ML classification methodology by (i) identifying those scenarios characterized by immune signal and data set complexity, where baseline methods already achieve high prediction accuracy, and (ii) facilitating realistic expectations of the performance of AIRR-ML models given training data set properties and assumptions. Our study serves as a template for defining specialized AIRR benchmark data sets for comprehensive benchmarking of AIRR-ML methods.

Subject(s)

Machine Learning , Receptors, Immunologic

4.

In silico proof of principle of machine learning-based antibody design at unconstrained scale.

Akbar, Rahmad; Robert, Philippe A; Weber, Cédric R; Widrich, Michael; Frank, Robert; Pavlovic, Milena; Scheffer, Lonneke; Chernigovskaya, Maria; Snapkov, Igor; Slabodkin, Andrei; Mehta, Brij Bhushan; Miho, Enkelejda; Lund-Johansen, Fridtjof; Andersen, Jan Terje; Hochreiter, Sepp; Hobæk Haff, Ingrid; Klambauer, Günter; Sandve, Geir Kjetil; Greiff, Victor.

MAbs ; 14(1): 2031482, 2022.

Article in English | MEDLINE | ID: mdl-35377271

ABSTRACT

Generative machine learning (ML) has been postulated to become a major driver in the computational design of antigen-specific monoclonal antibodies (mAb). However, efforts to confirm this hypothesis have been hindered by the infeasibility of testing arbitrarily large numbers of antibody sequences for their most critical design parameters: paratope, epitope, affinity, and developability. To address this challenge, we leveraged a lattice-based antibody-antigen binding simulation framework, which incorporates a wide range of physiological antibody-binding parameters. The simulation framework enables the computation of synthetic antibody-antigen 3D-structures, and it functions as an oracle for unrestricted prospective evaluation and benchmarking of antibody design parameters of ML-generated antibody sequences. We found that a deep generative model, trained exclusively on antibody sequence (one dimensional: 1D) data can be used to design conformational (three dimensional: 3D) epitope-specific antibodies, matching, or exceeding the training dataset in affinity and developability parameter value variety. Furthermore, we established a lower threshold of sequence diversity necessary for high-accuracy generative antibody ML and demonstrated that this lower threshold also holds on experimental real-world data. Finally, we show that transfer learning enables the generation of high-affinity antibody sequences from low-N training data. Our work establishes a priori feasibility and the theoretical foundation of high-throughput ML-based mAb design.

Subject(s)

Antigen-Antibody Reactions , Machine Learning , Antibodies, Monoclonal/chemistry , Binding Sites, Antibody , Epitopes

5.

simAIRR: simulation of adaptive immune repertoires with realistic receptor sequence sharing for benchmarking of immune state prediction methods.

Kanduri, Chakravarthi; Scheffer, Lonneke; Pavlovic, Milena; Rand, Knut Dagestad; Chernigovskaya, Maria; Pirvandy, Oz; Yaari, Gur; Greiff, Victor; Sandve, Geir K.

Gigascience ; 122022 12 28.

Article in English | MEDLINE | ID: mdl-37848619

ABSTRACT

BACKGROUND: Machine learning (ML) has gained significant attention for classifying immune states in adaptive immune receptor repertoires (AIRRs) to support the advancement of immunodiagnostics and therapeutics. Simulated data are crucial for the rigorous benchmarking of AIRR-ML methods. Existing approaches to generating synthetic benchmarking datasets result in the generation of naive repertoires missing the key feature of many shared receptor sequences (selected for common antigens) found in antigen-experienced repertoires. RESULTS: We demonstrate that a common approach to generating simulated AIRR benchmark datasets can introduce biases, which may be exploited for undesired shortcut learning by certain ML methods. To mitigate undesirable access to true signals in simulated AIRR datasets, we devised a simulation strategy (simAIRR) that constructs antigen-experienced-like repertoires with a realistic overlap of receptor sequences. simAIRR can be used for constructing AIRR-level benchmarks based on a range of assumptions (or experimental data sources) for what constitutes receptor-level immune signals. This includes the possibility of making or not making any prior assumptions regarding the similarity or commonality of immune state-associated sequences that will be used as true signals. We demonstrate the real-world realism of our proposed simulation approach by showing that basic ML strategies perform similarly on simAIRR-generated and real-world experimental AIRR datasets. CONCLUSIONS: This study sheds light on the potential shortcut learning opportunities for ML methods that can arise with the state-of-the-art way of simulating AIRR datasets. simAIRR is available as a Python package: https://github.com/KanduriC/simAIRR.

Subject(s)

Benchmarking , Computer Simulation

6.

Unconstrained generation of synthetic antibody-antigen structures to guide machine learning methodology for antibody specificity prediction.

Robert, Philippe A; Akbar, Rahmad; Frank, Robert; Pavlovic, Milena; Widrich, Michael; Snapkov, Igor; Slabodkin, Andrei; Chernigovskaya, Maria; Scheffer, Lonneke; Smorodina, Eva; Rawat, Puneet; Mehta, Brij Bhushan; Vu, Mai Ha; Mathisen, Ingvild Frøberg; Prósz, Aurél; Abram, Krzysztof; Olar, Alex; Miho, Enkelejda; Haug, Dag Trygve Tryslew; Lund-Johansen, Fridtjof; Hochreiter, Sepp; Haff, Ingrid Hobæk; Klambauer, Günter; Sandve, Geir Kjetil; Greiff, Victor.

Nat Comput Sci ; 2(12): 845-865, 2022 Dec.

Article in English | MEDLINE | ID: mdl-38177393

ABSTRACT

Machine learning (ML) is a key technology for accurate prediction of antibody-antigen binding. Two orthogonal problems hinder the application of ML to antibody-specificity prediction and the benchmarking thereof: the lack of a unified ML formalization of immunological antibody-specificity prediction problems and the unavailability of large-scale synthetic datasets to benchmark real-world relevant ML methods and dataset design. Here we developed the Absolut! software suite that enables parameter-based unconstrained generation of synthetic lattice-based three-dimensional antibody-antigen-binding structures with ground-truth access to conformational paratope, epitope and affinity. We formalized common immunological antibody-specificity prediction problems as ML tasks and confirmed that for both sequence- and structure-based tasks, accuracy-based rankings of ML methods trained on experimental data hold for ML methods trained on Absolut!-generated data. The Absolut! framework has the potential to enable real-world relevant development and benchmarking of ML strategies for biotherapeutics design.

Subject(s)

Antibodies , Antigen-Antibody Reactions , Antibody Specificity , Epitopes/chemistry , Machine Learning

7.

Individualized VDJ recombination predisposes the available Ig sequence space.

Slabodkin, Andrei; Chernigovskaya, Maria; Mikocziova, Ivana; Akbar, Rahmad; Scheffer, Lonneke; Pavlovic, Milena; Bashour, Habib; Snapkov, Igor; Mehta, Brij Bhushan; Weber, Cédric R; Gutierrez-Marcos, Jose; Sollid, Ludvig M; Haff, Ingrid Hobæk; Sandve, Geir Kjetil; Robert, Philippe A; Greiff, Victor.

Genome Res ; 31(12): 2209-2224, 2021 Dec.

Article in English | MEDLINE | ID: mdl-34815307

ABSTRACT

The process of recombination between variable (V), diversity (D), and joining (J) immunoglobulin (Ig) gene segments determines an individual's naive Ig repertoire and, consequently, (auto)antigen recognition. VDJ recombination follows probabilistic rules that can be modeled statistically. So far, it remains unknown whether VDJ recombination rules differ between individuals. If these rules differed, identical (auto)antigen-specific Ig sequences would be generated with individual-specific probabilities, signifying that the available Ig sequence space is individual specific. We devised a sensitivity-tested distance measure that enables inter-individual comparison of VDJ recombination models. We discovered, accounting for several sources of noise as well as allelic variation in Ig sequencing data, that not only unrelated individuals but also human monozygotic twins and even inbred mice possess statistically distinguishable immunoglobulin recombination models. This suggests that, in addition to genetic, there is also nongenetic modulation of VDJ recombination. We demonstrate that population-wide individualized VDJ recombination can result in orders of magnitude of difference in the probability to generate (auto)antigen-specific Ig sequences. Our findings have implications for immune receptor-based individualized medicine approaches relevant to vaccination, infection, and autoimmunity.

8.

A compact vocabulary of paratope-epitope interactions enables predictability of antibody-antigen binding.

Akbar, Rahmad; Robert, Philippe A; Pavlovic, Milena; Jeliazkov, Jeliazko R; Snapkov, Igor; Slabodkin, Andrei; Weber, Cédric R; Scheffer, Lonneke; Miho, Enkelejda; Haff, Ingrid Hobæk; Haug, Dag Trygve Tryslew; Lund-Johansen, Fridtjof; Safonova, Yana; Sandve, Geir K; Greiff, Victor.

Cell Rep ; 34(11): 108856, 2021 03 16.

Article in English | MEDLINE | ID: mdl-33730590

ABSTRACT

Antibody-antigen binding relies on the specific interaction of amino acids at the paratope-epitope interface. The predictability of antibody-antigen binding is a prerequisite for de novo antibody and (neo-)epitope design. A fundamental premise for the predictability of antibody-antigen binding is the existence of paratope-epitope interaction motifs that are universally shared among antibody-antigen structures. In a dataset of non-redundant antibody-antigen structures, we identify structural interaction motifs, which together compose a commonly shared structure-based vocabulary of paratope-epitope interactions. We show that this vocabulary enables the machine learnability of antibody-antigen binding on the paratope-epitope level using generative machine learning. The vocabulary (1) is compact, less than 104 motifs; (2) distinct from non-immune protein-protein interactions; and (3) mediates specific oligo- and polyreactive interactions between paratope-epitope pairs. Our work leverages combined structure- and sequence-based learning to demonstrate that machine-learning-driven predictive paratope and epitope engineering is feasible.

Subject(s)

Antigen-Antibody Reactions/immunology , Binding Sites, Antibody/immunology , Epitopes/immunology , Amino Acid Motifs , Amino Acid Sequence , Antibodies/chemistry , Antibodies/immunology , Complementarity Determining Regions/chemistry , Epitopes/chemistry , Machine Learning , Protein Binding

9.

The immuneML ecosystem for machine learning analysis of adaptive immune receptor repertoires.

Pavlovic, Milena; Scheffer, Lonneke; Motwani, Keshav; Kanduri, Chakravarthi; Kompova, Radmila; Vazov, Nikolay; Waagan, Knut; Bernal, Fabian L M; Costa, Alexandre Almeida; Corrie, Brian; Akbar, Rahmad; Al Hajj, Ghadi S; Balaban, Gabriel; Brusko, Todd M; Chernigovskaya, Maria; Christley, Scott; Cowell, Lindsay G; Frank, Robert; Grytten, Ivar; Gundersen, Sveinung; Haff, Ingrid Hobæk; Hovig, Eivind; Hsieh, Ping-Han; Klambauer, Günter; Kuijjer, Marieke L; Lund-Andersen, Christin; Martini, Antonio; Minotto, Thomas; Pensar, Johan; Rand, Knut; Riccardi, Enrico; Robert, Philippe A; Rocha, Artur; Slabodkin, Andrei; Snapkov, Igor; Sollid, Ludvig M; Titov, Dmytro; Weber, Cédric R; Widrich, Michael; Yaari, Gur; Greiff, Victor; Sandve, Geir Kjetil.

Nat Mach Intell ; 3(11): 936-944, 2021 Nov.

Article in English | MEDLINE | ID: mdl-37396030

ABSTRACT

Adaptive immune receptor repertoires (AIRR) are key targets for biomedical research as they record past and ongoing adaptive immune responses. The capacity of machine learning (ML) to identify complex discriminative sequence patterns renders it an ideal approach for AIRR-based diagnostic and therapeutic discovery. To date, widespread adoption of AIRR ML has been inhibited by a lack of reproducibility, transparency, and interoperability. immuneML (immuneml.uio.no) addresses these concerns by implementing each step of the AIRR ML process in an extensible, open-source software ecosystem that is based on fully specified and shareable workflows. To facilitate widespread user adoption, immuneML is available as a command-line tool and through an intuitive Galaxy web interface, and extensive documentation of workflows is provided. We demonstrate the broad applicability of immuneML by (i) reproducing a large-scale study on immune state prediction, (ii) developing, integrating, and applying a novel deep learning method for antigen specificity prediction, and (iii) showcasing streamlined interpretability-focused benchmarking of AIRR ML.

10.

T cell receptor repertoire as a potential diagnostic marker for celiac disease.

Yao, Ying; Zia, Asima; Neumann, Ralf Stefan; Pavlovic, Milena; Balaban, Gabriel; Lundin, Knut E A; Sandve, Geir Kjetil; Qiao, Shuo-Wang.

Clin Immunol ; 222: 108621, 2021 01.

Article in English | MEDLINE | ID: mdl-33197618

ABSTRACT

An individual's T cell repertoire is skewed towards some specificities as a result of past antigen exposure and subsequent clonal expansion. Identifying T cell receptor signatures associated with a disease is challenging due to the overall complexity of antigens and polymorphic HLA allotypes. In celiac disease, the antigen epitopes are well characterised and the specific HLA-DQ2-restricted T-cell repertoire associated with the disease has been explored in depth. By investigating T cell receptor repertoires of unsorted lamina propria T cells from 15 individuals, we provide the first proof-of-concept study showing that it could be possible to infer disease state by matching against a priori known disease-associated T cell receptor sequences.

Subject(s)

Celiac Disease/diagnosis , Celiac Disease/immunology , Epitopes, T-Lymphocyte/immunology , Receptors, Antigen, T-Cell/immunology , Adolescent , Adult , Aged , Biomarkers , HLA-DQ Antigens/genetics , HLA-DQ Antigens/immunology , Humans , Lymphocyte Activation/immunology , Middle Aged , Mucous Membrane/cytology , Mucous Membrane/immunology , Young Adult

11.

immuneSIM: tunable multi-feature simulation of B- and T-cell receptor repertoires for immunoinformatics benchmarking.

Weber, Cédric R; Akbar, Rahmad; Yermanos, Alexander; Pavlovic, Milena; Snapkov, Igor; Sandve, Geir K; Reddy, Sai T; Greiff, Victor.

Bioinformatics ; 36(11): 3594-3596, 2020 06 01.

Article in English | MEDLINE | ID: mdl-32154832

ABSTRACT

SUMMARY: B- and T-cell receptor repertoires of the adaptive immune system have become a key target for diagnostics and therapeutics research. Consequently, there is a rapidly growing number of bioinformatics tools for immune repertoire analysis. Benchmarking of such tools is crucial for ensuring reproducible and generalizable computational analyses. Currently, however, it remains challenging to create standardized ground truth immune receptor repertoires for immunoinformatics tool benchmarking. Therefore, we developed immuneSIM, an R package that allows the simulation of native-like and aberrant synthetic full-length variable region immune receptor sequences by tuning the following immune receptor features: (i) species and chain type (BCR, TCR, single and paired), (ii) germline gene usage, (iii) occurrence of insertions and deletions, (iv) clonal abundance, (v) somatic hypermutation and (vi) sequence motifs. Each simulated sequence is annotated by the complete set of simulation events that contributed to its in silico generation. immuneSIM permits the benchmarking of key computational tools for immune receptor analysis, such as germline gene annotation, diversity and overlap estimation, sequence similarity, network architecture, clustering analysis and machine learning methods for motif detection. AVAILABILITY AND IMPLEMENTATION: The package is available via https://github.com/GreiffLab/immuneSIM and on CRAN at https://cran.r-project.org/web/packages/immuneSIM. The documentation is hosted at https://immuneSIM.readthedocs.io. CONTACT: sai.reddy@ethz.ch or victor.greiff@medisin.uio.no. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Benchmarking , Software , Computer Simulation , Receptors, Antigen, T-Cell/genetics

12.

Ultra-performance liquid chromatography tandem mass spectrometry for the rapid, simultaneous analysis of ziprasidone and its impurities.

Carapic, Marija; Nikolic, Katarina; Markovic, Bojan; Petkovic, Milos; Pavlovic, Milena; Agbaba, Danica.

Biomed Chromatogr ; 33(2): e4384, 2019 Feb.

Article in English | MEDLINE | ID: mdl-30215855

ABSTRACT

The separation and characterization of the unknown degradation product of second-generation antipsychotic drug ziprasidone are essential for defining the genotoxic potential of the compound. The aim of this study was to develop a simple UHPLC method coupled with tandem mass spectrometry (MS/MS) for chemical characterization of an unknown degradant, and the separation and quantification of ziprasidone and its five main impurities (I-V) in the raw material and pharmaceuticals. Chromatographic conditions were optimized by experimental design. The MS/MS fragmentation conditions were optimized individually for each compound in order to obtain both specific fragments and high signal intensity. A rapid and sensitive UHPLC-MS/MS method was developed. All seven analytes were eluted within the 7 min run time. The best separation was obtained on the Acquity UPLC BEH C18 (50 × 2.1 mm × 1.7 µm) column in gradient mode with ammonium-formate buffer (10 mm; pH 4.7) and acetonitrile as mobile phase, with the flow rate of 0.3 mL min-1 and at the column temperature of 30°C. The new UHPLC-MS/MS method was fully validated and all validation parameters were confirmed. The fragmentation pathways and chemical characterization of an unknown degradant were proposed and it was confirmed that there are no structural alerts concerning genotoxicity.

Subject(s)

Chromatography, High Pressure Liquid/methods , Piperazines/analysis , Piperazines/chemistry , Tandem Mass Spectrometry/methods , Thiazoles/analysis , Thiazoles/chemistry , Drug Contamination , Least-Squares Analysis , Limit of Detection , Reproducibility of Results

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL