Pesquisa | Biblioteca Virtual em Saúde

DISSECT: deep semi-supervised consistency regularization for accurate cell type fraction and gene expression estimation.

Khatri, Robin; Machart, Pierre; Bonn, Stefan.

Genome Biol ; 25(1): 112, 2024 04 30.

Artigo em Inglês | MEDLINE | ID: mdl-38689377

RESUMO

Cell deconvolution is the estimation of cell type fractions and cell type-specific gene expression from mixed data. An unmet challenge in cell deconvolution is the scarcity of realistic training data and the domain shift often observed in synthetic training data. Here, we show that two novel deep neural networks with simultaneous consistency regularization of the target and training domains significantly improve deconvolution performance. Our algorithm, DISSECT, outperforms competing algorithms in cell fraction and gene expression estimation by up to 14 percentage points. DISSECT can be easily adapted to other biomedical data types, as exemplified by our proteomic deconvolution experiments.

Assuntos

Algoritmos , Humanos , Proteômica/métodos , Perfilação da Expressão Gênica/métodos , Aprendizado Profundo , Redes Neurais de Computação

NeoAgDT: optimization of personal neoantigen vaccine composition by digital twin simulation of a cancer cell population.

Mösch, Anja; Grazioli, Filippo; Machart, Pierre; Malone, Brandon.

Bioinformatics ; 40(5)2024 05 02.

Artigo em Inglês | MEDLINE | ID: mdl-38614133

RESUMO

MOTIVATION: Neoantigen vaccines make use of tumor-specific mutations to enable the patient's immune system to recognize and eliminate cancer. Selecting vaccine elements, however, is a complex task which needs to take into account not only the underlying antigen presentation pathway but also tumor heterogeneity. RESULTS: Here, we present NeoAgDT, a two-step approach consisting of: (i) simulating individual cancer cells to create a digital twin of the patient's tumor cell population and (ii) optimizing the vaccine composition by integer linear programming based on this digital twin. NeoAgDT shows improved selection of experimentally validated neoantigens over ranking-based approaches in a study of seven patients. AVAILABILITY AND IMPLEMENTATION: The NeoAgDT code is published on Github: https://github.com/nec-research/neoagdt.

Assuntos

Antígenos de Neoplasias , Vacinas Anticâncer , Neoplasias , Software , Humanos , Vacinas Anticâncer/imunologia , Neoplasias/imunologia , Antígenos de Neoplasias/imunologia , Mutação , Simulação por Computador , Biologia Computacional/métodos , Algoritmos

DISCERN: deep single-cell expression reconstruction for improved cell clustering and cell subtype and state detection.

Hausmann, Fabian; Ergen, Can; Khatri, Robin; Marouf, Mohamed; Hänzelmann, Sonja; Gagliani, Nicola; Huber, Samuel; Machart, Pierre; Bonn, Stefan.

Genome Biol ; 24(1): 212, 2023 09 20.

Artigo em Inglês | MEDLINE | ID: mdl-37730638

RESUMO

BACKGROUND: Single-cell sequencing provides detailed insights into biological processes including cell differentiation and identity. While providing deep cell-specific information, the method suffers from technical constraints, most notably a limited number of expressed genes per cell, which leads to suboptimal clustering and cell type identification. RESULTS: Here, we present DISCERN, a novel deep generative network that precisely reconstructs missing single-cell gene expression using a reference dataset. DISCERN outperforms competing algorithms in expression inference resulting in greatly improved cell clustering, cell type and activity detection, and insights into the cellular regulation of disease. We show that DISCERN is robust against differences between batches and is able to keep biological differences between batches, which is a common problem for imputation and batch correction algorithms. We use DISCERN to detect two unseen COVID-19-associated T cell types, cytotoxic CD4+ and CD8+ Tc2 T helper cells, with a potential role in adverse disease outcome. We utilize T cell fraction information of patient blood to classify mild or severe COVID-19 with an AUROC of 80% that can serve as a biomarker of disease stage. DISCERN can be easily integrated into existing single-cell sequencing workflow. CONCLUSIONS: Thus, DISCERN is a flexible tool for reconstructing missing single-cell gene expression using a reference dataset and can easily be applied to a variety of data sets yielding novel insights, e.g., into disease mechanisms.

Assuntos

COVID-19 , Humanos , COVID-19/genética , Algoritmos , Ciclo Celular , Diferenciação Celular , Análise por Conglomerados

Attentive Variational Information Bottleneck for TCR-peptide interaction prediction.

Grazioli, Filippo; Machart, Pierre; Mösch, Anja; Li, Kai; Castorina, Leonardo V; Pfeifer, Nico; Min, Martin Renqiang.

Bioinformatics ; 39(1)2023 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-36571499

RESUMO

MOTIVATION: We present a multi-sequence generalization of Variational Information Bottleneck and call the resulting model Attentive Variational Information Bottleneck (AVIB). Our AVIB model leverages multi-head self-attention to implicitly approximate a posterior distribution over latent encodings conditioned on multiple input sequences. We apply AVIB to a fundamental immuno-oncology problem: predicting the interactions between T-cell receptors (TCRs) and peptides. RESULTS: Experimental results on various datasets show that AVIB significantly outperforms state-of-the-art methods for TCR-peptide interaction prediction. Additionally, we show that the latent posterior distribution learned by AVIB is particularly effective for the unsupervised detection of out-of-distribution amino acid sequences. AVAILABILITY AND IMPLEMENTATION: The code and the data used for this study are publicly available at: https://github.com/nec-research/vibtcr. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Peptídeos , Software , Sequência de Aminoácidos , Receptores de Antígenos de Linfócitos T/genética

On TCR binding predictors failing to generalize to unseen peptides.

Grazioli, Filippo; Mösch, Anja; Machart, Pierre; Li, Kai; Alqassem, Israa; O'Donnell, Timothy J; Min, Martin Renqiang.

Front Immunol ; 13: 1014256, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36341448

RESUMO

Several recent studies investigate TCR-peptide/-pMHC binding prediction using machine learning or deep learning approaches. Many of these methods achieve impressive results on test sets, which include peptide sequences that are also included in the training set. In this work, we investigate how state-of-the-art deep learning models for TCR-peptide/-pMHC binding prediction generalize to unseen peptides. We create a dataset including positive samples from IEDB, VDJdb, McPAS-TCR, and the MIRA set, as well as negative samples from both randomization and 10X Genomics assays. We name this collection of samples TChard. We propose the hard split, a simple heuristic for training/test split, which ensures that test samples exclusively present peptides that do not belong to the training set. We investigate the effect of different training/test splitting techniques on the models' test performance, as well as the effect of training and testing the models using mismatched negative samples generated randomly, in addition to the negative samples derived from assays. Our results show that modern deep learning methods fail to generalize to unseen peptides. We provide an explanation why this happens and verify our hypothesis on the TChard dataset. We then conclude that robust prediction of TCR recognition is still far for being solved.

Assuntos

Peptídeos , Receptores de Antígenos de Linfócitos T , Receptores de Antígenos de Linfócitos T/metabolismo , Ligação Proteica , Peptídeos/metabolismo

Realistic in silico generation and augmentation of single-cell RNA-seq data using generative adversarial networks.

Marouf, Mohamed; Machart, Pierre; Bansal, Vikas; Kilian, Christoph; Magruder, Daniel S; Krebs, Christian F; Bonn, Stefan.

Nat Commun ; 11(1): 166, 2020 01 09.

Artigo em Inglês | MEDLINE | ID: mdl-31919373

RESUMO

A fundamental problem in biomedical research is the low number of observations available, mostly due to a lack of available biosamples, prohibitive costs, or ethical reasons. Augmenting few real observations with generated in silico samples could lead to more robust analysis results and a higher reproducibility rate. Here, we propose the use of conditional single-cell generative adversarial neural networks (cscGAN) for the realistic generation of single-cell RNA-seq data. cscGAN learns non-linear gene-gene dependencies from complex, multiple cell type samples and uses this information to generate realistic cells of defined types. Augmenting sparse cell populations with cscGAN generated cells improves downstream analyses such as the detection of marker genes, the robustness and reliability of classifiers, the assessment of novel analysis algorithms, and might reduce the number of animal experiments and costs in consequence. cscGAN outperforms existing methods for single-cell RNA-seq data generation in quality and hold great promise for the realistic generation and augmentation of other biomedical data types.

Assuntos

Pesquisa Biomédica/métodos , RNA-Seq/métodos , RNA/genética , Algoritmos , Animais , Simulação por Computador , Humanos , Camundongos , Modelos Teóricos , Redes Neurais de Computação

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA