Pesquisa | Portal Regional da BVS

N-of-one differential gene expression without control samples using a deep generative model.

Prada-Luengo, Iñigo; Schuster, Viktoria; Liang, Yuhu; Terkelsen, Thilde; Sora, Valentina; Krogh, Anders.

Genome Biol ; 24(1): 263, 2023 Nov 16.

Artigo em Inglês | MEDLINE | ID: mdl-37974217

RESUMO

Differential analysis of bulk RNA-seq data often suffers from lack of good controls. Here, we present a generative model that replaces controls, trained solely on healthy tissues. The unsupervised model learns a low-dimensional representation and can identify the closest normal representation for a given disease sample. This enables control-free, single-sample differential expression analysis. In breast cancer, we demonstrate how our approach selects marker genes and outperforms a state-of-the-art method. Furthermore, significant genes identified by the model are enriched in driver genes across cancers. Our results show that the in silico closest normal provides a more favorable comparison than control samples.

Assuntos

Aprendizagem , Aprendizado de Máquina , RNA-Seq/métodos , Expressão Gênica

The Deep Generative Decoder: MAP estimation of representations improves modelling of single-cell RNA data.

Schuster, Viktoria; Krogh, Anders.

Bioinformatics ; 39(9)2023 09 02.

Artigo em Inglês | MEDLINE | ID: mdl-37572301

RESUMO

MOTIVATION: Learning low-dimensional representations of single-cell transcriptomics has become instrumental to its downstream analysis. The state of the art is currently represented by neural network models, such as variational autoencoders, which use a variational approximation of the likelihood for inference. RESULTS: We here present the Deep Generative Decoder (DGD), a simple generative model that computes model parameters and representations directly via maximum a posteriori estimation. The DGD handles complex parameterized latent distributions naturally unlike variational autoencoders, which typically use a fixed Gaussian distribution, because of the complexity of adding other types. We first show its general functionality on a commonly used benchmark set, Fashion-MNIST. Secondly, we apply the model to multiple single-cell datasets. Here, the DGD learns low-dimensional, meaningful, and well-structured latent representations with sub-clustering beyond the provided labels. The advantages of this approach are its simplicity and its capability to provide representations of much smaller dimensionality than a comparable variational autoencoder. AVAILABILITY AND IMPLEMENTATION: scDGD is available as a python package at https://github.com/Center-for-Health-Data-Science/scDGD. The remaining code is made available here: https://github.com/Center-for-Health-Data-Science/dgd.

Assuntos

Redes Neurais de Computação , RNA , Perfilação da Expressão Gênica , Probabilidade , Distribuição Normal , Análise de Célula Única

A Manifold Learning Perspective on Representation Learning: Learning Decoder and Representations without an Encoder.

Schuster, Viktoria; Krogh, Anders.

Entropy (Basel) ; 23(11)2021 Oct 25.

Artigo em Inglês | MEDLINE | ID: mdl-34828101

RESUMO

Autoencoders are commonly used in representation learning. They consist of an encoder and a decoder, which provide a straightforward method to map n-dimensional data in input space to a lower m-dimensional representation space and back. The decoder itself defines an m-dimensional manifold in input space. Inspired by manifold learning, we showed that the decoder can be trained on its own by learning the representations of the training samples along with the decoder weights using gradient descent. A sum-of-squares loss then corresponds to optimizing the manifold to have the smallest Euclidean distance to the training samples, and similarly for other loss functions. We derived expressions for the number of samples needed to specify the encoder and decoder and showed that the decoder generally requires much fewer training samples to be well-specified compared to the encoder. We discuss the training of autoencoders in this perspective and relate it to previous work in the field that uses noisy training examples and other types of regularization. On the natural image data sets MNIST and CIFAR10, we demonstrated that the decoder is much better suited to learn a low-dimensional representation, especially when trained on small data sets. Using simulated gene regulatory data, we further showed that the decoder alone leads to better generalization and meaningful representations. Our approach of training the decoder alone facilitates representation learning even on small data sets and can lead to improved training of autoencoders. We hope that the simple analyses presented will also contribute to an improved conceptual understanding of representation learning.

NetTCR-2.0 enables accurate prediction of TCR-peptide binding by using paired TCRα and ß sequence data.

Montemurro, Alessandro; Schuster, Viktoria; Povlsen, Helle Rus; Bentzen, Amalie Kai; Jurtz, Vanessa; Chronister, William D; Crinklaw, Austin; Hadrup, Sine R; Winther, Ole; Peters, Bjoern; Jessen, Leon Eyrich; Nielsen, Morten.

Commun Biol ; 4(1): 1060, 2021 09 10.

Artigo em Inglês | MEDLINE | ID: mdl-34508155

RESUMO

Prediction of T-cell receptor (TCR) interactions with MHC-peptide complexes remains highly challenging. This challenge is primarily due to three dominant factors: data accuracy, data scarceness, and problem complexity. Here, we showcase that "shallow" convolutional neural network (CNN) architectures are adequate to deal with the problem complexity imposed by the length variations of TCRs. We demonstrate that current public bulk CDR3ß-pMHC binding data overall is of low quality and that the development of accurate prediction models is contingent on paired α/ß TCR sequence data corresponding to at least 150 distinct pairs for each investigated pMHC. In comparison, models trained on CDR3α or CDR3ß data alone demonstrated a variable and pMHC specific relative performance drop. Together these findings support that T-cell specificity is predictable given the availability of accurate and sufficient paired TCR sequence data. NetTCR-2.0 is publicly available at https://services.healthtech.dtu.dk/service.php?NetTCR-2.0 .

Assuntos

Redes Neurais de Computação , Receptores de Antígenos de Linfócitos T/química , Ligação Proteica

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA