Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
Más filtros













Base de datos
Intervalo de año de publicación
1.
PLoS One ; 19(4): e0298906, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38625909

RESUMEN

Detecting epistatic drivers of human phenotypes is a considerable challenge. Traditional approaches use regression to sequentially test multiplicative interaction terms involving pairs of genetic variants. For higher-order interactions and genome-wide large-scale data, this strategy is computationally intractable. Moreover, multiplicative terms used in regression modeling may not capture the form of biological interactions. Building on the Predictability, Computability, Stability (PCS) framework, we introduce the epiTree pipeline to extract higher-order interactions from genomic data using tree-based models. The epiTree pipeline first selects a set of variants derived from tissue-specific estimates of gene expression. Next, it uses iterative random forests (iRF) to search training data for candidate Boolean interactions (pairwise and higher-order). We derive significance tests for interactions, based on a stabilized likelihood ratio test, by simulating Boolean tree-structured null (no epistasis) and alternative (epistasis) distributions on hold-out test data. Finally, our pipeline computes PCS epistasis p-values that probabilisticly quantify improvement in prediction accuracy via bootstrap sampling on the test set. We validate the epiTree pipeline in two case studies using data from the UK Biobank: predicting red hair and multiple sclerosis (MS). In the case of predicting red hair, epiTree recovers known epistatic interactions surrounding MC1R and novel interactions, representing non-linearities not captured by logistic regression models. In the case of predicting MS, a more complex phenotype than red hair, epiTree rankings prioritize novel interactions surrounding HLA-DRB1, a variant previously associated with MS in several populations. Taken together, these results highlight the potential for epiTree rankings to help reduce the design space for follow up experiments.


Asunto(s)
Epistasis Genética , Estudio de Asociación del Genoma Completo , Humanos , Estudio de Asociación del Genoma Completo/métodos , Fenotipo , Herencia Multifactorial/genética , Modelos Logísticos , Polimorfismo de Nucleótido Simple
2.
bioRxiv ; 2024 Apr 06.
Artículo en Inglés | MEDLINE | ID: mdl-37873118

RESUMEN

Whereas protein language models have demonstrated remarkable efficacy in predicting the effects of missense variants, DNA counterparts have not yet achieved a similar competitive edge for genome-wide variant effect predictions, especially in complex genomes such as that of humans. To address this challenge, we here introduce GPN-MSA, a novel framework for DNA language models that leverages whole-genome sequence alignments across multiple species and takes only a few hours to train. Across several benchmarks on clinical databases (ClinVar, COSMIC, OMIM), experimental functional assays (DMS, DepMap), and population genomic data (gnomAD), our model for the human genome achieves outstanding performance on deleteriousness prediction for both coding and non-coding variants.

3.
Genome Biol ; 24(1): 182, 2023 08 07.
Artículo en Inglés | MEDLINE | ID: mdl-37550700

RESUMEN

BACKGROUND: Genetic variation in the human genome is a major determinant of individual disease risk, but the vast majority of missense variants have unknown etiological effects. Here, we present a robust learning framework for leveraging saturation mutagenesis experiments to construct accurate computational predictors of proteome-wide missense variant pathogenicity. RESULTS: We train cross-protein transfer (CPT) models using deep mutational scanning (DMS) data from only five proteins and achieve state-of-the-art performance on clinical variant interpretation for unseen proteins across the human proteome. We also improve predictive accuracy on DMS data from held-out proteins. High sensitivity is crucial for clinical applications and our model CPT-1 particularly excels in this regime. For instance, at 95% sensitivity of detecting human disease variants annotated in ClinVar, CPT-1 improves specificity to 68%, from 27% for ESM-1v and 55% for EVE. Furthermore, for genes not used to train REVEL, a supervised method widely used by clinicians, we show that CPT-1 compares favorably with REVEL. Our framework combines predictive features derived from general protein sequence models, vertebrate sequence alignments, and AlphaFold structures, and it is adaptable to the future inclusion of other sources of information. We find that vertebrate alignments, albeit rather shallow with only 100 genomes, provide a strong signal for variant pathogenicity prediction that is complementary to recent deep learning-based models trained on massive amounts of protein sequence data. We release predictions for all possible missense variants in 90% of human genes. CONCLUSIONS: Our results demonstrate the utility of mutational scanning data for learning properties of variants that transfer to unseen proteins.


Asunto(s)
Aprendizaje Automático , Proteoma , Humanos , Proteoma/genética , Secuencia de Aminoácidos , Mutación , Mutación Missense , Biología Computacional/métodos
4.
Nat Commun ; 11(1): 651, 2020 01 31.
Artículo en Inglés | MEDLINE | ID: mdl-32005835

RESUMEN

While single cell RNA sequencing (scRNA-seq) is invaluable for studying cell populations, cell-surface proteins are often integral markers of cellular function and serve as primary targets for therapeutic intervention. Here we propose a transfer learning framework, single cell Transcriptome to Protein prediction with deep neural network (cTP-net), to impute surface protein abundances from scRNA-seq data by learning from existing single-cell multi-omic resources.


Asunto(s)
Células/metabolismo , Perfilación de la Expresión Génica/métodos , Proteínas de la Membrana/genética , Análisis de la Célula Individual/métodos , Transcriptoma , Células/citología , Humanos , Proteínas de la Membrana/metabolismo , Redes Neurales de la Computación , Análisis de Secuencia de ARN
5.
Nat Methods ; 16(9): 875-878, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31471617

RESUMEN

Single-cell RNA sequencing (scRNA-seq) data are noisy and sparse. Here, we show that transfer learning across datasets remarkably improves data quality. By coupling a deep autoencoder with a Bayesian model, SAVER-X extracts transferable gene-gene relationships across data from different labs, varying conditions and divergent species, to denoise new target datasets.


Asunto(s)
Neoplasias de la Mama/metabolismo , Biología Computacional/métodos , Leucocitos Mononucleares/metabolismo , Análisis de Secuencia de ARN/normas , Análisis de la Célula Individual/métodos , Linfocitos T/metabolismo , Transcriptoma , Animales , Teorema de Bayes , Femenino , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Humanos , Ratones , Análisis de Secuencia de ARN/métodos
6.
Bioinformatics ; 35(24): 5155-5162, 2019 12 15.
Artículo en Inglés | MEDLINE | ID: mdl-31197307

RESUMEN

MOTIVATION: Dropout is a common phenomenon in single-cell RNA-seq (scRNA-seq) data, and when left unaddressed it affects the validity of the statistical analyses. Despite this, few current methods for differential expression (DE) analysis of scRNA-seq data explicitly model the process that gives rise to the dropout events. We develop DECENT, a method for DE analysis of scRNA-seq data that explicitly and accurately models the molecule capture process in scRNA-seq experiments. RESULTS: We show that DECENT demonstrates improved DE performance over existing DE methods that do not explicitly model dropout. This improvement is consistently observed across several public scRNA-seq datasets generated using different technological platforms. The gain in improvement is especially large when the capture process is overdispersed. DECENT maintains type I error well while achieving better sensitivity. Its performance without spike-ins is almost as good as when spike-ins are used to calibrate the capture model. AVAILABILITY AND IMPLEMENTATION: The method is implemented as a publicly available R package available from https://github.com/cz-ye/DECENT. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Análisis de la Célula Individual , Programas Informáticos , Perfilación de la Expresión Génica , RNA-Seq , Análisis de Secuencia de ARN
7.
Nat Med ; 24(12): 1941, 2018 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-30135555

RESUMEN

In the version of this article originally published, the institution in affiliation 10 was missing. Affiliation 10 was originally listed as Department of Surgery, Royal Melbourne Hospital and Royal Womens' Hospital, Melbourne, Victoria, Australia. It should have been Department of Surgery, Royal Melbourne Hospital and Royal Womens' Hospital, University of Melbourne, Melbourne, Victoria, Australia. The error has been corrected in the HTML and PDF versions of this article.

8.
Nat Med ; 24(7): 986-993, 2018 07.
Artículo en Inglés | MEDLINE | ID: mdl-29942092

RESUMEN

The quantity of tumor-infiltrating lymphocytes (TILs) in breast cancer (BC) is a robust prognostic factor for improved patient survival, particularly in triple-negative and HER2-overexpressing BC subtypes1. Although T cells are the predominant TIL population2, the relationship between quantitative and qualitative differences in T cell subpopulations and patient prognosis remains unknown. We performed single-cell RNA sequencing (scRNA-seq) of 6,311 T cells isolated from human BCs and show that significant heterogeneity exists in the infiltrating T cell population. We demonstrate that BCs with a high number of TILs contained CD8+ T cells with features of tissue-resident memory T (TRM) cell differentiation and that these CD8+ TRM cells expressed high levels of immune checkpoint molecules and effector proteins. A CD8+ TRM gene signature developed from the scRNA-seq data was significantly associated with improved patient survival in early-stage triple-negative breast cancer (TNBC) and provided better prognostication than CD8 expression alone. Our data suggest that CD8+ TRM cells contribute to BC immunosurveillance and are the key targets of modulation by immune checkpoint inhibition. Further understanding of the development, maintenance and regulation of TRM cells will be crucial for successful immunotherapeutic development in BC.


Asunto(s)
Neoplasias de la Mama/inmunología , Memoria Inmunológica , Análisis de la Célula Individual/métodos , Neoplasias de la Mama/patología , Complejo CD3/metabolismo , Antígenos CD8/metabolismo , Supervivencia sin Enfermedad , Femenino , Humanos , Estimación de Kaplan-Meier , Linfocitos Infiltrantes de Tumor/inmunología , Pronóstico , Análisis de Secuencia de ARN , Neoplasias de la Mama Triple Negativas/inmunología , Neoplasias de la Mama Triple Negativas/patología
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA