Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
Cell ; 174(3): 716-729.e27, 2018 07 26.
Artículo en Inglés | MEDLINE | ID: mdl-29961576

RESUMEN

Single-cell RNA sequencing technologies suffer from many sources of technical noise, including under-sampling of mRNA molecules, often termed "dropout," which can severely obscure important gene-gene relationships. To address this, we developed MAGIC (Markov affinity-based graph imputation of cells), a method that shares information across similar cells, via data diffusion, to denoise the cell count matrix and fill in missing transcripts. We validate MAGIC on several biological systems and find it effective at recovering gene-gene relationships and additional structures. Applied to the epithilial to mesenchymal transition, MAGIC reveals a phenotypic continuum, with the majority of cells residing in intermediate states that display stem-like signatures, and infers known and previously uncharacterized regulatory interactions, demonstrating that our approach can successfully uncover regulatory relations without perturbations.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Algoritmos , Línea Celular , Epistasis Genética/genética , Redes Reguladoras de Genes/genética , Humanos , Cadenas de Markov , MicroARNs/genética , ARN Mensajero/genética , Programas Informáticos
2.
Annu Rev Genomics Hum Genet ; 25(1): 27-49, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-38382493

RESUMEN

Population-scale single-cell genomics is a transformative approach for unraveling the intricate links between genetic and cellular variation. This approach is facilitated by cutting-edge experimental methodologies, including the development of high-throughput single-cell multiomics and advances in multiplexed environmental and genetic perturbations. Examining the effects of natural or synthetic genetic variants across cellular contexts provides insights into the mutual influence of genetics and the environment in shaping cellular heterogeneity. The development of computational methodologies further enables detailed quantitative analysis of molecular variation, offering an opportunity to examine the respective roles of stochastic, intercellular, and interindividual variation. Future opportunities lie in leveraging long-read sequencing, refining disease-relevant cellular models, and embracing predictive and generative machine learning models. These advancements hold the potential for a deeper understanding of the genetic architecture of human molecular traits, which in turn has important implications for understanding the genetic causes of human disease.


Asunto(s)
Variación Genética , Análisis de la Célula Individual , Humanos , Análisis de la Célula Individual/métodos , Genómica/métodos , Aprendizaje Automático , Genética de Población
3.
bioRxiv ; 2024 Jul 10.
Artículo en Inglés | MEDLINE | ID: mdl-39026761

RESUMEN

Background: A number of deep learning models have been developed to predict epigenetic features such as chromatin accessibility from DNA sequence. Model evaluations commonly report performance genome-wide; however, cis regulatory elements (CREs), which play critical roles in gene regulation, make up only a small fraction of the genome. Furthermore, cell type specific CREs contain a large proportion of complex disease heritability. Results: We evaluate genomic deep learning models in chromatin accessibility regions with varying degrees of cell type specificity. We assess two modeling directions in the field: general purpose models trained across thousands of outputs (cell types and epigenetic marks), and models tailored to specific tissues and tasks. We find that the accuracy of genomic deep learning models, including two state-of-the-art general purpose models - Enformer and Sei - varies across the genome and is reduced in cell type specific accessible regions. Using accessibility models trained on cell types from specific tissues, we find that increasing model capacity to learn cell type specific regulatory syntax - through single-task learning or high capacity multi-task models - can improve performance in cell type specific accessible regions. We also observe that improving reference sequence predictions does not consistently improve variant effect predictions, indicating that novel strategies are needed to improve performance on variants. Conclusions: Our results provide a new perspective on the performance of genomic deep learning models, showing that performance varies across the genome and is particularly reduced in cell type specific accessible regions. We also identify strategies to maximize performance in cell type specific accessible regions.

4.
Genome Biol ; 25(1): 202, 2024 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-39090688

RESUMEN

BACKGROUND: A number of deep learning models have been developed to predict epigenetic features such as chromatin accessibility from DNA sequence. Model evaluations commonly report performance genome-wide; however, cis regulatory elements (CREs), which play critical roles in gene regulation, make up only a small fraction of the genome. Furthermore, cell type-specific CREs contain a large proportion of complex disease heritability. RESULTS: We evaluate genomic deep learning models in chromatin accessibility regions with varying degrees of cell type specificity. We assess two modeling directions in the field: general purpose models trained across thousands of outputs (cell types and epigenetic marks) and models tailored to specific tissues and tasks. We find that the accuracy of genomic deep learning models, including two state-of-the-art general purpose models-Enformer and Sei-varies across the genome and is reduced in cell type-specific accessible regions. Using accessibility models trained on cell types from specific tissues, we find that increasing model capacity to learn cell type-specific regulatory syntax-through single-task learning or high capacity multi-task models-can improve performance in cell type-specific accessible regions. We also observe that improving reference sequence predictions does not consistently improve variant effect predictions, indicating that novel strategies are needed to improve performance on variants. CONCLUSIONS: Our results provide a new perspective on the performance of genomic deep learning models, showing that performance varies across the genome and is particularly reduced in cell type-specific accessible regions. We also identify strategies to maximize performance in cell type-specific accessible regions.


Asunto(s)
Cromatina , Aprendizaje Profundo , Genómica , Humanos , Cromatina/genética , Genómica/métodos , Secuencias Reguladoras de Ácidos Nucleicos , Especificidad de Órganos/genética , Epigénesis Genética , Modelos Genéticos
5.
bioRxiv ; 2024 Jun 22.
Artículo en Inglés | MEDLINE | ID: mdl-38948875

RESUMEN

Kidney disease is highly heritable; however, the causal genetic variants, the cell types in which these variants function, and the molecular mechanisms underlying kidney disease remain largely unknown. To identify genetic loci affecting kidney function, we performed a GWAS using multiple kidney function biomarkers and identified 462 loci. To begin to investigate how these loci affect kidney function, we generated single-cell chromatin accessibility (scATAC-seq) maps of the human kidney and identified candidate cis-regulatory elements (cCREs) for kidney podocytes, tubule epithelial cells, and kidney endothelial, stromal, and immune cells. Kidney tubule epithelial cCREs explained 58% of kidney function SNP-heritability and kidney podocyte cCREs explained an additional 6.5% of SNP-heritability. In contrast, little kidney function heritability was explained by kidney endothelial, stromal, or immune cell-specific cCREs. Through functionally informed fine-mapping, we identified putative causal kidney function variants and their corresponding cCREs. Using kidney scATAC-seq data, we created a deep learning model (which we named ChromKid) to predict kidney cell type-specific chromatin accessibility from sequence. ChromKid and allele specific kidney scATAC-seq revealed that many fine-mapped kidney function variants locally change chromatin accessibility in tubule epithelial cells. Enhancer assays confirmed that fine-mapped kidney function variants alter tubule epithelial regulatory element function. To map the genes which these regulatory elements control, we used CRISPR interference (CRISPRi) to target these regulatory elements in tubule epithelial cells and assessed changes in gene expression. CRISPRi of enhancers harboring kidney function variants regulated NDRG1 and RBPMS expression. Thus, inherited differences in tubule epithelial NDRG1 and RBPMS expression may predispose to kidney disease in humans. We conclude that genetic variants affecting tubule epithelial regulatory element function account for most SNP-heritability of human kidney function. This work provides an experimental approach to identify the variants, regulatory elements, and genes involved in polygenic disease.

6.
Nat Genet ; 56(10): 2078-2092, 2024 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-39256582

RESUMEN

Kidney failure, the decrease of kidney function below a threshold necessary to support life, is a major cause of morbidity and mortality. We performed a genome-wide association study (GWAS) of 406,504 individuals in the UK Biobank, identifying 430 loci affecting kidney function in middle-aged adults. To investigate the cell types affected by these loci, we integrated the GWAS with human kidney candidate cis-regulatory elements (cCREs) identified using single-cell assay for transposase-accessible chromatin sequencing (scATAC-seq). Overall, 56% of kidney function heritability localized to kidney tubule epithelial cCREs and an additional 7% to kidney podocyte cCREs. Thus, most heritable differences in adult kidney function are a result of altered gene expression in these two cell types. Using enhancer assays, allele-specific scATAC-seq and machine learning, we found that many kidney function variants alter tubule epithelial cCRE chromatin accessibility and function. Using CRISPRi, we determined which genes some of these cCREs regulate, implicating NDRG1, CCNB1 and STC1 in human kidney function.


Asunto(s)
Estudio de Asociación del Genoma Completo , Túbulos Renales , Secuencias Reguladoras de Ácidos Nucleicos , Humanos , Túbulos Renales/metabolismo , Secuencias Reguladoras de Ácidos Nucleicos/genética , Polimorfismo de Nucleótido Simple , Podocitos/metabolismo , Persona de Mediana Edad , Células Epiteliales/metabolismo , Cromatina/genética , Riñón/metabolismo , Masculino , Regulación de la Expresión Génica , Adulto , Femenino , Sitios de Carácter Cuantitativo
7.
bioRxiv ; 2023 Dec 23.
Artículo en Inglés | MEDLINE | ID: mdl-38187742

RESUMEN

Genomic sequence-to-activity models are increasingly utilized to understand gene regulatory syntax and probe the functional consequences of regulatory variation. Current models make accurate predictions of relative activity levels across the human reference genome, but their performance is more limited for predicting the effects of genetic variants, such as explaining gene expression variation across individuals. To better understand the causes of these shortcomings, we examine the uncertainty in predictions of genomic sequence-to-activity models using an ensemble of Basenji2 model replicates. We characterize prediction consistency on four types of sequences: reference genome sequences, reference genome sequences perturbed with TF motifs, eQTLs, and personal genome sequences. We observe that models tend to make high-confidence predictions on reference sequences, even when incorrect, and low-confidence predictions on sequences with variants. For eQTLs and personal genome sequences, we find that model replicates make inconsistent predictions in >50% of cases. Our findings suggest strategies to improve performance of these models.

8.
Nat Genet ; 55(12): 2056-2059, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-38036790

RESUMEN

Genomic deep learning models can predict genome-wide epigenetic features and gene expression levels directly from DNA sequence. While current models perform well at predicting gene expression levels across genes in different cell types from the reference genome, their ability to explain expression variation between individuals due to cis-regulatory genetic variants remains largely unexplored. Here, we evaluate four state-of-the-art models on paired personal genome and transcriptome data and find limited performance when explaining variation in expression across individuals. In addition, models often fail to predict the correct direction of effect of cis-regulatory genetic variation on expression.


Asunto(s)
Aprendizaje Profundo , Transcriptoma , Humanos , Transcriptoma/genética , Variación Genética/genética , Genoma , Genómica
9.
Nat Biotechnol ; 34(6): 637-45, 2016 06.
Artículo en Inglés | MEDLINE | ID: mdl-27136076

RESUMEN

Recent single-cell analysis technologies offer an unprecedented opportunity to elucidate developmental pathways. Here we present Wishbone, an algorithm for positioning single cells along bifurcating developmental trajectories with high resolution. Wishbone uses multi-dimensional single-cell data, such as mass cytometry or RNA-Seq data, as input and orders cells according to their developmental progression, and it pinpoints bifurcation points by labeling each cell as pre-bifurcation or as one of two post-bifurcation cell fates. Using 30-channel mass cytometry data, we show that Wishbone accurately recovers the known stages of T-cell development in the mouse thymus, including the bifurcation point. We also apply the algorithm to mouse myeloid differentiation and demonstrate its generalization to additional lineages. A comparison of Wishbone to diffusion maps, SCUBA and Monocle shows that it outperforms these methods both in the accuracy of ordering cells and in the correct identification of branch points.


Asunto(s)
Algoritmos , Diferenciación Celular/fisiología , Modelos Biológicos , Morfogénesis/fisiología , Linfocitos T/citología , Linfocitos T/fisiología , Animales , Simulación por Computador , Ratones , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA