Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
Cell ; 186(7): 1493-1511.e40, 2023 03 30.
Artículo en Inglés | MEDLINE | ID: mdl-37001506

RESUMEN

Understanding how genetic variants impact molecular phenotypes is a key goal of functional genomics, currently hindered by reliance on a single haploid reference genome. Here, we present the EN-TEx resource of 1,635 open-access datasets from four donors (∼30 tissues × âˆ¼15 assays). The datasets are mapped to matched, diploid genomes with long-read phasing and structural variants, instantiating a catalog of >1 million allele-specific loci. These loci exhibit coordinated activity along haplotypes and are less conserved than corresponding, non-allele-specific ones. Surprisingly, a deep-learning transformer model can predict the allele-specific activity based only on local nucleotide-sequence context, highlighting the importance of transcription-factor-binding motifs particularly sensitive to variants. Furthermore, combining EN-TEx with existing genome annotations reveals strong associations between allele-specific and GWAS loci. It also enables models for transferring known eQTLs to difficult-to-profile tissues (e.g., from skin to heart). Overall, EN-TEx provides rich data and generalizable models for more accurate personal functional genomics.


Asunto(s)
Epigenoma , Sitios de Carácter Cuantitativo , Estudio de Asociación del Genoma Completo , Genómica , Fenotipo , Polimorfismo de Nucleótido Simple
2.
Nature ; 632(8023): 122-130, 2024 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-39020179

RESUMEN

Genetic variation that influences gene expression and splicing is a key source of phenotypic diversity1-5. Although invaluable, studies investigating these links in humans have been strongly biased towards participants of European ancestries, which constrains generalizability and hinders evolutionary research. Here to address these limitations, we developed MAGE, an open-access RNA sequencing dataset of lymphoblastoid cell lines from 731 individuals from the 1000 Genomes Project6, spread across 5 continental groups and 26 populations. Most variation in gene expression (92%) and splicing (95%) was distributed within versus between populations, which mirrored the variation in DNA sequence. We mapped associations between genetic variants and expression and splicing of nearby genes (cis-expression quantitative trait loci (eQTLs) and cis-splicing QTLs (sQTLs), respectively). We identified more than 15,000 putatively causal eQTLs and more than 16,000 putatively causal sQTLs that are enriched for relevant epigenomic signatures. These include 1,310 eQTLs and 1,657 sQTLs that are largely private to underrepresented populations. Our data further indicate that the magnitude and direction of causal eQTL effects are highly consistent across populations. Moreover, the apparent 'population-specific' effects observed in previous studies were largely driven by low resolution or additional independent eQTLs of the same genes that were not detected. Together, our study expands our understanding of human gene expression diversity and provides an inclusive resource for studying the evolution and function of human genomes.


Asunto(s)
Regulación de la Expresión Génica , Variación Genética , Genoma Humano , Internacionalidad , Sitios de Carácter Cuantitativo , Empalme del ARN , Grupos Raciales , Femenino , Humanos , Masculino , Artefactos , Sesgo , Línea Celular , Estudios de Cohortes , Conjuntos de Datos como Asunto , Epigenómica , Evolución Molecular , Regulación de la Expresión Génica/genética , Genética de Población , Genoma Humano/genética , Linfocitos/citología , Linfocitos/metabolismo , Sitios de Carácter Cuantitativo/genética , Grupos Raciales/genética , Empalme del ARN/genética , Análisis de Secuencia de ARN
3.
Nature ; 583(7818): 720-728, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-32728244

RESUMEN

Transcription factors are DNA-binding proteins that have key roles in gene regulation1,2. Genome-wide occupancy maps of transcriptional regulators are important for understanding gene regulation and its effects on diverse biological processes3-6. However, only a minority of the more than 1,600 transcription factors encoded in the human genome has been assayed. Here we present, as part of the ENCODE (Encyclopedia of DNA Elements) project, data and analyses from chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) experiments using the human HepG2 cell line for 208 chromatin-associated proteins (CAPs). These comprise 171 transcription factors and 37 transcriptional cofactors and chromatin regulator proteins, and represent nearly one-quarter of CAPs expressed in HepG2 cells. The binding profiles of these CAPs form major groups associated predominantly with promoters or enhancers, or with both. We confirm and expand the current catalogue of DNA sequence motifs for transcription factors, and describe motifs that correspond to other transcription factors that are co-enriched with the primary ChIP target. For example, FOX family motifs are enriched in ChIP-seq peaks of 37 other CAPs. We show that motif content and occupancy patterns can distinguish between promoters and enhancers. This catalogue reveals high-occupancy target regions at which many CAPs associate, although each contains motifs for only a minority of the numerous associated transcription factors. These analyses provide a more complete overview of the gene regulatory networks that define this cell type, and demonstrate the usefulness of the large-scale production efforts of the ENCODE Consortium.


Asunto(s)
Secuenciación de Inmunoprecipitación de Cromatina , Cromatina/genética , Cromatina/metabolismo , Proteínas de Unión al ADN/metabolismo , Anotación de Secuencia Molecular , Secuencias Reguladoras de Ácidos Nucleicos/genética , Conjuntos de Datos como Asunto , Elementos de Facilitación Genéticos/genética , Células Hep G2 , Humanos , Motivos de Nucleótidos/genética , Regiones Promotoras Genéticas/genética , Unión Proteica , Factores de Transcripción/metabolismo
4.
Nature ; 583(7818): 699-710, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-32728249

RESUMEN

The human and mouse genomes contain instructions that specify RNAs and proteins and govern the timing, magnitude, and cellular context of their production. To better delineate these elements, phase III of the Encyclopedia of DNA Elements (ENCODE) Project has expanded analysis of the cell and tissue repertoires of RNA transcription, chromatin structure and modification, DNA methylation, chromatin looping, and occupancy by transcription factors and RNA-binding proteins. Here we summarize these efforts, which have produced 5,992 new experimental datasets, including systematic determinations across mouse fetal development. All data are available through the ENCODE data portal (https://www.encodeproject.org), including phase II ENCODE1 and Roadmap Epigenomics2 data. We have developed a registry of 926,535 human and 339,815 mouse candidate cis-regulatory elements, covering 7.9 and 3.4% of their respective genomes, by integrating selected datatypes associated with gene regulation, and constructed a web-based server (SCREEN; http://screen.encodeproject.org) to provide flexible, user-defined access to this resource. Collectively, the ENCODE data and registry provide an expansive resource for the scientific community to build a better understanding of the organization and function of the human and mouse genomes.


Asunto(s)
ADN/genética , Bases de Datos Genéticas , Genoma/genética , Genómica , Anotación de Secuencia Molecular , Sistema de Registros , Secuencias Reguladoras de Ácidos Nucleicos/genética , Animales , Cromatina/genética , Cromatina/metabolismo , ADN/química , Huella de ADN , Metilación de ADN/genética , Momento de Replicación del ADN , Desoxirribonucleasa I/metabolismo , Genoma Humano , Histonas/metabolismo , Humanos , Ratones , Ratones Transgénicos , Proteínas de Unión al ARN/genética , Transcripción Genética/genética , Transposasas/metabolismo
6.
J Am Soc Nephrol ; 29(5): 1525-1535, 2018 05.
Artículo en Inglés | MEDLINE | ID: mdl-29476007

RESUMEN

Background Interpreting genetic variants is one of the greatest challenges impeding analysis of rapidly increasing volumes of genomic data from patients. For example, SHROOM3 is an associated risk gene for CKD, yet causative mechanism(s) of SHROOM3 allele(s) are unknown.Methods We used our analytic pipeline that integrates genetic, computational, biochemical, CRISPR/Cas9 editing, molecular, and physiologic data to characterize coding and noncoding variants to study the human SHROOM3 risk locus for CKD.Results We identified a novel SHROOM3 transcriptional start site, which results in a shorter isoform lacking the PDZ domain and is regulated by a common noncoding sequence variant associated with CKD (rs17319721, allele frequency: 0.35). This variant disrupted allele binding to the transcription factor TCF7L2 in podocyte cell nuclear extracts and altered transcription levels of SHROOM3 in cultured cells, potentially through the loss of repressive looping between rs17319721 and the novel start site. Although common variant mechanisms are of high utility, sequencing is beginning to identify rare variants involved in disease; therefore, we used our biophysical tools to analyze an average of 112,849 individual human genome sequences for rare SHROOM3 missense variants, revealing 35 high-effect variants. The high-effect alleles include a coding variant (P1244L) previously associated with CKD (P=0.01, odds ratio=7.95; 95% CI, 1.53 to 41.46) that we find to be present in East Asian individuals at an allele frequency of 0.0027. We determined that P1244L attenuates the interaction of SHROOM3 with 14-3-3, suggesting alterations to the Hippo pathway, a known mediator of CKD.Conclusions These data demonstrate multiple new SHROOM3-dependent genetic/molecular mechanisms that likely affect CKD.


Asunto(s)
Proteínas de Microfilamentos/genética , Insuficiencia Renal Crónica/genética , Alelos , Animales , Núcleo Celular , Frecuencia de los Genes , Sitios Genéticos , Células HEK293 , Humanos , Ratones , Mutación Missense , Podocitos , Isoformas de Proteínas/genética , Proteína 2 Similar al Factor de Transcripción 7/genética , Transcripción Genética , Pez Cebra
7.
Nat Commun ; 15(1): 6985, 2024 Aug 14.
Artículo en Inglés | MEDLINE | ID: mdl-39143063

RESUMEN

Genome-wide association studies (GWAS) have found widespread evidence of pleiotropy, but characterization of global patterns of pleiotropy remain highly incomplete due to insufficient power of current approaches. We develop fastASSET, a method that allows efficient detection of variant-level pleiotropic association across many traits. We analyze GWAS summary statistics of 116 complex traits of diverse types collected from the GRASP repository and large GWAS Consortia. We identify 2293 independent loci and find that the lead variants in nearly all these loci (~99%) to be associated with ≥ 2 traits (median = 6). We observe that degree of pleiotropy estimated from our study predicts that observed in the UK Biobank for a much larger number of traits (K = 4114) (correlation = 0.43, p-value < 2.2 × 10 - 16 ). Follow-up analyzes of 21 trait-specific variants indicate their link to the expression in trait-related tissues for a small number of genes involved in relevant biological processes. Our findings provide deeper insight into the nature of pleiotropy and leads to identification of highly trait-specific susceptibility variants.


Asunto(s)
Pleiotropía Genética , Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Estudio de Asociación del Genoma Completo/métodos , Humanos , Fenotipo , Herencia Multifactorial/genética , Variación Genética
8.
Nat Commun ; 15(1): 4417, 2024 May 24.
Artículo en Inglés | MEDLINE | ID: mdl-38789417

RESUMEN

Genome-wide association studies (GWAS) have become well-powered to detect loci associated with telomere length. However, no prior work has validated genes nominated by GWAS to examine their role in telomere length regulation. We conducted a multi-ancestry meta-analysis of 211,369 individuals and identified five novel association signals. Enrichment analyses of chromatin state and cell-type heritability suggested that blood/immune cells are the most relevant cell type to examine telomere length association signals. We validated specific GWAS associations by overexpressing KBTBD6 or POP5 and demonstrated that both lengthened telomeres. CRISPR/Cas9 deletion of the predicted causal regions in K562 blood cells reduced expression of these genes, demonstrating that these loci are related to transcriptional regulation of KBTBD6 and POP5. Our results demonstrate the utility of telomere length GWAS in the identification of telomere length regulation mechanisms and validate KBTBD6 and POP5 as genes affecting telomere length regulation.


Asunto(s)
Estudio de Asociación del Genoma Completo , Homeostasis del Telómero , Telómero , Humanos , Telómero/genética , Telómero/metabolismo , Células K562 , Homeostasis del Telómero/genética , Polimorfismo de Nucleótido Simple , Regulación de la Expresión Génica , Sistemas CRISPR-Cas
9.
bioRxiv ; 2023 Nov 08.
Artículo en Inglés | MEDLINE | ID: mdl-37965206

RESUMEN

Genetic variation influencing gene expression and splicing is a key source of phenotypic diversity. Though invaluable, studies investigating these links in humans have been strongly biased toward participants of European ancestries, diminishing generalizability and hindering evolutionary research. To address these limitations, we developed MAGE, an open-access RNA-seq data set of lymphoblastoid cell lines from 731 individuals from the 1000 Genomes Project spread across 5 continental groups and 26 populations. Most variation in gene expression (92%) and splicing (95%) was distributed within versus between populations, mirroring variation in DNA sequence. We mapped associations between genetic variants and expression and splicing of nearby genes (cis-eQTLs and cis-sQTLs, respective), identifying >15,000 putatively causal eQTLs and >16,000 putatively causal sQTLs that are enriched for relevant epigenomic signatures. These include 1310 eQTLs and 1657 sQTLs that are largely private to previously underrepresented populations. Our data further indicate that the magnitude and direction of causal eQTL effects are highly consistent across populations and that apparent "population-specific" effects observed in previous studies were largely driven by low resolution or additional independent eQTLs of the same genes that were not detected. Together, our study expands understanding of gene expression diversity across human populations and provides an inclusive resource for studying the evolution and function of human genomes.

10.
Front Cell Dev Biol ; 10: 1033695, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36467401

RESUMEN

The small GTPase family is well-studied in cancer and cellular physiology. With 162 annotated human genes, the family has a broad expression throughout cells of the body. Members of the family have multiple exons that require splicing. Yet, the role of splicing within the family has been underexplored. We have studied the splicing dynamics of small GTPases throughout 41,671 samples by integrating Nanopore and Illumina sequencing techniques. Within this work, we have made several discoveries. 1). Using the GTEx long read data of 92 samples, each small GTPase gene averages two transcripts, with 83 genes (51%) expressing two or more isoforms. 2). Cross-tissue analysis of GTEx from 17,382 samples shows 41 genes (25%) expressing two or more protein-coding isoforms. These include protein-changing transcripts in genes such as RHOA, RAB37, RAB40C, RAB4B, RAB5C, RHOC, RAB1A, RAN, RHEB, RAC1, and KRAS. 3). The isolation and library technique of the RNAseq influences the abundance of non-sense-mediated decay and retained intron transcripts of small GTPases, which are observed more often in genes than appreciated. 4). Analysis of 16,243 samples of "Blood PAXgene" identified seven genes (3.7%; RHOA, RAB40C, RAB4B, RAB37, RAB5B, RAB5C, RHOC) with two or more transcripts expressed as the major isoform (75% of the total gene), suggesting a role of genetics in altering splicing. 5). Rare (ARL6, RAB23, ARL13B, HRAS, NRAS) and common variants (GEM, RHOC, MRAS, RAB5B, RERG, ARL16) can influence splicing and have an impact on phenotypes and diseases. 6). Multiple genes (RAB9A, RAP2C, ARL4A, RAB3A, RAB26, RAB3C, RASL10A, RAB40B, and HRAS) have sex differences in transcript expression. 7). Several exons are included or excluded for small GTPase genes (RASEF, KRAS, RAC1, RHEB, ARL4A, RHOA, RAB30, RHOBTB1, ARL16, RAP1A) in one or more forms of cancer. 8). Ten transcripts are altered in hypoxia (SAR1B, IFT27, ARL14, RAB11A, RAB10, RAB38, RAN, RIT1, RAB9A) with RHOA identified to have a transient 3'UTR RNA base editing at a conserved site found in all of its transcripts. Overall, we show a remarkable and dynamic role of splicing within the small GTPase family that requires future explorations.

11.
Genome Biol ; 21(1): 235, 2020 09 11.
Artículo en Inglés | MEDLINE | ID: mdl-32912314

RESUMEN

Genetic regulation of gene expression, revealed by expression quantitative trait loci (eQTLs), exhibits complex patterns of tissue-specific effects. Characterization of these patterns may allow us to better understand mechanisms of gene regulation and disease etiology. We develop a constrained matrix factorization model, sn-spMF, to learn patterns of tissue-sharing and apply it to 49 human tissues from the Genotype-Tissue Expression (GTEx) project. The learned factors reflect tissues with known biological similarity and identify transcription factors that may mediate tissue-specific effects. sn-spMF, available at https://github.com/heyuan7676/ts_eQTLs , can be applied to learn biologically interpretable patterns of eQTL tissue-specificity and generate testable mechanistic hypotheses.


Asunto(s)
Regulación de la Expresión Génica , Modelos Genéticos , Sitios de Carácter Cuantitativo , Factores de Transcripción/metabolismo , Humanos
12.
Biol Sex Differ ; 11(1): 28, 2020 05 12.
Artículo en Inglés | MEDLINE | ID: mdl-32398044

RESUMEN

BACKGROUND: The commonly used laboratory rat, Rattus norvegicus, is unique in having multiple Sry gene copies found on the Y chromosome, with different copies encoding amino acid variations that influence the resulting protein function. It is not clear which Sry genes are expressed at the onset of testis differentiation or how their expression correlates with that of other genes in testis-determination pathways. METHODS: Here, two independent E11-E14 developmental RNAseq datasets show that multiple Sry genes are expressed at E12-E13. RESULTS: The identified copies expressed during testis initiation include Sry4A, Sry1, and Sry3C, which are conserved in every strain of Rattus norvegicus with genomes sequenced to date. CONCLUSIONS: This work represents a first step in defining the complex environment of rat testis differentiation that can open the door for generating sex reversal model systems using embryo manipulation techniques that have been available in the mouse but not the rat.


Asunto(s)
Genes sry , Testículo/crecimiento & desarrollo , Animales , Regulación del Desarrollo de la Expresión Génica , Masculino , Ratas Sprague-Dawley , Transcripción Genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA