Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
Cell ; 186(7): 1493-1511.e40, 2023 03 30.
Artigo em Inglês | MEDLINE | ID: mdl-37001506

RESUMO

Understanding how genetic variants impact molecular phenotypes is a key goal of functional genomics, currently hindered by reliance on a single haploid reference genome. Here, we present the EN-TEx resource of 1,635 open-access datasets from four donors (∼30 tissues × âˆ¼15 assays). The datasets are mapped to matched, diploid genomes with long-read phasing and structural variants, instantiating a catalog of >1 million allele-specific loci. These loci exhibit coordinated activity along haplotypes and are less conserved than corresponding, non-allele-specific ones. Surprisingly, a deep-learning transformer model can predict the allele-specific activity based only on local nucleotide-sequence context, highlighting the importance of transcription-factor-binding motifs particularly sensitive to variants. Furthermore, combining EN-TEx with existing genome annotations reveals strong associations between allele-specific and GWAS loci. It also enables models for transferring known eQTLs to difficult-to-profile tissues (e.g., from skin to heart). Overall, EN-TEx provides rich data and generalizable models for more accurate personal functional genomics.


Assuntos
Epigenoma , Locos de Características Quantitativas , Estudo de Associação Genômica Ampla , Genômica , Fenótipo , Polimorfismo de Nucleotídeo Único
2.
Nature ; 632(8023): 122-130, 2024 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-39020179

RESUMO

Genetic variation that influences gene expression and splicing is a key source of phenotypic diversity1-5. Although invaluable, studies investigating these links in humans have been strongly biased towards participants of European ancestries, which constrains generalizability and hinders evolutionary research. Here to address these limitations, we developed MAGE, an open-access RNA sequencing dataset of lymphoblastoid cell lines from 731 individuals from the 1000 Genomes Project6, spread across 5 continental groups and 26 populations. Most variation in gene expression (92%) and splicing (95%) was distributed within versus between populations, which mirrored the variation in DNA sequence. We mapped associations between genetic variants and expression and splicing of nearby genes (cis-expression quantitative trait loci (eQTLs) and cis-splicing QTLs (sQTLs), respectively). We identified more than 15,000 putatively causal eQTLs and more than 16,000 putatively causal sQTLs that are enriched for relevant epigenomic signatures. These include 1,310 eQTLs and 1,657 sQTLs that are largely private to underrepresented populations. Our data further indicate that the magnitude and direction of causal eQTL effects are highly consistent across populations. Moreover, the apparent 'population-specific' effects observed in previous studies were largely driven by low resolution or additional independent eQTLs of the same genes that were not detected. Together, our study expands our understanding of human gene expression diversity and provides an inclusive resource for studying the evolution and function of human genomes.


Assuntos
Regulação da Expressão Gênica , Variação Genética , Genoma Humano , Internacionalidade , Locos de Características Quantitativas , Splicing de RNA , Grupos Raciais , Feminino , Humanos , Masculino , Artefatos , Viés , Linhagem Celular , Estudos de Coortes , Conjuntos de Dados como Assunto , Epigenômica , Evolução Molecular , Regulação da Expressão Gênica/genética , Genética Populacional , Genoma Humano/genética , Linfócitos/citologia , Linfócitos/metabolismo , Locos de Características Quantitativas/genética , Grupos Raciais/genética , Splicing de RNA/genética , Análise de Sequência de RNA
3.
Nature ; 583(7818): 720-728, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32728244

RESUMO

Transcription factors are DNA-binding proteins that have key roles in gene regulation1,2. Genome-wide occupancy maps of transcriptional regulators are important for understanding gene regulation and its effects on diverse biological processes3-6. However, only a minority of the more than 1,600 transcription factors encoded in the human genome has been assayed. Here we present, as part of the ENCODE (Encyclopedia of DNA Elements) project, data and analyses from chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) experiments using the human HepG2 cell line for 208 chromatin-associated proteins (CAPs). These comprise 171 transcription factors and 37 transcriptional cofactors and chromatin regulator proteins, and represent nearly one-quarter of CAPs expressed in HepG2 cells. The binding profiles of these CAPs form major groups associated predominantly with promoters or enhancers, or with both. We confirm and expand the current catalogue of DNA sequence motifs for transcription factors, and describe motifs that correspond to other transcription factors that are co-enriched with the primary ChIP target. For example, FOX family motifs are enriched in ChIP-seq peaks of 37 other CAPs. We show that motif content and occupancy patterns can distinguish between promoters and enhancers. This catalogue reveals high-occupancy target regions at which many CAPs associate, although each contains motifs for only a minority of the numerous associated transcription factors. These analyses provide a more complete overview of the gene regulatory networks that define this cell type, and demonstrate the usefulness of the large-scale production efforts of the ENCODE Consortium.


Assuntos
Sequenciamento de Cromatina por Imunoprecipitação , Cromatina/genética , Cromatina/metabolismo , Proteínas de Ligação a DNA/metabolismo , Anotação de Sequência Molecular , Sequências Reguladoras de Ácido Nucleico/genética , Conjuntos de Dados como Assunto , Elementos Facilitadores Genéticos/genética , Células Hep G2 , Humanos , Motivos de Nucleotídeos/genética , Regiões Promotoras Genéticas/genética , Ligação Proteica , Fatores de Transcrição/metabolismo
4.
Nature ; 583(7818): 699-710, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32728249

RESUMO

The human and mouse genomes contain instructions that specify RNAs and proteins and govern the timing, magnitude, and cellular context of their production. To better delineate these elements, phase III of the Encyclopedia of DNA Elements (ENCODE) Project has expanded analysis of the cell and tissue repertoires of RNA transcription, chromatin structure and modification, DNA methylation, chromatin looping, and occupancy by transcription factors and RNA-binding proteins. Here we summarize these efforts, which have produced 5,992 new experimental datasets, including systematic determinations across mouse fetal development. All data are available through the ENCODE data portal (https://www.encodeproject.org), including phase II ENCODE1 and Roadmap Epigenomics2 data. We have developed a registry of 926,535 human and 339,815 mouse candidate cis-regulatory elements, covering 7.9 and 3.4% of their respective genomes, by integrating selected datatypes associated with gene regulation, and constructed a web-based server (SCREEN; http://screen.encodeproject.org) to provide flexible, user-defined access to this resource. Collectively, the ENCODE data and registry provide an expansive resource for the scientific community to build a better understanding of the organization and function of the human and mouse genomes.


Assuntos
DNA/genética , Bases de Dados Genéticas , Genoma/genética , Genômica , Anotação de Sequência Molecular , Sistema de Registros , Sequências Reguladoras de Ácido Nucleico/genética , Animais , Cromatina/genética , Cromatina/metabolismo , DNA/química , Pegada de DNA , Metilação de DNA/genética , Período de Replicação do DNA , Desoxirribonuclease I/metabolismo , Genoma Humano , Histonas/metabolismo , Humanos , Camundongos , Camundongos Transgênicos , Proteínas de Ligação a RNA/genética , Transcrição Gênica/genética , Transposases/metabolismo
6.
J Am Soc Nephrol ; 29(5): 1525-1535, 2018 05.
Artigo em Inglês | MEDLINE | ID: mdl-29476007

RESUMO

Background Interpreting genetic variants is one of the greatest challenges impeding analysis of rapidly increasing volumes of genomic data from patients. For example, SHROOM3 is an associated risk gene for CKD, yet causative mechanism(s) of SHROOM3 allele(s) are unknown.Methods We used our analytic pipeline that integrates genetic, computational, biochemical, CRISPR/Cas9 editing, molecular, and physiologic data to characterize coding and noncoding variants to study the human SHROOM3 risk locus for CKD.Results We identified a novel SHROOM3 transcriptional start site, which results in a shorter isoform lacking the PDZ domain and is regulated by a common noncoding sequence variant associated with CKD (rs17319721, allele frequency: 0.35). This variant disrupted allele binding to the transcription factor TCF7L2 in podocyte cell nuclear extracts and altered transcription levels of SHROOM3 in cultured cells, potentially through the loss of repressive looping between rs17319721 and the novel start site. Although common variant mechanisms are of high utility, sequencing is beginning to identify rare variants involved in disease; therefore, we used our biophysical tools to analyze an average of 112,849 individual human genome sequences for rare SHROOM3 missense variants, revealing 35 high-effect variants. The high-effect alleles include a coding variant (P1244L) previously associated with CKD (P=0.01, odds ratio=7.95; 95% CI, 1.53 to 41.46) that we find to be present in East Asian individuals at an allele frequency of 0.0027. We determined that P1244L attenuates the interaction of SHROOM3 with 14-3-3, suggesting alterations to the Hippo pathway, a known mediator of CKD.Conclusions These data demonstrate multiple new SHROOM3-dependent genetic/molecular mechanisms that likely affect CKD.


Assuntos
Proteínas dos Microfilamentos/genética , Insuficiência Renal Crônica/genética , Alelos , Animais , Núcleo Celular , Frequência do Gene , Loci Gênicos , Células HEK293 , Humanos , Camundongos , Mutação de Sentido Incorreto , Podócitos , Isoformas de Proteínas/genética , Proteína 2 Semelhante ao Fator 7 de Transcrição/genética , Transcrição Gênica , Peixe-Zebra
7.
Nat Commun ; 15(1): 6985, 2024 Aug 14.
Artigo em Inglês | MEDLINE | ID: mdl-39143063

RESUMO

Genome-wide association studies (GWAS) have found widespread evidence of pleiotropy, but characterization of global patterns of pleiotropy remain highly incomplete due to insufficient power of current approaches. We develop fastASSET, a method that allows efficient detection of variant-level pleiotropic association across many traits. We analyze GWAS summary statistics of 116 complex traits of diverse types collected from the GRASP repository and large GWAS Consortia. We identify 2293 independent loci and find that the lead variants in nearly all these loci (~99%) to be associated with ≥ 2 traits (median = 6). We observe that degree of pleiotropy estimated from our study predicts that observed in the UK Biobank for a much larger number of traits (K = 4114) (correlation = 0.43, p-value < 2.2 × 10 - 16 ). Follow-up analyzes of 21 trait-specific variants indicate their link to the expression in trait-related tissues for a small number of genes involved in relevant biological processes. Our findings provide deeper insight into the nature of pleiotropy and leads to identification of highly trait-specific susceptibility variants.


Assuntos
Pleiotropia Genética , Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Estudo de Associação Genômica Ampla/métodos , Humanos , Fenótipo , Herança Multifatorial/genética , Variação Genética
8.
Nat Commun ; 15(1): 4417, 2024 May 24.
Artigo em Inglês | MEDLINE | ID: mdl-38789417

RESUMO

Genome-wide association studies (GWAS) have become well-powered to detect loci associated with telomere length. However, no prior work has validated genes nominated by GWAS to examine their role in telomere length regulation. We conducted a multi-ancestry meta-analysis of 211,369 individuals and identified five novel association signals. Enrichment analyses of chromatin state and cell-type heritability suggested that blood/immune cells are the most relevant cell type to examine telomere length association signals. We validated specific GWAS associations by overexpressing KBTBD6 or POP5 and demonstrated that both lengthened telomeres. CRISPR/Cas9 deletion of the predicted causal regions in K562 blood cells reduced expression of these genes, demonstrating that these loci are related to transcriptional regulation of KBTBD6 and POP5. Our results demonstrate the utility of telomere length GWAS in the identification of telomere length regulation mechanisms and validate KBTBD6 and POP5 as genes affecting telomere length regulation.


Assuntos
Estudo de Associação Genômica Ampla , Homeostase do Telômero , Telômero , Humanos , Telômero/genética , Telômero/metabolismo , Células K562 , Homeostase do Telômero/genética , Polimorfismo de Nucleotídeo Único , Regulação da Expressão Gênica , Sistemas CRISPR-Cas
9.
bioRxiv ; 2023 Nov 08.
Artigo em Inglês | MEDLINE | ID: mdl-37965206

RESUMO

Genetic variation influencing gene expression and splicing is a key source of phenotypic diversity. Though invaluable, studies investigating these links in humans have been strongly biased toward participants of European ancestries, diminishing generalizability and hindering evolutionary research. To address these limitations, we developed MAGE, an open-access RNA-seq data set of lymphoblastoid cell lines from 731 individuals from the 1000 Genomes Project spread across 5 continental groups and 26 populations. Most variation in gene expression (92%) and splicing (95%) was distributed within versus between populations, mirroring variation in DNA sequence. We mapped associations between genetic variants and expression and splicing of nearby genes (cis-eQTLs and cis-sQTLs, respective), identifying >15,000 putatively causal eQTLs and >16,000 putatively causal sQTLs that are enriched for relevant epigenomic signatures. These include 1310 eQTLs and 1657 sQTLs that are largely private to previously underrepresented populations. Our data further indicate that the magnitude and direction of causal eQTL effects are highly consistent across populations and that apparent "population-specific" effects observed in previous studies were largely driven by low resolution or additional independent eQTLs of the same genes that were not detected. Together, our study expands understanding of gene expression diversity across human populations and provides an inclusive resource for studying the evolution and function of human genomes.

10.
Front Cell Dev Biol ; 10: 1033695, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36467401

RESUMO

The small GTPase family is well-studied in cancer and cellular physiology. With 162 annotated human genes, the family has a broad expression throughout cells of the body. Members of the family have multiple exons that require splicing. Yet, the role of splicing within the family has been underexplored. We have studied the splicing dynamics of small GTPases throughout 41,671 samples by integrating Nanopore and Illumina sequencing techniques. Within this work, we have made several discoveries. 1). Using the GTEx long read data of 92 samples, each small GTPase gene averages two transcripts, with 83 genes (51%) expressing two or more isoforms. 2). Cross-tissue analysis of GTEx from 17,382 samples shows 41 genes (25%) expressing two or more protein-coding isoforms. These include protein-changing transcripts in genes such as RHOA, RAB37, RAB40C, RAB4B, RAB5C, RHOC, RAB1A, RAN, RHEB, RAC1, and KRAS. 3). The isolation and library technique of the RNAseq influences the abundance of non-sense-mediated decay and retained intron transcripts of small GTPases, which are observed more often in genes than appreciated. 4). Analysis of 16,243 samples of "Blood PAXgene" identified seven genes (3.7%; RHOA, RAB40C, RAB4B, RAB37, RAB5B, RAB5C, RHOC) with two or more transcripts expressed as the major isoform (75% of the total gene), suggesting a role of genetics in altering splicing. 5). Rare (ARL6, RAB23, ARL13B, HRAS, NRAS) and common variants (GEM, RHOC, MRAS, RAB5B, RERG, ARL16) can influence splicing and have an impact on phenotypes and diseases. 6). Multiple genes (RAB9A, RAP2C, ARL4A, RAB3A, RAB26, RAB3C, RASL10A, RAB40B, and HRAS) have sex differences in transcript expression. 7). Several exons are included or excluded for small GTPase genes (RASEF, KRAS, RAC1, RHEB, ARL4A, RHOA, RAB30, RHOBTB1, ARL16, RAP1A) in one or more forms of cancer. 8). Ten transcripts are altered in hypoxia (SAR1B, IFT27, ARL14, RAB11A, RAB10, RAB38, RAN, RIT1, RAB9A) with RHOA identified to have a transient 3'UTR RNA base editing at a conserved site found in all of its transcripts. Overall, we show a remarkable and dynamic role of splicing within the small GTPase family that requires future explorations.

11.
Genome Biol ; 21(1): 235, 2020 09 11.
Artigo em Inglês | MEDLINE | ID: mdl-32912314

RESUMO

Genetic regulation of gene expression, revealed by expression quantitative trait loci (eQTLs), exhibits complex patterns of tissue-specific effects. Characterization of these patterns may allow us to better understand mechanisms of gene regulation and disease etiology. We develop a constrained matrix factorization model, sn-spMF, to learn patterns of tissue-sharing and apply it to 49 human tissues from the Genotype-Tissue Expression (GTEx) project. The learned factors reflect tissues with known biological similarity and identify transcription factors that may mediate tissue-specific effects. sn-spMF, available at https://github.com/heyuan7676/ts_eQTLs , can be applied to learn biologically interpretable patterns of eQTL tissue-specificity and generate testable mechanistic hypotheses.


Assuntos
Regulação da Expressão Gênica , Modelos Genéticos , Locos de Características Quantitativas , Fatores de Transcrição/metabolismo , Humanos
12.
Biol Sex Differ ; 11(1): 28, 2020 05 12.
Artigo em Inglês | MEDLINE | ID: mdl-32398044

RESUMO

BACKGROUND: The commonly used laboratory rat, Rattus norvegicus, is unique in having multiple Sry gene copies found on the Y chromosome, with different copies encoding amino acid variations that influence the resulting protein function. It is not clear which Sry genes are expressed at the onset of testis differentiation or how their expression correlates with that of other genes in testis-determination pathways. METHODS: Here, two independent E11-E14 developmental RNAseq datasets show that multiple Sry genes are expressed at E12-E13. RESULTS: The identified copies expressed during testis initiation include Sry4A, Sry1, and Sry3C, which are conserved in every strain of Rattus norvegicus with genomes sequenced to date. CONCLUSIONS: This work represents a first step in defining the complex environment of rat testis differentiation that can open the door for generating sex reversal model systems using embryo manipulation techniques that have been available in the mouse but not the rat.


Assuntos
Genes sry , Testículo/crescimento & desenvolvimento , Animais , Regulação da Expressão Gênica no Desenvolvimento , Masculino , Ratos Sprague-Dawley , Transcrição Gênica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA