RESUMEN
Studying the genetic regulation of protein expression (through protein quantitative trait loci (pQTLs)) offers a deeper understanding of regulatory variants uncharacterized by mRNA expression regulation (expression QTLs (eQTLs)) studies. Here we report cis-eQTL and cis-pQTL statistical fine-mapping from 1,405 genotyped samples with blood mRNA and 2,932 plasma samples of protein expression, as part of the Japan COVID-19 Task Force (JCTF). Fine-mapped eQTLs (n = 3,464) were enriched for 932 variants validated with a massively parallel reporter assay. Fine-mapped pQTLs (n = 582) were enriched for missense variations on structured and extracellular domains, although the possibility of epitope-binding artifacts remains. Trans-eQTL and trans-pQTL analysis highlighted associations of class I HLA allele variation with KIR genes. We contrast the multi-tissue origin of plasma protein with blood mRNA, contributing to the limited colocalization level, distinct regulatory mechanisms and trait relevance of eQTLs and pQTLs. We report a negative correlation between ABO mRNA and protein expression because of linkage disequilibrium between distinct nearby eQTLs and pQTLs.
RESUMEN
BACKGROUND: Increasing evidence suggests that a substantial proportion of disease-associated mutations occur in enhancers, regions of non-coding DNA essential to gene regulation. Understanding the structures and mechanisms of the regulatory programs this variation affects can shed light on the apparatuses of human diseases. RESULTS: We collect epigenetic and gene expression datasets from seven early time points during neural differentiation. Focusing on this model system, we construct networks of enhancer-promoter interactions, each at an individual stage of neural induction. These networks serve as the base for a rich series of analyses, through which we demonstrate their temporal dynamics and enrichment for various disease-associated variants. We apply the Girvan-Newman clustering algorithm to these networks to reveal biologically relevant substructures of regulation. Additionally, we demonstrate methods to validate predicted enhancer-promoter interactions using transcription factor overexpression and massively parallel reporter assays. CONCLUSIONS: Our findings suggest a generalizable framework for exploring gene regulatory programs and their dynamics across developmental processes; this includes a comprehensive approach to studying the effects of disease-associated variation on transcriptional networks. The techniques applied to our networks have been published alongside our findings as a computational tool, E-P-INAnalyzer. Our procedure can be utilized across different cellular contexts and disorders.
Asunto(s)
Elementos de Facilitación Genéticos , Redes Reguladoras de Genes , Regiones Promotoras Genéticas , Humanos , Neurogénesis/genética , Diferenciación Celular , Factores de Transcripción/metabolismo , Factores de Transcripción/genética , Modelos Genéticos , Neuronas/metabolismoRESUMEN
Background: Increasing evidence suggests that a substantial proportion of disease-associated mutations occur in enhancers, regions of non-coding DNA essential to gene regulation. Understanding the structures and mechanisms of regulatory programs this variation affects can shed light on the apparatuses of human diseases. Results: We collected epigenetic and gene expression datasets from seven early time points during neural differentiation. Focusing on this model system, we constructed networks of enhancer-promoter interactions, each at an individual stage of neural induction. These networks served as the base for a rich series of analyses, through which we demonstrated their temporal dynamics and enrichment for various disease-associated variants. We applied the Girvan-Newman clustering algorithm to these networks to reveal biologically relevant substructures of regulation. Additionally, we demonstrated methods to validate predicted enhancer-promoter interactions using transcription factor overexpression and massively parallel reporter assays. Conclusions: Our findings suggest a generalizable framework for exploring gene regulatory programs and their dynamics across developmental processes. This includes a comprehensive approach to studying the effects of disease-associated variation on transcriptional networks. The techniques applied to our networks have been published alongside our findings as a computational tool, E-P-INAnalyzer. Our procedure can be utilized across different cellular contexts and disorders.
RESUMEN
Nucleotide changes in gene regulatory elements are important determinants of neuronal development and diseases. Using massively parallel reporter assays in primary human cells from mid-gestation cortex and cerebral organoids, we interrogated the cis-regulatory activity of 102,767 open chromatin regions, including thousands of sequences with cell type-specific accessibility and variants associated with brain gene regulation. In primary cells, we identified 46,802 active enhancer sequences and 164 variants that alter enhancer activity. Activity was comparable in organoids and primary cells, suggesting that organoids provide an adequate model for the developing cortex. Using deep learning we decoded the sequence basis and upstream regulators of enhancer activity. This work establishes a comprehensive catalog of functional gene regulatory elements and variants in human neuronal development.
Asunto(s)
Corteza Cerebral , Neurogénesis , Organoides , Humanos , Corteza Cerebral/embriología , Corteza Cerebral/metabolismo , Cromatina/metabolismo , Cromatina/genética , Aprendizaje Profundo , Elementos de Facilitación Genéticos , Regulación del Desarrollo de la Expresión Génica , Neurogénesis/genética , Neuronas/metabolismo , Organoides/metabolismo , Secuencias Reguladoras de Ácidos Nucleicos , Regiones Promotoras Genéticas , Elementos Reguladores de la TranscripciónRESUMEN
Regulation of gene expression through enhancers is one of the major processes shaping the structure and function of the human brain during development. High-throughput assays have predicted thousands of enhancers involved in neurodevelopment, and confirming their activity through orthogonal functional assays is crucial. Here, we utilized Massively Parallel Reporter Assays (MPRAs) in stem cells and forebrain organoids to evaluate the activity of ~ 7000 gene-linked enhancers previously identified in human fetal tissues and brain organoids. We used a Gaussian mixture model to evaluate the contribution of background noise in the measured activity signal to confirm the activity of ~ 35% of the tested enhancers, with most showing temporal-specific activity, suggesting their evolving role in neurodevelopment. The temporal specificity was further supported by the correlation of activity with gene expression. Our findings provide a valuable gene regulatory resource to the scientific community.
Asunto(s)
Regulación de la Expresión Génica , Secuencias Reguladoras de Ácidos Nucleicos , Humanos , Organoides , Prosencéfalo , Elementos de Facilitación GenéticosRESUMEN
Skin color is highly variable in Africans, yet little is known about the underlying molecular mechanism. Here we applied massively parallel reporter assays to screen 1,157 candidate variants influencing skin pigmentation in Africans and identified 165 single-nucleotide polymorphisms showing differential regulatory activities between alleles. We combine Hi-C, genome editing and melanin assays to identify regulatory elements for MFSD12, HMG20B, OCA2, MITF, LEF1, TRPS1, BLOC1S6 and CYB561A3 that impact melanin levels in vitro and modulate human skin color. We found that independent mutations in an OCA2 enhancer contribute to the evolution of human skin color diversity and detect signals of local adaptation at enhancers of MITF, LEF1 and TRPS1, which may contribute to the light skin color of Khoesan-speaking populations from Southern Africa. Additionally, we identified CYB561A3 as a novel pigmentation regulator that impacts genes involved in oxidative phosphorylation and melanogenesis. These results provide insights into the mechanisms underlying human skin color diversity and adaptive evolution.
Asunto(s)
Albinismo Oculocutáneo , Melaninas , Pigmentación de la Piel , Humanos , Pigmentación de la Piel/genética , Melaninas/genética , Alelos , Genómica , Pigmentación/genética , Polimorfismo de Nucleótido Simple/genética , Proteínas Represoras/genéticaRESUMEN
The advent of perturbation-based massively parallel reporter assays (MPRAs) technique has facilitated the delineation of the roles of non-coding regulatory elements in orchestrating gene expression. However, computational efforts remain scant to evaluate and establish guidelines for sequence design strategies for perturbation MPRAs. In this study, we propose a framework for evaluating and comparing various perturbation strategies for MPRA experiments. Within this framework, we benchmark three different perturbation approaches from the perspectives of alteration in motif-based profiles, consistency of MPRA outputs, and robustness of models that predict the activities of putative regulatory motifs. While our analyses show very similar results across multiple benchmarking metrics, the predictive modeling for the approach involving random nucleotide shuffling shows significant robustness compared with the other two approaches. Thus, we recommend designing sequences by randomly shuffling the nucleotides of the perturbed site in perturbation-MPRA, followed by a coherence check to prevent the introduction of other variations of the target motifs. In summary, our evaluation framework and the benchmarking findings create a resource of computational pipelines and highlight the potential of perturbation-MPRA in predicting non-coding regulatory activities.
Asunto(s)
Técnicas Genéticas , Secuencias Reguladoras de Ácidos Nucleicos , NucleótidosRESUMEN
The advent of the perturbation-based massively parallel reporter assays (MPRAs) technique has enabled delineating of the roles of non-coding regulatory elements in orchestrating gene expression. However, computational efforts remain scant to evaluate and establish guidelines for sequence design strategies for perturbation MPRAs. Here, we propose a framework for evaluating and comparing various perturbation strategies for MPRA experiments. Under this framework, we benchmark three different perturbation approaches from the perspectives of alteration in motif-based profiles, consistency of MPRA outputs, and robustness of models that predict the activities of putative regulatory motifs. Although our analyses show similar while significant results in multiple metrics, the method of randomly shuffling nucleotides outperform the other two methods. Thus, we still recommend designing sequences by randomly shuffling the nucleotides of the perturbed site in perturbation-MPRA. The evaluation framework, together with the benchmarking findings in our work, creates a resource of computational pipelines and illustrates the promise of perturbation-MPRA for predicting non-coding regulatory activities.
RESUMEN
Regulation of gene expression through enhancers is one of the major processes shaping the structure and function of the human brain during development. High-throughput assays have predicted thousands of enhancers involved in neurodevelopment, and confirming their activity through orthogonal functional assays is crucial. Here, we utilized Massively Parallel Reporter Assays (MPRAs) in stem cells and forebrain organoids to evaluate the activity of ~7,000 gene-linked enhancers previously identified in human fetal tissues and brain organoids. We used a Gaussian mixture model to evaluate the contribution of background noise in the measured activity signal to confirm the activity of ~35% of the tested enhancers, with most showing temporal-specific activity, suggesting their evolving role in neurodevelopment. The temporal specificity was further supported by the correlation of activity with gene expression. Our findings provide a valuable gene regulatory resource to the scientific community.
RESUMEN
Human accelerated regions (HARs) are conserved genomic loci that evolved at an accelerated rate in the human lineage and may underlie human-specific traits. We generated HARs and chimpanzee accelerated regions with an automated pipeline and an alignment of 241 mammalian genomes. Combining deep learning with chromatin capture experiments in human and chimpanzee neural progenitor cells, we discovered a significant enrichment of HARs in topologically associating domains containing human-specific genomic variants that change three-dimensional (3D) genome organization. Differential gene expression between humans and chimpanzees at these loci suggests rewiring of regulatory interactions between HARs and neurodevelopmental genes. Thus, comparative genomics together with models of 3D genome folding revealed enhancer hijacking as an explanation for the rapid evolution of HARs.
Asunto(s)
Sitios Genéticos , Neurogénesis , Animales , Humanos , Cromatina/genética , Genoma Humano , Genómica , Pan troglodytes/genética , Neurogénesis/genética , Aprendizaje ProfundoRESUMEN
The gene regulatory code and grammar remain largely unknown, precluding our ability to link phenotype to genotype in regulatory sequences. Here, using a massively parallel reporter assay (MPRA) of 209,440 sequences, we examine all possible pair and triplet combinations, permutations and orientations of eighteen liver-associated transcription factor binding sites (TFBS). We find that TFBS orientation and order have a major effect on gene regulatory activity. Corroborating these results with genomic analyses, we find clear human promoter TFBS orientation biases and similar TFBS orientation and order transcriptional effects in an MPRA that tested 164,307 liver candidate regulatory elements. Additionally, by adding TFBS orientation to a model that predicts expression from sequence we improve performance by 7.7%. Collectively, our results show that TFBS orientation and order have a significant effect on gene regulatory activity and need to be considered when analyzing the functional effect of variants on the activity of these sequences.
Asunto(s)
Regulación de la Expresión Génica , Factores de Transcripción , Humanos , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Sitios de Unión/genética , Regiones Promotoras Genéticas/genética , Unión ProteicaRESUMEN
Human rhinovirus (HRV) infections are generally referred to as the common cold, and are the main cause of mild symptoms. HRV is less frequently implicated in the development of severe respiratory infections. This study reports a nosocomial outbreak of bronchitis and pneumonia caused by HRV in a hospital during the COVID-19 epidemic in September 2022 in Gunma Prefecture, Japan. The patient continued to be symptomatic for nine days. During this outbreak, all 15 residents displayed respiratory symptoms. HRV-A was detected in 12 of the 12 samples, and phylogenetic analysis classified the strain as HRV-A type 61. HRV, COVID-19, and other respiratory infections cannot be differentiated based solely on clinical symptoms. A surveillance system to monitor them is thus needed.
Asunto(s)
COVID-19 , Infección Hospitalaria , Infecciones por Picornaviridae , Infecciones del Sistema Respiratorio , Humanos , COVID-19/epidemiología , Infección Hospitalaria/epidemiología , Brotes de Enfermedades , Hospitales , Japón/epidemiología , Filogenia , Infecciones por Picornaviridae/epidemiología , Rhinovirus/genéticaRESUMEN
The human genome contains millions of candidate cis-regulatory elements (CREs) with cell-type-specific activities that shape both health and myriad disease states. However, we lack a functional understanding of the sequence features that control the activity and cell-type-specific features of these CREs. Here, we used lentivirus-based massively parallel reporter assays (lentiMPRAs) to test the regulatory activity of over 680,000 sequences, representing a nearly comprehensive set of all annotated CREs among three cell types (HepG2, K562, and WTC11), finding 41.7% to be functional. By testing sequences in both orientations, we find promoters to have significant strand orientation effects. We also observe that their 200 nucleotide cores function as non-cell-type-specific 'on switches' providing similar expression levels to their associated gene. In contrast, enhancers have weaker orientation effects, but increased tissue-specific characteristics. Utilizing our lentiMPRA data, we develop sequence-based models to predict CRE function with high accuracy and delineate regulatory motifs. Testing an additional lentiMPRA library encompassing 60,000 CREs in all three cell types, we further identified factors that determine cell-type specificity. Collectively, our work provides an exhaustive catalog of functional CREs in three widely used cell lines, and showcases how large-scale functional measurements can be used to dissect regulatory grammar.
RESUMEN
Autism spectrum disorder (ASD) is a common, complex, and highly heritable condition with contributions from both common and rare genetic variations. While disruptive, rare variants in protein-coding regions clearly contribute to symptoms, the role of rare non-coding remains unclear. Variants in these regions, including promoters, can alter downstream RNA and protein quantity; however, the functional impacts of specific variants observed in ASD cohorts remain largely uncharacterized. Here, we analyzed 3600 de novo mutations in promoter regions previously identified by whole-genome sequencing of autistic probands and neurotypical siblings to test the hypothesis that mutations in cases have a greater functional impact than those in controls. We leveraged massively parallel reporter assays (MPRAs) to detect transcriptional consequences of these variants in neural progenitor cells and identified 165 functionally high confidence de novo variants (HcDNVs). While these HcDNVs are enriched for markers of active transcription, disruption to transcription factor binding sites, and open chromatin, we did not identify differences in functional impact based on ASD diagnostic status.
Asunto(s)
Trastorno del Espectro Autista , Trastorno Autístico , Humanos , Trastorno del Espectro Autista/genética , Predisposición Genética a la Enfermedad , Mutación , Trastorno Autístico/genética , Regiones Promotoras GenéticasRESUMEN
Nucleotide changes in gene regulatory elements are important determinants of neuronal development and disease. Using massively parallel reporter assays in primary human cells from mid-gestation cortex and cerebral organoids, we interrogated the cis-regulatory activity of 102,767 sequences, including differentially accessible cell-type specific regions in the developing cortex and single-nucleotide variants associated with psychiatric disorders. In primary cells, we identified 46,802 active enhancer sequences and 164 disorder-associated variants that significantly alter enhancer activity. Activity was comparable in organoids and primary cells, suggesting that organoids provide an adequate model for the developing cortex. Using deep learning, we decoded the sequence basis and upstream regulators of enhancer activity. This work establishes a comprehensive catalog of functional gene regulatory elements and variants in human neuronal development.
RESUMEN
Using machine learning (ML), we interrogated the function of all human-chimpanzee variants in 2,645 human accelerated regions (HARs), finding 43% of HARs have variants with large opposing effects on chromatin state and 14% on neurodevelopmental enhancer activity. This pattern, consistent with compensatory evolution, was confirmed using massively parallel reporter assays in chimpanzee and human neural progenitor cells. The species-specific enhancer activity of HARs was accurately predicted from the presence and absence of transcription factor footprints in each species. Despite these striking cis effects, activity of a given HAR sequence was nearly identical in human and chimpanzee cells. This suggests that HARs did not evolve to compensate for changes in the trans environment but instead altered their ability to bind factors present in both species. Thus, ML prioritized variants with functional effects on human neurodevelopment and revealed an unexpected reason why HARs may have evolved so rapidly.
Asunto(s)
Encéfalo , Elementos de Facilitación Genéticos , Pan troglodytes , Animales , Humanos , Cromatina , Aprendizaje Automático , Pan troglodytes/metabolismo , Factores de Transcripción/genética , Encéfalo/crecimiento & desarrolloRESUMEN
lternative DNA conformations, termed non-B DNA structures, can affect transcription, but the underlying mechanisms and their functional impact have not been systematically characterized. Here, we used computational genomic analyses coupled with massively parallel reporter assays (MPRAs) to show that certain non-B DNA structures have a substantial effect on gene expression. Genomic analyses found that non-B DNA structures at promoters harbor an excess of germline variants. Analysis of multiple MPRAs, including a promoter library specifically designed to perturb non-B DNA structures, functionally validated that Z-DNA can significantly affect promoter activity. We also observed that biophysical properties of non-B DNA motifs, such as the length of Z-DNA motifs and the orientation of G-quadruplex structures relative to transcriptional direction, have a significant effect on promoter activity. Combined, their higher mutation rate and functional effect on transcription implicate a subset of non-B DNA motifs as major drivers of human gene-expression-associated phenotypes.
RESUMEN
Gene regulatory elements play a key role in orchestrating gene expression during cellular differentiation, but what determines their function over time remains largely unknown. Here, we perform perturbation-based massively parallel reporter assays at seven early time points of neural differentiation to systematically characterize how regulatory elements and motifs within them guide cellular differentiation. By perturbing over 2,000 putative DNA binding motifs in active regulatory regions, we delineate four categories of functional elements, and observe that activity direction is mostly determined by the sequence itself, while the magnitude of effect depends on the cellular environment. We also find that fine-tuning transcription rates is often achieved by a combined activity of adjacent activating and repressing elements. Our work provides a blueprint for the sequence components needed to induce different transcriptional patterns in general and specifically during neural differentiation.