RESUMO
An outbreak of over 1,000 COVID-19 cases in Provincetown, Massachusetts (MA), in July 2021-the first large outbreak mostly in vaccinated individuals in the US-prompted a comprehensive public health response, motivating changes to national masking recommendations and raising questions about infection and transmission among vaccinated individuals. To address these questions, we combined viral genomic and epidemiological data from 467 individuals, including 40% of outbreak-associated cases. The Delta variant accounted for 99% of cases in this dataset; it was introduced from at least 40 sources, but 83% of cases derived from a single source, likely through transmission across multiple settings over a short time rather than a single event. Genomic and epidemiological data supported multiple transmissions of Delta from and between fully vaccinated individuals. However, despite its magnitude, the outbreak had limited onward impact in MA and the US overall, likely due to high vaccination rates and a robust public health response.
Assuntos
COVID-19/epidemiologia , COVID-19/imunologia , COVID-19/transmissão , SARS-CoV-2/genética , SARS-CoV-2/imunologia , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , COVID-19/virologia , Criança , Pré-Escolar , Busca de Comunicante/métodos , Surtos de Doenças , Feminino , Genoma Viral , Humanos , Lactente , Recém-Nascido , Masculino , Massachusetts/epidemiologia , Pessoa de Meia-Idade , Epidemiologia Molecular , Filogenia , SARS-CoV-2/classificação , Vacinação , Sequenciamento Completo do Genoma , Adulto JovemRESUMO
3' untranslated region (3'UTR) variants are strongly associated with human traits and diseases, yet few have been causally identified. We developed the massively parallel reporter assay for 3'UTRs (MPRAu) to sensitively assay 12,173 3'UTR variants. We applied MPRAu to six human cell lines, focusing on genetic variants associated with genome-wide association studies (GWAS) and human evolutionary adaptation. MPRAu expands our understanding of 3'UTR function, suggesting that simple sequences predominately explain 3'UTR regulatory activity. We adapt MPRAu to uncover diverse molecular mechanisms at base pair resolution, including an adenylate-uridylate (AU)-rich element of LEPR linked to potential metabolic evolutionary adaptations in East Asians. We nominate hundreds of 3'UTR causal variants with genetically fine-mapped phenotype associations. Using endogenous allelic replacements, we characterize one variant that disrupts a miRNA site regulating the viral defense gene TRIM14 and one that alters PILRB abundance, nominating a causal variant underlying transcriptional changes in age-related macular degeneration.
Assuntos
Regiões 3' não Traduzidas/genética , Evolução Biológica , Doença/genética , Estudo de Associação Genômica Ampla , Algoritmos , Alelos , Regulação da Expressão Gênica , Genes Reporter , Variação Genética , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Polirribossomos/metabolismo , Locos de Características Quantitativas/genética , RNA/genéticaRESUMO
Although studies have identified hundreds of loci associated with human traits and diseases, pinpointing causal alleles remains difficult, particularly for non-coding variants. To address this challenge, we adapted the massively parallel reporter assay (MPRA) to identify variants that directly modulate gene expression. We applied it to 32,373 variants from 3,642 cis-expression quantitative trait loci and control regions. Detection by MPRA was strongly correlated with measures of regulatory function. We demonstrate MPRA's capabilities for pinpointing causal alleles, using it to identify 842 variants showing differential expression between alleles, including 53 well-annotated variants associated with diseases and traits. We investigated one in detail, a risk allele for ankylosing spondylitis, and provide direct evidence of a non-coding variant that alters expression of the prostaglandin EP4 receptor. These results create a resource of concrete leads and illustrate the promise of this approach for comprehensively interrogating how non-coding polymorphism shapes human biology.
Assuntos
Regulação da Expressão Gênica , Genes Reporter , Doenças Genéticas Inatas/genética , Técnicas Genéticas , Variação Genética , Alelos , Biblioteca Gênica , Células Hep G2 , Humanos , Locos de Características Quantitativas , Sensibilidade e Especificidade , Espondilite Anquilosante/genéticaRESUMO
Cis-regulatory elements (CREs) control gene expression, orchestrating tissue identity, developmental timing and stimulus responses, which collectively define the thousands of unique cell types in the body1-3. While there is great potential for strategically incorporating CREs in therapeutic or biotechnology applications that require tissue specificity, there is no guarantee that an optimal CRE for these intended purposes has arisen naturally. Here we present a platform to engineer and validate synthetic CREs capable of driving gene expression with programmed cell-type specificity. We take advantage of innovations in deep neural network modelling of CRE activity across three cell types, efficient in silico optimization and massively parallel reporter assays to design and empirically test thousands of CREs4-8. Through large-scale in vitro validation, we show that synthetic sequences are more effective at driving cell-type-specific expression in three cell lines compared with natural sequences from the human genome and achieve specificity in analogous tissues when tested in vivo. Synthetic sequences exhibit distinct motif vocabulary associated with activity in the on-target cell type and a simultaneous reduction in the activity of off-target cells. Together, we provide a generalizable framework to prospectively engineer CREs from massively parallel reporter assay models and demonstrate the required literacy to write fit-for-purpose regulatory code.
RESUMO
The evolution of human anatomical features likely involved changes in gene regulation during development. However, the nature and extent of human-specific developmental regulatory functions remain unknown. We obtained a genome-wide view of cis-regulatory evolution in human embryonic tissues by comparing the histone modification H3K27ac, which provides a quantitative readout of promoter and enhancer activity, during human, rhesus, and mouse limb development. Based on increased H3K27ac, we find that 13% of promoters and 11% of enhancers have gained activity on the human lineage since the human-rhesus divergence. These gains largely arose by modification of ancestral regulatory activities in the limb or potential co-option from other tissues and are likely to have heterogeneous genetic causes. Most enhancers that exhibit gain of activity in humans originated in mammals. Gains at promoters and enhancers in the human limb are associated with increased gene expression, suggesting they include molecular drivers of human morphological evolution.
Assuntos
Evolução Biológica , Elementos Facilitadores Genéticos , Extremidades/embriologia , Regulação da Expressão Gênica no Desenvolvimento , Regiões Promotoras Genéticas , Acetilação , Animais , Genética Médica , Estudo de Associação Genômica Ampla , Histonas/metabolismo , Humanos , Macaca mulatta/embriologia , Camundongos/embriologia , Organogênese , TranscriptomaRESUMO
Autism spectrum disorder (ASD) is a complex developmental syndrome of unknown etiology. Recent studies employing exome- and genome-wide sequencing have identified nine high-confidence ASD (hcASD) genes. Working from the hypothesis that ASD-associated mutations in these biologically pleiotropic genes will disrupt intersecting developmental processes to contribute to a common phenotype, we have attempted to identify time periods, brain regions, and cell types in which these genes converge. We have constructed coexpression networks based on the hcASD "seed" genes, leveraging a rich expression data set encompassing multiple human brain regions across human development and into adulthood. By assessing enrichment of an independent set of probable ASD (pASD) genes, derived from the same sequencing studies, we demonstrate a key point of convergence in midfetal layer 5/6 cortical projection neurons. This approach informs when, where, and in what cell types mutations in these specific genes may be productively studied to clarify ASD pathophysiology.
Assuntos
Encéfalo/metabolismo , Transtornos Globais do Desenvolvimento Infantil/genética , Transtornos Globais do Desenvolvimento Infantil/fisiopatologia , Animais , Encéfalo/embriologia , Encéfalo/crescimento & desenvolvimento , Encéfalo/patologia , Transtornos Globais do Desenvolvimento Infantil/patologia , Exoma , Feminino , Feto/metabolismo , Feto/patologia , Perfilação da Expressão Gênica , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Masculino , Camundongos , Mutação , Neurônios/metabolismo , Córtex Pré-Frontal/metabolismo , Análise de Sequência de DNARESUMO
The ENCODE Consortium's efforts to annotate noncoding cis-regulatory elements (CREs) have advanced our understanding of gene regulatory landscapes. Pooled, noncoding CRISPR screens offer a systematic approach to investigate cis-regulatory mechanisms. The ENCODE4 Functional Characterization Centers conducted 108 screens in human cell lines, comprising >540,000 perturbations across 24.85 megabases of the genome. Using 332 functionally confirmed CRE-gene links in K562 cells, we established guidelines for screening endogenous noncoding elements with CRISPR interference (CRISPRi), including accurate detection of CREs that exhibit variable, often low, transcriptional effects. Benchmarking five screen analysis tools, we find that CASA produces the most conservative CRE calls and is robust to artifacts of low-specificity single guide RNAs. We uncover a subtle DNA strand bias for CRISPRi in transcribed regions with implications for screen design and analysis. Together, we provide an accessible data resource, predesigned single guide RNAs for targeting 3,275,697 ENCODE SCREEN candidate CREs with CRISPRi and screening guidelines to accelerate functional characterization of the noncoding genome.
Assuntos
Sistemas CRISPR-Cas , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Humanos , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas/genética , Sistemas CRISPR-Cas/genética , Genoma , Células K562 , RNA Guia de Sistemas CRISPR-CasRESUMO
Genetic changes that altered the function of gene regulatory elements have been implicated in the evolution of human traits such as the expansion of the cerebral cortex. However, identifying the particular changes that modified regulatory activity during human evolution remain challenging. Here we used massively parallel enhancer assays in neural stem cells to quantify the functional impact of >32,000 human-specific substitutions in >4,300 human accelerated regions (HARs) and human gain enhancers (HGEs), which include enhancers with novel activities in humans. We found that >30% of active HARs and HGEs exhibited differential activity between human and chimpanzee. We isolated the effects of human-specific substitutions from background genetic variation to identify the effects of genetic changes most relevant to human evolution. We found that substitutions interacted in both additive and nonadditive ways to modify enhancer function. Substitutions within HARs, which are highly constrained compared to HGEs, showed smaller effects on enhancer activity, suggesting that the impact of human-specific substitutions is buffered in enhancers with constrained ancestral functions. Our findings yield insight into how human-specific genetic changes altered enhancer function and provide a rich set of candidates for studies of regulatory evolution in humans.
Assuntos
Evolução Biológica , Elementos Facilitadores Genéticos , Genoma Humano , Células-Tronco Neurais/metabolismo , Fatores de Transcrição/metabolismo , Animais , Humanos , Neocórtex , Pan troglodytes/genéticaRESUMO
Although some variation introgressed from Neanderthals has undergone selective sweeps, little is known about its functional significance. We used a Massively Parallel Reporter Assay (MPRA) to assay 5,353 high-frequency introgressed variants for their ability to modulate the gene expression within 170 bp of endogenous sequence. We identified 2,548 variants in active putative cis-regulatory elements (CREs) and 292 expression-modulating variants (emVars). These emVars are predicted to alter the binding motifs of important immune transcription factors, are enriched for associations with neutrophil and white blood cell count, and are associated with the expression of genes that function in innate immune pathways including inflammatory response and antiviral defense. We combined the MPRA data with other data sets to identify strong candidates to be driver variants of positive selection including an emVar that may contribute to protection against severe COVID-19 response. We endogenously deleted two CREs containing expression-modulation variants linked to immune function, rs11624425 and rs80317430, identifying their primary genic targets as ELMSAN1, and PAN2 and STAT2, respectively, three genes differentially expressed during influenza infection. Overall, we present the first database of experimentally identified expression-modulating Neanderthal-introgressed alleles contributing to potential immune response in modern humans.
Assuntos
Variação Genética , Genoma Humano , Imunidade Inata/genética , Homem de Neandertal , Animais , Expressão Gênica , Humanos , Inflamação , Homem de Neandertal/genéticaRESUMO
As a species, we possess unique biological features that distinguish us from other primates. Here, we review recent efforts to identify changes in gene regulation that drove the evolution of novel human phenotypes. We discuss genotype-directed comparisons of human and nonhuman primate genomes to identify human-specific genetic changes that may encode new regulatory functions. We also review phenotype-directed approaches, which use comparisons of gene expression or regulatory function in homologous human and nonhuman primate cells and tissues to identify changes in expression levels or regulatory activity that may be due to genetic changes in humans. Together, these studies are beginning to reveal the landscape of regulatory innovation in human evolution and point to specific regulatory changes for further study. Finally, we highlight two novel strategies to model human-specific regulatory functions in vivo: primate induced pluripotent stem cells and the generation of humanized mice by genome editing.
Assuntos
Evolução Molecular , Regulação da Expressão Gênica/genética , Células-Tronco Pluripotentes Induzidas , Animais , Genoma , Humanos , Camundongos , Camundongos Transgênicos/genética , Primatas/genéticaRESUMO
Morphological innovations such as the mammalian neocortex may involve the evolution of novel regulatory sequences. However, de novo birth of regulatory elements active during morphogenesis has not been extensively studied in mammals. Here, we use H3K27ac-defined regulatory elements active during human and mouse corticogenesis to identify enhancers that were likely active in the ancient mammalian forebrain. We infer the phylogenetic origins of these enhancers and find that â¼20% arose in the mammalian stem lineage, coincident with the emergence of the neocortex. Implementing a permutation strategy that controls for the nonrandom variation in the ages of background genomic sequences, we find that mammal-specific enhancers are overrepresented near genes involved in cell migration, cell signaling, and axon guidance. Mammal-specific enhancers are also overrepresented in modules of coexpressed genes in the cortex that are associated with these pathways, notably ephrin and semaphorin signaling. Our results also provide insight into the mechanisms of regulatory innovation in mammals. We find that most neocortical enhancers did not originate by en bloc exaptation of transposons. Young neocortical enhancers exhibit smaller H3K27ac footprints and weaker evolutionary constraint in eutherian mammals than older neocortical enhancers. Based on these observations, we present a model of the enhancer life cycle in which neocortical enhancers initially emerge from genomic background as short, weakly constrained "proto-enhancers." Many proto-enhancers are likely lost, but some may serve as nucleation points for complex enhancers to evolve.
Assuntos
Evolução Biológica , Elementos Facilitadores Genéticos/genética , Regulação da Expressão Gênica no Desenvolvimento/genética , Morfogênese/genética , Neocórtex/crescimento & desenvolvimento , Fatores de Transcrição/genética , Animais , Sequência de Bases , Simulação por Computador , Humanos , Camundongos , Modelos Genéticos , Neocórtex/embriologia , Neocórtex/metabolismo , Especificidade da EspécieRESUMO
Cohesin is implicated in establishing tissue-specific DNA loops that target enhancers to promoters, and also localizes to sites bound by the insulator protein CTCF, which blocks enhancer-promoter communication. However, cohesin-associated interactions have not been characterized on a genome-wide scale. Here we performed chromatin interaction analysis with paired-end tag sequencing (ChIA-PET) of the cohesin subunit SMC1A in developing mouse limb. We identified 2264 SMC1A interactions, of which 1491 (65%) involved sites co-occupied by CTCF. SMC1A participates in tissue-specific enhancer-promoter interactions and interactions that demarcate regions of correlated regulatory output. In contrast to previous studies, we also identified interactions between promoters and distal sites that are maintained in multiple tissues but are poised in embryonic stem cells and resolve to tissue-specific activated or repressed chromatin states in the mouse embryo. Our results reveal the diversity of cohesin-associated interactions in the genome and highlight their role in establishing the regulatory architecture of development.
Assuntos
Proteínas de Ciclo Celular/metabolismo , Cromatina/metabolismo , Proteínas Cromossômicas não Histona/metabolismo , Animais , Sítios de Ligação , Fator de Ligação a CCCTC , Imunoprecipitação da Cromatina , Elementos Facilitadores Genéticos , Regulação da Expressão Gênica no Desenvolvimento , Genoma , Histonas/metabolismo , Botões de Extremidades/metabolismo , Camundongos , Camundongos Endogâmicos C57BL , Especificidade de Órgãos , Regiões Promotoras Genéticas , Subunidades Proteicas/metabolismo , Proteínas Repressoras/metabolismo , CoesinasRESUMO
The regulatory elements that direct tissue-specific gene expression in the developing mammalian embryo remain largely unknown. Although chromatin profiling has proven to be a powerful method for mapping regulatory sequences in cultured cells, chromatin states characteristic of active developmental enhancers have not been directly identified in embryonic tissues. Here we use whole-transcriptome analysis coupled with genome-wide profiling of H3K27ac and H3K27me3 to map chromatin states and enhancers in mouse embryonic forelimb and hindlimb. We show that gene-expression differences between forelimb and hindlimb, and between limb and other embryonic cell types, are correlated with tissue-specific H3K27ac signatures at promoters and distal sites. Using H3K27ac profiles, we identified 28,377 putative enhancers, many of which are likely to be limb specific based on strong enrichment near genes highly expressed in the limb and comparisons with tissue-specific EP300 sites and known enhancers. We describe a chromatin state signature associated with active developmental enhancers, defined by high levels of H3K27ac marking, nucleosome displacement, hypersensitivity to sonication, and strong depletion of H3K27me3. We also find that some developmental enhancers exhibit components of this signature, including hypersensitivity, H3K27ac enrichment, and H3K27me3 depletion, at lower levels in tissues in which they are not active. Our results establish histone modification profiling as a tool for developmental enhancer discovery, and suggest that enhancers maintain an open chromatin state in multiple embryonic tissues independent of their activity level.
Assuntos
Cromatina/genética , Elementos Facilitadores Genéticos , Extremidades/embriologia , Regulação da Expressão Gênica no Desenvolvimento , Animais , Proteína p300 Associada a E1A/genética , Proteína p300 Associada a E1A/metabolismo , Embrião de Mamíferos , Extremidades/fisiologia , Perfilação da Expressão Gênica , Histonas/genética , Histonas/metabolismo , Camundongos , Nucleossomos/metabolismo , Especificidade de Órgãos/genéticaRESUMO
The genetic differences underlying unique phenotypes in humans compared to our closest primate relatives have long remained a mystery. Similarly, the genetic basis of adaptations between human groups during our expansion across the globe is poorly characterized. Uncovering the downstream phenotypic consequences of these genetic variants has been difficult, as a substantial portion lies in noncoding regions, such as cis-regulatory elements (CREs). Here, we review recent high-throughput approaches to measure the functions of CREs and the impact of variation within them. CRISPR screens can directly perturb CREs in the genome to understand downstream impacts on gene expression and phenotypes, while massively parallel reporter assays can decipher the regulatory impact of sequence variants. Machine learning has begun to be able to predict regulatory function from sequence alone, further scaling our ability to characterize genome function. Applying these tools across diverse phenotypes, model systems, and ancestries is beginning to revolutionize our understanding of noncoding variation underlying human evolution.
Assuntos
Evolução Molecular , Genoma Humano , Humanos , Variação Genética , Animais , Sequências Reguladoras de Ácido Nucleico/genética , Fenótipo , Aprendizado de MáquinaRESUMO
Identifying the causal variants and mechanisms that drive complex traits and diseases remains a core problem in human genetics. The majority of these variants have individually weak effects and lie in non-coding gene-regulatory elements where we lack a complete understanding of how single nucleotide alterations modulate transcriptional processes to affect human phenotypes. To address this, we measured the activity of 221,412 trait-associated variants that had been statistically fine-mapped using a Massively Parallel Reporter Assay (MPRA) in 5 diverse cell-types. We show that MPRA is able to discriminate between likely causal variants and controls, identifying 12,025 regulatory variants with high precision. Although the effects of these variants largely agree with orthogonal measures of function, only 69% can plausibly be explained by the disruption of a known transcription factor (TF) binding motif. We dissect the mechanisms of 136 variants using saturation mutagenesis and assign impacted TFs for 91% of variants without a clear canonical mechanism. Finally, we provide evidence that epistasis is prevalent for variants in close proximity and identify multiple functional variants on the same haplotype at a small, but important, subset of trait-associated loci. Overall, our study provides a systematic functional characterization of likely causal common variants underlying complex and molecular human traits, enabling new insights into the regulatory grammar underlying disease risk.
RESUMO
Conserved genomic sequences disrupted in humans may underlie uniquely human phenotypic traits. We identified and characterized 10,032 human-specific conserved deletions (hCONDELs). These short (average 2.56 base pairs) deletions are enriched for human brain functions across genetic, epigenomic, and transcriptomic datasets. Using massively parallel reporter assays in six cell types, we discovered 800 hCONDELs conferring significant differences in regulatory activity, half of which enhance rather than disrupt regulatory function. We highlight several hCONDELs with putative human-specific effects on brain development, including HDAC5, CPEB4, and PPP2CA. Reverting an hCONDEL to the ancestral sequence alters the expression of LOXL2 and developmental genes involved in myelination and synaptic function. Our data provide a rich resource to investigate the evolutionary mechanisms driving new traits in humans and other species.
Assuntos
Encéfalo , Evolução Molecular , Regulação da Expressão Gênica no Desenvolvimento , Deleção de Sequência , Humanos , Sequência Conservada/genética , Genoma , Genômica , Proteínas de Ligação a RNA/genética , Encéfalo/crescimento & desenvolvimentoRESUMO
Although thousands of genomic regions have been associated with heritable human diseases, attempts to elucidate biological mechanisms are impeded by a general inability to discern which genomic positions are functionally important. Evolutionary constraint is a powerful predictor of function that is agnostic to cell type or disease mechanism. Here, single base phyloP scores from the whole genome alignment of 240 placental mammals identified 3.5% of the human genome as significantly constrained, and likely functional. We compared these scores to large-scale genome annotation, genome-wide association studies (GWAS), copy number variation, clinical genetics findings, and cancer data sets. Evolutionarily constrained positions are enriched for variants explaining common disease heritability (more than any other functional annotation). Our results improve variant annotation but also highlight that the regulatory landscape of the human genome still needs to be further explored and linked to disease.
RESUMO
Thousands of genomic regions have been associated with heritable human diseases, but attempts to elucidate biological mechanisms are impeded by an inability to discern which genomic positions are functionally important. Evolutionary constraint is a powerful predictor of function, agnostic to cell type or disease mechanism. Single-base phyloP scores from 240 mammals identified 3.3% of the human genome as significantly constrained and likely functional. We compared phyloP scores to genome annotation, association studies, copy-number variation, clinical genetics findings, and cancer data. Constrained positions are enriched for variants that explain common disease heritability more than other functional annotations. Our results improve variant annotation but also highlight that the regulatory landscape of the human genome still needs to be further explored and linked to disease.
Assuntos
Doença , Variação Genética , Animais , Humanos , Evolução Biológica , Genoma Humano , Estudo de Associação Genômica Ampla , Genômica , Anotação de Sequência Molecular , Polimorfismo de Nucleotídeo Único , Doença/genéticaRESUMO
Zoonomia is the largest comparative genomics resource for mammals produced to date. By aligning genomes for 240 species, we identify bases that, when mutated, are likely to affect fitness and alter disease risk. At least 332 million bases (~10.7%) in the human genome are unusually conserved across species (evolutionarily constrained) relative to neutrally evolving repeats, and 4552 ultraconserved elements are nearly perfectly conserved. Of 101 million significantly constrained single bases, 80% are outside protein-coding exons and half have no functional annotations in the Encyclopedia of DNA Elements (ENCODE) resource. Changes in genes and regulatory elements are associated with exceptional mammalian traits, such as hibernation, that could inform therapeutic development. Earth's vast and imperiled biodiversity offers distinctive power for identifying genetic variants that affect genome function and organismal phenotypes.