RESUMO
Intermolecular RNA-RNA interactions are used by many noncoding RNAs (ncRNAs) to achieve their diverse functions. To identify these contacts, we developed a method based on RNA antisense purification to systematically map RNA-RNA interactions (RAP-RNA) and applied it to investigate two ncRNAs implicated in RNA processing: U1 small nuclear RNA, a component of the spliceosome, and Malat1, a large ncRNA that localizes to nuclear speckles. U1 and Malat1 interact with nascent transcripts through distinct targeting mechanisms. Using differential crosslinking, we confirmed that U1 directly hybridizes to 5' splice sites and 5' splice site motifs throughout introns and found that Malat1 interacts with pre-mRNAs indirectly through protein intermediates. Interactions with nascent pre-mRNAs cause U1 and Malat1 to localize proximally to chromatin at active genes, demonstrating that ncRNAs can use RNA-RNA interactions to target specific pre-mRNAs and genomic sites. RAP-RNA is sensitive to lower abundance RNAs as well, making it generally applicable for investigating ncRNAs.
Assuntos
Técnicas Genéticas , RNA Mensageiro/metabolismo , Animais , Sequência de Bases , Reagentes de Ligações Cruzadas/metabolismo , Camundongos , Dados de Sequência Molecular , Motivos de Nucleotídeos , Sítios de Splice de RNA , RNA Longo não Codificante/química , RNA Longo não Codificante/metabolismo , RNA Mensageiro/química , RNA Nuclear Pequeno/metabolismo , RNA não Traduzido/química , RNA não Traduzido/metabolismoRESUMO
Although several hundred regions of the human genome harbor signals of positive natural selection, few of the relevant adaptive traits and variants have been elucidated. Using full-genome sequence variation from the 1000 Genomes (1000G) Project and the composite of multiple signals (CMS) test, we investigated 412 candidate signals and leveraged functional annotation, protein structure modeling, epigenetics, and association studies to identify and extensively annotate candidate causal variants. The resulting catalog provides a tractable list for experimental follow-up; it includes 35 high-scoring nonsynonymous variants, 59 variants associated with expression levels of a nearby coding gene or lincRNA, and numerous variants associated with susceptibility to infectious disease and other phenotypes. We experimentally characterized one candidate nonsynonymous variant in Toll-like receptor 5 (TLR5) and show that it leads to altered NF-κB signaling in response to bacterial flagellin. PAPERFLICK:
Assuntos
Técnicas Genéticas , Genoma Humano , Estudo de Associação Genômica Ampla , Mutação , Animais , Bactérias/metabolismo , Flagelina/metabolismo , Projeto HapMap , Humanos , NF-kappa B/metabolismo , Locos de Características Quantitativas , Elementos Reguladores de Transcrição , Transdução de Sinais , Receptor 5 Toll-Like/genética , Receptor 5 Toll-Like/metabolismoRESUMO
Gene regulation in the human genome is controlled by distal enhancers that activate specific nearby promoters1. A proposed model for this specificity is that promoters have sequence-encoded preferences for certain enhancers, for example, mediated by interacting sets of transcription factors or cofactors2. This 'biochemical compatibility' model has been supported by observations at individual human promoters and by genome-wide measurements in Drosophila3-9. However, the degree to which human enhancers and promoters are intrinsically compatible has not yet been systematically measured, and how their activities combine to control RNA expression remains unclear. Here we design a high-throughput reporter assay called enhancer × promoter self-transcribing active regulatory region sequencing (ExP STARR-seq) and applied it to examine the combinatorial compatibilities of 1,000 enhancer and 1,000 promoter sequences in human K562 cells. We identify simple rules for enhancer-promoter compatibility, whereby most enhancers activate all promoters by similar amounts, and intrinsic enhancer and promoter activities multiplicatively combine to determine RNA output (R2 = 0.82). In addition, two classes of enhancers and promoters show subtle preferential effects. Promoters of housekeeping genes contain built-in activating motifs for factors such as GABPA and YY1, which decrease the responsiveness of promoters to distal enhancers. Promoters of variably expressed genes lack these motifs and show stronger responsiveness to enhancers. Together, this systematic assessment of enhancer-promoter compatibility suggests a multiplicative model tuned by enhancer and promoter class to control gene transcription in the human genome.
Assuntos
Elementos Facilitadores Genéticos , Regiões Promotoras Genéticas , Elementos Facilitadores Genéticos/genética , Humanos , Regiões Promotoras Genéticas/genética , RNA/biossíntese , RNA/genética , Fatores de Transcrição/metabolismoRESUMO
The past fifty years have seen the development and application of numerous statistical methods to identify genomic regions that appear to be shaped by natural selection. These methods have been used to investigate the macro- and microevolution of a broad range of organisms, including humans. Here, we provide a comprehensive outline of these methods, explaining their conceptual motivations and statistical interpretations. We highlight areas of recent and future development in evolutionary genomics methods and discuss ongoing challenges for researchers employing such tests. In particular, we emphasize the importance of functional follow-up studies to characterize putative selected alleles and the use of selection scans as hypothesis-generating tools for investigating evolutionary histories.
Assuntos
Genômica , Seleção Genética/genética , Adaptação Fisiológica/genética , Alelos , Substituição de Aminoácidos , Animais , Evolução Molecular , Previsões , Frequência do Gene , Genética Populacional/métodos , Técnicas de Genotipagem , Humanos , Desequilíbrio de Ligação , Modelos Genéticos , Herança Multifatorial/genética , Mutação , Taxa de Mutação , Fenótipo , Análise de Sequência de DNARESUMO
Gene expression is controlled by sequence-specific transcription factors (TFs), which bind to regulatory sequences in DNA. TF binding occurs in nucleosome-depleted regions of DNA (NDRs), which generally encompass regions with lengths similar to those protected by nucleosomes. However, less is known about where within these regions specific TFs tend to be found. Here, we characterize the positional bias of inferred binding sites for 103 TFs within â¼500,000 NDRs across 47 cell types. We find that distinct classes of TFs display different binding preferences: Some tend to have binding sites toward the edges, some toward the center, and some at other positions within the NDR. These patterns are highly consistent across cell types, suggesting that they may reflect TF-specific intrinsic structural or functional characteristics. In particular, TF classes with binding sites at NDR edges are enriched for those known to interact with histones and chromatin remodelers, whereas TFs with central enrichment interact with other TFs and cofactors such as p300. Our results suggest distinct regiospecific binding patterns and functions of TF classes within enhancers.
Assuntos
Regulação da Expressão Gênica/fisiologia , Elementos de Resposta/fisiologia , Fatores de Transcrição/metabolismo , Humanos , Células Jurkat , Fatores de Transcrição/genética , Células U937RESUMO
Enhancers regulate gene expression through the binding of sequence-specific transcription factors (TFs) to cognate motifs. Various features influence TF binding and enhancer function-including the chromatin state of the genomic locus, the affinities of the binding site, the activity of the bound TFs, and interactions among TFs. However, the precise nature and relative contributions of these features remain unclear. Here, we used massively parallel reporter assays (MPRAs) involving 32,115 natural and synthetic enhancers, together with high-throughput in vivo binding assays, to systematically dissect the contribution of each of these features to the binding and activity of genomic regulatory elements that contain motifs for PPARγ, a TF that serves as a key regulator of adipogenesis. We show that distinct sets of features govern PPARγ binding vs. enhancer activity. PPARγ binding is largely governed by the affinity of the specific motif site and higher-order features of the larger genomic locus, such as chromatin accessibility. In contrast, the enhancer activity of PPARγ binding sites depends on varying contributions from dozens of TFs in the immediate vicinity, including interactions between combinations of these TFs. Different pairs of motifs follow different interaction rules, including subadditive, additive, and superadditive interactions among specific classes of TFs, with both spatially constrained and flexible grammars. Our results provide a paradigm for the systematic characterization of the genomic features underlying regulatory elements, applicable to the design of synthetic regulatory elements or the interpretation of human genetic variation.
Assuntos
Elementos Facilitadores Genéticos/genética , Regulação da Expressão Gênica , Genômica/métodos , Fatores de Transcrição/metabolismo , Células 3T3-L1 , Animais , Sítios de Ligação/genética , Camundongos , Mutação , Motivos de Nucleotídeos/genética , PPAR gama/metabolismo , Ligação ProteicaRESUMO
The Plasmodium falciparum parasite's ability to adapt to environmental pressures, such as the human immune system and antimalarial drugs, makes malaria an enduring burden to public health. Understanding the genetic basis of these adaptations is critical to intervening successfully against malaria. To that end, we created a high-density genotyping array that assays over 17,000 single nucleotide polymorphisms (â¼ 1 SNP/kb), and applied it to 57 culture-adapted parasites from three continents. We characterized genome-wide genetic diversity within and between populations and identified numerous loci with signals of natural selection, suggesting their role in recent adaptation. In addition, we performed a genome-wide association study (GWAS), searching for loci correlated with resistance to thirteen antimalarials; we detected both known and novel resistance loci, including a new halofantrine resistance locus, PF10_0355. Through functional testing we demonstrated that PF10_0355 overexpression decreases sensitivity to halofantrine, mefloquine, and lumefantrine, but not to structurally unrelated antimalarials, and that increased gene copy number mediates resistance. Our GWAS and follow-on functional validation demonstrate the potential of genome-wide studies to elucidate functionally important loci in the malaria parasite genome.
Assuntos
Antimaláricos/farmacologia , Resistência a Medicamentos/genética , Loci Gênicos , Plasmodium falciparum/genética , Etanolaminas/farmacologia , Fluorenos/farmacologia , Dosagem de Genes , Expressão Gênica , Estudos de Associação Genética , Variação Genética , Genótipo , Haplótipos , Desequilíbrio de Ligação , Lumefantrina , Malária Falciparum/parasitologia , Malária Falciparum/prevenção & controle , Mefloquina/farmacologia , Fenantrenos/farmacologia , Plasmodium falciparum/efeitos dos fármacos , Polimorfismo de Nucleotídeo Único , Seleção GenéticaRESUMO
In human cells, DNA double-strand breaks are repaired primarily by the non-homologous end joining (NHEJ) pathway. Given their critical nature, we expected NHEJ proteins to be evolutionarily conserved, with relatively little sequence change over time. Here, we report that while critical domains of these proteins are conserved as expected, the sequence of NHEJ proteins has also been shaped by recurrent positive selection, leading to rapid sequence evolution in other protein domains. In order to characterize the molecular evolution of the human NHEJ pathway, we generated large simian primate sequence datasets for NHEJ genes. Codon-based models of gene evolution yielded statistical support for the recurrent positive selection of five NHEJ genes during primate evolution: XRCC4, NBS1, Artemis, POLλ, and CtIP. Analysis of human polymorphism data using the composite of multiple signals (CMS) test revealed that XRCC4 has also been subjected to positive selection in modern humans. Crystal structures are available for XRCC4, Nbs1, and Polλ; and residues under positive selection fall exclusively on the surfaces of these proteins. Despite the positive selection of such residues, biochemical experiments with variants of one positively selected site in Nbs1 confirm that functions necessary for DNA repair and checkpoint signaling have been conserved. However, many viruses interact with the proteins of the NHEJ pathway as part of their infectious lifecycle. We propose that an ongoing evolutionary arms race between viruses and NHEJ genes may be driving the surprisingly rapid evolution of these critical genes.
Assuntos
Reparo do DNA/genética , Evolução Molecular , Primatas/genética , Recombinação Genética/genética , Adaptação Fisiológica/genética , Sequência de Aminoácidos , Animais , Sítios de Ligação/genética , Proteínas de Transporte/química , Proteínas de Transporte/genética , Proteínas de Transporte/metabolismo , Proteínas de Ciclo Celular/química , Proteínas de Ciclo Celular/genética , Proteínas de Ciclo Celular/metabolismo , Quebras de DNA de Cadeia Dupla , DNA Polimerase beta/química , DNA Polimerase beta/genética , DNA Polimerase beta/metabolismo , Proteínas de Ligação a DNA/química , Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo , Endodesoxirribonucleases , Endonucleases , Humanos , Modelos Moleculares , Dados de Sequência Molecular , Proteínas Nucleares/química , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Filogenia , Primatas/classificação , Ligação Proteica , Estrutura Terciária de Proteína , Seleção Genética , Homologia de Sequência de Aminoácidos , Transdução de SinaisRESUMO
Enhancer elements in the human genome control how genes are expressed in specific cell types and harbor thousands of genetic variants that influence risk for common diseases1-4. Yet, we still do not know how enhancers regulate specific genes, and we lack general rules to predict enhancer-gene connections across cell types5,6. We developed an experimental approach, CRISPRi-FlowFISH, to perturb enhancers in the genome, and we applied it to test >3,500 potential enhancer-gene connections for 30 genes. We found that a simple activity-by-contact model substantially outperformed previous methods at predicting the complex connections in our CRISPR dataset. This activity-by-contact model allows us to construct genome-wide maps of enhancer-gene connections in a given cell type, on the basis of chromatin state measurements. Together, CRISPRi-FlowFISH and the activity-by-contact model provide a systematic approach to map and predict which enhancers regulate which genes, and will help to interpret the functions of the thousands of disease risk variants in the noncoding genome.
Assuntos
Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Elementos Facilitadores Genéticos , Regiões Promotoras Genéticas , Animais , Fator de Transcrição GATA1/genética , Regulação da Expressão Gênica , Desacetilase 6 de Histona/genética , Humanos , Hibridização in Situ Fluorescente , Células K562 , Camundongos , Modelos Genéticos , RNA Guia de CinetoplastídeosRESUMO
Biological interpretation of genome-wide association study data frequently involves assessing whether SNPs linked to a biological process, for example, binding of a transcription factor, show unsigned enrichment for disease signal. However, signed annotations quantifying whether each SNP allele promotes or hinders the biological process can enable stronger statements about disease mechanism. We introduce a method, signed linkage disequilibrium profile regression, for detecting genome-wide directional effects of signed functional annotations on disease risk. We validate the method via simulations and application to molecular quantitative trait loci in blood, recovering known transcriptional regulators. We apply the method to expression quantitative trait loci in 48 Genotype-Tissue Expression tissues, identifying 651 transcription factor-tissue associations including 30 with robust evidence of tissue specificity. We apply the method to 46 diseases and complex traits (average n = 290 K), identifying 77 annotation-trait associations representing 12 independent transcription factor-trait associations, and characterize the underlying transcriptional programs using gene-set enrichment analyses. Our results implicate new causal disease genes and new disease mechanisms.
Assuntos
Doença/genética , Estudo de Associação Genômica Ampla , Herança Multifatorial/genética , Locos de Características Quantitativas , Fatores de Transcrição/metabolismo , Sítios de Ligação/genética , Células Sanguíneas/metabolismo , Células Sanguíneas/patologia , Análise Química do Sangue , Regulação da Expressão Gênica , Predisposição Genética para Doença , Humanos , Desequilíbrio de Ligação , Fenótipo , Polimorfismo de Nucleotídeo Único , Ligação Proteica , Fatores de RiscoRESUMO
Gene expression in mammals is regulated by noncoding elements that can affect physiology and disease, yet the functions and target genes of most noncoding elements remain unknown. We present a high-throughput approach that uses clustered regularly interspaced short palindromic repeats (CRISPR) interference (CRISPRi) to discover regulatory elements and identify their target genes. We assess >1 megabase of sequence in the vicinity of two essential transcription factors, MYC and GATA1, and identify nine distal enhancers that control gene expression and cellular proliferation. Quantitative features of chromatin state and chromosome conformation distinguish the seven enhancers that regulate MYC from other elements that do not, suggesting a strategy for predicting enhancer-promoter connectivity. This CRISPRi-based approach can be applied to dissect transcriptional networks and interpret the contributions of noncoding genetic variation to human disease.
Assuntos
Mapeamento Cromossômico/métodos , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Elementos Facilitadores Genéticos/fisiologia , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Regiões Promotoras Genéticas/fisiologia , Sistemas CRISPR-Cas , Proliferação de Células/genética , Doença/genética , Elementos Facilitadores Genéticos/genética , Fator de Transcrição GATA1/genética , Regulação da Expressão Gênica , Humanos , Células K562 , Regiões Promotoras Genéticas/genética , Proteínas Proto-Oncogênicas c-myc/genética , Reação em Cadeia da Polimerase em Tempo RealRESUMO
Rapidly evolving viruses and other pathogens can have an immense impact on human evolution as natural selection acts to increase the prevalence of genetic variants providing resistance to disease. With the emergence of large datasets of human genetic variation, we can search for signatures of natural selection in the human genome driven by such disease-causing microorganisms. Based on this approach, we have previously hypothesized that Lassa virus (LASV) may have been a driver of natural selection in West African populations where Lassa haemorrhagic fever is endemic. In this study, we provide further evidence for this notion. By applying tests for selection to genome-wide data from the International Haplotype Map Consortium and the 1000 Genomes Consortium, we demonstrate evidence for positive selection in LARGE and interleukin 21 (IL21), two genes implicated in LASV infectivity and immunity. We further localized the signals of selection, using the recently developed composite of multiple signals method, to introns and putative regulatory regions of those genes. Our results suggest that natural selection may have targeted variants giving rise to alternative splicing or differential gene expression of LARGE and IL21. Overall, our study supports the hypothesis that selective pressures imposed by LASV may have led to the emergence of particular alleles conferring resistance to Lassa fever, and opens up new avenues of research pursuit.
Assuntos
Resistência à Doença/genética , Evolução Molecular , Genoma Humano/genética , Interleucinas/genética , Febre Lassa/genética , Vírus Lassa/patogenicidade , N-Acetilglucosaminiltransferases/genética , Seleção Genética , África Ocidental , População Negra/genética , Humanos , FilogeografiaRESUMO
Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R(2)) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships.
Assuntos
Interpretação Estatística de Dados , Algoritmos , Animais , Beisebol/estatística & dados numéricos , Feminino , Expressão Gênica , Genes Fúngicos , Genômica/métodos , Humanos , Intestinos/microbiologia , Masculino , Metagenoma , Camundongos , Obesidade , Saccharomyces cerevisiae/genéticaRESUMO
The human genome contains hundreds of regions whose patterns of genetic variation indicate recent positive natural selection, yet for most the underlying gene and the advantageous mutation remain unknown. We developed a method, composite of multiple signals (CMS), that combines tests for multiple signals of selection and increases resolution by up to 100-fold. By applying CMS to candidate regions from the International Haplotype Map, we localized population-specific selective signals to 55 kilobases (median), identifying known and novel causal variants. CMS can not just identify individual loci but implicates precise variants selected by evolution.