RESUMEN
Transposable elements (TEs) are active in neuronal cells raising the question whether TE insertions contribute to risk of neuropsychiatric disease. While genome-wide association studies (GWAS) serve as a tool to discover genetic loci associated with neuropsychiatric diseases, unfortunately GWAS do not directly detect structural variants such as TEs. To examine the role of TEs in psychiatric and neurologic disease, we evaluated 17,000 polymorphic TEs and find 76 are in linkage disequilibrium with disease haplotypes (P < 10-6 ) defined by GWAS. From these 76 polymorphic TEs, we identify potentially causal candidates based on having insertions in genomic regions of regulatory chromatin and on having associations with altered gene expression in brain tissues. We show that lead candidate insertions have regulatory effects on gene expression in human neural stem cells altering the activity of a minimal promoter. Taken together, we identify 10 polymorphic TE insertions that are potential candidates on par with other variants for having a causal role in neurologic and psychiatric disorders.
Asunto(s)
Trastornos Mentales , Retroelementos , Humanos , Retroelementos/genética , Estudio de Asociación del Genoma Completo , Genoma , Sitios Genéticos , Trastornos Mentales/genética , Elementos Transponibles de ADN/genética , Evolución MolecularRESUMEN
Alu are high copy number interspersed repeats that have accumulated near genes during primate and human evolution. They are a pervasive source of structural variation in modern humans. Impacts that Alu insertions may have on gene expression are not well understood, although some have been associated with expression quantitative trait loci (eQTLs). Here, we directly test regulatory effects of polymorphic Alu insertions in isolation of other variants on the same haplotype. To screen insertion variants for those with such effects, we used ectopic luciferase reporter assays and evaluated 110 Alu insertion variants, including more than 40 with a potential role in disease risk. We observed a continuum of effects with significant outliers that up- or down-regulate luciferase activity. Using a series of reporter constructs, which included genomic context surrounding the Alu, we can distinguish between instances in which the Alu disrupts another regulator and those in which the Alu introduces new regulatory sequence. We next focused on three polymorphic Alu loci associated with breast cancer that display significant effects in the reporter assay. We used CRISPR to modify the endogenous sequences, establishing cell lines varying in the Alu genotype. Our findings indicate that Alu genotype can alter expression of genes implicated in cancer risk, including PTHLH, RANBP9, and MYC These data show that commonly occurring polymorphic Alu elements can alter transcript levels and potentially contribute to disease risk.
RESUMEN
BACKGROUND: Mobile elements are a major source of structural variants in the human genome, and some mobile elements can regulate gene expression and transcript splicing. However, the impact of polymorphic mobile element insertions (pMEIs) on gene expression and splicing in diverse human tissues has not been thoroughly studied. The multi-tissue gene expression and whole genome sequencing data generated by the Genotype-Tissue Expression (GTEx) project provide a great opportunity to systematically evaluate the role of pMEIs in regulating gene expression in human tissues. RESULTS: Using the GTEx whole genome sequencing data, we identify 20,545 high-quality pMEIs from 639 individuals. Coupling pMEI genotypes with gene expression profiles, we identify pMEI-associated expression quantitative trait loci (eQTLs) and splicing quantitative trait loci (sQTLs) in 48 tissues. Using joint analyses of pMEIs and other genomic variants, pMEIs are predicted to be the potential causal variant for 3522 eQTLs and 3717 sQTLs. The pMEI-associated eQTLs and sQTLs show a high level of tissue specificity, and these pMEIs are enriched in the proximity of affected genes and in regulatory elements. Using reporter assays, we confirm that several pMEIs associated with eQTLs and sQTLs can alter gene expression levels and isoform proportions, respectively. CONCLUSION: Overall, our study shows that pMEIs are associated with thousands of gene expression and splicing variations, indicating that pMEIs could have a significant role in regulating tissue-specific gene expression and transcript splicing. Detailed mechanisms for the role of pMEIs in gene regulation in different tissues will be an important direction for future studies.
Asunto(s)
Empalme Alternativo , Conjuntos de Datos como Asunto , Expresión Génica , Secuencias Repetitivas Esparcidas , Sitios de Carácter Cuantitativo , HumanosRESUMEN
Long interspersed element-1 (LINE-1, L1) sequences, which comprise about 17% of human genome, are the product of one of the most active types of mobile DNAs in modern humans. LINE-1 insertion alleles can cause inherited and de novo genetic diseases, and LINE-1-encoded proteins are highly expressed in some cancers. Genome-wide LINE-1 mapping in single cells could be useful for defining somatic and germline retrotransposition rates, and for enabling studies to characterize tumour heterogeneity, relate insertions to transcriptional and epigenetic effects at the cellular level, or describe cellular phylogenies in development. Our laboratories have reported a genome-wide LINE-1 insertion site mapping method for bulk DNA, named transposon insertion profiling by sequencing (TIPseq). There have been significant barriers applying LINE-1 mapping to single cells, owing to the chimeric artefacts and features of repetitive sequences. Here, we optimize a modified TIPseq protocol and show its utility for LINE-1 mapping in single lymphoblastoid cells. Results from single-cell TIPseq experiments compare well to known LINE-1 insertions found by whole-genome sequencing and TIPseq on bulk DNA. Among the several approaches we tested, whole-genome amplification by multiple displacement amplification followed by restriction enzyme digestion, vectorette ligation and LINE-1-targeted PCR had the best assay performance. This article is part of a discussion meeting issue 'Crossroads between transposons and gene regulation'.
Asunto(s)
Elementos Transponibles de ADN/genética , Elementos de Nucleótido Esparcido Largo/genética , Mutagénesis Insercional , Línea Celular , Humanos , Análisis de Secuencia de ADN , Análisis de la Célula IndividualRESUMEN
LINE-1 retrotransposon overexpression is a hallmark of human cancers. We identified a colorectal cancer wherein a fast-growing tumor subclone downregulated LINE-1, prompting us to examine how LINE-1 expression affects cell growth. We find that nontransformed cells undergo a TP53-dependent growth arrest and activate interferon signaling in response to LINE-1. TP53 inhibition allows LINE-1+ cells to grow, and genome-wide-knockout screens show that these cells require replication-coupled DNA-repair pathways, replication-stress signaling and replication-fork restart factors. Our findings demonstrate that LINE-1 expression creates specific molecular vulnerabilities and reveal a retrotransposition-replication conflict that may be an important determinant of cancer growth.
Asunto(s)
ADN/genética , Elementos de Nucleótido Esparcido Largo , Neoplasias/genética , Línea Celular Tumoral , Proliferación Celular , Replicación del ADN , Puntos de Control de la Fase G1 del Ciclo Celular , Regulación Neoplásica de la Expresión Génica , Células HEK293 , Células HeLa , Humanos , Transducción de Señal , Proteína p53 Supresora de Tumor/genética , Proteína p53 Supresora de Tumor/metabolismoRESUMEN
Alu retrotransposons account for more than 10% of the human genome, and insertions of these elements create structural variants segregating in human populations. Such polymorphic Alus are powerful markers to understand population structure, and they represent variants that can greatly impact genome function, including gene expression. Accurate genotyping of Alus and other mobile elements has been challenging. Indeed, we found that Alu genotypes previously called for the 1000 Genomes Project are sometimes erroneous, which poses significant problems for phasing these insertions with other variants that comprise the haplotype. To ameliorate this issue, we introduce a new pipeline - TypeTE - which genotypes Alu insertions from whole-genome sequencing data. Starting from a list of polymorphic Alus, TypeTE identifies the hallmarks (poly-A tail and target site duplication) and orientation of Alu insertions using local re-assembly to reconstruct presence and absence alleles. Genotype likelihoods are then computed after re-mapping sequencing reads to the reconstructed alleles. Using a high-quality set of PCR-based genotyping of >200 loci, we show that TypeTE improves genotype accuracy from 83% to 92% in the 1000 Genomes dataset. TypeTE can be readily adapted to other retrotransposon families and brings a valuable toolbox addition for population genomics.
Asunto(s)
Secuencias Repetitivas Esparcidas/genética , Mutagénesis Insercional/genética , Programas Informáticos , Secuenciación Completa del Genoma/métodos , Bases de Datos Genéticas , Frecuencia de los Genes/genética , Sitios Genéticos , Genética de Población , Genoma Humano , Genotipo , HumanosRESUMEN
Transposable elements are abundant in the human genome, and great strides have been made in pinpointing variations in these repetitive sequences using whole-genome sequencing. Now, the focus is shifting to understanding their expression and regulation, and the functional consequences of their insertion and retention in the genome over time. Whereas transposable element insertions have been known to cause human genetic disease since the 1980s, the scope of their contributions to heritable phenotypes is now starting to be uncovered. Here, we review the many ways human retrotransposons contribute to genome function, their dysregulation in diseases including cancer and how they affect genetic disease.
Asunto(s)
Evolución Molecular , Enfermedades Genéticas Congénitas/genética , Genética Humana , Neoplasias/genética , Retroelementos , HumanosRESUMEN
BACKGROUND: Transposable elements make up a significant portion of the human genome. Accurately locating these mobile DNAs is vital to understand their role as a source of structural variation and somatic mutation. To this end, laboratories have developed strategies to selectively amplify or otherwise enrich transposable element insertion sites in genomic DNA. RESULTS: Here we describe a technique, Transposon Insertion Profiling by sequencing (TIPseq), to map Long INterspersed Element 1 (LINE-1, L1) retrotransposon insertions in the human genome. This method uses vectorette PCR to amplify species-specific L1 (L1PA1) insertion sites followed by paired-end Illumina sequencing. In addition to providing a step-by-step molecular biology protocol, we offer users a guide to our pipeline for data analysis, TIPseqHunter. Our recent studies in pancreatic and ovarian cancer demonstrate the ability of TIPseq to identify invariant (fixed), polymorphic (inherited variants), as well as somatically-acquired L1 insertions that distinguish cancer genomes from a patient's constitutional make-up. CONCLUSIONS: TIPseq provides an approach for amplifying evolutionarily young, active transposable element insertion sites from genomic DNA. Our rationale and variations on this protocol may be useful to those mapping L1 and other mobile elements in complex genomes.
RESUMEN
Transposable elements (TEs) are interspersed repeat sequences that make up much of the human genome. Their expression has been implicated in development and disease. However, TE-derived RNA-seq reads are difficult to quantify. Past approaches have excluded these reads or aggregated RNA expression to subfamilies shared by similar TE copies, sacrificing quantitative accuracy or the genomic context necessary to understand the basis of TE transcription. As a result, the effects of TEs on gene expression and associated phenotypes are not well understood. Here, we present Software for Quantifying Interspersed Repeat Expression (SQuIRE), the first RNA-seq analysis pipeline that provides a quantitative and locus-specific picture of TE expression (https://github.com/wyang17/SQuIRE). SQuIRE is an accurate and user-friendly tool that can be used for a variety of species. We applied SQuIRE to RNA-seq from normal mouse tissues and a Drosophila model of amyotrophic lateral sclerosis. In both model organisms, we recapitulated previously reported TE subfamily expression levels and revealed locus-specific TE expression. We also identified differences in TE transcription patterns relating to transcript type, gene expression and RNA splicing that would be lost with other approaches using subfamily-level analyses. Altogether, our findings illustrate the importance of studying TE transcription with locus-level resolution.
Asunto(s)
Elementos Transponibles de ADN/genética , Sitios Genéticos/genética , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Transcripción Genética/genética , Esclerosis Amiotrófica Lateral/genética , Animales , Modelos Animales de Enfermedad , Drosophila melanogaster/genética , Ratones , Empalme del ARN/genéticaRESUMEN
RNA splicing is a highly regulated process dependent on sequences near splice sites. Insertions of Alu retrotransposons can disrupt splice sites or bind splicing regulators. We hypothesized that some common inherited polymorphic Alu insertions are responsible for splicing QTLs (sQTL). We focused on intronic Alu variants mapping within 100 bp of an alternatively used exon and screened for those that alter splicing. We identify five loci, 21.7% of those assayed, where the polymorphic Alu alters splicing. While in most cases the Alu promotes exon skipping, at one locus the Alu increases exon inclusion. Of particular interest is an Alu polymorphism in the CD58 gene. Reduced CD58 expression is associated with risk for developing multiple sclerosis. We show that the Alu insertion promotes skipping of CD58 exon 3 and results in a frameshifted transcript, indicating that the Alu may be the causative variant for increased MS risk at this locus. Using RT-PCR analysis at the endogenous locus, we confirm that the Alu variant is a sQTL for CD58. In summary, altered splicing efficiency is a common functional consequence of Alu polymorphisms including at least one instance where the variant is implicated in disease risk. This work broadens our understanding of splicing regulatory sequences around exons.
Asunto(s)
Elementos Alu/genética , Antígenos CD58/genética , Sitios de Carácter Cuantitativo/genética , Empalme del ARN/genética , Empalme Alternativo/genética , Exones/genética , Variación Genética , Humanos , Intrones/genética , Sitios de Empalme de ARN/genética , ARN Mensajero/genéticaRESUMEN
Interspersed repeat sequences comprise much of our DNA, although their functional effects are poorly understood. The most commonly occurring repeat is the Alu short interspersed element. New Alu insertions occur in human populations, and have been responsible for several instances of genetic disease. In this study, we sought to determine if there are instances of polymorphic Alu insertion variants that function in a common variant, common disease paradigm. We cataloged 809 polymorphic Alu elements mapping to 1,159 loci implicated in disease risk by genome-wide association study (GWAS) (P < 10-8). We found that Alu insertion variants occur disproportionately at GWAS loci (P = 0.013). Moreover, we identified 44 of these Alu elements in linkage disequilibrium (r2 > 0.7) with the trait-associated SNP. This figure represents a >20-fold increase in the number of polymorphic Alu elements associated with human phenotypes. This work provides a broader perspective on how structural variants in repetitive DNAs may contribute to human disease.
Asunto(s)
Elementos Alu , Enfermedad/genética , Estudios de Casos y Controles , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Humanos , Desequilibrio de Ligamiento , Polimorfismo de Nucleótido SimpleRESUMEN
BACKGROUND: Polymorphic Alu elements account for 17% of structural variants in the human genome. The majority of these belong to the youngest AluY subfamilies, and most structural variant discovery efforts have focused on identifying Alu polymorphisms from these currently retrotranspositionally active subfamilies. In this report we analyze polymorphisms from the evolutionarily older AluS subfamily, whose peak activity was tens of millions of years ago. We annotate the AluS polymorphisms, assess their likely mechanism of origin, and evaluate their contribution to structural variation in the human genome. RESULTS: Of 52 previously reported polymorphic AluS elements ascertained for this study, 48 were confirmed to belong to the AluS subfamily using high stringency subfamily classification criteria. Of these, the majority (77%, 37/48) appear to be deletion polymorphisms. Two polymorphic AluS elements (4%) have features of non-classical Alu insertions and one polymorphic AluS element (2%) likely inserted by a mechanism involving internal priming. Seven AluS polymorphisms (15%) appear to have arisen by the classical target-primed reverse transcription (TPRT) retrotransposition mechanism. These seven TPRT products are 3' intact with 3' poly-A tails, and are flanked by target site duplications; L1 ORF2p endonuclease cleavage sites were also observed, providing additional evidence that these are L1 ORF2p endonuclease-mediated TPRT insertions. Further sequence analysis showed strong conservation of both the RNA polymerase III promoter and SRP9/14 binding sites, important for mediating transcription and interaction with retrotransposition machinery, respectively. This conservation of functional features implies that some of these are fairly recent insertions since they have not diverged significantly from their respective retrotranspositionally competent source elements. CONCLUSIONS: Of the polymorphic AluS elements evaluated in this report, 15% (7/48) have features consistent with TPRT-mediated insertion, thus suggesting that some AluS elements have been more active recently than previously thought, or that fixation of AluS insertion alleles remains incomplete. These data expand the potential significance of polymorphic AluS elements in contributing to structural variation in the human genome. Future discovery efforts focusing on polymorphic AluS elements are likely to identify more such polymorphisms, and approaches tailored to identify deletion alleles may be warranted.