RESUMEN
Two abundant classes of mobile elements, namely Alu and L1 elements, continue to generate new retrotransposon insertions in human genomes. Estimates suggest that these elements have generated millions of new germline insertions in individual human genomes worldwide. Unfortunately, current technologies are not capable of detecting most of these young insertions, and the true extent of germline mutagenesis by endogenous human retrotransposons has been difficult to examine. Here, we describe technologies for detecting these young retrotransposon insertions and demonstrate that such insertions indeed are abundant in human populations. We also found that new somatic L1 insertions occur at high frequencies in human lung cancer genomes. Genome-wide analysis suggests that altered DNA methylation may be responsible for the high levels of L1 mobilization observed in these tumors. Our data indicate that transposon-mediated mutagenesis is extensive in human genomes and is likely to have a major impact on human biology and diseases.
Asunto(s)
Elementos Alu , Genoma Humano , Elementos de Nucleótido Esparcido Largo , Mutagénesis , Análisis de Secuencia de ADN/métodos , Neoplasias Encefálicas/genética , Humanos , Neoplasias Pulmonares/genética , MetilaciónRESUMEN
Virtually all genome sequencing efforts in national biobanks, complex and Mendelian disease programs, and medical genetic initiatives are reliant upon short-read whole-genome sequencing (srWGS), which presents challenges for the detection of structural variants (SVs) relative to emerging long-read WGS (lrWGS) technologies. Given this ubiquity of srWGS in large-scale genomics initiatives, we sought to establish expectations for routine SV detection from this data type by comparison with lrWGS assembly, as well as to quantify the genomic properties and added value of SVs uniquely accessible to each technology. Analyses from the Human Genome Structural Variation Consortium (HGSVC) of three families captured ~11,000 SVs per genome from srWGS and ~25,000 SVs per genome from lrWGS assembly. Detection power and precision for SV discovery varied dramatically by genomic context and variant class: 9.7% of the current GRCh38 reference is defined by segmental duplication (SD) and simple repeat (SR), yet 91.4% of deletions that were specifically discovered by lrWGS localized to these regions. Across the remaining 90.3% of reference sequence, we observed extremely high (93.8%) concordance between technologies for deletions in these datasets. In contrast, lrWGS was superior for detection of insertions across all genomic contexts. Given that non-SD/SR sequences encompass 95.9% of currently annotated disease-associated exons, improved sensitivity from lrWGS to discover novel pathogenic deletions in these currently interpretable genomic regions is likely to be incremental. However, these analyses highlight the considerable added value of assembly-based lrWGS to create new catalogs of insertions and transposable elements, as well as disease-associated repeat expansions in genomic sequences that were previously recalcitrant to routine assessment.
Asunto(s)
Genoma Humano/genética , Variación Estructural del Genoma , Genómica/métodos , Objetivos , Secuenciación Completa del Genoma/métodos , Secuenciación Completa del Genoma/normas , Variaciones en el Número de Copia de ADN , Exones/genética , Humanos , Proyectos de Investigación , Duplicaciones Segmentarias en el Genoma , Alineación de SecuenciaRESUMEN
Several large-scale Illumina whole-genome sequencing (WGS) and whole-exome sequencing (WES) projects have emerged recently that have provided exceptional opportunities to discover mobile element insertions (MEIs) and study the impact of these MEIs on human genomes. However, these projects also have presented major challenges with respect to the scalability and computational costs associated with performing MEI discovery on tens or even hundreds of thousands of samples. To meet these challenges, we have developed a more efficient and scalable version of our mobile element locator tool (MELT) called CloudMELT. We then used MELT and CloudMELT to perform MEI discovery in 57,919 human genomes and exomes, leading to the discovery of 104,350 nonredundant MEIs. We leveraged this collection (1) to examine potentially active L1 source elements that drive the mobilization of new Alu, L1, and SVA MEIs in humans; (2) to examine the population distributions and subfamilies of these MEIs; and (3) to examine the mutagenesis of GENCODE genes, ENCODE-annotated features, and disease genes by these MEIs. Our study provides new insights on the L1 source elements that drive MEI mutagenesis and brings forth a better understanding of how this mutagenesis impacts human genomes.
RESUMEN
Somatic LINE-1 (L1) retrotransposition has been detected in early embryos, adult brains, and the gastrointestinal (GI) tract, and many cancers, including epithelial GI tumors. We previously found numerous somatic L1 insertions in paired normal and GI cancerous tissues. Here, using a modified method of single-cell analysis for somatic L1 insertions, we studied adenocarcinomas of colon, pancreas, and stomach, and found a variable number of somatic L1 insertions in tumors of the same type from patient to patient. We detected no somatic L1 insertions in single cells of 5 of 10 tumors studied. In three tumors, aneuploid cells were detected by FACS. In one pancreatic tumor, there were many more L1 insertions in aneuploid than in euploid tumor cells. In one gastric cancer, both aneuploid and euploid cells contained large numbers of likely clonal insertions. However, in a second gastric cancer with aneuploid cells, no somatic L1 insertions were found. We suggest that when the cellular environment is favorable to retrotransposition, aneuploidy predisposes tumor cells to L1 insertions, and retrotransposition may occur at the transition from euploidy to aneuploidy. Seventeen percent of insertions were also present in normal cells, similar to findings in genomic DNA from normal tissues of GI tumor patients. We provide evidence that: 1) The number of L1 insertions in tumors of the same type is highly variable, 2) most somatic L1 insertions in GI cancer tissues are absent from normal tissues, and 3) under certain conditions, somatic L1 retrotransposition exhibits a propensity for occurring in aneuploid cells.
Asunto(s)
Adenocarcinoma/genética , Neoplasias Gastrointestinales/genética , Elementos de Nucleótido Esparcido Largo/genética , Adenocarcinoma/patología , Artefactos , Neoplasias Gastrointestinales/patología , Humanos , Análisis de la Célula IndividualRESUMEN
Mobile element insertions (MEIs) represent â¼25% of all structural variants in human genomes. Moreover, when they disrupt genes, MEIs can influence human traits and diseases. Therefore, MEIs should be fully discovered along with other forms of genetic variation in whole genome sequencing (WGS) projects involving population genetics, human diseases, and clinical genomics. Here, we describe the Mobile Element Locator Tool (MELT), which was developed as part of the 1000 Genomes Project to perform MEI discovery on a population scale. Using both Illumina WGS data and simulations, we demonstrate that MELT outperforms existing MEI discovery tools in terms of speed, scalability, specificity, and sensitivity, while also detecting a broader spectrum of MEI-associated features. Several run modes were developed to perform MEI discovery on local and cloud systems. In addition to using MELT to discover MEIs in modern humans as part of the 1000 Genomes Project, we also used it to discover MEIs in chimpanzees and ancient (Neanderthal and Denisovan) hominids. We detected diverse patterns of MEI stratification across these populations that likely were caused by (1) diverse rates of MEI production from source elements, (2) diverse patterns of MEI inheritance, and (3) the introgression of ancient MEIs into modern human genomes. Overall, our study provides the most comprehensive map of MEIs to date spanning chimpanzees, ancient hominids, and modern humans and reveals new aspects of MEI biology in these lineages. We also demonstrate that MELT is a robust platform for MEI discovery and analysis in a variety of experimental settings.
Asunto(s)
Biología Computacional/métodos , Elementos Transponibles de ADN , Hombre de Neandertal/genética , Pan troglodytes/genética , Animales , Bases de Datos Genéticas , Evolución Molecular , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Polimorfismo de Nucleótido Simple , Programas Informáticos , Secuenciación Completa del Genoma/métodosRESUMEN
Although human LINE-1 (L1) elements are actively mobilized in many cancers, a role for somatic L1 retrotransposition in tumor initiation has not been conclusively demonstrated. Here, we identify a novel somatic L1 insertion in the APC tumor suppressor gene that provided us with a unique opportunity to determine whether such insertions can actually initiate colorectal cancer (CRC), and if so, how this might occur. Our data support a model whereby a hot L1 source element on Chromosome 17 of the patient's genome evaded somatic repression in normal colon tissues and thereby initiated CRC by mutating the APC gene. This insertion worked together with a point mutation in the second APC allele to initiate tumorigenesis through the classic two-hit CRC pathway. We also show that L1 source profiles vary considerably depending on the ancestry of an individual, and that population-specific hot L1 elements represent a novel form of cancer risk.
Asunto(s)
Adenocarcinoma/genética , Neoplasias Colorrectales/genética , Regulación Neoplásica de la Expresión Génica , Mutagénesis Insercional , Retroelementos/genética , Proteína de la Poliposis Adenomatosa del Colon/genética , Carcinogénesis/genética , Análisis Mutacional de ADN , Femenino , Silenciador del Gen , Humanos , Inestabilidad de Microsatélites , Persona de Mediana EdadRESUMEN
RATIONALE: The clinical features of patients infected with pulmonary nontuberculous mycobacteria (PNTM) are well described, but the genetic components of infection susceptibility are not. OBJECTIVES: To examine genetic variants in patients with PNTM, their unaffected family members, and a control group. METHODS: Whole-exome sequencing was done on 69 white patients with PNTM and 18 of their white unaffected family members. We performed a candidate gene analysis using immune, cystic fibrosis transmembrance conductance regulator (CFTR), cilia, and connective tissue gene sets. The numbers of patients, family members, and control subjects with variants in each category were compared, as was the average number of variants per person. MEASUREMENTS AND MAIN RESULTS: A significantly higher number of patients with PNTM than the other subjects had low-frequency, protein-affecting variants in immune, CFTR, cilia, and connective tissue categories (35, 26, 90, and 90%, respectively). Patients with PNTM also had significantly more cilia and connective tissue variants per person than did control subjects (2.47 and 2.55 compared with 1.38 and 1.40, respectively; P = 1.4 × 10(-6) and P = 2.7 × 10(-8), respectively). Patients with PNTM had an average of 5.26 variants across all categories (1.98 in control subjects; P = 2.8 × 10(-17)), and they were more likely than control subjects to have variants in multiple categories. We observed similar results for family members without PNTM infection, with the exception of the immune category. CONCLUSIONS: Patients with PNTM have more low-frequency, protein-affecting variants in immune, CFTR, cilia, and connective tissue genes than their unaffected family members and control subjects. We propose that PNTM infection is a multigenic disease in which combinations of variants across gene categories, plus environmental exposures, increase susceptibility to the infection.
Asunto(s)
Cilios/genética , Tejido Conectivo , Regulador de Conductancia de Transmembrana de Fibrosis Quística/genética , Inmunidad/genética , Infecciones por Mycobacterium no Tuberculosas/genética , Tuberculosis Pulmonar/genética , Adulto , Anciano , Anciano de 80 o más Años , Estudios de Casos y Controles , Causalidad , Estudios de Cohortes , Exoma , Familia , Femenino , Predisposición Genética a la Enfermedad , Variación Genética , Humanos , Masculino , Persona de Mediana Edad , Análisis de Componente Principal , Análisis de Secuencia de ADNRESUMEN
Human genetic variation is expected to play a central role in personalized medicine. Yet only a fraction of the natural genetic variation that is harbored by humans has been discovered to date. Here we report almost 2 million small insertions and deletions (INDELs) that range from 1 bp to 10,000 bp in length in the genomes of 79 diverse humans. These variants include 819,363 small INDELs that map to human genes. Small INDELs frequently were found in the coding exons of these genes, and several lines of evidence indicate that such variation is a major determinant of human biological diversity. Microarray-based genotyping experiments revealed several interesting observations regarding the population genetics of small INDEL variation. For example, we found that many of our INDELs had high levels of linkage disequilibrium (LD) with both HapMap SNPs and with high-scoring SNPs from genome-wide association studies. Overall, our study indicates that small INDEL variation is likely to be a key factor underlying inherited traits and diseases in humans.
Asunto(s)
Variación Genética , Genoma Humano/genética , Mutación INDEL/genética , Genómica/métodos , Genotipo , Humanos , Análisis por Micromatrices , Medicina de Precisión/métodosRESUMEN
Three mobile element classes, namely Alu, LINE-1 (L1), and SVA elements, remain actively mobile in human genomes and continue to produce new mobile element insertions (MEIs). Historically, MEIs have been discovered and studied using several methods, including: (1) Southern blots, (2) PCR (including PCR display), and (3) the detection of MEI copies from young subfamilies. We are now entering a new phase of MEI discovery where these methods are being replaced by whole genome sequencing and bioinformatics analysis to discover novel MEIs. We expect that the universe of sequenced human genomes will continue to expand rapidly over the next several years, both with short-read and long-read technologies. These resources will provide unprecedented opportunities to discover MEIs and study their impact on human traits and diseases. They also will allow the MEI community to discover and study the source elements that produce these new MEIs, which will facilitate our ability to study source element regulation in various tissue contexts and disease states. This, in turn, will allow us to better understand MEI mutagenesis in humans and the impact of this mutagenesis on human biology.
Asunto(s)
Genoma Humano , Hominidae , Animales , Humanos , Biología Computacional/métodos , Secuenciación Completa del Genoma , Elementos de Nucleótido Esparcido LargoRESUMEN
Nuclear localization signals (NLSs) are amino acid sequences that target cargo proteins into the nucleus. Rigorous characterization of NLS motifs is essential to understanding and predicting pathways for nuclear import. The best-characterized NLS is the classical NLS (cNLS), which is recognized by the cNLS receptor, importin-alpha. cNLSs are conventionally defined as having one (monopartite) or two clusters of basic amino acids separated by a 9-12 aa linker (bipartite). Motivated by the finding that Ty1 integrase, which contains an unconventional putative bipartite cNLS with a 29 aa linker, exploits the classical nuclear import machinery, we assessed the functional boundaries for linker length within a bipartite cNLS. We confirmed that the integrase cNLS is a bona fide bipartite cNLS, then carried out a systematic analysis of linker length in an obligate bipartite cNLS cargo, which revealed that some linkers longer than conventionally defined can function in nuclear import. Linker function is dependent on the sequence and likely the inherent flexibility of the linker. Subsequently, we interrogated the Saccharomyces cerevisiae proteome to identify cellular proteins containing putative long bipartite cNLSs. We experimentally confirmed that Rrp4 contains a bipartite cNLS with a 25 aa linker. Our studies show that the traditional definition of bipartite cNLSs is too restrictive and linker length can vary depending on amino acid composition.
Asunto(s)
Señales de Localización Nuclear/metabolismo , Proteoma , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo , Transporte Activo de Núcleo Celular , Secuencia de Aminoácidos , Integrasas/metabolismo , Carioferinas/metabolismo , Datos de Secuencia MolecularRESUMEN
In this review, we focus on progress that has been made with detecting small insertions and deletions (INDELs) in human genomes. Over the past decade, several million small INDELs have been discovered in human populations and personal genomes. The amount of genetic variation that is caused by these small INDELs is substantial. The number of INDELs in human genomes is second only to the number of single nucleotide polymorphisms (SNPs), and, in terms of base pairs of variation, INDELs cause similar levels of variation as SNPs. Many of these INDELs map to functionally important sites within human genes, and thus, are likely to influence human traits and diseases. Therefore, small INDEL variation will play a prominent role in personalized medicine.
Asunto(s)
Variación Genética/genética , Genoma Humano/genética , Mutagénesis Insercional/genética , Eliminación de Secuencia/genética , Humanos , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
Complex structural variants (CSVs) are genomic alterations that have more than two breakpoints and are considered as the simultaneous occurrence of simple structural variants. However, detecting the compounded mutational signals of CSVs is challenging through a commonly used model-match strategy. As a result, there has been limited progress for CSV discovery compared with simple structural variants. Here, we systematically analyzed the multi-breakpoint connection feature of CSVs, and proposed Mako, utilizing a bottom-up guided model-free strategy, to detect CSVs from paired-end short-read sequencing. Specifically, we implemented a graph-based pattern growth approach, where the graph depicts potential breakpoint connections, and pattern growth enables CSV detection without pre-defined models. Comprehensive evaluations on both simulated and real datasets revealed that Mako outperformed other algorithms. Notably, validation rates of CSVs on real data based on experimental and computational validations as well as manual inspections are around 70%, where the medians of experimental and computational breakpoint shift are 13 bp and 26 bp, respectively. Moreover, the Mako CSV subgraph effectively characterized the breakpoint connections of a CSV event and uncovered a total of 15 CSV types, including two novel types of adjacent segment swap and tandem dispersed duplication. Further analysis of these CSVs also revealed the impact of sequence homology on the formation of CSVs. Mako is publicly available at https://github.com/xjtu-omics/Mako.
Asunto(s)
Algoritmos , Genómica , Genoma , Secuenciación de Nucleótidos de Alto Rendimiento , Mutación , Análisis de Secuencia de ADNRESUMEN
Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing1,2 with continuous long-read or high-fidelity3 sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value > 40) and highly contiguous (contig N50 > 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms.
Asunto(s)
Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Padres , Análisis de Secuencia de ADN/métodos , Análisis de la Célula Individual/métodos , Algoritmos , Haplotipos , Humanos , Puerto Rico/etnologíaRESUMEN
Although a large proportion (44%) of the human genome is occupied by transposons and transposon-like repetitive elements, only a small proportion (<0.05%) of these elements remain active today. Recent evidence indicates that approximately 35-40 subfamilies of Alu, L1 and SVA elements (and possibly HERV-K elements) remain actively mobile in the human genome. These active transposons are of great interest because they continue to produce genetic diversity in human populations and also cause human diseases by integrating into genes. In this review, we examine these active human transposons and explore mechanistic factors that influence their mobilization.
Asunto(s)
Elementos Transponibles de ADN , Genoma Humano , Emparejamiento Base , Secuencia de Bases , Humanos , Datos de Secuencia Molecular , Conformación de Ácido Nucleico , Secuencias Repetitivas de Ácidos NucleicosRESUMEN
Like its retroviral relatives, the long terminal repeat retrotransposon Ty1 in the yeast Saccharomyces cerevisiae must traverse a permanently intact nuclear membrane for successful transposition and replication. For retrotransposition to occur, at least a subset of Ty1 proteins, including the Ty1 integrase, must enter the nucleus. Nuclear localization of integrase is dependent upon a C-terminal nuclear targeting sequence. However, the nuclear import machinery that recognizes this nuclear targeting signal has not been defined. We investigated the mechanism by which Ty1 integrase gains access to nuclear DNA as a model for how other retroelements, including retroviruses like HIV, may utilize cellular nuclear transport machinery to import their essential nuclear proteins. We show that Ty1 retrotransposition is significantly impaired in yeast mutants that alter the classical nuclear protein import pathway, including the Ran-GTPase, and the dimeric import receptor, importin-alpha/beta. Although Ty1 proteins are made and processed in these mutant cells, our studies reveal that an integrase reporter is not properly targeted to the nucleus in cells carrying mutations in the classical nuclear import machinery. Furthermore, we demonstrate that integrase coimmunoprecipitates with the importin-alpha transport receptor and directly binds to importin-alpha. Taken together, these data suggest Ty1 integrase can employ the classical nuclear protein transport machinery to enter the nucleus.
Asunto(s)
Núcleo Celular/metabolismo , Integrasas/metabolismo , Proteínas Nucleares/metabolismo , Retroelementos , Proteínas de Saccharomyces cerevisiae/metabolismo , Transporte Activo de Núcleo Celular , Núcleo Celular/química , Citoplasma/química , Proteínas Fluorescentes Verdes/análisis , Integrasas/química , Mutación , Señales de Localización Nuclear , Saccharomyces cerevisiae/enzimología , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , alfa Carioferinas/genética , alfa Carioferinas/metabolismo , Proteína de Unión al GTP ran/genética , Proteína de Unión al GTP ran/metabolismoRESUMEN
Retrotransposable elements (RTEs) have actively multiplied over the past 80 million years of primate evolution, and as a consequence, such elements collectively occupy â¼ 40% of the human genome. As RTE activity can have detrimental effects on the human genome and transcriptome, silencing mechanisms have evolved to restrict retrotransposition. The brain is the only known somatic tissue where RTEs are de-repressed throughout the life of a healthy human and each neuron in specific brain regions accumulates up to â¼13.7 new somatic L1 insertions (and perhaps more). However, even higher levels of somatic RTE expression and retrotransposition have been found in a number of human neurological disorders. This review is focused on how RTE expression and retrotransposition in neuronal tissues might contribute to the initiation and progression of these disorders. These disorders are discussed in three broad and sometimes overlapping categories: 1) disorders such as Rett syndrome, Aicardi-Goutières syndrome, and ataxia-telangiectasia, where expression/retrotransposition is increased due to mutations in genes that play a role in regulating RTEs in healthy cells, 2) disorders such as autism spectrum disorder, schizophrenia, and substance abuse disorders, which are thought to be caused by a combination of genetic and environmental stress factors, and 3) disorders associated with age, such as frontotemporal lobar degeneration (FTLD), amyotrophic lateral sclerosis (ALS), and normal aging, where there is a time-dependent accumulation of neurological degeneration, RTE copy number, and phenotypes. Research has revealed increased levels of RTE activity in many neurological disorders, but in most cases, a clear causal link between RTE activity and these disorders has not been well established. At the same time, even if increased RTE activity is a passenger and not a driver of disease, a detrimental effect is more likely than a beneficial one. Thus, a better understanding of the role of RTEs in neuronal tissues likely is an important part of understanding, preventing, and treating these disorders.
RESUMEN
Glioma is a unique neoplastic disease that develops exclusively in the central nervous system (CNS) and rarely metastasizes to other tissues. This feature strongly implicates the tumor-host CNS microenvironment in gliomagenesis and tumor progression. We investigated the differences and similarities in glioma biology as conveyed by transcriptomic patterns across four mammalian hosts: rats, mice, dogs, and humans. Given the inherent intra-tumoral molecular heterogeneity of human glioma, we focused this study on tumors with upregulation of the platelet-derived growth factor signaling axis, a common and early alteration in human gliomagenesis. The results reveal core neoplastic alterations in mammalian glioma, as well as unique contributions of the tumor host to neoplastic processes. Notable differences were observed in gene expression patterns as well as related biological pathways and cell populations known to mediate key elements of glioma biology, including angiogenesis, immune evasion, and brain invasion. These data provide new insights regarding mammalian models of human glioma, and how these insights and models relate to our current understanding of the human disease.
Asunto(s)
Neoplasias Encefálicas/genética , Transformación Celular Neoplásica/genética , Perfilación de la Expresión Génica , Glioma/genética , Transcriptoma , Animales , Neoplasias Encefálicas/patología , Biología Computacional/métodos , Perros , Regulación Neoplásica de la Expresión Génica , Glioma/patología , Ratones , Ratas , Reproducibilidad de los Resultados , Especificidad de la EspecieRESUMEN
The human LINE-1 (or L1) element is a non-LTR retrotransposon that is mobilized through an RNA intermediate by an L1-encoded reverse transcriptase and other L1-encoded proteins. L1 elements remain actively mobile today and continue to mutagenize human genomes. Importantly, when new insertions disrupt gene function, they can cause diseases. Historically, L1s were thought to be active in the germline but silenced in adult somatic tissues. However, recent studies now show that L1 is active in at least some somatic tissues, including epithelial cancers. In this review, we provide an overview of these recent developments, and examine evidence that somatic L1 retrotransposition can initiate and drive tumorigenesis in humans. Recent studies have: (i) cataloged somatic L1 activity in many epithelial tumor types; (ii) identified specific full-length L1 source elements that give rise to somatic L1 insertions; and (iii) determined that L1 promoter hypomethylation likely plays an early role in the derepression of L1s in somatic tissues. A central challenge moving forward is to determine the extent to which L1 driver mutations can promote tumor initiation, evolution, and metastasis in humans.
Asunto(s)
Carcinogénesis , Elementos de Nucleótido Esparcido Largo , Neoplasias/fisiopatología , Retroelementos , Humanos , Recombinación GenéticaRESUMEN
An international effort is underway to generate a comprehensive haplotype map (HapMap) of the human genome represented by an estimated 300,000 to 1 million 'tag' single nucleotide polymorphisms (SNPs). Our analysis indicates that the current human SNP map is not sufficiently dense to support the HapMap project. For example, 24.6% of the genome currently lacks SNPs at the minimal density and spacing that would be required to construct even a conservative tag SNP map containing 300,000 SNPs. In an effort to improve the human SNP map, we identified 140,696 additional SNP candidates using a new bioinformatics pipeline. Over 51,000 of these SNPs mapped to the largest gaps in the human SNP map, leading to significant improvements in these regions. Our SNPs will be immediately useful for the HapMap project, and will allow for the inclusion of many additional genomic intervals in the final HapMap. Nevertheless, our results also indicate that additional SNP discovery projects will be required both to define the haplotype architecture of the human genome and to construct comprehensive tag SNP maps that will be useful for genetic linkage studies in humans.
Asunto(s)
Mapeo Cromosómico/métodos , Genoma Humano , Polimorfismo de Nucleótido Simple/genética , Secuencia de Bases , ADN/química , ADN/genética , Bases de Datos de Ácidos Nucleicos , Haplotipos , Humanos , Reacción en Cadena de la Polimerasa , Análisis de Secuencia de ADNRESUMEN
Transposons and transposon-like repetitive elements collectively occupy 44% of the human genome sequence. In an effort to measure the levels of genetic variation that are caused by human transposons, we have developed a new method to broadly detect transposon insertion polymorphisms of all kinds in humans. We began by identifying 606,093 insertion and deletion (indel) polymorphisms in the genomes of diverse humans. We then screened these polymorphisms to detect indels that were caused by de novo transposon insertions. Our method was highly efficient and led to the identification of 605 nonredundant transposon insertion polymorphisms in 36 diverse humans. We estimate that this represents 25-35% of approximately 2075 common transposon polymorphisms in human populations. Because we identified all transposon insertion polymorphisms with a single method, we could evaluate the relative levels of variation that were caused by each transposon class. The average human in our study was estimated to harbor 1283 Alu insertion polymorphisms, 180 L1 polymorphisms, 56 SVA polymorphisms, and 17 polymorphisms related to other forms of mobilized DNA. Overall, our study provides significant steps toward (i) measuring the genetic variation that is caused by transposon insertions in humans and (ii) identifying the transposon copies that produce this variation.