RESUMO
The combined analysis of haplotype panels with phenotype clinical cohorts is a common approach to explore the genetic architecture of human diseases. However, genetic studies are mainly based on single nucleotide variants (SNVs) and small insertions and deletions (indels). Here, we contribute to fill this gap by generating a dense haplotype map focused on the identification, characterization, and phasing of structural variants (SVs). By integrating multiple variant identification methods and Logistic Regression Models (LRMs), we present a catalogue of 35 431 441 variants, including 89 178 SVs (≥50 bp), 30 325 064 SNVs and 5 017 199 indels, across 785 Illumina high coverage (30x) whole-genomes from the Iberian GCAT Cohort, containing a median of 3.52M SNVs, 606 336 indels and 6393 SVs per individual. The haplotype panel is able to impute up to 14 360 728 SNVs/indels and 23 179 SVs, showing a 2.7-fold increase for SVs compared with available genetic variation panels. The value of this panel for SVs analysis is shown through an imputed rare Alu element located in a new locus associated with Mononeuritis of lower limb, a rare neuromuscular disease. This study represents the first deep characterization of genetic variation within the Iberian population and the first operational haplotype panel to systematically include the SVs into genome-wide genetic studies.
Assuntos
Genoma Humano , Haplótipos , Mutação INDEL , Aciltransferases , Europa (Continente) , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Lipase , Polimorfismo de Nucleotídeo Único , Sequenciamento Completo do Genoma/métodosRESUMO
Despite the interest in characterizing genomic variation, the presence of large repeats at the breakpoints hinders the analysis of many structural variants. This is especially problematic for inversions, since there is typically no gain or loss of DNA. Here, we tested novel linkage-based droplet digital PCR (ddPCR) assays to study 20 inversions ranging from 3.1 to 742 kb flanked by inverted repeats (IRs) up to 134 kb long. Of those, we validated 13 inversions predicted by different genome-wide techniques. In addition, we obtained new experimental human population information across 95 African, European, and East Asian individuals for 16 inversions, including four already validated variants without high-throughput genotyping methods. Through comparison with previous data, independent replicates and both inversion breakpoints, we demonstrate that the technique is highly accurate and reproducible. Most studied inversions are widespread across continents, and their frequency is negatively correlated with genetic length. Moreover, all except two show clear signs of being recurrent, and we could better define the factors affecting recurrence levels and estimate the inversion rate across the genome. Finally, the generated genotypes have allowed us to check inversion functional effects, validating gene expression differences reported before for two inversions and finding new candidate associations. Therefore, the developed methodology makes it possible to screen these and other complex genomic variants quickly in a large number of samples for the first time, highlighting the importance of direct genotyping to assess their potential consequences and clinical implications.
Assuntos
Inversão Cromossômica , Reação em Cadeia da Polimerase/métodos , Genoma Humano , Técnicas de Genotipagem , Humanos , Nucleotídeos/análiseRESUMO
Rhesus macaque is an Old World monkey that shared a common ancestor with human â¼25 Myr ago and is an important animal model for human disease studies. A deep understanding of its genetics is therefore required for both biomedical and evolutionary studies. Among structural variants, inversions represent a driving force in speciation and play an important role in disease predisposition. Here we generated a genome-wide map of inversions between human and macaque, combining single-cell strand sequencing with cytogenetics. We identified 375 total inversions between 859 bp and 92 Mbp, increasing by eightfold the number of previously reported inversions. Among these, 19 inversions flanked by segmental duplications overlap with recurrent copy number variants associated with neurocognitive disorders. Evolutionary analyses show that in 17 out of 19 cases, the Hominidae orientation of these disease-associated regions is always derived. This suggests that duplicated sequences likely played a fundamental role in generating inversions in humans and great apes, creating architectures that nowadays predispose these regions to disease-associated genetic instability. Finally, we identified 861 genes mapping at 156 inversions breakpoints, with some showing evidence of differential expression in human and macaque cell lines, thus highlighting candidates that might have contributed to the evolution of species-specific features. This study depicts the most accurate fine-scale map of inversions between human and macaque using a two-pronged integrative approach, such as single-cell strand sequencing and cytogenetics, and represents a valuable resource toward understanding of the biology and evolution of primate species.
Assuntos
Pontos de Quebra do Cromossomo , Inversão Cromossômica , Evolução Molecular , Macaca mulatta/genética , Animais , Doença/genética , Regulação da Expressão Gênica , Genoma , Genômica , Heterozigoto , Humanos , Hibridização in Situ Fluorescente , Recombinação Genética , Análise de Sequência de DNA , Análise de Célula ÚnicaRESUMO
The growing catalogue of structural variants in humans often overlooks inversions as one of the most difficult types of variation to study, even though they affect phenotypic traits in diverse organisms. Here, we have analysed in detail 90 inversions predicted from the comparison of two independently assembled human genomes: the reference genome (NCBI36/HG18) and HuRef. Surprisingly, we found that two thirds of these predictions (62) represent errors either in assembly comparison or in one of the assemblies, including 27 misassembled regions in HG18. Next, we validated 22 of the remaining 28 potential polymorphic inversions using different PCR techniques and characterized their breakpoints and ancestral state. In addition, we determined experimentally the derived allele frequency in Europeans for 17 inversions (DAF = 0.01-0.80), as well as the distribution in 14 worldwide populations for 12 of them based on the 1000 Genomes Project data. Among the validated inversions, nine have inverted repeats (IRs) at their breakpoints, and two show nucleotide variation patterns consistent with a recurrent origin. Conversely, inversions without IRs have a unique origin and almost all of them show deletions or insertions at the breakpoints in the derived allele mediated by microhomology sequences, which highlights the importance of mechanisms like FoSTeS/MMBIR in the generation of complex rearrangements in the human genome. Finally, we found several inversions located within genes and at least one candidate to be positively selected in Africa. Thus, our study emphasizes the importance of careful analysis and validation of large-scale genomic predictions to extract reliable biological conclusions.
Assuntos
Inversão Cromossômica/genética , Genoma Humano/genética , Anotação de Sequência Molecular , Inversão de Sequência/genética , Evolução Molecular , Humanos , Polimorfismo Genético , Seleção Genética/genética , Análise de Sequência de DNARESUMO
Despite many years of study into inversions, very little is known about their functional consequences, especially in humans. A common hypothesis is that the selective value of inversions stems in part from their effects on nearby genes, although evidence of this in natural populations is almost nonexistent. Here we present a global analysis of a new 415-kb polymorphic inversion that is among the longest ones found in humans and is the first with clear position effects. This inversion is located in chromosome 19 and has been generated by non-homologous end joining between blocks of transposable elements with low identity. PCR genotyping in 541 individuals from eight different human populations allowed the detection of tag SNPs and inversion genotyping in multiple populations worldwide, showing that the inverted allele is mainly found in East Asia with an average frequency of 4.7%. Interestingly, one of the breakpoints disrupts the transcription factor gene ZNF257, causing a significant reduction in the total expression level of this gene in lymphoblastoid cell lines. RNA-Seq analysis of the effects of this expression change in standard homozygotes and inversion heterozygotes revealed distinct expression patterns that were validated by quantitative RT-PCR. Moreover, we have found a new fusion transcript that is generated exclusively from inverted chromosomes around one of the breakpoints. Finally, by the analysis of the associated nucleotide variation, we have estimated that the inversion was generated ~40,000-50,000 years ago and, while a neutral evolution cannot be ruled out, its current frequencies are more consistent with those expected for a deleterious variant, although no significant association with phenotypic traits has been found so far.
Assuntos
Inversão Cromossômica/genética , Cromossomos Humanos Par 19/genética , Evolução Molecular , Fatores de Transcrição/genética , Pontos de Quebra do Cromossomo , Reparo do DNA por Junção de Extremidades/genética , Elementos de DNA Transponíveis/genética , Regulação da Expressão Gênica , Genética Populacional , Genótipo , Humanos , Polimorfismo de Nucleotídeo Único , Fatores de Transcrição/biossínteseRESUMO
The prevalence of asthma and obesity is increasing worldwide, and obesity is a well-documented risk factor for asthma. The mechanisms underlying this association and parallel time trends remain largely unknown but genetic factors may be involved. Here, we report on a common ~0.45 Mb genomic inversion at 16p11.2 that can be accurately genotyped via SNP array data. We show that the inversion allele protects against the joint occurrence of asthma and obesity in five large independent studies (combined sample size of 317 cases and 543 controls drawn from a total of 5,809 samples; combined OR = 0.48, p = 5.5 × 10(-6)). Allele frequencies show remarkable worldwide population stratification, ranging from 10% in East Africa to 49% in Northern Europe, consistent with discordant and extreme genetic drifts or adaptive selections after human migration out of Africa. Inversion alleles strongly correlate with expression levels of neighboring genes, especially TUFM (p = 3.0 × 10(-40)) that encodes a mitochondrial protein regulator of energy balance and inhibitor of type 1 interferon, and other candidates for asthma (IL27) and obesity (APOB48R and SH2B1). Therefore, by affecting gene expression, the ~0.45 Mb 16p11.2 inversion provides a genetic basis for the joint susceptibility to asthma and obesity, with a population attributable risk of 39.7%. Differential mitochondrial function and basal energy balance of inversion alleles might also underlie the potential selection signature that led to their uneven distribution in world populations.
Assuntos
Asma/genética , Predisposição Genética para Doença/genética , Obesidade/genética , Adulto , Algoritmos , Alelos , Inversão Cromossômica , Mapeamento Cromossômico , Cromossomos Humanos Par 16/genética , Estudos de Coortes , Feminino , Regulação da Expressão Gênica , Frequência do Gene , Genética Populacional , Genoma Humano , Genótipo , Haplótipos , Humanos , Masculino , Razão de Chances , Fenótipo , Polimorfismo de Nucleotídeo ÚnicoRESUMO
In recent years different types of structural variants (SVs) have been discovered in the human genome and their functional impact has become increasingly clear. Inversions, however, are poorly characterized and more difficult to study, especially those mediated by inverted repeats or segmental duplications. Here, we describe the results of a simple and fast inverse PCR (iPCR) protocol for high-throughput genotyping of a wide variety of inversions using a small amount of DNA. In particular, we analyzed 22 inversions predicted in humans ranging from 5.1 kb to 226 kb and mediated by inverted repeat sequences of 1.6-24 kb. First, we validated 17 of the 22 inversions in a panel of nine HapMap individuals from different populations, and we genotyped them in 68 additional individuals of European origin, with correct genetic transmission in â¼ 12 mother-father-child trios. Global inversion minor allele frequency varied between 1% and 49% and inversion genotypes were consistent with Hardy-Weinberg equilibrium. By analyzing the nucleotide variation and the haplotypes in these regions, we found that only four inversions have linked tag-SNPs and that in many cases there are multiple shared SNPs between standard and inverted chromosomes, suggesting an unexpected high degree of inversion recurrence during human evolution. iPCR was also used to check 16 of these inversions in four chimpanzees and two gorillas, and 10 showed both orientations either within or between species, providing additional support for their multiple origin. Finally, we have identified several inversions that include genes in the inverted or breakpoint regions, and at least one disrupts a potential coding gene. Thus, these results represent a significant advance in our understanding of inversion polymorphism in human populations and challenge the common view of a single origin of inversions, with important implications for inversion analysis in SNP-based studies.
Assuntos
Inversão Cromossômica/genética , Evolução Molecular , Sequências Repetidas Invertidas/genética , Duplicações Segmentares Genômicas/genética , Animais , Mapeamento Cromossômico , Genoma Humano , Projeto HapMap , Humanos , Pan troglodytes/genética , Polimorfismo GenéticoRESUMO
The newest genomic advances have uncovered an unprecedented degree of structural variation throughout genomes, with great amounts of data accumulating rapidly. Here we introduce InvFEST (http://invfestdb.uab.cat), a database combining multiple sources of information to generate a complete catalogue of non-redundant human polymorphic inversions. Due to the complexity of this type of changes and the underlying high false-positive discovery rate, it is necessary to integrate all the available data to get a reliable estimate of the real number of inversions. InvFEST automatically merges predictions into different inversions, refines the breakpoint locations, and finds associations with genes and segmental duplications. In addition, it includes data on experimental validation, population frequency, functional effects and evolutionary history. All this information is readily accessible through a complete and user-friendly web report for each inversion. In its current version, InvFEST combines information from 34 different studies and contains 1092 candidate inversions, which are categorized based on internal scores and manual curation. Therefore, InvFEST aims to represent the most reliable set of human inversions and become a central repository to share information, guide future studies and contribute to the analysis of the functional and evolutionary impact of inversions on the human genome.
Assuntos
Bases de Dados de Ácidos Nucleicos , Genoma Humano , Inversão de Sequência , Pontos de Quebra do Cromossomo , Inversão Cromossômica , Humanos , Internet , Polimorfismo Genético , Duplicações Segmentares Genômicas , Integração de SistemasRESUMO
BACKGROUND: Population genetics and association studies usually rely on a set of known variable sites that are then genotyped in subsequent samples, because it is easier to genotype than to discover the variation. This is also true for structural variation detected from sequence data. However, the genotypes at known variable sites can only be inferred with uncertainty from low coverage data. Thus, statistical approaches that infer genotype likelihoods, test hypotheses, and estimate population parameters without requiring accurate genotypes are becoming popular. Unfortunately, the current implementations of these methods are intended to analyse only single nucleotide and short indel variation, and they usually assume that the two alleles in a heterozygous individual are sampled with equal probability. This is generally false for structural variants detected with paired ends or split reads. Therefore, the population genetics of structural variants cannot be studied, unless a painstaking and potentially biased genotyping is performed first. RESULTS: We present svgem, an expectation-maximization implementation to estimate allele and genotype frequencies, calculate genotype posterior probabilities, and test for Hardy-Weinberg equilibrium and for population differences, from the numbers of times the alleles are observed in each individual. Although applicable to single nucleotide variation, it aims at bi-allelic structural variation of any type, observed by either split reads or paired ends, with arbitrarily high allele sampling bias. We test svgem with simulated and real data from the 1000 Genomes Project. CONCLUSIONS: svgem makes it possible to use low-coverage sequencing data to study the population distribution of structural variants without having to know their genotypes. Furthermore, this advance allows the combined analysis of structural and nucleotide variation within the same genotype-free statistical framework, thus preventing biases introduced by genotype imputation.
Assuntos
Algoritmos , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Alelos , Genética Populacional , Genoma , Genótipo , Humanos , Funções Verossimilhança , Polimorfismo GenéticoRESUMO
Copy-number variants (CNVs) are genome-wide structural variations involving the duplication or deletion of large nucleotide sequences. While these types of variations can be commonly found in humans, large and rare CNVs are known to contribute to the development of various neurodevelopmental disorders (NDDs), including autism spectrum disorder (ASD). Nevertheless, given that these NDD-risk CNVs cover broad regions of the genome, it is particularly challenging to pinpoint the critical gene(s) responsible for the manifestation of the phenotype. In this study, we performed a meta-analysis of CNV data from 11,614 affected individuals with NDDs and 4,031 control individuals from SFARI database to identify 41 NDD-risk CNV loci, including 24 novel regions. We also found evidence for dosage-sensitive genes within these regions being significantly enriched for known NDD-risk genes and pathways. In addition, a significant proportion of these genes was found to (1) converge in protein-protein interaction networks, (2) be among most expressed genes in the brain across all developmental stages, and (3) be hit by deletions that are significantly over-transmitted to individuals with ASD within multiplex ASD families from the iHART cohort. Finally, we conducted a burden analysis using 4,281 NDD cases from Decipher and iHART cohorts, and 2,504 neurotypical control individuals from 1000 Genomes and iHART, which resulted in the validation of the association of 162 dosage-sensitive genes driving risk for NDDs, including 22 novel NDD-risk genes. Importantly, most NDD-risk CNV loci entail multiple NDD-risk genes in agreement with a polygenic model associated with the majority of NDD cases.
Assuntos
Transtorno do Espectro Autista , Variações do Número de Cópias de DNA , Predisposição Genética para Doença , Transtornos do Neurodesenvolvimento , Humanos , Variações do Número de Cópias de DNA/genética , Transtornos do Neurodesenvolvimento/genética , Transtorno do Espectro Autista/genética , Estudo de Associação Genômica Ampla , Mapas de Interação de Proteínas/genéticaRESUMO
The THBS4 gene encodes a glycoprotein involved in inflammatory responses and synaptogenesis. THBS4 is expressed at higher levels in the brain of humans compared with nonhuman primates, and the protein accumulates in ß-amyloid plaques. We analyzed THBS4 genetic variability in humans and show that two haplotypes (hap1 and hap2) are maintained by balancing selection and modulate THBS4 expression in lymphocytes. Indeed, the balancing selection region covers a predicted transcriptional enhancer. In humans, but not in macaques and chimpanzees, THBS4 brain expression increases with age, and variants in the balancing selection region interact with sex in influencing THBS4 expression (pinteraction = 0.038), with hap1 homozygous females showing lowest expression. In Alzheimer disease (AD) patients, significant interactions between sex and THBS4 genotype were detected for peripheral gray matter (pinteraction = 0.014) and total gray matter (pinteraction = 0.012) volumes. Similarly to the gene expression results, the interaction is mainly mediated by hap1 homozygous AD females, who show reduced volumes. Thus, the balancing selection target in THBS4 is likely represented by one or more variants that regulate tissue-specific and sex-specific gene expression. The selection signature associated with THBS4 might not be related to AD pathogenesis, but rather to inflammatory responses.
Assuntos
Doença de Alzheimer/genética , Encéfalo/metabolismo , Fatores Sexuais , Trombospondinas/genética , Doença de Alzheimer/patologia , Animais , Sequência de Bases , Encéfalo/patologia , Feminino , Genética Populacional , Haplótipos , Humanos , Masculino , Homologia de Sequência do Ácido NucleicoRESUMO
BACKGROUND: There is increasing evidence of the importance of copy number variants (CNV) in genetic diversity among individuals and populations, as well as in some common genetic diseases. We previously characterized a common 32-kb insertion/deletion variant of the PSORS4 locus at chromosome 1q21 that harbours the LCE3C and LCE3B genes. This variant allele (LCE3C_LCE3B-del) is common in patients with psoriasis and other autoimmune disorders from certain ethnic groups. RESULTS: Using array-CGH (Agilent 244 K) in samples from the HapMap and Human Genome Diversity Panel (HGDP) collections, we identified 54 regions showing population differences in comparison to Africans. We provided here a comprehensive population-genetic analysis of one of these regions, which involves the 32-kb deletion of the PSORS4 locus. By a PCR-based genotyping assay we characterised the profiles of the LCE3C_LCE3B-del and the linkage disequilibrium (LD) pattern between the variant allele and the tag SNP rs4112788. Our results show that most populations tend to have a higher frequency of the deleted allele than Sub-Saharan Africans. Furthermore, we found strong LD between rs4112788G and LCE3C_LCE3B-del in most non-African populations (r2 >0.8), in contrast to the low concordance between loci (r2 <0.3) in the African populations. CONCLUSIONS: These results are another example of population variability in terms of biomedical interesting CNV. The frequency distribution of the LCE3C_LCE3B-del allele and the LD pattern across populations suggest that the differences between ethnic groups might not be due to natural selection, but the consequence of genetic drift caused by the strong bottleneck that occurred during "out of Africa" expansion.
Assuntos
Variações do Número de Cópias de DNA , Genética Populacional , Psoríase/genética , Deleção de Sequência , Alelos , Doenças Autoimunes/genética , Cromossomos Humanos Par 1 , Hibridização Genômica Comparativa , Frequência do Gene , Técnicas de Genotipagem , Projeto HapMap , Projeto Genoma Humano , Humanos , Mutação INDEL , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único , Grupos Raciais/genéticaRESUMO
BACKGROUND: The only known albino gorilla, named Snowflake, was a male wild born individual from Equatorial Guinea who lived at the Barcelona Zoo for almost 40 years. He was diagnosed with non-syndromic oculocutaneous albinism, i.e. white hair, light eyes, pink skin, photophobia and reduced visual acuity. Despite previous efforts to explain the genetic cause, this is still unknown. Here, we study the genetic cause of his albinism and making use of whole genome sequencing data we find a higher inbreeding coefficient compared to other gorillas. RESULTS: We successfully identified the causal genetic variant for Snowflake's albinism, a non-synonymous single nucleotide variant located in a transmembrane region of SLC45A2. This transporter is known to be involved in oculocutaneous albinism type 4 (OCA4) in humans. We provide experimental evidence that shows that this amino acid replacement alters the membrane spanning capability of this transmembrane region. Finally, we provide a comprehensive study of genome-wide patterns of autozygogosity revealing that Snowflake's parents were related, being this the first report of inbreeding in a wild born Western lowland gorilla. CONCLUSIONS: In this study we demonstrate how the use of whole genome sequencing can be extended to link genotype and phenotype in non-model organisms and it can be a powerful tool in conservation genetics (e.g., inbreeding and genetic diversity) with the expected decrease in sequencing cost.
Assuntos
Genômica , Gorilla gorilla/genética , Sequenciamento de Nucleotídeos em Larga Escala , Endogamia , Sequência de Aminoácidos , Animais , Feminino , Heterozigoto , Masculino , Proteínas de Membrana Transportadoras/química , Proteínas de Membrana Transportadoras/genética , Repetições de Microssatélites/genética , Dados de Sequência Molecular , Mutação , Análise de Sequência de DNARESUMO
Descriptions of genes that are adaptively evolving in humans and that carry polymorphisms with an effect on cognitive performances have been virtually absent. SNAP25 encodes a presynaptic protein with a role in regulation of neurotransmitter release. We analysed the intra-specific diversity along SNAP25 and identified a region in intron 1 that shows signatures of balancing selection in humans. The estimated TMRCA (time to the most recent common ancestor) of the SNAP25 haplotype phylogeny amounted to 2.08 million years. The balancing selection signature is not secondary to demographic events or to biased gene conversion, and encompasses rs363039. This SNP has previously been associated to cognitive performances with contrasting results in different populations. We analysed this variant in two Italian cohorts in different age ranges and observed a significant genotype effect for rs363039 on verbal performances in females alone. Post hoc analysis revealed that the effect is driven by differences between heterozygotes and both homozygous genotypes. Thus, heterozygote females for rs363039 display higher verbal performances compared to both homozygotes. This finding was replicated in a cohort of Italian subjects suffering from neuromuscular diseases that do not affect cognition. Heterozygote advantage is one of the possible reasons underlying the maintenance of genetic diversity in natural populations. The observation that heterozygotes for rs363039 display higher verbal abilities compared to homozygotes perfectly fits the underlying balancing selection model. Although caution should be used in inferring selective pressures from observed signatures, SNAP25 might represent the first description of an adaptively evolving gene with a role in cognition.
Assuntos
Inteligência/genética , Seleção Genética , Proteína 25 Associada a Sinaptossoma/genética , Mulheres/psicologia , Criança , Pré-Escolar , Cognição , Estudos de Coortes , Evolução Molecular , Feminino , Genética Populacional , Genótipo , Humanos , Itália , Masculino , Polimorfismo de Nucleotídeo Único , VocabulárioRESUMO
BACKGROUND: Polymorphic inversions are a source of genetic variability with a direct impact on recombination frequencies. Given the difficulty of their experimental study, computational methods have been developed to infer their existence in a large number of individuals using genome-wide data of nucleotide variation. Methods based on haplotype tagging of known inversions attempt to classify individuals as having a normal or inverted allele. Other methods that measure differences between linkage disequilibrium attempt to identify regions with inversions but unable to classify subjects accurately, an essential requirement for association studies. RESULTS: We present a novel method to both identify polymorphic inversions from genome-wide genotype data and classify individuals as containing a normal or inverted allele. Our method, a generalization of a published method for haplotype data 1, utilizes linkage between groups of SNPs to partition a set of individuals into normal and inverted subpopulations. We employ a sliding window scan to identify regions likely to have an inversion, and accumulation of evidence from neighboring SNPs is used to accurately determine the inversion status of each subject. Further, our approach detects inversions directly from genotype data, thus increasing its usability to current genome-wide association studies (GWAS). CONCLUSIONS: We demonstrate the accuracy of our method to detect inversions and classify individuals on principled-simulated genotypes, produced by the evolution of an inversion event within a coalescent model 2. We applied our method to real genotype data from HapMap Phase III to characterize the inversion status of two known inversions within the regions 17q21 and 8p23 across 1184 individuals. Finally, we scan the full genomes of the European Origin (CEU) and Yoruba (YRI) HapMap samples. We find population-based evidence for 9 out of 15 well-established autosomic inversions, and for 52 regions previously predicted by independent experimental methods in ten (9+1) individuals 34. We provide efficient implementations of both genotype and haplotype methods as a unified R package inveRsion.
Assuntos
Inversão Cromossômica , Cromossomos Humanos Par 17 , Genética Populacional , Estudo de Associação Genômica Ampla , Projeto HapMap , Humanos , Desequilíbrio de Ligação , Modelos Genéticos , Análise de Sequência com Séries de Oligonucleotídeos , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Supergenes are involved in adaptation in multiple organisms, but they are little known in humans. Genomic inversions are the most common mechanism of supergene generation and maintenance. Here, we review the information about two large inversions that are the best examples of potential human supergenes. In addition, we do an integrative analysis of the newest data to understand better their functional effects and underlying genetic changes. We have found that the highly divergent haplotypes of the 17q21.31 inversion of approximately 1.5 Mb have multiple phenotypic associations, with consistent effects in brain-related traits, red and white blood cells, lung function, male and female characteristics and disease risk. By combining gene expression and nucleotide variation data, we also analysed the molecular differences between haplotypes, including gene duplications, amino acid substitutions and regulatory changes, and identify CRHR1, KANLS1 and MAPT as good candidates to be responsible for these phenotypes. The situation is more complex for the 8p23.1 inversion, where there is no clear genetic differentiation. However, the inversion is associated with several related phenotypes and gene expression differences that could be linked to haplotypes specific of one orientation. Our work, therefore, contributes to the characterization of both exceptional variants and illustrates the important role of inversions. This article is part of the theme issue 'Genomic architecture of supergenes: causes and evolutionary consequences'.
Assuntos
Inversão Cromossômica , Polimorfismo Genético , Feminino , Genômica , Haplótipos , Humanos , Masculino , FenótipoRESUMO
The role of miRNAs in regulating megakaryocyte differentiation was examined using bipotent K562 human leukemia cells. miR-34a is strongly up-regulated during phorbol ester-induced megakaryocyte differentiation, but not during hemin-induced erythrocyte differentiation. Enforced expression of miR-34a in K562 cells inhibits cell proliferation, induces cell-cycle arrest in G(1) phase, and promotes megakaryocyte differentiation as measured by CD41 induction. miR-34a expression is also up-regulated during thrombopoietin-induced differentiation of CD34(+) hematopoietic precursors, and its enforced expression in these cells significantly increases the number of megakaryocyte colonies. miR-34a directly regulates expression of MYB, facilitating megakaryocyte differentiation, and of CDK4 and CDK6, to inhibit the G(1)/S transition. However, these miR-34a target genes are down-regulated rapidly after inducing megakaryocyte differentiation before miR-34a is induced. This suggests that miR-34a is not responsible for the initial down-regulation but may contribute to maintaining their suppression later on. Previous studies have implicated miR-34a as a tumor suppressor gene whose transcription is activated by p53. However, in p53-null K562 cells, phorbol esters induce miR-34a expression independently of p53 by activating an alternative phorbol ester-responsive promoter to produce a longer pri-miR-34a transcript.
Assuntos
Diferenciação Celular/fisiologia , Fase G1/fisiologia , Megacariócitos/metabolismo , MicroRNAs/biossíntese , Proteína Supressora de Tumor p53/metabolismo , Regulação para Cima/fisiologia , Antígenos CD34 , Carcinógenos/farmacologia , Diferenciação Celular/efeitos dos fármacos , Fase G1/efeitos dos fármacos , Células-Tronco Hematopoéticas/citologia , Células-Tronco Hematopoéticas/metabolismo , Humanos , Células K562 , Megacariócitos/citologia , MicroRNAs/genética , Ésteres de Forbol/farmacologia , Glicoproteína IIb da Membrana de Plaquetas/biossíntese , Regiões Promotoras Genéticas/fisiologia , Proteínas Proto-Oncogênicas c-myb/metabolismo , Trombopoetina/farmacologia , Proteína Supressora de Tumor p53/genética , Regulação para Cima/efeitos dos fármacosRESUMO
Chromosomal inversions have an important role in evolution, and an increasing number of inversion polymorphisms are being identified in the human population. The evolutionary history of these inversions and the mechanisms by which they arise are therefore of significant interest. Previously, a polymorphic inversion on human chromosome Xq28 that includes the FLNA and EMD loci was discovered and hypothesized to have been the result of nonallelic homologous recombination (NAHR) between near-identical inverted duplications flanking this region. Here, we carried out an in-depth study of the orthologous region in 27 additional eutherians and report that this inversion is not specific to humans, but has occurred independently and repeatedly at least 10 times in multiple eutherian lineages. Moreover, inverted duplications flank the FLNA-EMD region in all 16 species for which high-quality sequence assemblies are available. Based on detailed sequence analyses, we propose a model in which the observed inverted duplications originated from a common duplication event that predates the eutherian radiation. Subsequent gene conversion homogenized the duplications, thereby providing a continuous substrate for NAHR that led to the recurrent inversion of this segment of the genome. These results provide an extreme example in support of the evolutionary breakpoint reusage hypothesis and point out that some near-identical human segmental duplications may, in fact, have originated >100 million years ago.
Assuntos
Inversão Cromossômica/genética , Evolução Molecular , Cromossomo X/genética , Animais , Duplicação Gênica , Genoma/genética , Humanos , Modelos Genéticos , Dados de Sequência MolecularRESUMO
Variation in social behavior and plumage in the white-throated sparrow (Zonotrichia albicollis) is linked to an inversion polymorphism on chromosome 2. Here we report the results of our comparative cytogenetic mapping efforts and population genetics studies focused on the genomic characterization of this balanced chromosomal polymorphism. Comparative chromosome painting and cytogenetic mapping of 15 zebra finch BAC clones to the standard (ZAL2) and alternative (ZAL2(m)) arrangements revealed that this chromosome is orthologous to chicken chromosome 3, and that at a minimum, ZAL2 and ZAL2(m) differ by a pair of included pericentric inversions that we estimate span at least 98 Mb. Population-based sequencing and genotyping of multiple loci demonstrated that ZAL2(m) suppresses recombination in the heterokaryotype and is evolving as a rare nonrecombining autosomal segment of the genome. In addition, we estimate that the first inversion within the ZAL2(m) arrangement originated 2.2+/-0.3 million years ago. Finally, while previously recognized as a genetic model for the evolution of social behavior, we found that the ZAL2/ZAL2(m) polymorphism also shares genetic and phenotypic features with the mouse t complex and we further suggest that the ZAL2/ZAL2(m) polymorphism is a heretofore unrecognized model for the early stages of sex chromosome evolution.
Assuntos
Cromossomos/genética , Rearranjo Gênico , Polimorfismo Genético , Recombinação Genética/genética , Comportamento Social , Pardais/genética , Supressão Genética , Animais , Mapeamento Cromossômico , Coloração Cromossômica , Cromossomos Artificiais Bacterianos , Células Clonais , Evolução Molecular , Fluxo Gênico , Modelos Genéticos , Filogenia , Cromossomo Y/genéticaRESUMO
Inversions are one type of structural variants linked to phenotypic differences and adaptation in multiple organisms. However, there is still very little information about polymorphic inversions in the human genome due to the difficulty of their detection. Here, we develop a new high-throughput genotyping method based on probe hybridization and amplification, and we perform a complete study of 45 common human inversions of 0.1-415 kb. Most inversions promoted by homologous recombination occur recurrently in humans and great apes and they are not tagged by SNPs. Furthermore, there is an enrichment of inversions showing signatures of positive or balancing selection, diverse functional effects, such as gene disruption and gene-expression changes, or association with phenotypic traits. Therefore, our results indicate that the genome is more dynamic than previously thought and that human inversions have important functional and evolutionary consequences, making possible to determine for the first time their contribution to complex traits.