RESUMEN
Unlike copy number variants (CNVs), inversions remain an underexplored genetic variation class. By integrating multiple genomic technologies, we discover 729 inversions in 41 human genomes. Approximately 85% of inversions <2 kbp form by twin-priming during L1 retrotransposition; 80% of the larger inversions are balanced and affect twice as many nucleotides as CNVs. Balanced inversions show an excess of common variants, and 72% are flanked by segmental duplications (SDs) or retrotransposons. Since flanking repeats promote non-allelic homologous recombination, we developed complementary approaches to identify recurrent inversion formation. We describe 40 recurrent inversions encompassing 0.6% of the genome, showing inversion rates up to 2.7 × 10-4 per locus per generation. Recurrent inversions exhibit a sex-chromosomal bias and co-localize with genomic disorder critical regions. We propose that inversion recurrence results in an elevated number of heterozygous carriers and structural SD diversity, which increases mutability in the population and predisposes specific haplotypes to disease-causing CNVs.
Asunto(s)
Inversión Cromosómica , Duplicaciones Segmentarias en el Genoma , Inversión Cromosómica/genética , Variaciones en el Número de Copia de ADN/genética , Genoma Humano , Genómica , HumanosRESUMEN
Single-nucleotide variants (SNVs) in segmental duplications (SDs) have not been systematically assessed because of the limitations of mapping short-read sequencing data1,2. Here we constructed 1:1 unambiguous alignments spanning high-identity SDs across 102 human haplotypes and compared the pattern of SNVs between unique and duplicated regions3,4. We find that human SNVs are elevated 60% in SDs compared to unique regions and estimate that at least 23% of this increase is due to interlocus gene conversion (IGC) with up to 4.3 megabase pairs of SD sequence converted on average per human haplotype. We develop a genome-wide map of IGC donors and acceptors, including 498 acceptor and 454 donor hotspots affecting the exons of about 800 protein-coding genes. These include 171 genes that have 'relocated' on average 1.61 megabase pairs in a subset of human haplotypes. Using a coalescent framework, we show that SD regions are slightly evolutionarily older when compared to unique sequences, probably owing to IGC. SNVs in SDs, however, show a distinct mutational spectrum: a 27.1% increase in transversions that convert cytosine to guanine or the reverse across all triplet contexts and a 7.6% reduction in the frequency of CpG-associated mutations when compared to unique DNA. We reason that these distinct mutational properties help to maintain an overall higher GC content of SD DNA compared to that of unique DNA, probably driven by GC-biased conversion between paralogous sequences5,6.
Asunto(s)
Conversión Génica , Mutación , Duplicaciones Segmentarias en el Genoma , Humanos , Conversión Génica/genética , Genoma Humano/genética , Polimorfismo de Nucleótido Simple/genética , Haplotipos/genética , Exones/genética , Citosina/química , Guanina/química , Islas de CpG/genéticaRESUMEN
The secreted mucins MUC5AC and MUC5B are large glycoproteins that play critical defensive roles in pathogen entrapment and mucociliary clearance. Their respective genes contain polymorphic and degenerate protein-coding variable number tandem repeats (VNTRs) that make the loci difficult to investigate with short reads. We characterize the structural diversity of MUC5AC and MUC5B by long-read sequencing and assembly of 206 human and 20 nonhuman primate (NHP) haplotypes. We find that human MUC5B is largely invariant (5,761-5,762 amino acids [aa]); however, seven haplotypes have expanded VNTRs (6,291-7,019 aa). In contrast, 30 allelic variants of MUC5AC encode 16 distinct proteins (5,249-6,325 aa) with cysteine-rich domain and VNTR copy-number variation. We group MUC5AC alleles into three phylogenetic clades: H1 (46%, â¼5,654 aa), H2 (33%, â¼5,742 aa), and H3 (7%, â¼6,325 aa). The two most common human MUC5AC variants are smaller than NHP gene models, suggesting a reduction in protein length during recent human evolution. Linkage disequilibrium and Tajima's D analyses reveal that East Asians carry exceptionally large blocks with an excess of rare variation (p < 0.05) at MUC5AC. To validate this result, we use Locityper for genotyping MUC5AC haplogroups in 2,600 unrelated samples from the 1000 Genomes Project. We observe a signature of positive selection in H1 among East Asians and a depletion of the likely ancestral haplogroup (H3). In Europeans, H3 alleles show an excess of common variation and deviate from Hardy-Weinberg equilibrium (p < 0.05), consistent with heterozygote advantage and balancing selection. This study provides a generalizable strategy to characterize complex protein-coding VNTRs for improved disease associations.
Asunto(s)
Alelos , Variación Genética , Haplotipos , Repeticiones de Minisatélite , Mucina 5AC , Mucina 5B , Filogenia , Humanos , Mucina 5B/genética , Animales , Mucina 5AC/genética , Mucina 5AC/metabolismo , Repeticiones de Minisatélite/genética , Variaciones en el Número de Copia de ADN , Primates/genéticaRESUMEN
The complete assembly of each human chromosome is essential for understanding human biology and evolution1,2. Here we use complementary long-read sequencing technologies to complete the linear assembly of human chromosome 8. Our assembly resolves the sequence of five previously long-standing gaps, including a 2.08-Mb centromeric α-satellite array, a 644-kb copy number polymorphism in the ß-defensin gene cluster that is important for disease risk, and an 863-kb variable number tandem repeat at chromosome 8q21.2 that can function as a neocentromere. We show that the centromeric α-satellite array is generally methylated except for a 73-kb hypomethylated region of diverse higher-order α-satellites enriched with CENP-A nucleosomes, consistent with the location of the kinetochore. In addition, we confirm the overall organization and methylation pattern of the centromere in a diploid human genome. Using a dual long-read sequencing approach, we complete high-quality draft assemblies of the orthologous centromere from chromosome 8 in chimpanzee, orangutan and macaque to reconstruct its evolutionary history. Comparative and phylogenetic analyses show that the higher-order α-satellite structure evolved in the great ape ancestor with a layered symmetry, in which more ancient higher-order repeats locate peripherally to monomeric α-satellites. We estimate that the mutation rate of centromeric satellite DNA is accelerated by more than 2.2-fold compared to the unique portions of the genome, and this acceleration extends into the flanking sequence.
Asunto(s)
Cromosomas Humanos Par 8/química , Cromosomas Humanos Par 8/genética , Evolución Molecular , Animales , Línea Celular , Centrómero/química , Centrómero/genética , Centrómero/metabolismo , Cromosomas Humanos Par 8/fisiología , Metilación de ADN , ADN Satélite/genética , Epigénesis Genética , Femenino , Humanos , Macaca mulatta/genética , Masculino , Repeticiones de Minisatélite/genética , Pan troglodytes/genética , Filogenia , Pongo abelii/genética , Telómero/química , Telómero/genética , Telómero/metabolismoRESUMEN
The divergence of chimpanzee and bonobo provides one of the few examples of recent hominid speciation1,2. Here we describe a fully annotated, high-quality bonobo genome assembly, which was constructed without guidance from reference genomes by applying a multiplatform genomics approach. We generate a bonobo genome assembly in which more than 98% of genes are completely annotated and 99% of the gaps are closed, including the resolution of about half of the segmental duplications and almost all of the full-length mobile elements. We compare the bonobo genome to those of other great apes1,3-5 and identify more than 5,569 fixed structural variants that specifically distinguish the bonobo and chimpanzee lineages. We focus on genes that have been lost, changed in structure or expanded in the last few million years of bonobo evolution. We produce a high-resolution map of incomplete lineage sorting and estimate that around 5.1% of the human genome is genetically closer to chimpanzee or bonobo and that more than 36.5% of the genome shows incomplete lineage sorting if we consider a deeper phylogeny including gorilla and orangutan. We also show that 26% of the segments of incomplete lineage sorting between human and chimpanzee or human and bonobo are non-randomly distributed and that genes within these clustered segments show significant excess of amino acid replacement compared to the rest of the genome.
Asunto(s)
Evolución Molecular , Genoma/genética , Genómica , Pan paniscus/genética , Filogenia , Animales , Factor 4A Eucariótico de Iniciación/genética , Femenino , Genes , Gorilla gorilla/genética , Anotación de Secuencia Molecular/normas , Pan troglodytes/genética , Pongo/genética , Duplicaciones Segmentarias en el Genoma , Análisis de Secuencia de ADNRESUMEN
This research proposes a magnetic field sensor with spatial orientation ability. Through the assistance of a magnetic flux concentrator, out-of-plane magnetic flux can be concentrated and guided into the planar magnetic cores of a fluxgate sensor. A printed circuit board is used to construct the basic planar structure, on which the proposed three-dimensional magnetic flux concentrator and magnetic cores are assembled. This reduces the alignment error of the coils and improves the reliability of the sensor. Three-axis sensing is achieved by using the second harmonic signals from selected sensing coil pairs. The magnetometer exhibits a linear range to 130 µT. At an excitation frequency of 50 kHz, the measured sensitivities are 257.1, 468.8, and 258.8 V/T for the X-, Y-, and Z-axis sensing modes, respectively. This sensor utilizes only one sensing mechanism for the vector field, making it suitable for IoT applications, especially for assessing mechanical posture or position.
RESUMEN
The indigenous inhabitants of Siberia live in some of the harshest environments on earth, experiencing extended periods of severe cold temperatures, dramatic variation in photoperiod, and limited and highly variable food resources. While the successful long-term settlement of this area by humans required multiple behavioral and cultural innovations, the nature of the underlying genetic changes has generally remained elusive. In this study, we used a three-part approach to identify putative targets of positive natural selection in Siberians. We first performed selection scans on whole exome and genome-wide single nucleotide polymorphism array data from multiple Siberian populations. We then annotated candidates in the tails of the empirical distributions, focusing on candidates with evidence linking them to biological processes and phenotypes previously identified as relevant to adaptation in circumpolar groups. The top candidates were then genotyped in additional populations to determine their spatial allele frequency distributions and associations with climate variables. Our analysis reveals missense mutations in three genes involved in lipid metabolism (PLA2G2A, PLIN1, and ANGPTL8) that exhibit genomic and spatial patterns consistent with selection for cold climate and/or diet. These variants are unified by their connection to brown adipose tissue and may help to explain previously observed physiological differences in Siberians such as low serum lipid levels and increased basal metabolic rate. These results support the hypothesis that indigenous Siberians have genetically adapted to their local environment by selection on multiple genes.
Asunto(s)
Adaptación Biológica , Evolución Molecular , Genoma Humano , Selección Genética , Proteína 8 Similar a la Angiopoyetina , Proteínas Similares a la Angiopoyetina/genética , Clima , Dieta , Frecuencia de los Genes , Fosfolipasas A2 Grupo II/genética , Haplotipos , Humanos , Desequilibrio de Ligamiento , Mutación Missense , Hormonas Peptídicas/genética , Perilipina-1/genética , Polimorfismo de Nucleótido Simple , SiberiaRESUMEN
African Pygmies practicing a mobile hunter-gatherer lifestyle are phenotypically and genetically diverged from other anatomically modern humans, and they likely experienced strong selective pressures due to their unique lifestyle in the Central African rainforest. To identify genomic targets of adaptation, we sequenced the genomes of four Biaka Pygmies from the Central African Republic and jointly analyzed these data with the genome sequences of three Baka Pygmies from Cameroon and nine Yoruba famers. To account for the complex demographic history of these populations that includes both isolation and gene flow, we fit models using the joint allele frequency spectrum and validated them using independent approaches. Our two best-fit models both suggest ancient divergence between the ancestors of the farmers and Pygmies, 90,000 or 150,000 yr ago. We also find that bidirectional asymmetric gene flow is statistically better supported than a single pulse of unidirectional gene flow from farmers to Pygmies, as previously suggested. We then applied complementary statistics to scan the genome for evidence of selective sweeps and polygenic selection. We found that conventional statistical outlier approaches were biased toward identifying candidates in regions of high mutation or low recombination rate. To avoid this bias, we assigned P-values for candidates using whole-genome simulations incorporating demography and variation in both recombination and mutation rates. We found that genes and gene sets involved in muscle development, bone synthesis, immunity, reproduction, cell signaling and development, and energy metabolism are likely to be targets of positive natural selection in Western African Pygmies or their recent ancestors.
Asunto(s)
Población Negra/genética , Genética de Población , Genoma , Genómica , Pan paniscus/genética , Selección Genética , Adaptación Biológica , Animales , Biología Computacional , Simulación por Computador , Flujo Génico , Variación Genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Desequilibrio de Ligamiento , Modelos Genéticos , Modelos Estadísticos , Reproducibilidad de los ResultadosRESUMEN
Comparisons of whole-genome sequences from ancient and contemporary samples have pointed to several instances of archaic admixture through interbreeding between the ancestors of modern non-Africans and now extinct hominids such as Neanderthals and Denisovans. One implication of these findings is that some adaptive features in contemporary humans may have entered the population via gene flow with archaic forms in Eurasia. Within Africa, fossil evidence suggests that anatomically modern humans (AMH) and various archaic forms coexisted for much of the last 200,000 yr; however, the absence of ancient DNA in Africa has limited our ability to make a direct comparison between archaic and modern human genomes. Here, we use statistical inference based on high coverage whole-genome data (greater than 60×) from contemporary African Pygmy hunter-gatherers as an alternative means to study the evolutionary history of the genus Homo. Using whole-genome simulations that consider demographic histories that include both isolation and gene flow with neighboring farming populations, our inference method rejects the hypothesis that the ancestors of AMH were genetically isolated in Africa, thus providing the first whole genome-level evidence of African archaic admixture. Our inferences also suggest a complex human evolutionary history in Africa, which involves at least a single admixture event from an unknown archaic population into the ancestors of AMH, likely within the last 30,000 yr.
Asunto(s)
Población Negra/genética , Evolución Molecular , Genética de Población , Genoma Humano , Genoma , Genómica , Pan paniscus/genética , Animales , Flujo Génico , Frecuencia de los Genes , Sitios Genéticos , Haplotipos , Humanos , Desequilibrio de Ligamiento , Polimorfismo de Nucleótido SimpleRESUMEN
Siberia is one of the coldest environments on Earth and has great seasonal temperature variation. Long-term settlement in northern Siberia undoubtedly required biological adaptation to severe cold stress, dramatic variation in photoperiod, and limited food resources. In addition, recent archeological studies show that humans first occupied Siberia at least 45,000 years ago; yet our understanding of the demographic history of modern indigenous Siberians remains incomplete. In this study, we use whole-exome sequencing data from the Nganasans and Yakuts to infer the evolutionary history of these two indigenous Siberian populations. Recognizing the complexity of the adaptive process, we designed a model-based test to systematically search for signatures of polygenic selection. Our approach accounts for stochasticity in the demographic process and the hitchhiking effect of classic selective sweeps, as well as potential biases resulting from recombination rate and mutation rate heterogeneity. Our demographic inference shows that the Nganasans and Yakuts diverged â¼12,000-13,000 years ago from East-Asian ancestors in a process involving continuous gene flow. Our polygenic selection scan identifies seven candidate gene sets with Siberian-specific signals. Three of these gene sets are related to diet, especially to fat metabolism, consistent with the hypothesis of adaptation to a fat-rich animal diet. Additional testing rejects the effect of hitchhiking and favors a model in which selection yields small allele frequency changes at multiple unlinked genes.
Asunto(s)
Aclimatación/genética , Adaptación Biológica/genética , Alelos , Pueblo Asiatico/genética , Evolución Biológica , ADN Mitocondrial/genética , Demografía/métodos , Dieta , Grasas de la Dieta , Etnicidad/genética , Exoma/genética , Flujo Génico/genética , Frecuencia de los Genes/genética , Variación Genética/genética , Genética de Población/métodos , Humanos , Herencia Multifactorial/genética , Filogenia , Siberia , Secuenciación del Exoma/métodosRESUMEN
BACKGROUND: Radiation proctitis (RP) is a significant complication of pelvic radiation. Effective treatments for chronic RP are currently lacking. We report a case where chronic RP was successfully managed by metformin and butyrate (M-B) enema and suppository therapy. CASE PRESENTATION: A 70-year-old Asian male was diagnosed with prostate cancer of bilateral lobes, underwent definitive radiotherapy to the prostate of 76 Gy in 38 fractions and six months of androgen deprivation therapy. Despite a stable PSA nadir of 0.2 ng/mL for 10 months post-radiotherapy, he developed intermittent rectal bleeding, and was diagnosed as chronic RP. Symptoms persisted despite two months of oral mesalamine, mesalamine enema and hydrocortisone enema treatment. Transition to daily 2% metformin and butyrate (M-B) enema for one week led to significant improvement, followed by maintenance therapy with daily 2.0% M-B suppository for three weeks, resulting in continued reduction of rectal bleeding. Endoscopic examination and biopsy demonstrated a good therapeutic effect. CONCLUSIONS: M-B enema and suppository may be an effective treatment for chronic RP.
Asunto(s)
Enema , Metformina , Proctitis , Neoplasias de la Próstata , Traumatismos por Radiación , Humanos , Masculino , Proctitis/tratamiento farmacológico , Proctitis/etiología , Anciano , Metformina/uso terapéutico , Metformina/administración & dosificación , Neoplasias de la Próstata/radioterapia , Neoplasias de la Próstata/tratamiento farmacológico , Traumatismos por Radiación/tratamiento farmacológico , Enfermedad Crónica , Resultado del Tratamiento , Butiratos/uso terapéutico , Hemorragia Gastrointestinal/tratamiento farmacológico , Hemorragia Gastrointestinal/terapia , Hemorragia Gastrointestinal/etiología , SupositoriosRESUMEN
The secreted mucins MUC5AC and MUC5B play critical defensive roles in airway pathogen entrapment and mucociliary clearance by encoding large glycoproteins with variable number tandem repeats (VNTRs). These polymorphic and degenerate protein coding VNTRs make the loci difficult to investigate with short reads. We characterize the structural diversity of MUC5AC and MUC5B by long-read sequencing and assembly of 206 human and 20 nonhuman primate (NHP) haplotypes. We find that human MUC5B is largely invariant (5761-5762aa); however, seven haplotypes have expanded VNTRs (6291-7019aa). In contrast, 30 allelic variants of MUC5AC encode 16 distinct proteins (5249-6325aa) with cysteine-rich domain and VNTR copy number variation. We grouped MUC5AC alleles into three phylogenetic clades: H1 (46%, ~5654aa), H2 (33%, ~5742aa), and H3 (7%, ~6325aa). The two most common human MUC5AC variants are smaller than NHP gene models, suggesting a reduction in protein length during recent human evolution. Linkage disequilibrium (LD) and Tajima's D analyses reveal that East Asians carry exceptionally large MUC5AC LD blocks with an excess of rare variation (p<0.05). To validate this result, we used Locityper for genotyping MUC5AC haplogroups in 2,600 unrelated samples from the 1000 Genomes Project. We observed signatures of positive selection in H1 and H2 among East Asians and a depletion of the likely ancestral haplogroup (H3). In Africans and Europeans, H3 alleles show an excess of common variation and deviate from Hardy-Weinberg equilibrium, consistent with heterozygote advantage and balancing selection. This study provides a generalizable strategy to characterize complex protein coding VNTRs for improved disease associations.
RESUMEN
The genetic mechanisms underlying the expansion in size and complexity of the human brain remain poorly understood. Long interspersed nuclear element-1 (L1) retrotransposons are a source of divergent genetic information in hominoid genomes, but their importance in physiological functions and their contribution to human brain evolution are largely unknown. Using multiomics profiling, we here demonstrate that L1 promoters are dynamically active in the developing and the adult human brain. L1s generate hundreds of developmentally regulated and cell type-specific transcripts, many that are co-opted as chimeric transcripts or regulatory RNAs. One L1-derived long noncoding RNA, LINC01876, is a human-specific transcript expressed exclusively during brain development. CRISPR interference silencing of LINC01876 results in reduced size of cerebral organoids and premature differentiation of neural progenitors, implicating L1s in human-specific developmental processes. In summary, our results demonstrate that L1-derived transcripts provide a previously undescribed layer of primate- and human-specific transcriptome complexity that contributes to the functional diversification of the human brain.
Asunto(s)
Retroelementos , Transcriptoma , Animales , Humanos , Retroelementos/genética , Elementos de Nucleótido Esparcido Largo/genética , Neuronas , Primates/genéticaRESUMEN
The human forebrain has expanded in size and complexity compared to chimpanzees despite limited changes in protein-coding genes, suggesting that gene expression regulation is an important driver of brain evolution. Here, we identify a KRAB-ZFP transcription factor, ZNF558, that is expressed in human but not chimpanzee forebrain neural progenitor cells. ZNF558 evolved as a suppressor of LINE-1 transposons but has been co-opted to regulate a single target, the mitophagy gene SPATA18. ZNF558 plays a role in mitochondrial homeostasis, and loss-of-function experiments in cerebral organoids suggests that ZNF558 influences developmental timing during early human brain development. Expression of ZNF558 is controlled by the size of a variable number tandem repeat that is longer in chimpanzees compared to humans, and variable in the human population. Thus, this work provides mechanistic insight into how a cis-acting structural variation establishes a regulatory network that affects human brain evolution.
Asunto(s)
Redes Reguladoras de Genes , Organoides , Encéfalo/metabolismo , Proteínas de Unión al ADN , Regulación de la Expresión Génica , Humanos , Organoides/metabolismo , Factores de Transcripción/genética , Factores de Transcripción/metabolismoRESUMEN
TRP channel-associated factor 1/2 (TCAF1/TCAF2) proteins antagonistically regulate the cold-sensor protein TRPM8 in multiple human tissues. Understanding their significance has been complicated given the locus spans a gap-ridden region with complex segmental duplications in GRCh38. Using long-read sequencing, we sequence-resolve the locus, annotate full-length TCAF models in primate genomes, and show substantial human-specific TCAF copy number variation. We identify two human super haplogroups, H4 and H5, and establish that TCAF duplications originated ~1.7 million years ago but diversified only in Homo sapiens by recurrent structural mutations. Conversely, in all archaic-hominin samples the fixation for a specific H4 haplotype without duplication is likely due to positive selection. Here, our results of TCAF copy number expansion, selection signals in hominins, and differential TCAF2 expression between haplogroups and high TCAF2 and TRPM8 expression in liver and prostate in modern-day humans imply TCAF diversification among hominins potentially in response to cold or dietary adaptations.
Asunto(s)
Duplicación de Gen , Hominidae/genética , Proteínas de la Membrana/genética , Selección Genética , Animales , Variaciones en el Número de Copia de ADN , Evolución Molecular , Genoma Humano , Haplotipos , Humanos , Hombre de Neandertal , FilogeniaRESUMEN
Autism is a highly heritable complex disorder in which de novo mutation (DNM) variation contributes significantly to risk. Using whole-genome sequencing data from 3,474 families, we investigate another source of large-effect risk variation, ultra-rare variants. We report and replicate a transmission disequilibrium of private, likely gene-disruptive (LGD) variants in probands but find that 95% of this burden resides outside of known DNM-enriched genes. This variant class more strongly affects multiplex family probands and supports a multi-hit model for autism. Candidate genes with private LGD variants preferentially transmitted to probands converge on the E3 ubiquitin-protein ligase complex, intracellular transport and Erb signaling protein networks. We estimate that these variants are approximately 2.5 generations old and significantly younger than other variants of similar type and frequency in siblings. Overall, private LGD variants are under strong purifying selection and appear to act on a distinct set of genes not yet associated with autism.
Asunto(s)
Trastorno del Espectro Autista/genética , Predisposición Genética a la Enfermedad , Proteínas/genética , Trastorno Autístico/genética , Evolución Molecular , Dosificación de Gen , Haplotipos , Humanos , Desequilibrio de Ligamiento , Modelos Genéticos , Mutación , Linaje , Polimorfismo de Nucleótido Simple , Mapas de Interacción de Proteínas/genética , Hermanos , Secuenciación Completa del GenomaRESUMEN
Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci. We identified 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterized 130 of the most active mobile element source elements and found that 63% of all SVs arise through homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population.
Asunto(s)
Variación Genética , Genoma Humano , Haplotipos , Femenino , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Mutación INDEL , Secuencias Repetitivas Esparcidas , Masculino , Grupos de Población/genética , Sitios de Carácter Cuantitativo , Retroelementos , Análisis de Secuencia de ADN , Inversión de Secuencia , Secuenciación Completa del GenomaRESUMEN
Inversions play an important role in disease and evolution but are difficult to characterize because their breakpoints map to large repeats. We increased by sixfold the number (n = 1,069) of previously reported great ape inversions by using single-cell DNA template strand and long-read sequencing. We find that the X chromosome is most enriched (2.5-fold) for inversions, on the basis of its size and duplication content. There is an excess of differentially expressed primate genes near the breakpoints of large (>100 kilobases (kb)) inversions but not smaller events. We show that when great ape lineage-specific duplications emerge, they preferentially (approximately 75%) occur in an inverted orientation compared to that at their ancestral locus. We construct megabase-pair scale haplotypes for individual chromosomes and identify 23 genomic regions that have recurrently toggled between a direct and an inverted state over 15 million years. The direct orientation is most frequently the derived state for human polymorphisms that predispose to recurrent copy number variants associated with neurodevelopmental disease.
Asunto(s)
Inversión Cromosómica/genética , Genoma/genética , Hominidae/genética , Animales , Cromosomas/genética , Variaciones en el Número de Copia de ADN/genética , Evolución Molecular , Femenino , Haplotipos/genética , Humanos , MasculinoRESUMEN
BACKGROUND: The complex interspersed pattern of segmental duplications in humans is responsible for rearrangements associated with neurodevelopmental disease, including the emergence of novel genes important in human brain evolution. We investigate the evolution of LCR16a, a putative driver of this phenomenon that encodes one of the most rapidly evolving human-ape gene families, nuclear pore interacting protein (NPIP). RESULTS: Comparative analysis shows that LCR16a has independently expanded in five primate lineages over the last 35 million years of primate evolution. The expansions are associated with independent lineage-specific segmental duplications flanking LCR16a leading to the emergence of large interspersed duplication blocks at non-orthologous chromosomal locations in each primate lineage. The intron-exon structure of the NPIP gene family has changed dramatically throughout primate evolution with different branches showing characteristic gene models yet maintaining an open reading frame. In the African ape lineage, we detect signatures of positive selection that occurred after a transition to more ubiquitous expression among great ape tissues when compared to Old World and New World monkeys. Mouse transgenic experiments from baboon and human genomic loci confirm these expression differences and suggest that the broader ape expression pattern arose due to mutational changes that emerged in cis. CONCLUSIONS: LCR16a promotes serial interspersed duplications and creates hotspots of genomic instability that appear to be an ancient property of primate genomes. Dramatic changes to NPIP gene structure and altered tissue expression preceded major bouts of positive selection in the African ape lineage, suggestive of a gene undergoing strong adaptive evolution.