Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 47
Filter
1.
Nature ; 630(8016): 401-411, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38811727

ABSTRACT

Apes possess two sex chromosomes-the male-specific Y chromosome and the X chromosome, which is present in both males and females. The Y chromosome is crucial for male reproduction, with deletions being linked to infertility1. The X chromosome is vital for reproduction and cognition2. Variation in mating patterns and brain function among apes suggests corresponding differences in their sex chromosomes. However, owing to their repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the methodology developed for the telomere-to-telomere (T2T) human genome, we produced gapless assemblies of the X and Y chromosomes for five great apes (bonobo (Pan paniscus), chimpanzee (Pan troglodytes), western lowland gorilla (Gorilla gorilla gorilla), Bornean orangutan (Pongo pygmaeus) and Sumatran orangutan (Pongo abelii)) and a lesser ape (the siamang gibbon (Symphalangus syndactylus)), and untangled the intricacies of their evolution. Compared with the X chromosomes, the ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements-owing to the accumulation of lineage-specific ampliconic regions, palindromes, transposable elements and satellites. Many Y chromosome genes expand in multi-copy families and some evolve under purifying selection. Thus, the Y chromosome exhibits dynamic evolution, whereas the X chromosome is more stable. Mapping short-read sequencing data to these assemblies revealed diversity and selection patterns on sex chromosomes of more than 100 individual great apes. These reference assemblies are expected to inform human evolution and conservation genetics of non-human apes, all of which are endangered species.


Subject(s)
Hominidae , X Chromosome , Y Chromosome , Animals , Female , Male , Gorilla gorilla/genetics , Hominidae/genetics , Hominidae/classification , Hylobatidae/genetics , Pan paniscus/genetics , Pan troglodytes/genetics , Phylogeny , Pongo abelii/genetics , Pongo pygmaeus/genetics , Telomere/genetics , X Chromosome/genetics , Y Chromosome/genetics , Evolution, Molecular , DNA Copy Number Variations/genetics , Humans , Endangered Species , Reference Standards
2.
bioRxiv ; 2024 Apr 08.
Article in English | MEDLINE | ID: mdl-38645259

ABSTRACT

The crab-eating macaques ( Macaca fascicularis ) and rhesus macaques ( M. mulatta ) are widely studied nonhuman primates in biomedical and evolutionary research. Despite their significance, the current understanding of the complex genomic structure in macaques and the differences between species requires substantial improvement. Here, we present a complete genome assembly of a crab-eating macaque and 20 haplotype-resolved macaque assemblies to investigate the complex regions and major genomic differences between species. Segmental duplication in macaques is ∼42% lower, while centromeres are ∼3.7 times longer than those in humans. The characterization of ∼2 Mbp fixed genetic variants and ∼240 Mbp complex loci highlights potential associations with metabolic differences between the two macaque species (e.g., CYP2C76 and EHBP1L1 ). Additionally, hundreds of alternative splicing differences show post-transcriptional regulation divergence between these two species (e.g., PNPO ). We also characterize 91 large-scale genomic differences between macaques and humans at a single-base-pair resolution and highlight their impact on gene regulation in primate evolution (e.g., FOLH1 and PIEZO2 ). Finally, population genetics recapitulates macaque speciation and selective sweeps, highlighting potential genetic basis of reproduction and tail phenotype differences (e.g., STAB1 , SEMA3F , and HOXD13 ). In summary, the integrated analysis of genetic variation and population genetics in macaques greatly enhances our comprehension of lineage-specific phenotypes, adaptation, and primate evolution, thereby improving their biomedical applications in human diseases.

3.
bioRxiv ; 2023 Dec 01.
Article in English | MEDLINE | ID: mdl-38077089

ABSTRACT

Apes possess two sex chromosomes-the male-specific Y and the X shared by males and females. The Y chromosome is crucial for male reproduction, with deletions linked to infertility. The X chromosome carries genes vital for reproduction and cognition. Variation in mating patterns and brain function among great apes suggests corresponding differences in their sex chromosome structure and evolution. However, due to their highly repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the state-of-the-art experimental and computational methods developed for the telomere-to-telomere (T2T) human genome, we produced gapless, complete assemblies of the X and Y chromosomes for five great apes (chimpanzee, bonobo, gorilla, Bornean and Sumatran orangutans) and a lesser ape, the siamang gibbon. These assemblies completely resolved ampliconic, palindromic, and satellite sequences, including the entire centromeres, allowing us to untangle the intricacies of ape sex chromosome evolution. We found that, compared to the X, ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements. This divergence on the Y arises from the accumulation of lineage-specific ampliconic regions and palindromes (which are shared more broadly among species on the X) and from the abundance of transposable elements and satellites (which have a lower representation on the X). Our analysis of Y chromosome genes revealed lineage-specific expansions of multi-copy gene families and signatures of purifying selection. In summary, the Y exhibits dynamic evolution, while the X is more stable. Finally, mapping short-read sequencing data from >100 great ape individuals revealed the patterns of diversity and selection on their sex chromosomes, demonstrating the utility of these reference assemblies for studies of great ape evolution. These complete sex chromosome assemblies are expected to further inform conservation genetics of nonhuman apes, all of which are endangered species.

4.
Genes (Basel) ; 14(12)2023 12 10.
Article in English | MEDLINE | ID: mdl-38137016

ABSTRACT

Large-scale genomic structural variations can have significant clinical implications, depending on the specific altered genomic region. Briefly, 2q37 microdeletion syndrome is a prevalent subtelomeric deletion disorder characterized by variable-sized deletions. Affected patients exhibit a wide range of clinical manifestations, including short stature, facial dysmorphism, and features of autism spectrum disorder, among others. Conversely, isolated duplications of proximal chromosome 2q are rare and lack a distinct phenotype. In this report, we provide an extensive molecular analysis of a 15-day-old newborn referred for syndromic features. Our analysis reveals an 8.5 Mb microdeletion at 2q37.1, which extends to the telomere, in conjunction with an 8.6 Mb interstitial microduplication at 2q34q36.1. Our findings underscore the prominence of 2q37 terminal deletions as commonly reported genomic anomalies. We compare our patient's phenotype with previously reported cases in the literature to contribute to a more refined classification of 2q37 microdeletion syndrome and assess the potential impact of 2q34q36.1 microduplication. We also investigate multiple hypotheses to clarify the genetic mechanisms responsible for the observed genomic rearrangement.


Subject(s)
Autism Spectrum Disorder , Intellectual Disability , Infant, Newborn , Humans , Chromosome Deletion , Intellectual Disability/genetics , Autism Spectrum Disorder/genetics , Chromosome Structures , Telomere
5.
Int J Mol Sci ; 24(21)2023 Oct 31.
Article in English | MEDLINE | ID: mdl-37958807

ABSTRACT

The impact of segmental duplications on human evolution and disease is only just starting to unfold, thanks to advancements in sequencing technologies that allow for their discovery and precise genotyping. The 15q11-q13 locus is a hotspot of recurrent copy number variation associated with Prader-Willi/Angelman syndromes, developmental delay, autism, and epilepsy and is mediated by complex segmental duplications, many of which arose recently during evolution. To gain insight into the instability of this region, we characterized its architecture in human and nonhuman primates, reconstructing the evolutionary history of five different inversions that rearranged the region in different species primarily by accumulation of segmental duplications. Comparative analysis of human and nonhuman primate duplication structures suggests a human-specific gain of directly oriented duplications in the regions flanking the GOLGA cores and HERC segmental duplications, representing potential genomic drivers for the human-specific expansions. The increasing complexity of segmental duplication organization over the course of evolution underlies its association with human susceptibility to recurrent disease-associated rearrangements.


Subject(s)
Autistic Disorder , Prader-Willi Syndrome , Animals , Humans , DNA Copy Number Variations/genetics , Primates/genetics , Prader-Willi Syndrome/genetics , Segmental Duplications, Genomic/genetics , Autistic Disorder/genetics , Chromosomes, Human, Pair 15/genetics , Gene Duplication
6.
Genome Res ; 32(10): 1941-1951, 2022 10.
Article in English | MEDLINE | ID: mdl-36180231

ABSTRACT

Gibbons are the most speciose family of living apes, characterized by a diverse chromosome number and rapid rate of large-scale rearrangements. Here we performed single-cell template strand sequencing (Strand-seq), molecular cytogenetics, and deep in silico analysis of a southern white-cheeked gibbon genome, providing the first comprehensive map of 238 previously hidden small-scale inversions. We determined that more than half are gibbon specific, at least fivefold higher than shown for other primate lineage-specific inversions, with a significantly high number of small heterozygous inversions, suggesting that accelerated evolution of inversions may have played a role in the high sympatric diversity of gibbons. Although the precise mechanisms underlying these inversions are not yet understood, it is clear that segmental duplication-mediated NAHR only accounts for a small fraction of events. Several genomic features, including gene density and repeat (e.g., LINE-1) content, might render these regions more break-prone and susceptible to inversion formation. In the attempt to characterize interspecific variation between southern and northern white-cheeked gibbons, we identify several large assembly errors in the current GGSC Nleu3.0/nomLeu3 reference genome comprising more than 49 megabases of DNA. Finally, we provide a list of 182 candidate genes potentially involved in gibbon diversification and speciation.


Subject(s)
Hominidae , Hylobates , Animals , Hylobates/genetics , Genome , Primates/genetics , Chromosome Inversion/genetics , Chromosomes , Hominidae/genetics
7.
Cell ; 185(11): 1986-2005.e26, 2022 05 26.
Article in English | MEDLINE | ID: mdl-35525246

ABSTRACT

Unlike copy number variants (CNVs), inversions remain an underexplored genetic variation class. By integrating multiple genomic technologies, we discover 729 inversions in 41 human genomes. Approximately 85% of inversions <2 kbp form by twin-priming during L1 retrotransposition; 80% of the larger inversions are balanced and affect twice as many nucleotides as CNVs. Balanced inversions show an excess of common variants, and 72% are flanked by segmental duplications (SDs) or retrotransposons. Since flanking repeats promote non-allelic homologous recombination, we developed complementary approaches to identify recurrent inversion formation. We describe 40 recurrent inversions encompassing 0.6% of the genome, showing inversion rates up to 2.7 × 10-4 per locus per generation. Recurrent inversions exhibit a sex-chromosomal bias and co-localize with genomic disorder critical regions. We propose that inversion recurrence results in an elevated number of heterozygous carriers and structural SD diversity, which increases mutability in the population and predisposes specific haplotypes to disease-causing CNVs.


Subject(s)
Chromosome Inversion , Segmental Duplications, Genomic , Chromosome Inversion/genetics , DNA Copy Number Variations/genetics , Genome, Human , Genomics , Humans
8.
Genes (Basel) ; 12(6)2021 06 07.
Article in English | MEDLINE | ID: mdl-34200357

ABSTRACT

Chromosome deletions, including band 5q12, have rarely been reported and have been associated with a wide range of clinical manifestations, such as postnatal growth retardation, intellectual disability, hyperactivity, nonspecific ocular defects, facial dysmorphism, and epilepsy. In this study, we describe for the first time a child with growth retardation in which we identified a balanced t(3;10) translocation by conventional cytogenetic analysis in addition to an 8.6 Mb 5q12 deletion through array-CGH. Our results show that the phenotypic abnormalities of a case that had been interpreted as "balanced" by conventional cytogenetics are mainly due to a cryptic deletion, highlighting the need for molecular investigation in subjects with an abnormal phenotype before assuming the cause is an apparently simple cytogenetic rearrangement. Finally, we identify PDE4D and PIK3R1 genes as the two major candidates responsible for the clinical features expressed in our patient.


Subject(s)
Chromosome Deletion , Chromosome Disorders/genetics , Chromosomes, Human, Pair 5/genetics , Growth Disorders/genetics , Chromosome Disorders/pathology , Class Ia Phosphatidylinositol 3-Kinase/genetics , Comparative Genomic Hybridization , Cyclic Nucleotide Phosphodiesterases, Type 4/genetics , Female , Growth Disorders/pathology , Humans , Infant , Karyotyping , Phenotype , Translocation, Genetic
9.
Genes (Basel) ; 12(5)2021 05 19.
Article in English | MEDLINE | ID: mdl-34069634

ABSTRACT

Variability is the source on which selective pressure acts, allowing genome evolution and adaptation [...].


Subject(s)
Genes/genetics , Genome, Human/genetics , Adaptation, Physiological/genetics , Animals , Evolution, Molecular , Humans , Phenotype , Plants/genetics
10.
Nature ; 594(7861): 77-81, 2021 06.
Article in English | MEDLINE | ID: mdl-33953399

ABSTRACT

The divergence of chimpanzee and bonobo provides one of the few examples of recent hominid speciation1,2. Here we describe a fully annotated, high-quality bonobo genome assembly, which was constructed without guidance from reference genomes by applying a multiplatform genomics approach. We generate a bonobo genome assembly in which more than 98% of genes are completely annotated and 99% of the gaps are closed, including the resolution of about half of the segmental duplications and almost all of the full-length mobile elements. We compare the bonobo genome to those of other great apes1,3-5 and identify more than 5,569 fixed structural variants that specifically distinguish the bonobo and chimpanzee lineages. We focus on genes that have been lost, changed in structure or expanded in the last few million years of bonobo evolution. We produce a high-resolution map of incomplete lineage sorting and estimate that around 5.1% of the human genome is genetically closer to chimpanzee or bonobo and that more than 36.5% of the genome shows incomplete lineage sorting if we consider a deeper phylogeny including gorilla and orangutan. We also show that 26% of the segments of incomplete lineage sorting between human and chimpanzee or human and bonobo are non-randomly distributed and that genes within these clustered segments show significant excess of amino acid replacement compared to the rest of the genome.


Subject(s)
Evolution, Molecular , Genome/genetics , Genomics , Pan paniscus/genetics , Phylogeny , Animals , Eukaryotic Initiation Factor-4A/genetics , Female , Genes , Gorilla gorilla/genetics , Molecular Sequence Annotation/standards , Pan troglodytes/genetics , Pongo/genetics , Segmental Duplications, Genomic , Sequence Analysis, DNA
11.
Science ; 370(6523)2020 12 18.
Article in English | MEDLINE | ID: mdl-33335035

ABSTRACT

The rhesus macaque (Macaca mulatta) is the most widely studied nonhuman primate (NHP) in biomedical research. We present an updated reference genome assembly (Mmul_10, contig N50 = 46 Mbp) that increases the sequence contiguity 120-fold and annotate it using 6.5 million full-length transcripts, thus improving our understanding of gene content, isoform diversity, and repeat organization. With the improved assembly of segmental duplications, we discovered new lineage-specific genes and expanded gene families that are potentially informative in studies of evolution and disease susceptibility. Whole-genome sequencing (WGS) data from 853 rhesus macaques identified 85.7 million single-nucleotide variants (SNVs) and 10.5 million indel variants, including potentially damaging variants in genes associated with human autism and developmental delay, providing a framework for developing noninvasive NHP models of human disease.


Subject(s)
Genetic Predisposition to Disease , Genome , Macaca mulatta/genetics , Polymorphism, Single Nucleotide , Animals , Genetic Variation , Humans , Molecular Sequence Annotation , Whole Genome Sequencing
12.
Genome Res ; 30(11): 1680-1693, 2020 11.
Article in English | MEDLINE | ID: mdl-33093070

ABSTRACT

Rhesus macaque is an Old World monkey that shared a common ancestor with human ∼25 Myr ago and is an important animal model for human disease studies. A deep understanding of its genetics is therefore required for both biomedical and evolutionary studies. Among structural variants, inversions represent a driving force in speciation and play an important role in disease predisposition. Here we generated a genome-wide map of inversions between human and macaque, combining single-cell strand sequencing with cytogenetics. We identified 375 total inversions between 859 bp and 92 Mbp, increasing by eightfold the number of previously reported inversions. Among these, 19 inversions flanked by segmental duplications overlap with recurrent copy number variants associated with neurocognitive disorders. Evolutionary analyses show that in 17 out of 19 cases, the Hominidae orientation of these disease-associated regions is always derived. This suggests that duplicated sequences likely played a fundamental role in generating inversions in humans and great apes, creating architectures that nowadays predispose these regions to disease-associated genetic instability. Finally, we identified 861 genes mapping at 156 inversions breakpoints, with some showing evidence of differential expression in human and macaque cell lines, thus highlighting candidates that might have contributed to the evolution of species-specific features. This study depicts the most accurate fine-scale map of inversions between human and macaque using a two-pronged integrative approach, such as single-cell strand sequencing and cytogenetics, and represents a valuable resource toward understanding of the biology and evolution of primate species.


Subject(s)
Chromosome Breakpoints , Chromosome Inversion , Evolution, Molecular , Macaca mulatta/genetics , Animals , Disease/genetics , Gene Expression Regulation , Genome , Genomics , Heterozygote , Humans , In Situ Hybridization, Fluorescence , Recombination, Genetic , Sequence Analysis, DNA , Single-Cell Analysis
13.
Nat Genet ; 52(8): 849-858, 2020 08.
Article in English | MEDLINE | ID: mdl-32541924

ABSTRACT

Inversions play an important role in disease and evolution but are difficult to characterize because their breakpoints map to large repeats. We increased by sixfold the number (n = 1,069) of previously reported great ape inversions by using single-cell DNA template strand and long-read sequencing. We find that the X chromosome is most enriched (2.5-fold) for inversions, on the basis of its size and duplication content. There is an excess of differentially expressed primate genes near the breakpoints of large (>100 kilobases (kb)) inversions but not smaller events. We show that when great ape lineage-specific duplications emerge, they preferentially (approximately 75%) occur in an inverted orientation compared to that at their ancestral locus. We construct megabase-pair scale haplotypes for individual chromosomes and identify 23 genomic regions that have recurrently toggled between a direct and an inverted state over 15 million years. The direct orientation is most frequently the derived state for human polymorphisms that predispose to recurrent copy number variants associated with neurodevelopmental disease.


Subject(s)
Chromosome Inversion/genetics , Genome/genetics , Hominidae/genetics , Animals , Chromosomes/genetics , DNA Copy Number Variations/genetics , Evolution, Molecular , Female , Haplotypes/genetics , Humans , Male
14.
Genes (Basel) ; 11(2)2020 02 18.
Article in English | MEDLINE | ID: mdl-32085667

ABSTRACT

POTE (prostate, ovary, testis, and placenta expressed) genes belong to a primate-specific gene family expressed in prostate, ovary, and testis as well as in several cancers including breast, prostate, and lung cancers. Due to their tumor-specific expression, POTEs are potential oncogenes, therapeutic targets, and biomarkers for these malignancies. This gene family maps within human and primate segmental duplications with a copy number ranging from two to 14 in different species. Due to the high sequence identity among the gene copies, specific efforts are needed to assemble these loci in order to correctly define the organization and evolution of the gene family. Using single-molecule, real-time (SMRT) sequencing, in silico analyses, and molecular cytogenetics, we characterized the structure, copy number, and chromosomal distribution of the POTE genes, as well as their expression in normal and disease tissues, and provided a comparative analysis of the POTE organization and gene structure in primate genomes. We were able, for the first time, to de novo sequence and assemble a POTE tandem duplication in marmoset that is misassembled and collapsed in the reference genome, thus revealing the presence of a second POTE copy. Taken together, our findings provide comprehensive insights into the evolutionary dynamics of the primate-specific POTE gene family, involving gene duplications, deletions, and long interspersed nuclear element (LINE) transpositions to explain the actual repertoire of these genes in human and primate genomes.


Subject(s)
Multigene Family , Ovary/chemistry , Placenta/chemistry , Primates/genetics , Prostate/chemistry , Testis/chemistry , Animals , Chromosome Mapping , Computer Simulation , Evolution, Molecular , Female , Gene Expression Profiling , Gene Expression Regulation , Humans , Male , Pregnancy , Single Molecule Imaging , Tissue Distribution
15.
Science ; 366(6463)2019 10 18.
Article in English | MEDLINE | ID: mdl-31624180

ABSTRACT

Copy number variants (CNVs) are subject to stronger selective pressure than single-nucleotide variants, but their roles in archaic introgression and adaptation have not been systematically investigated. We show that stratified CNVs are significantly associated with signatures of positive selection in Melanesians and provide evidence for adaptive introgression of large CNVs at chromosomes 16p11.2 and 8p21.3 from Denisovans and Neanderthals, respectively. Using long-read sequence data, we reconstruct the structure and complex evolutionary history of these polymorphisms and show that both encode positively selected genes absent from most human populations. Our results collectively suggest that large CNVs originating in archaic hominins and introgressed into modern humans have played an important role in local population adaptation and represent an insufficiently studied source of large-scale genetic variation.


Subject(s)
Genetic Introgression , Animals , Chromosome Duplication , Chromosomes, Human, Pair 16/genetics , Chromosomes, Human, Pair 8/genetics , DNA Copy Number Variations , Evolution, Molecular , Genome, Human , Haplotypes , Hominidae/genetics , Humans , Melanesia , Models, Genetic , Neanderthals/genetics , Polymorphism, Genetic , Selection, Genetic , Whole Genome Sequencing
16.
PLoS Genet ; 15(3): e1008075, 2019 03.
Article in English | MEDLINE | ID: mdl-30917130

ABSTRACT

Human chromosome 15q25 is involved in several disease-associated structural rearrangements, including microdeletions and chromosomal markers with inverted duplications. Using comparative fluorescence in situ hybridization, strand-sequencing, single-molecule, real-time sequencing and Bionano optical mapping analyses, we investigated the organization of the 15q25 region in human and nonhuman primates. We found that two independent inversions occurred in this region after the fission event that gave rise to phylogenetic chromosomes XIV and XV in humans and great apes. One of these inversions is still polymorphic in the human population today and may confer differential susceptibility to 15q25 microdeletions and inverted duplications. The inversion breakpoints map within segmental duplications containing core duplicons of the GOLGA gene family and correspond to the site of an ancestral centromere, which became inactivated about 25 million years ago. The inactivation of this centromere likely released segmental duplications from recombination repression typical of centromeric regions. We hypothesize that this increased the frequency of ectopic recombination creating a hotspot of hominid inversions where dispersed GOLGA core elements now predispose this region to recurrent genomic rearrangements associated with disease.


Subject(s)
Chromosome Inversion , Chromosomes, Human, Pair 15/genetics , Segmental Duplications, Genomic , Animals , Autoantigens/genetics , Chromosomal Instability , Evolution, Molecular , Gene Dosage , Gene Rearrangement , Genetic Variation , Golgi Matrix Proteins/genetics , Hominidae/genetics , Humans , Multigene Family , Phylogeny , Primates/genetics , Recombination, Genetic , Species Specificity
17.
Genome Res ; 28(6): 910-920, 2018 06.
Article in English | MEDLINE | ID: mdl-29776991

ABSTRACT

For many years, inversions have been proposed to be a direct driving force in speciation since they suppress recombination when heterozygous. Inversions are the most common large-scale differences among humans and great apes. Nevertheless, they represent large events easily distinguishable by classical cytogenetics, whose resolution, however, is limited. Here, we performed a genome-wide comparison between human, great ape, and macaque genomes using the net alignments for the most recent releases of genome assemblies. We identified a total of 156 putative inversions, between 103 kb and 91 Mb, corresponding to 136 human loci. Combining literature, sequence, and experimental analyses, we analyzed 109 of these loci and found 67 regions inverted in one or multiple primates, including 28 newly identified inversions. These events overlap with 81 human genes at their breakpoints, and seven correspond to sites of recurrent rearrangements associated with human disease. This work doubles the number of validated primate inversions larger than 100 kb, beyond what was previously documented. We identified 74 sites of errors, where the sequence has been assembled in the wrong orientation, in the reference genomes analyzed. Our data serve two purposes: First, we generated a map of evolutionary inversions in these genomes representing a resource for interrogating differences among these species at a functional level; second, we provide a list of misassembled regions in these primate genomes, involving over 300 Mb of DNA and 1978 human genes. Accurately annotating these regions in the genome references has immediate applications for evolutionary and biomedical studies on primates.


Subject(s)
Chromosome Inversion/genetics , Genome, Human/genetics , Primates/genetics , Sequence Inversion/genetics , Animals , Evolution, Molecular , Humans , Molecular Sequence Annotation , Pan troglodytes/genetics
18.
Nat Ecol Evol ; 1(3): 69, 2017.
Article in English | MEDLINE | ID: mdl-28580430

ABSTRACT

Segmental duplications contribute to human evolution, adaptation and genomic instability but are often poorly characterized. We investigate the evolution, genetic variation and coding potential of human-specific segmental duplications (HSDs). We identify 218 HSDs based on analysis of 322 deeply sequenced archaic and contemporary hominid genomes. We sequence 550 human and nonhuman primate genomic clones to reconstruct the evolution of the largest, most complex regions with protein-coding potential (n=80 genes/33 gene families). We show that HSDs are non-randomly organized, associate preferentially with ancestral ape duplications termed "core duplicons", and evolved primarily in an interspersed inverted orientation. In addition to Homo sapiens-specific gene expansions (e.g., TCAF1/2), we highlight ten gene families (e.g., ARHGAP11B and SRGAP2C) where copy number never returns to the ancestral state, there is evidence of mRNA splicing, and no common gene-disruptive mutations are observed in the general population. Such duplicates are candidates for the evolution of human-specific adaptive traits.

19.
Genome Biol ; 18(1): 49, 2017 03 09.
Article in English | MEDLINE | ID: mdl-28279197

ABSTRACT

BACKGROUND: Gene innovation by duplication is a fundamental evolutionary process but is difficult to study in humans due to the large size, high sequence identity, and mosaic nature of segmental duplication blocks. The human-specific gene hydrocephalus-inducing 2, HYDIN2, was generated by a 364 kbp duplication of 79 internal exons of the large ciliary gene HYDIN from chromosome 16q22.2 to chromosome 1q21.1. Because the HYDIN2 locus lacks the ancestral promoter and seven terminal exons of the progenitor gene, we sought to characterize transcription at this locus by coupling reverse transcription polymerase chain reaction and long-read sequencing. RESULTS: 5' RACE indicates a transcription start site for HYDIN2 outside of the duplication and we observe fusion transcripts spanning both the 5' and 3' breakpoints. We observe extensive splicing diversity leading to the formation of altered open reading frames (ORFs) that appear to be under relaxed selection. We show that HYDIN2 adopted a new promoter that drives an altered pattern of expression, with highest levels in neural tissues. We estimate that the HYDIN duplication occurred ~3.2 million years ago and find that it is nearly fixed (99.9%) for diploid copy number in contemporary humans. Examination of 73 chromosome 1q21 rearrangement patients reveals that HYDIN2 is deleted or duplicated in most cases. CONCLUSIONS: Together, these data support a model of rapid gene innovation by fusion of incomplete segmental duplications, altered tissue expression, and potential subfunctionalization or neofunctionalization of HYDIN2 early in the evolution of the Homo lineage.


Subject(s)
Gene Duplication , Gene Fusion , Neurons/metabolism , Chromosome Aberrations , Chromosome Breakpoints , Chromosome Disorders/genetics , Chromosomes, Human, Pair 1 , DNA Copy Number Variations , Evolution, Molecular , Gene Conversion , Gene Expression Profiling , Genetic Variation , Genetics, Population , Genomics/methods , Humans , Open Reading Frames , Organ Specificity/genetics , Phenotype , Selection, Genetic , Transcription, Genetic
20.
BMC Genomics ; 18(1): 65, 2017 01 10.
Article in English | MEDLINE | ID: mdl-28073353

ABSTRACT

BACKGROUND: Although many algorithms are now available that aim to characterize different classes of structural variation, discovery of balanced rearrangements such as inversions remains an open problem. This is mainly due to the fact that breakpoints of such events typically lie within segmental duplications or common repeats, which reduces the mappability of short reads. The algorithms developed within the 1000 Genomes Project to identify inversions are limited to relatively short inversions, and there are currently no available algorithms to discover large inversions using high throughput sequencing technologies. RESULTS: Here we propose a novel algorithm, VALOR, to discover large inversions using new sequencing methods that provide long range information such as 10X Genomics linked-read sequencing, pooled clone sequencing, or other similar technologies that we commonly refer to as long range sequencing. We demonstrate the utility of VALOR using both pooled clone sequencing and 10X Genomics linked-read sequencing generated from the genome of an individual from the HapMap project (NA12878). We also provide a comprehensive comparison of VALOR against several state-of-the-art structural variation discovery algorithms that use whole genome shotgun sequencing data. CONCLUSIONS: In this paper, we show that VALOR is able to accurately discover all previously identified and experimentally validated large inversions in the same genome with a low false discovery rate. Using VALOR, we also predicted a novel inversion, which we validated using fluorescent in situ hybridization. VALOR is available at https://github.com/BilkentCompGen/VALOR.


Subject(s)
Genomics/methods , Sequence Inversion/genetics , Algorithms , Genome, Human/genetics , High-Throughput Nucleotide Sequencing , Humans , Whole Genome Sequencing
SELECTION OF CITATIONS
SEARCH DETAIL