RESUMEN
We sequenced and assembled using multiple long-read sequencing technologies the genomes of chimpanzee, bonobo, gorilla, orangutan, gibbon, macaque, owl monkey, and marmoset. We identified 1,338,997 lineage-specific fixed structural variants (SVs) disrupting 1,561 protein-coding genes and 136,932 regulatory elements, including the most complete set of human-specific fixed differences. We estimate that 819.47 Mbp or â¼27% of the genome has been affected by SVs across primate evolution. We identify 1,607 structurally divergent regions wherein recurrent structural variation contributes to creating SV hotspots where genes are recurrently lost (e.g., CARD, C4, and OLAH gene families) and additional lineage-specific genes are generated (e.g., CKAP2, VPS36, ACBD7, and NEK5 paralogs), becoming targets of rapid chromosomal diversification and positive selection (e.g., RGPD gene family). High-fidelity long-read sequencing has made these dynamic regions of the genome accessible for sequence-level analyses within and between primate species.
Asunto(s)
Genoma , Primates , Animales , Humanos , Secuencia de Bases , Primates/clasificación , Primates/genética , Evolución Biológica , Análisis de Secuencia de ADN , Variación Estructural del GenomaRESUMEN
Apes possess two sex chromosomes-the male-specific Y chromosome and the X chromosome, which is present in both males and females. The Y chromosome is crucial for male reproduction, with deletions being linked to infertility1. The X chromosome is vital for reproduction and cognition2. Variation in mating patterns and brain function among apes suggests corresponding differences in their sex chromosomes. However, owing to their repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the methodology developed for the telomere-to-telomere (T2T) human genome, we produced gapless assemblies of the X and Y chromosomes for five great apes (bonobo (Pan paniscus), chimpanzee (Pan troglodytes), western lowland gorilla (Gorilla gorilla gorilla), Bornean orangutan (Pongo pygmaeus) and Sumatran orangutan (Pongo abelii)) and a lesser ape (the siamang gibbon (Symphalangus syndactylus)), and untangled the intricacies of their evolution. Compared with the X chromosomes, the ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements-owing to the accumulation of lineage-specific ampliconic regions, palindromes, transposable elements and satellites. Many Y chromosome genes expand in multi-copy families and some evolve under purifying selection. Thus, the Y chromosome exhibits dynamic evolution, whereas the X chromosome is more stable. Mapping short-read sequencing data to these assemblies revealed diversity and selection patterns on sex chromosomes of more than 100 individual great apes. These reference assemblies are expected to inform human evolution and conservation genetics of non-human apes, all of which are endangered species.
Asunto(s)
Hominidae , Cromosoma X , Cromosoma Y , Animales , Femenino , Masculino , Gorilla gorilla/genética , Hominidae/genética , Hominidae/clasificación , Hylobatidae/genética , Pan paniscus/genética , Pan troglodytes/genética , Filogenia , Pongo abelii/genética , Pongo pygmaeus/genética , Telómero/genética , Cromosoma X/genética , Cromosoma Y/genética , Evolución Molecular , Variaciones en el Número de Copia de ADN/genética , Humanos , Especies en Peligro de Extinción , Estándares de ReferenciaRESUMEN
The retrotransposon LINE-1 (L1) is central to the recent evolutionary history of the human genome and continues to drive genetic diversity and germline pathogenesis. However, the spatiotemporal extent and biological significance of somatic L1 activity are poorly defined and are virtually unexplored in other primates. From a single L1 lineage active at the divergence of apes and Old World monkeys, successive L1 subfamilies have emerged in each descendant primate germline. As revealed by case studies, the presently active human L1 subfamily can also mobilize during embryonic and brain development in vivo. It is unknown whether nonhuman primate L1s can similarly generate somatic insertions in the brain. Here we applied approximately 40× single-cell whole-genome sequencing (scWGS), as well as retrotransposon capture sequencing (RC-seq), to 20 hippocampal neurons from two rhesus macaques (Macaca mulatta). In one animal, we detected and PCR-validated a somatic L1 insertion that generated target site duplications, carried a short 5' transduction, and was present in â¼7% of hippocampal neurons but absent from cerebellum and nonbrain tissues. The corresponding donor L1 allele was exceptionally mobile in vitro and was embedded in PRDM4, a gene expressed throughout development and in neural stem cells. Nanopore long-read methylome and RNA-seq transcriptome analyses indicated young retrotransposon subfamily activation in the early embryo, followed by repression in adult tissues. These data highlight endogenous macaque L1 retrotransposition potential, provide prototypical evidence of L1-mediated somatic mosaicism in a nonhuman primate, and allude to L1 mobility in the brain over the past 30 million years of human evolution.
Asunto(s)
Encéfalo , Elementos de Nucleótido Esparcido Largo , Retroelementos , Animales , Proteínas de Unión al ADN/genética , Macaca mulatta/genética , Neuronas , Retroelementos/genética , Factores de Transcripción/genéticaRESUMEN
Embryonic aneuploidy is highly complex, often leading to developmental arrest, implantation failure or spontaneous miscarriage in both natural and assisted reproduction. Despite our knowledge of mitotic mis-segregation in somatic cells, the molecular pathways regulating chromosome fidelity during the error-prone cleavage-stage of mammalian embryogenesis remain largely undefined. Using bovine embryos and live-cell fluorescent imaging, we observed frequent micro-/multi-nucleation of mis-segregated chromosomes in initial mitotic divisions that underwent unilateral inheritance, re-fused with the primary nucleus or formed a chromatin bridge with neighboring cells. A correlation between a lack of syngamy, multipolar divisions and asymmetric genome partitioning was also revealed, and single-cell DNA-seq showed propagation of primarily non-reciprocal mitotic errors. Depletion of the mitotic checkpoint protein BUB1B (also known as BUBR1) resulted in similarly abnormal nuclear structures and cell divisions, as well as chaotic aneuploidy and dysregulation of the kinase-substrate network that mediates mitotic progression, all before zygotic genome activation. This demonstrates that embryonic micronuclei sustain multiple fates, provides an explanation for blastomeres with uniparental origins, and substantiates defective checkpoints and likely other maternally derived factors as major contributors to the karyotypic complexity afflicting mammalian preimplantation development.
Asunto(s)
Aneuploidia , Blastómeros , Animales , Bovinos , Cromosomas , Desarrollo Embrionario/genética , Cariotipificación , Mamíferos/genética , Mitosis/genéticaRESUMEN
Preeclampsia is a hypertensive disorder of pregnancy that affects â¼2%-5% of all pregnancies, contributes to 4 of the top 10 causes of pregnancy-related deaths, and remains a long-term risk factor for cardiometabolic diseases. Yet, little is still known about the molecular mechanisms that lead to this disease. There is evidence that some cases have a genetic cause. However, it is well appreciated that harmful factors in the environment, such as poor nutrition, stress, and toxins, may lead to epigenetics changes that can contribute to this disease. DNA methylation is one of the epigenetic modifications known to be fairly stable and impact gene expression. Using DNA from buccal swabs, we analyzed global DNA methylation among three groups of individuals: mothers who experienced 1) early-stage preeclampsia (<32 wk), 2) late-stage preeclampsia (>37 wk), or 3) no complications during their pregnancies, as well as the children from these three groups. We found significant differentially methylated regions (DMRs) between mothers who experienced preeclampsia compared with those with no complications adjacent or within genes that are important for placentation, embryonic development, cell adhesion, and inflammation (e.g., the cadherin pathway). A significant portion of DMR genes showed expression in tissues relevant to preeclampsia (i.e., the brain, heart, kidney, uterus, ovaries, and placenta). As this study was performed on DNA extracted from cheek swabs, this opens the way to future studies in different tissues, aimed at identifying possible biomarkers of risk and early detection, developing targeted interventions, and reducing the progression of this life-threatening disease.NEW & NOTEWORTHY Preeclampsia is a life-threatening hypertensive disorder, affecting 2%-5% of pregnancies, that remains poorly understood. This study analyzed DNA methylation from buccal swabs from mothers who experienced early and late-stage preeclampsia and those with uncomplicated pregnancies, along with their children. Differentially methylated regions were found near and within genes crucial for placental function, embryonic development, and inflammation. Many of these genes are expressed in preeclampsia-related tissues, offering hope for future biomarker development for this condition.
Asunto(s)
Hipertensión , Preeclampsia , Niño , Embarazo , Femenino , Humanos , Placenta/metabolismo , Preeclampsia/diagnóstico , Epigenoma , Metilación de ADN/genética , Hipertensión/genética , Biomarcadores/metabolismo , Inflamación/genética , ADNRESUMEN
Asthma susceptibility is influenced by environmental, genetic, and epigenetic factors. DNA methylation is one form of epigenetic modification that regulates gene expression and is both inherited and modified by environmental exposures throughout life. Prenatal development is a particularly vulnerable time period during which exposure to maternal asthma increases asthma risk in offspring. How maternal asthma affects DNA methylation in offspring and what the consequences of differential methylation are in subsequent generations are not fully known. In this study, we tested the effects of grandmaternal house dust mite (HDM) allergen sensitization during pregnancy on airway physiology and inflammation in HDM-sensitized and challenged second-generation mice. We also tested the effects of grandmaternal HDM sensitization on tissue-specific DNA methylation in allergen-naïve and -sensitized second-generation mice. Descendants of both allergen- and vehicle-exposed grandmaternal founders exhibited airway hyperreactivity after HDM sensitization. However, grandmaternal allergen sensitization significantly potentiated airway hyperreactivity and altered the epigenomic trajectory in second-generation offspring after HDM sensitization compared with HDM-sensitized offspring from vehicle-exposed founders. As a result, biological processes and signaling pathways associated with epigenetic modifications were distinct between lineages. A targeted analysis of pathway-associated gene expression found that Smad3 was significantly dysregulated as a result of grandmaternal allergen sensitization. These data show that grandmaternal allergen exposure during pregnancy establishes a unique epigenetic trajectory that reprograms allergen responses in second-generation offspring and may contribute to asthma risk.NEW & NOTEWORTHY Asthma susceptibility is influenced by environmental, genetic, and epigenetic factors. This study shows that maternal allergen exposure during pregnancy promotes unique epigenetic trajectories in second-generation offspring at baseline and in response to allergen sensitization, which is associated with the potentiation of airway hyperreactivity. These effects are one mechanism by which maternal asthma may influence the inheritance of asthma risk.
Asunto(s)
Asma , Efectos Tardíos de la Exposición Prenatal , Embarazo , Humanos , Femenino , Ratones , Animales , Alérgenos , Epigenómica , Efectos Tardíos de la Exposición Prenatal/genética , Asma/genética , Susceptibilidad a Enfermedades , Epigénesis Genética , PyroglyphidaeRESUMEN
The Javan gibbon, Hylobates moloch, is an endangered gibbon species restricted to the forest remnants of western and central Java, Indonesia, and one of the rarest of the Hylobatidae family. Hylobatids consist of 4 genera (Holoock, Hylobates, Symphalangus, and Nomascus) that are characterized by different numbers of chromosomes, ranging from 38 to 52. The underlying cause of this karyotype plasticity is not entirely understood, at least in part, due to the limited availability of genomic data. Here we present the first scaffold-level assembly for H. moloch using a combination of whole-genome Illumina short reads, 10X Chromium linked reads, PacBio, and Oxford Nanopore long reads and proximity-ligation data. This Hylobates genome represents a valuable new resource for comparative genomics studies in primates.
Asunto(s)
Genoma , Hylobates , Animales , Hylobates/genética , Bosques , Especies en Peligro de Extinción , IndonesiaRESUMEN
Co-option of transposable elements (TEs) to become part of existing or new enhancers is an important mechanism for evolution of gene regulation. However, contributions of lineage-specific TE insertions to recent regulatory adaptations remain poorly understood. Gibbons present a suitable model to study these contributions as they have evolved a lineage-specific TE called LAVA (LINE-AluSz-VNTR-AluLIKE), which is still active in the gibbon genome. The LAVA retrotransposon is thought to have played a role in the emergence of the highly rearranged structure of the gibbon genome by disrupting transcription of cell cycle genes. In this study, we investigated whether LAVA may have also contributed to the evolution of gene regulation by adopting enhancer function. We characterized fixed and polymorphic LAVA insertions across multiple gibbons and found 96 LAVA elements overlapping enhancer chromatin states. Moreover, LAVA was enriched in multiple transcription factor binding motifs, was bound by an important transcription factor (PU.1), and was associated with higher levels of gene expression in cis We found gibbon-specific signatures of purifying/positive selection at 27 LAVA insertions. Two of these insertions were fixed in the gibbon lineage and overlapped with enhancer chromatin states, representing putative co-opted LAVA enhancers. These putative enhancers were located within genes encoding SETD2 and RAD9A, two proteins that facilitate accurate repair of DNA double-strand breaks and prevent chromosomal rearrangement mutations. Co-option of LAVA in these genes may have influenced regulation of processes that preserve genome integrity. Our findings highlight the importance of considering lineage-specific TEs in studying evolution of gene regulatory elements.
Asunto(s)
Genoma , Hylobates/genética , Retroelementos , Animales , Cromatina/genética , Evolución Molecular , Regulación de la Expresión Génica , Hylobates/clasificación , Mutagénesis Insercional , Secuencias Reguladoras de Ácidos Nucleicos , Especificidad de la EspecieRESUMEN
Centromeres are functionally conserved chromosomal loci essential for proper chromosome segregation during cell division, yet they show high sequence diversity across species. Despite their variation, a near universal feature of centromeres is the presence of repetitive sequences, such as DNA satellites and transposable elements (TEs). Because of their rapidly evolving karyotypes, gibbons represent a compelling model to investigate divergence of functional centromere sequences across short evolutionary timescales. In this study, we use ChIP-seq, RNA-seq, and fluorescence in situ hybridization to comprehensively investigate the centromeric repeat content of the four extant gibbon genera (Hoolock, Hylobates, Nomascus, and Siamang). In all gibbon genera, we find that CENP-A nucleosomes and the DNA-proteins that interface with the inner kinetochore preferentially bind retroelements of broad classes rather than satellite DNA. A previously identified gibbon-specific composite retrotransposon, LAVA, known to be expanded within the centromere regions of one gibbon genus (Hoolock), displays centromere- and species-specific sequence differences, potentially as a result of its co-option to a centromeric function. When dissecting centromere satellite composition, we discovered the presence of the retroelement-derived macrosatellite SST1 in multiple centromeres of Hoolock, whereas alpha-satellites represent the predominate satellite in the other genera, further suggesting an independent evolutionary trajectory for Hoolock centromeres. Finally, using de novo assembly of centromere sequences, we determined that transcripts originating from gibbon centromeres recapitulate the species-specific TE composition. Combined, our data reveal dynamic shifts in the repeat content that define gibbon centromeres and coincide with the extensive karyotypic diversity within this lineage.
Asunto(s)
Centrómero , Hylobates , Animales , Centrómero/genética , ADN Satélite/genética , Hylobates/genética , Hibridación Fluorescente in Situ , Retroelementos/genéticaRESUMEN
Aneuploidy that arises during meiosis and/or mitosis is a major contributor to early embryo loss. We previously showed that human preimplantation embryos encapsulate missegregated chromosomes into micronuclei while undergoing cellular fragmentation and that fragments can contain chromosomal material, but the source of this DNA was unknown. Here, we leveraged the use of a nonhuman primate model and single-cell DNA-sequencing (scDNA-seq) to examine the chromosomal content of 471 individual samples comprising 254 blastomeres, 42 polar bodies, and 175 cellular fragments from a large number (N = 50) of disassembled rhesus cleavage-stage embryos. Our analysis revealed that the aneuploidy and micronucleation frequency is conserved between humans and macaques, and that fragments encapsulate whole and/or partial chromosomes lost from blastomeres. Single-cell/fragment genotyping showed that these chromosome-containing cellular fragments (CCFs) can be maternally or paternally derived and display double-stranded DNA breaks. DNA breakage was further indicated by reciprocal subchromosomal losses/gains between blastomeres and large segmental errors primarily detected at the terminal ends of chromosomes. By combining time-lapse imaging with scDNA-seq, we determined that multipolar divisions at the zygote or two-cell stage were associated with CCFs and generated a random mixture of chromosomally normal and abnormal blastomeres with uniparental or biparental origins. Despite frequent chromosome missegregation at the cleavage-stage, we show that CCFs and nondividing aneuploid blastomeres showing extensive DNA damage are prevented from incorporation into blastocysts. These findings suggest that embryos respond to chromosomal errors by encapsulation into micronuclei, elimination via cellular fragmentation, and selection against highly aneuploid blastomeres to overcome chromosome instability during preimplantation development.
Asunto(s)
Aneuploidia , Blastocisto/citología , Blastómeros/citología , Micronúcleos con Defecto Cromosómico/embriología , Animales , Segregación Cromosómica , Cromosomas/genética , Roturas del ADN de Doble Cadena , Femenino , Macaca , Análisis de la Célula IndividualRESUMEN
BACKGROUND: Proper placentation, including trophoblast differentiation and function, is essential for the health and well-being of both the mother and baby throughout pregnancy. Placental abnormalities that occur during the early stages of development are thought to contribute to preeclampsia and other placenta-related pregnancy complications. However, relatively little is known about these stages in humans due to obvious ethical and technical limitations. Rhesus macaques are considered an ideal surrogate for studying human placentation, but the unclear translatability of known human placental markers and lack of accessible rhesus trophoblast cell lines can impede the use of this animal model. RESULTS: Here, we performed a cross-species transcriptomic comparison of human and rhesus placenta and determined that while the majority of human placental marker genes (HPGs) were similarly expressed, 952 differentially expressed genes (DEGs) were identified between the two species. Functional enrichment analysis of the 447 human-upregulated DEGs, including ADAM12, ERVW-1, KISS1, LGALS13, PAPPA2, PGF, and SIGLEC6, revealed over-representation of genes implicated in preeclampsia and other pregnancy disorders. Additionally, to enable in vitro functional studies of early placentation, we generated and thoroughly characterized two highly pure first trimester telomerase (TERT) immortalized rhesus trophoblast cell lines (iRP-D26 and iRP-D28A) that retained crucial features of isolated primary trophoblasts. CONCLUSIONS: Overall, our findings help elucidate the molecular translatability between human and rhesus placenta and reveal notable expression differences in several HPGs and genes implicated in pregnancy complications that should be considered when using the rhesus animal model to study normal and pathological human placentation.
Asunto(s)
Placenta , Animales , Femenino , Galectinas , Humanos , Macaca mulatta/genética , Placentación/genética , Preeclampsia , Embarazo , Proteínas Gestacionales , Transcriptoma , TrofoblastosRESUMEN
The relationship between evolutionary genome remodeling and the three-dimensional structure of the genome remain largely unexplored. Here, we use the heavily rearranged gibbon genome to examine how evolutionary chromosomal rearrangements impact genome-wide chromatin interactions, topologically associating domains (TADs), and their epigenetic landscape. We use high-resolution maps of gibbon-human breaks of synteny (BOS), apply Hi-C in gibbon, measure an array of epigenetic features, and perform cross-species comparisons. We find that gibbon rearrangements occur at TAD boundaries, independent of the parameters used to identify TADs. This overlap is supported by a remarkable genetic and epigenetic similarity between BOS and TAD boundaries, namely presence of CpG islands and SINE elements, and enrichment in CTCF and H3K4me3 binding. Cross-species comparisons reveal that regions orthologous to BOS also correspond with boundaries of large (400-600 kb) TADs in human and other mammalian species. The colocalization of rearrangement breakpoints and TAD boundaries may be due to higher chromatin fragility at these locations and/or increased selective pressure against rearrangements that disrupt TAD integrity. We also examine the small portion of BOS that did not overlap with TAD boundaries and gave rise to novel TADs in the gibbon genome. We postulate that these new TADs generally lack deleterious consequences. Last, we show that limited epigenetic homogenization occurs across breakpoints, irrespective of their time of occurrence in the gibbon lineage. Overall, our findings demonstrate remarkable conservation of chromatin interactions and epigenetic landscape in gibbons, in spite of extensive genomic shuffling.
Asunto(s)
Epigénesis Genética/genética , Genoma/genética , Animales , Cromatina/genética , Islas de CpG/genética , Epigenómica/métodos , Genómica/métodos , Humanos , Sintenía/genéticaRESUMEN
Mastomys are the most widespread African rodent and carriers of various diseases such as the plague or Lassa virus. In addition, mastomys have rapidly gained a large number of mammary glands. Here, we generated a genome, variome, and transcriptomes for Mastomys coucha. As mastomys diverged at similar times from mouse and rat, we demonstrate their utility as a comparative genomic tool for these commonly used animal models. Furthermore, we identified over 500 mastomys accelerated regions, often residing near important mammary developmental genes or within their exons leading to protein sequence changes. Functional characterization of a noncoding mastomys accelerated region, located in the HoxD locus, showed enhancer activity in mouse developing mammary glands. Combined, our results provide genomic resources for mastomys and highlight their potential both as a comparative genomic tool and for the identification of mammary gland number determining factors.
Asunto(s)
Genoma , Murinae/genética , Animales , Masculino , Ratones , Murinae/metabolismo , Filogeografía , Ratas , TranscriptomaRESUMEN
Single-cell genome sequencing has proven valuable for the detection of somatic variation, particularly in the context of tumor evolution. Current technologies suffer from high library construction costs, which restrict the number of cells that can be assessed and thus impose limitations on the ability to measure heterogeneity within a tissue. Here, we present single-cell combinatorial indexed sequencing (SCI-seq) as a means of simultaneously generating thousands of low-pass single-cell libraries for detection of somatic copy-number variants. We constructed libraries for 16,698 single cells from a combination of cultured cell lines, primate frontal cortex tissue and two human adenocarcinomas, and obtained a detailed assessment of subclonal variation within a pancreatic tumor.
Asunto(s)
Adenocarcinoma/genética , Mapeo Cromosómico/métodos , Variaciones en el Número de Copia de ADN/genética , Lóbulo Frontal/citología , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Neoplasias Pancreáticas/genética , Análisis de Secuencia de ADN/métodos , Análisis de la Célula Individual/métodos , Animales , Línea Celular Tumoral , Biblioteca de Genes , Genoma Humano/genética , Células HeLa , Humanos , Macaca mulattaRESUMEN
Gibbons are small arboreal apes that display an accelerated rate of evolutionary chromosomal rearrangement and occupy a key node in the primate phylogeny between Old World monkeys and great apes. Here we present the assembly and analysis of a northern white-cheeked gibbon (Nomascus leucogenys) genome. We describe the propensity for a gibbon-specific retrotransposon (LAVA) to insert into chromosome segregation genes and alter transcription by providing a premature termination site, suggesting a possible molecular mechanism for the genome plasticity of the gibbon lineage. We further show that the gibbon genera (Nomascus, Hylobates, Hoolock and Symphalangus) experienced a near-instantaneous radiation â¼5 million years ago, coincident with major geographical changes in southeast Asia that caused cycles of habitat compression and expansion. Finally, we identify signatures of positive selection in genes important for forelimb development (TBX5) and connective tissues (COL1A1) that may have been involved in the adaptation of gibbons to their arboreal habitat.
Asunto(s)
Genoma/genética , Hylobates/clasificación , Hylobates/genética , Cariotipo , Filogenia , Animales , Evolución Molecular , Hominidae/clasificación , Hominidae/genética , Humanos , Datos de Secuencia Molecular , Retroelementos/genética , Selección Genética , Terminación de la Transcripción GenéticaRESUMEN
The steady occurrence of DNA mutations is a key source for evolution, generating the genomic variation in the population upon which natural selection acts. Mutations driving evolution have to occur in the oocytes and sperm in order to be transmitted to the next generation. Through similar mechanisms, mutations also accumulate in somatic cells (e.g., skin cells, neurons, lymphocytes) during development and adult life. The concept that somatic cells can collect new mutations with time suggests that we are a mosaic of cells with different genomic compositions. Particular attention has been recently paid to somatic mutations in the brain, with a focus on the relationship between this phenomenon and the origin of human diseases. Given this progressive accumulation of mutations, it is likely that an increased load of somatic mutations is present later in life and that this could be associated with late-life diseases and aging. In this review, we focus on a particular type of mutation: the loss and/or gain of whole chromosomes (i.e., aneuploidy) caused by errors in chromosomes segregation in neurons and glia. Currently, it is hard to grasp the functional impact of somatic mutation in the brain because we lack reliable estimates of the proportion of aneuploid cells in the normal brain across different ages. Here, we revisit the key studies that attempted to quantify the proportion of aneuploid cells in both normal and diseased brains and highlight the deep inconsistencies among the different studies done in the last 15 years. Finally, our review highlights several limitations of studies performed in human and rodent models and explores a possible translational role for non-human primates.
Asunto(s)
Aneuploidia , Encefalopatías/genética , Modelos Genéticos , Animales , Encéfalo/metabolismo , Encefalopatías/metabolismo , Humanos , MutaciónRESUMEN
Most common methods for inferring transposable element (TE) evolutionary relationships are based on dividing TEs into subfamilies using shared diagnostic nucleotides. Although originally justified based on the "master gene" model of TE evolution, computational and experimental work indicates that many of the subfamilies generated by these methods contain multiple source elements. This implies that subfamily-based methods give an incomplete picture of TE relationships. Studies on selection, functional exaptation, and predictions of horizontal transfer may all be affected. Here, we develop a Bayesian method for inferring TE ancestry that gives the probability that each sequence was replicative, its frequency of replication, and the probability that each extant TE sequence came from each possible ancestral sequence. Applying our method to 986 members of the newly-discovered LAVA family of TEs, we show that there were far more source elements in the history of LAVA expansion than subfamilies identified using the CoSeg subfamily-classification program. We also identify multiple replicative elements in the AluSc subfamily in humans. Our results strongly indicate that a reassessment of subfamily structures is necessary to obtain accurate estimates of mutation processes, phylogenetic relationships and historical times of activity.
Asunto(s)
Elementos Transponibles de ADN/genética , Evolución Molecular , Filogenia , Teorema de Bayes , Transferencia de Gen Horizontal/genética , Humanos , MutaciónRESUMEN
BACKGROUND: Rhesus macaques are widely used in biomedical research, but the application of genomic information in this species to better understand human disease is still in its infancy. Whole-genome sequence (WGS) data in large pedigreed macaque colonies could provide substantial experimental power for genetic discovery, but the collection of WGS data in large cohorts remains a formidable expense. Here, we describe a cost-effective approach that selects the most informative macaques in a pedigree for 30X WGS, followed by low-cost genotyping-by-sequencing (GBS) at 30X on the remaining macaques in order to generate sparse genotype data at high accuracy. Dense variants from the selected macaques with WGS data are then imputed into macaques having only sparse GBS data, resulting in dense genome-wide genotypes throughout the pedigree. RESULTS: We developed GBS for the macaque genome using a digestion with PstI, followed by sequencing of size-selected fragments at 30X coverage. From GBS sequence data collected on all individuals in a 16-member pedigree, we characterized high-confidence genotypes at 22,455 single nucleotide variant (SNV) sites that were suitable for guiding imputation of dense sequence data from WGS. To characterize dense markers for imputation, we performed WGS at 30X coverage on nine of the 16 individuals, yielding 10,193,425 high-confidence SNVs. To validate the use of GBS data for facilitating imputation, we initially focused on chromosome 19 as a test case, using an optimized panel of 833 sparse, evenly-spaced markers from GBS and 5,010 dense markers from WGS. Using the method of "Genotype Imputation Given Inheritance" (GIGI), we evaluated the effects on imputation accuracy of 3 different strategies for selecting individuals for WGS, including 1) using "GIGI-Pick" to select the most informative individuals, 2) using the most recent generation, or 3) using founders only. We also evaluated the effects on imputation accuracy of using a range of from 1 to 9 WGS individuals for imputation. We found that the GIGI-Pick algorithm for selection of WGS individuals outperformed common heuristic approaches, and that genotype numbers and accuracy improved very little when using >5 WGS individuals for imputation. Informed by our findings, we used 4 macaques with WGS data to impute variants at up to 7,655,491 sites spanning all 20 autosomes in the 12 remaining macaques, based on their GBS genotypes at only 17,158 loci. Using a strict confidence threshold, we imputed an average of 3,680,238 variants per individual at >99 % accuracy, or an average 4,458,883 variants per individual at a more relaxed threshold, yielding >97 % accuracy. CONCLUSIONS: We conclude that an optimal tradeoff between genotype accuracy, number of imputed genotypes, and overall cost exists at the ratio of one individual selected for WGS using the GIGI-Pick algorithm, per 3-5 relatives selected for GBS. This approach makes feasible the collection of accurate, dense genome-wide sequence data in large pedigreed macaque cohorts without the need for more expensive WGS data on all individuals.
Asunto(s)
Técnicas de Genotipaje/métodos , Macaca mulatta/genética , Análisis de Secuencia de ADN/métodos , Algoritmos , Animales , Cromosomas/genética , Biología Computacional/economía , Biología Computacional/métodos , Técnicas de Genotipaje/economía , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN/economíaRESUMEN
Naturally occurring admixture has now been documented in every major primate lineage, suggesting its key role in primate evolutionary history. Active primate hybrid zones can provide valuable insight into this process. Here, we investigate the history of admixture in one of the best-studied natural primate hybrid zones, between yellow baboons (Papio cynocephalus) and anubis baboons (Papio anubis) in the Amboseli ecosystem of Kenya. We generated a new genome assembly for yellow baboon and low-coverage genomewide resequencing data from yellow baboons, anubis baboons and known hybrids (n = 44). Using a novel composite likelihood method for estimating local ancestry from low-coverage data, we found high levels of genetic diversity and genetic differentiation between the parent taxa, and excellent agreement between genome-scale ancestry estimates and a priori pedigree, life history and morphology-based estimates (r(2) = 0.899). However, even putatively unadmixed Amboseli yellow individuals carried a substantial proportion of anubis ancestry, presumably due to historical admixture. Further, the distribution of shared vs. fixed differences between a putatively unadmixed Amboseli yellow baboon and an unadmixed anubis baboon, both sequenced at high coverage, is inconsistent with simple isolation-migration or equilibrium migration models. Our findings suggest a complex process of intermittent contact that has occurred multiple times in baboon evolutionary history, despite no obvious fitness costs to hybrids or major geographic or behavioural barriers. In combination with the extensive phenotypic data available for baboon hybrids, our results provide valuable context for understanding the history of admixture in primates, including in our own lineage.
Asunto(s)
Evolución Biológica , Variación Genética , Hibridación Genética , Papio anubis/genética , Papio cynocephalus/genética , Animales , Flujo Génico , Genética de Población , Genotipo , Kenia , Funciones de Verosimilitud , Modelos Genéticos , Linaje , FenotipoRESUMEN
The mechanism for generating double minutes chromosomes (dmin) and homogeneously staining regions (hsr) in cancer is still poorly understood. Through an integrated approach combining next-generation sequencing, single nucleotide polymorphism array, fluorescent in situ hybridization and polymerase chain reaction-based techniques, we inferred the fine structure of MYC-containing dmin/hsr amplicons harboring sequences from several different chromosomes in seven tumor cell lines, and characterized an unprecedented number of hsr insertion sites. Local chromosome shattering involving a single-step catastrophic event (chromothripsis) was recently proposed to explain clustered chromosomal rearrangements and genomic amplifications in cancer. Our bioinformatics analyses based on the listed criteria to define chromothripsis led us to exclude it as the driving force underlying amplicon genesis in our samples. Instead, the finding of coexisting heterogeneous amplicons, differing in their complexity and chromosome content, in cell lines derived from the same tumor indicated the occurrence of a multi-step evolutionary process in the genesis of dmin/hsr. Our integrated approach allowed us to gather a complete view of the complex chromosome rearrangements occurring within MYC amplicons, suggesting that more than one model may be invoked to explain the origin of dmin/hsr in cancer. Finally, we identified PVT1 as a target of fusion events, confirming its role as breakpoint hotspot in MYC amplification.