RESUMEN
We analyze whole-genome sequencing data from 141,431 Chinese women generated for non-invasive prenatal testing (NIPT). We use these data to characterize the population genetic structure and to investigate genetic associations with maternal and infectious traits. We show that the present day distribution of alleles is a function of both ancient migration and very recent population movements. We reveal novel phenotype-genotype associations, including several replicated associations with height and BMI, an association between maternal age and EMB, and between twin pregnancy and NRG1. Finally, we identify a unique pattern of circulating viral DNA in plasma with high prevalence of hepatitis B and other clinically relevant maternal infections. A GWAS for viral infections identifies an exceptionally strong association between integrated herpesvirus 6 and MOV10L1, which affects piwi-interacting RNA (piRNA) processing and PIWI protein function. These findings demonstrate the great value and potential of accumulating NIPT data for worldwide medical and genetic analyses.
Asunto(s)
Pueblo Asiatico/genética , Diagnóstico Prenatal/métodos , Adulto , Alelos , China , ADN/genética , Etnicidad/genética , Femenino , Frecuencia de los Genes/genética , Pruebas Genéticas , Variación Genética/genética , Genética de Población/métodos , Estudio de Asociación del Genoma Completo/métodos , Genómica/métodos , Migración Humana , Humanos , Embarazo , Análisis de Secuencia de ADNRESUMEN
Principal component analysis (PCA) is widely used in statistics, machine learning, and genomics for dimensionality reduction and uncovering low-dimensional latent structure. To address the challenges posed by ever-growing data size, fast and memory-efficient PCA methods have gained prominence. In this paper, we propose a novel randomized singular value decomposition (RSVD) algorithm implemented in PCAone, featuring a window-based optimization scheme that enables accelerated convergence while improving the accuracy. Additionally, PCAone incorporates out-of-core and multithreaded implementations for the existing Implicitly Restarted Arnoldi Method (IRAM) and RSVD. Through comprehensive evaluations using multiple large-scale real-world data sets in different fields, we show the advantage of PCAone over existing methods. The new algorithm achieves significantly faster computation time while maintaining accuracy comparable to the slower IRAM method. Notably, our analyses of UK Biobank, comprising around 0.5 million individuals and 6.1 million common single nucleotide polymorphisms, show that PCAone accurately computes the top 40 principal components within 9 h. This analysis effectively captures population structure, signals of selection, structural variants, and low recombination regions, utilizing <20 GB of memory and 20 CPU threads. Furthermore, when applied to single-cell RNA sequencing data featuring 1.3 million cells, PCAone, accurately capturing the top 40 principal components in 49 min. This performance represents a 10-fold improvement over state-of-the-art tools.
Asunto(s)
Bancos de Muestras Biológicas , Programas Informáticos , Humanos , Análisis de Componente Principal , Algoritmos , GenómicaRESUMEN
The maritime expansion of Scandinavian populations during the Viking Age (about AD 750-1050) was a far-flung transformation in world history1,2. Here we sequenced the genomes of 442 humans from archaeological sites across Europe and Greenland (to a median depth of about 1×) to understand the global influence of this expansion. We find the Viking period involved gene flow into Scandinavia from the south and east. We observe genetic structure within Scandinavia, with diversity hotspots in the south and restricted gene flow within Scandinavia. We find evidence for a major influx of Danish ancestry into England; a Swedish influx into the Baltic; and Norwegian influx into Ireland, Iceland and Greenland. Additionally, we see substantial ancestry from elsewhere in Europe entering Scandinavia during the Viking Age. Our ancient DNA analysis also revealed that a Viking expedition included close family members. By comparing with modern populations, we find that pigmentation-associated loci have undergone strong population differentiation during the past millennium, and trace positively selected loci-including the lactase-persistence allele of LCT and alleles of ANKA that are associated with the immune response-in detail. We conclude that the Viking diaspora was characterized by substantial transregional engagement: distinct populations influenced the genomic makeup of different regions of Europe, and Scandinavia experienced increased contact with the rest of the continent.
Asunto(s)
Flujo Génico/genética , Genética de Población , Genoma Humano/genética , Genómica , Migración Humana/historia , Alelos , Conjuntos de Datos como Asunto , Inglaterra , Evolución Molecular , Groenlandia , Historia Medieval , Humanos , Inmunidad/genética , Irlanda , Lactasa/genética , Lactasa/metabolismo , Masculino , Países Escandinavos y Nórdicos , Selección Genética , Análisis Espacio-Temporal , Adulto JovenRESUMEN
Accurate inference of population structure is important in many studies of population genetics. Here we present HaploNet, a method for performing dimensionality reduction and clustering of genetic data. The method is based on local clustering of phased haplotypes using neural networks from whole-genome sequencing or dense genotype data. By using Gaussian mixtures in a variational autoencoder framework, we are able to learn a low-dimensional latent space in which we cluster haplotypes along the genome in a highly scalable manner. We show that we can use haplotype clusters in the latent space to infer global population structure using haplotype information by exploiting the generative properties of our framework. Based on fitted neural networks and their latent haplotype clusters, we can perform principal component analysis and estimate ancestry proportions based on a maximum likelihood framework. Using sequencing data from simulations and closely related human populations, we show that our approach is better at distinguishing closely related populations than standard admixture and principal component analysis software. We further show that HaploNet is fast and highly scalable by applying it to genotype array data of the UK Biobank.
RESUMEN
African antelope diversity is a globally unique vestige of a much richer world-wide Pleistocene megafauna. Despite this, the evolutionary processes leading to the prolific radiation of African antelopes are not well understood. Here, we sequenced 145 whole genomes from both subspecies of the waterbuck (Kobus ellipsiprymnus), an African antelope believed to be in the process of speciation. We investigated genetic structure and population divergence and found evidence of a mid-Pleistocene separation on either side of the eastern Great Rift Valley, consistent with vicariance caused by a rain shadow along the so-called 'Kingdon's Line'. However, we also found pervasive evidence of both recent and widespread historical gene flow across the Rift Valley barrier. By inferring the genome-wide landscape of variation among subspecies, we found 14 genomic regions of elevated differentiation, including a locus that may be related to each subspecies' distinctive coat pigmentation pattern. We investigated these regions as candidate speciation islands. However, we observed no significant reduction in gene flow in these regions, nor any indications of selection against hybrids. Altogether, these results suggest a pattern whereby climatically driven vicariance is the most important process driving the African antelope radiation, and suggest that reproductive isolation may not set in until very late in the divergence process. This has a significant impact on taxonomic inference, as many taxa will be in a gray area of ambiguous systematic status, possibly explaining why it has been hard to achieve consensus regarding the species status of many African antelopes. Our analyses demonstrate how population genetics based on low-depth whole genome sequencing can provide new insights that can help resolve how far lineages have gone along the path to speciation.
RESUMEN
Impalas are unusual among bovids because they have remained morphologically similar over millions of years-a phenomenon referred to as evolutionary stasis. Here, we sequenced 119 whole genomes from the two extant subspecies of impala, the common (Aepyceros melampus melampus) and black-faced (A. m. petersi) impala. We investigated the evolutionary forces working within the species to explore how they might be associated with its evolutionary stasis as a taxon. Despite being one of the most abundant bovid species, we found low genetic diversity overall, and a phylogeographic signal of spatial expansion from southern to eastern Africa. Contrary to expectations under a scenario of evolutionary stasis, we found pronounced genetic structure between and within the two subspecies with indications of ancient, but not recent, gene flow. Black-faced impala and eastern African common impala populations had more runs of homozygosity than common impala in southern Africa, and, using a proxy for genetic load, we found that natural selection is working less efficiently in these populations compared to the southern African populations. Together with the fossil record, our results are consistent with a fixed-optimum model of evolutionary stasis, in which impalas in the southern African core of the range are able to stay near their evolutionary fitness optimum as a generalist ecotone species, whereas eastern African impalas may struggle to do so due to the effects of genetic drift and reduced adaptation to the local habitat, leading to recurrent local extinction in eastern Africa and re-colonisation from the South.
Asunto(s)
Flujo Génico , Variación Genética , Genética de Población , Filogeografía , Selección Genética , Animales , Evolución Biológica , FilogeniaRESUMEN
Genomic studies of species threatened by extinction are providing crucial information about evolutionary mechanisms and genetic consequences of population declines and bottlenecks. However, to understand how species avoid the extinction vortex, insights can be drawn by studying species that thrive despite past declines. Here, we studied the population genomics of the muskox (Ovibos moschatus), an Ice Age relict that was at the brink of extinction for thousands of years at the end of the Pleistocene yet appears to be thriving today. We analysed 108 whole genomes, including present-day individuals representing the current native range of both muskox subspecies, the white-faced and the barren-ground muskox (O. moschatus wardi and O. moschatus moschatus) and a ~21,000-year-old ancient individual from Siberia. We found that the muskox' demographic history was profoundly shaped by past climate changes and post-glacial re-colonizations. In particular, the white-faced muskox has the lowest genome-wide heterozygosity recorded in an ungulate. Yet, there is no evidence of inbreeding depression in native muskox populations. We hypothesize that this can be explained by the effect of long-term gradual population declines that allowed for purging of strongly deleterious mutations. This study provides insights into how species with a history of population bottlenecks, small population sizes and low genetic diversity survive against all odds.
Asunto(s)
Metagenómica , Resiliencia Psicológica , Humanos , Animales , Recién Nacido , Evolución Biológica , Genómica , Rumiantes/genética , Variación Genética/genéticaRESUMEN
Despite broad agreement that the Americas were initially populated via Beringia, the land bridge that connected far northeast Asia with northwestern North America during the Pleistocene epoch, when and how the peopling of the Americas occurred remains unresolved. Analyses of human remains from Late Pleistocene Alaska are important to resolving the timing and dispersal of these populations. The remains of two infants were recovered at Upward Sun River (USR), and have been dated to around 11.5 thousand years ago (ka). Here, by sequencing the USR1 genome to an average coverage of approximately 17 times, we show that USR1 is most closely related to Native Americans, but falls basal to all previously sequenced contemporary and ancient Native Americans. As such, USR1 represents a distinct Ancient Beringian population. Using demographic modelling, we infer that the Ancient Beringian population and ancestors of other Native Americans descended from a single founding population that initially split from East Asians around 36 ± 1.5 ka, with gene flow persisting until around 25 ± 1.1 ka. Gene flow from ancient north Eurasians into all Native Americans took place 25-20 ka, with Ancient Beringians branching off around 22-18.1 ka. Our findings support a long-term genetic structure in ancestral Native Americans, consistent with the Beringian 'standstill model'. We show that the basal northern and southern Native American branches, to which all other Native Americans belong, diverged around 17.5-14.6 ka, and that this probably occurred south of the North American ice sheets. We also show that after 11.5 ka, some of the northern Native American populations received gene flow from a Siberian population most closely related to Koryaks, but not Palaeo-Eskimos, Inuits or Kets, and that Native American gene flow into Inuits was through northern and not southern Native American groups. Our findings further suggest that the far-northern North American presence of northern Native Americans is from a back migration that replaced or absorbed the initial founding population of Ancient Beringians.
Asunto(s)
Efecto Fundador , Genoma Humano/genética , Indígenas Norteamericanos/genética , Modelos Genéticos , Filogenia , Alaska , Asia Oriental/etnología , Flujo Génico , Genética de Población , Historia Antigua , Migración Humana , Humanos , Lactante , Ríos , Siberia/etnología , Factores de TiempoRESUMEN
African wild pigs have a contentious evolutionary and biogeographic history. Until recently, desert warthog (Phacochoerus aethiopicus) and common warthog (P. africanus) were considered a single species. Molecular evidence surprisingly suggested they diverged at least 4.4 million years ago, and possibly outside of Africa. We sequenced the first whole-genomes of four desert warthogs and 35 common warthogs from throughout their range. We show that these two species diverged much later than previously estimated, 400,000-1,700,000 years ago depending on assumptions of gene flow. This brings it into agreement with the paleontological record. We found that the common warthog originated in western Africa and subsequently colonized eastern and southern Africa. During this range expansion, the common warthog interbred with the desert warthog, presumably in eastern Africa, underlining this region's importance in African biogeography. We found that immune system-related genes may have adaptively introgressed into common warthogs, indicating that resistance to novel diseases was one of the most potent drivers of evolution as common warthogs expanded their range. Hence, we solve some of the key controversies surrounding warthog evolution and reveal a complex evolutionary history involving range expansion, introgression, and adaptation to new diseases.
Asunto(s)
Resistencia a la Enfermedad , Enfermedades de los Porcinos , África , África Oriental , Animales , Secuencia de Bases , Resistencia a la Enfermedad/genética , PorcinosRESUMEN
BACKGROUND & AIMS: The sucrase-isomaltase (SI) c.273_274delAG loss-of-function variant is common in Arctic populations and causes congenital sucrase-isomaltase deficiency, which is an inability to break down and absorb sucrose and isomaltose. Children with this condition experience gastrointestinal symptoms when dietary sucrose is introduced. We aimed to describe the health of adults with sucrase-isomaltase deficiency. METHODS: The association between c.273_274delAG and phenotypes related to metabolic health was assessed in 2 cohorts of Greenlandic adults (n = 4922 and n = 1629). A sucrase-isomaltase knockout (Sis-KO) mouse model was used to further elucidate the findings. RESULTS: Homozygous carriers of the variant had a markedly healthier metabolic profile than the remaining population, including lower body mass index (ß [standard error], -2.0 [0.5] kg/m2; P = 3.1 × 10-5), body weight (-4.8 [1.4] kg; P = 5.1 × 10-4), fat percentage (-3.3% [1.0%]; P = 3.7 × 10-4), fasting triglyceride (-0.27 [0.07] mmol/L; P = 2.3 × 10-6), and remnant cholesterol (-0.11 [0.03] mmol/L; P = 4.2 × 10-5). Further analyses suggested that this was likely mediated partly by higher circulating levels of acetate observed in homozygous carriers (ß [standard error], 0.056 [0.002] mmol/L; P = 2.1 × 10-26), and partly by reduced sucrose uptake, but not lower caloric intake. These findings were verified in Sis-KO mice, which, compared with wild-type mice, were leaner on a sucrose-containing diet, despite similar caloric intake, had significantly higher plasma acetate levels in response to a sucrose gavage, and had lower plasma glucose level in response to a sucrose-tolerance test. CONCLUSIONS: These results suggest that sucrase-isomaltase constitutes a promising drug target for improvement of metabolic health, and that the health benefits are mediated by reduced dietary sucrose uptake and possibly also by higher levels of circulating acetate.
Asunto(s)
Sacarosa en la Dieta , Complejo Sacarasa-Isomaltasa , Acetatos , Animales , Errores Innatos del Metabolismo de los Carbohidratos , Sacarosa en la Dieta/efectos adversos , Humanos , Ratones , Oligo-1,6-Glucosidasa , Complejo Sacarasa-Isomaltasa/deficiencia , Complejo Sacarasa-Isomaltasa/genética , Complejo Sacarasa-Isomaltasa/metabolismoRESUMEN
The iconic Cape buffalo has experienced several documented population declines in recent history. These declines have been largely attributed to the late 19th century rinderpest pandemic. However, the effect of the rinderpest pandemic on their genetic diversity remains contentious, and other factors that have potentially affected this diversity include environmental changes during the Pleistocene, range expansions and recent human activity. Motivated by this, we present analyses of whole genome sequencing data from 59 individuals from across the Cape buffalo range to assess present-day levels of genome-wide genetic diversity and what factors have influenced these levels. We found that the Cape buffalo has high average heterozygosity overall (0.40%), with the two southernmost populations having significantly lower heterozygosity levels (0.33% and 0.29%) on par with that of the domesticated water buffalo (0.29%). Interestingly, we found that these lower levels are probably due to recent inbreeding (average fraction of runs of homozygosity 23.7% and 19.9%) rather than factors further back in time during the Pleistocene. Moreover, detailed investigations of recent demographic history show that events across the past three centuries were the main drivers of the exceptional loss of genetic diversity in the southernmost populations, coincident with the onset of colonialism in the southern extreme of the Cape buffalo range. Hence, our results add to the growing body of studies suggesting that multiple recent human-mediated impacts during the colonial period caused massive losses of large mammal abundance in southern Africa.
Asunto(s)
Genética de Población , Peste Bovina , Animales , Humanos , Sudáfrica , Variación Genética , Búfalos/genética , ColonialismoRESUMEN
The genetic architecture of the small and isolated Greenlandic population is advantageous for identification of novel genetic variants associated with cardio-metabolic traits. We aimed to identify genetic loci associated with body mass index (BMI), to expand the knowledge of the genetic and biological mechanisms underlying obesity. Stage 1 BMI-association analyses were performed in 4,626 Greenlanders. Stage 2 replication and meta-analysis were performed in additional cohorts comprising 1,058 Yup'ik Alaska Native people, and 1,529 Greenlanders. Obesity-related traits were assessed in the stage 1 study population. We identified a common variant on chromosome 11, rs4936356, where the derived G-allele had a frequency of 24% in the stage 1 study population. The derived allele was genome-wide significantly associated with lower BMI (beta (SE), -0.14 SD (0.03), p = 3.2x10-8), corresponding to 0.64 kg/m2 lower BMI per G allele in the stage 1 study population. We observed a similar effect in the Yup'ik cohort (-0.09 SD, p = 0.038), and a non-significant effect in the same direction in the independent Greenlandic stage 2 cohort (-0.03 SD, p = 0.514). The association remained genome-wide significant in meta-analysis of the Arctic cohorts (-0.10 SD (0.02), p = 4.7x10-8). Moreover, the variant was associated with a leaner body type (weight, -1.68 (0.37) kg; waist circumference, -1.52 (0.33) cm; hip circumference, -0.85 (0.24) cm; lean mass, -0.84 (0.19) kg; fat mass and percent, -1.66 (0.33) kg and -1.39 (0.27) %; visceral adipose tissue, -0.30 (0.07) cm; subcutaneous adipose tissue, -0.16 (0.05) cm, all p<0.0002), lower insulin resistance (HOMA-IR, -0.12 (0.04), p = 0.00021), and favorable lipid levels (triglyceride, -0.05 (0.02) mmol/l, p = 0.025; HDL-cholesterol, 0.04 (0.01) mmol/l, p = 0.0015). In conclusion, we identified a novel variant, where the derived G-allele possibly associated with lower BMI in Arctic populations, and as a consequence also leaner body type, lower insulin resistance, and a favorable lipid profile.
Asunto(s)
Índice de Masa Corporal , Cromosomas Humanos Par 11/genética , Inuk/genética , Polimorfismo de Nucleótido Simple , Adiposidad , Colesterol/sangre , ADN Intergénico/genética , Femenino , Groenlandia , Humanos , Resistencia a la Insulina , Masculino , Metaboloma , Circunferencia de la CinturaRESUMEN
MOTIVATION: Principal component analysis (PCA) is a commonly used tool in genetics to capture and visualize population structure. Due to technological advances in sequencing, such as the widely used non-invasive prenatal test, massive datasets of ultra-low coverage sequencing are being generated. These datasets are characterized by having a large amount of missing genotype information. RESULTS: We present EMU, a method for inferring population structure in the presence of rampant non-random missingness. We show through simulations that several commonly used PCA methods cannot handle missing data arisen from various sources, which leads to biased results as individuals are projected into the PC space based on their amount of missingness. In terms of accuracy, EMU outperforms an existing method that also accommodates missingness while being competitively fast. We further tested EMU on around 100K individuals of the Phase 1 dataset of the Chinese Millionome Project, that were shallowly sequenced to around 0.08×. From this data we are able to capture the population structure of the Han Chinese and to reproduce previous analysis in a matter of CPU hours instead of CPU years. EMU's capability to accurately infer population structure in the presence of missingness will be of increasing importance with the rising number of large-scale genetic datasets. AVAILABILITY AND IMPLEMENTATION: EMU is written in Python and is freely available at https://github.com/rosemeis/emu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RESUMEN
The population history of Aboriginal Australians remains largely uncharacterized. Here we generate high-coverage genomes for 83 Aboriginal Australians (speakers of Pama-Nyungan languages) and 25 Papuans from the New Guinea Highlands. We find that Papuan and Aboriginal Australian ancestors diversified 25-40 thousand years ago (kya), suggesting pre-Holocene population structure in the ancient continent of Sahul (Australia, New Guinea and Tasmania). However, all of the studied Aboriginal Australians descend from a single founding population that differentiated ~10-32 kya. We infer a population expansion in northeast Australia during the Holocene epoch (past 10,000 years) associated with limited gene flow from this region to the rest of Australia, consistent with the spread of the Pama-Nyungan languages. We estimate that Aboriginal Australians and Papuans diverged from Eurasians 51-72 kya, following a single out-of-Africa dispersal, and subsequently admixed with archaic populations. Finally, we report evidence of selection in Aboriginal Australians potentially associated with living in the desert.
Asunto(s)
Genoma Humano/genética , Genómica , Nativos de Hawái y Otras Islas del Pacífico/genética , Filogenia , Grupos Raciales/genética , África/etnología , Australia , Conjuntos de Datos como Asunto , Clima Desértico , Flujo Génico , Genética de Población , Historia Antigua , Migración Humana/historia , Humanos , Lenguaje , Nueva Guinea , Dinámica Poblacional , TasmaniaRESUMEN
BACKGROUND: Identification of selection signatures between populations is often an important part of a population genetic study. Leveraging high-throughput DNA sequencing larger sample sizes of populations with similar ancestries has become increasingly common. This has led to the need of methods capable of identifying signals of selection in populations with a continuous cline of genetic differentiation. Individuals from continuous populations are inherently challenging to group into meaningful units which is why existing methods rely on principal components analysis for inference of the selection signals. These existing methods require called genotypes as input which is problematic for studies based on low-coverage sequencing data. MATERIALS AND METHODS: We have extended two principal component analysis based selection statistics to genotype likelihood data and applied them to low-coverage sequencing data from the 1000 Genomes Project for populations with European and East Asian ancestry to detect signals of selection in samples with continuous population structure. RESULTS: Here, we present two selections statistics which we have implemented in the PCAngsd framework. These methods account for genotype uncertainty, opening for the opportunity to conduct selection scans in continuous populations from low and/or variable coverage sequencing data. To illustrate their use, we applied the methods to low-coverage sequencing data from human populations of East Asian and European ancestries and show that the implemented selection statistics can control the false positive rate and that they identify the same signatures of selection from low-coverage sequencing data as state-of-the-art software using high quality called genotypes. CONCLUSION: We show that selection scans of low-coverage sequencing data of populations with similar ancestry perform on par with that obtained from high quality genotype data. Moreover, we demonstrate that PCAngsd outperform selection statistics obtained from called genotypes from low-coverage sequencing data without the need for ad-hoc filtering.
Asunto(s)
Genética de Población , Secuenciación de Nucleótidos de Alto Rendimiento , Genoma , Genotipo , Humanos , Polimorfismo de Nucleótido Simple , Análisis de Componente PrincipalRESUMEN
AIMS/HYPOTHESIS: The common muscle-specific TBC1D4 p.Arg684Ter loss-of-function variant defines a subtype of non-autoimmune diabetes in Arctic populations. Homozygous carriers are characterised by elevated postprandial glucose and insulin levels. Because 3.8% of the Greenlandic population are homozygous carriers, it is important to explore possibilities for precision medicine. We aimed to investigate whether physical activity attenuates the effect of this variant on 2 h plasma glucose levels after an oral glucose load. METHODS: In a Greenlandic population cohort (n = 2655), 2 h plasma glucose levels were obtained after an OGTT, physical activity was estimated as physical activity energy expenditure and TBC1D4 genotype was determined. We performed TBC1D4-physical activity interaction analysis, applying a linear mixed model to correct for genetic admixture and relatedness. RESULTS: Physical activity was inversely associated with 2 h plasma glucose levels (ß[main effect of physical activity] -0.0033 [mmol/l] / [kJ kg-1 day-1], p = 6.5 × 10-5), and significantly more so among homozygous carriers of the TBC1D4 risk variant compared with heterozygous carriers and non-carriers (ß[interaction] -0.015 [mmol/l] / [kJ kg-1 day-1], p = 0.0085). The estimated effect size suggests that 1 h of vigorous physical activity per day (compared with resting) reduces 2 h plasma glucose levels by an additional ~0.7 mmol/l in homozygous carriers of the risk variant. CONCLUSIONS/INTERPRETATION: Physical activity improves glucose homeostasis particularly in homozygous TBC1D4 risk variant carriers via a skeletal muscle TBC1 domain family member 4-independent pathway. This provides a rationale to implement physical activity as lifestyle precision medicine in Arctic populations. DATA REPOSITORY: The Greenlandic Cardio-Metabochip data for the Inuit Health in Transition study has been deposited at the European Genome-phenome Archive ( https://www.ebi.ac.uk/ega/dacs/EGAC00001000736 ) under accession EGAD00010001428.
Asunto(s)
Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/fisiopatología , Ejercicio Físico/fisiología , Proteínas Activadoras de GTPasa/genética , Hiperglucemia/prevención & control , Mutación con Pérdida de Función/genética , Periodo Posprandial/fisiología , Adulto , Glucemia/metabolismo , Diabetes Mellitus Tipo 2/sangre , Femenino , Interacción Gen-Ambiente , Predisposición Genética a la Enfermedad , Técnicas de Genotipaje , Prueba de Tolerancia a la Glucosa , Groenlandia/epidemiología , Humanos , Hiperglucemia/genética , Insulina/sangre , Inuk/genética , Estilo de Vida , Masculino , Persona de Mediana EdadRESUMEN
MOTIVATION: The presence of present-day human contaminating DNA fragments is one of the challenges defining ancient DNA (aDNA) research. This is especially relevant to the ancient human DNA field where it is difficult to distinguish endogenous molecules from human contaminants due to their genetic similarity. Recently, with the advent of high-throughput sequencing and new aDNA protocols, hundreds of ancient human genomes have become available. Contamination in those genomes has been measured with computational methods often developed specifically for these empirical studies. Consequently, some of these methods have not been implemented and tested for general use while few are aimed at low-depth nuclear data, a common feature in aDNA datasets. RESULTS: We develop a new X-chromosome-based maximum likelihood method for estimating present-day human contamination in low-depth sequencing data from male individuals. We implement our method for general use, assess its performance under conditions typical of ancient human DNA research, and compare it to previous nuclear data-based methods through extensive simulations. For low-depth data, we show that existing methods can produce unusable estimates or substantially underestimate contamination. In contrast, our method provides accurate estimates for a depth of coverage as low as 0.5× on the X-chromosome when contamination is below 25%. Moreover, our method still yields meaningful estimates in very challenging situations, i.e. when the contaminant and the target come from closely related populations or with increased error rates. With a running time below 5 min, our method is applicable to large scale aDNA genomic studies. AVAILABILITY AND IMPLEMENTATION: The method is implemented in C++ and R and is available in github.com/sapfo/contaminationX and popgen.dk/angsd.
Asunto(s)
ADN Antiguo , Secuenciación de Nucleótidos de Alto Rendimiento , Cromosomas , Humanos , Funciones de Verosimilitud , Masculino , Análisis de Secuencia de ADNRESUMEN
Grant's gazelles have recently been proposed to be a species complex comprising three highly divergent mtDNA lineages (Nanger granti, N. notata and N. petersii). The three lineages have nonoverlapping distributions in East Africa, but without any obvious geographical divisions, making them an interesting model for studying the early-stage evolutionary dynamics of allopatric speciation in detail. Here, we use genomic data obtained by restriction site-associated (RAD) sequencing of 106 gazelle individuals to shed light on the evolutionary processes underlying Grant's gazelle divergence, to characterize their genetic structure and to assess the presence of gene flow between the main lineages in the species complex. We date the species divergence to 134,000 years ago, which is recent in evolutionary terms. We find population subdivision within N. granti, which coincides with the previously suggested two subspecies, N. g. granti and N. g. robertsii. Moreover, these two lineages seem to have hybridized in Masai Mara. Perhaps more surprisingly given their extreme genetic differentiation, N. granti and N. petersii also show signs of prolonged admixture in Mkomazi, which we identified as a hybrid population most likely founded by allopatric lineages coming into secondary contact. Despite the admixed composition of this population, elevated X chromosomal differentiation suggests that selection may be shaping the outcome of hybridization in this population. Our results therefore provide detailed insights into the processes of allopatric speciation and secondary contact in a recently radiated species complex.
Asunto(s)
Antílopes , Flujo Génico , África Oriental , Animales , Antílopes/genética , ADN Mitocondrial/genética , Especiación Genética , Hibridación Genética , FilogeniaRESUMEN
Kennewick Man, referred to as the Ancient One by Native Americans, is a male human skeleton discovered in Washington state (USA) in 1996 and initially radiocarbon dated to 8,340-9,200 calibrated years before present (BP). His population affinities have been the subject of scientific debate and legal controversy. Based on an initial study of cranial morphology it was asserted that Kennewick Man was neither Native American nor closely related to the claimant Plateau tribes of the Pacific Northwest, who claimed ancestral relationship and requested repatriation under the Native American Graves Protection and Repatriation Act (NAGPRA). The morphological analysis was important to judicial decisions that Kennewick Man was not Native American and that therefore NAGPRA did not apply. Instead of repatriation, additional studies of the remains were permitted. Subsequent craniometric analysis affirmed Kennewick Man to be more closely related to circumpacific groups such as the Ainu and Polynesians than he is to modern Native Americans. In order to resolve Kennewick Man's ancestry and affiliations, we have sequenced his genome to â¼1× coverage and compared it to worldwide genomic data including for the Ainu and Polynesians. We find that Kennewick Man is closer to modern Native Americans than to any other population worldwide. Among the Native American groups for whom genome-wide data are available for comparison, several seem to be descended from a population closely related to that of Kennewick Man, including the Confederated Tribes of the Colville Reservation (Colville), one of the five tribes claiming Kennewick Man. We revisit the cranial analyses and find that, as opposed to genome-wide comparisons, it is not possible on that basis to affiliate Kennewick Man to specific contemporary groups. We therefore conclude based on genetic comparisons that Kennewick Man shows continuity with Native North Americans over at least the last eight millennia.