Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 127
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Cell ; 175(2): 347-359.e14, 2018 10 04.
Artigo em Inglês | MEDLINE | ID: mdl-30290141

RESUMO

We analyze whole-genome sequencing data from 141,431 Chinese women generated for non-invasive prenatal testing (NIPT). We use these data to characterize the population genetic structure and to investigate genetic associations with maternal and infectious traits. We show that the present day distribution of alleles is a function of both ancient migration and very recent population movements. We reveal novel phenotype-genotype associations, including several replicated associations with height and BMI, an association between maternal age and EMB, and between twin pregnancy and NRG1. Finally, we identify a unique pattern of circulating viral DNA in plasma with high prevalence of hepatitis B and other clinically relevant maternal infections. A GWAS for viral infections identifies an exceptionally strong association between integrated herpesvirus 6 and MOV10L1, which affects piwi-interacting RNA (piRNA) processing and PIWI protein function. These findings demonstrate the great value and potential of accumulating NIPT data for worldwide medical and genetic analyses.


Assuntos
Povo Asiático/genética , Diagnóstico Pré-Natal/métodos , Adulto , Alelos , China , DNA/genética , Etnicidade/genética , Feminino , Frequência do Gene/genética , Testes Genéticos , Variação Genética/genética , Genética Populacional/métodos , Estudo de Associação Genômica Ampla/métodos , Genômica/métodos , Migração Humana , Humanos , Gravidez , Análise de Sequência de DNA
2.
Genome Res ; 33(9): 1599-1608, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37620119

RESUMO

Principal component analysis (PCA) is widely used in statistics, machine learning, and genomics for dimensionality reduction and uncovering low-dimensional latent structure. To address the challenges posed by ever-growing data size, fast and memory-efficient PCA methods have gained prominence. In this paper, we propose a novel randomized singular value decomposition (RSVD) algorithm implemented in PCAone, featuring a window-based optimization scheme that enables accelerated convergence while improving the accuracy. Additionally, PCAone incorporates out-of-core and multithreaded implementations for the existing Implicitly Restarted Arnoldi Method (IRAM) and RSVD. Through comprehensive evaluations using multiple large-scale real-world data sets in different fields, we show the advantage of PCAone over existing methods. The new algorithm achieves significantly faster computation time while maintaining accuracy comparable to the slower IRAM method. Notably, our analyses of UK Biobank, comprising around 0.5 million individuals and 6.1 million common single nucleotide polymorphisms, show that PCAone accurately computes the top 40 principal components within 9 h. This analysis effectively captures population structure, signals of selection, structural variants, and low recombination regions, utilizing <20 GB of memory and 20 CPU threads. Furthermore, when applied to single-cell RNA sequencing data featuring 1.3 million cells, PCAone, accurately capturing the top 40 principal components in 49 min. This performance represents a 10-fold improvement over state-of-the-art tools.


Assuntos
Bancos de Espécimes Biológicos , Software , Humanos , Análise de Componente Principal , Algoritmos , Genômica
3.
Nature ; 585(7825): 390-396, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32939067

RESUMO

The maritime expansion of Scandinavian populations during the Viking Age (about AD 750-1050) was a far-flung transformation in world history1,2. Here we sequenced the genomes of 442 humans from archaeological sites across Europe and Greenland (to a median depth of about 1×) to understand the global influence of this expansion. We find the Viking period involved gene flow into Scandinavia from the south and east. We observe genetic structure within Scandinavia, with diversity hotspots in the south and restricted gene flow within Scandinavia. We find evidence for a major influx of Danish ancestry into England; a Swedish influx into the Baltic; and Norwegian influx into Ireland, Iceland and Greenland. Additionally, we see substantial ancestry from elsewhere in Europe entering Scandinavia during the Viking Age. Our ancient DNA analysis also revealed that a Viking expedition included close family members. By comparing with modern populations, we find that pigmentation-associated loci have undergone strong population differentiation during the past millennium, and trace positively selected loci-including the lactase-persistence allele of LCT and alleles of ANKA that are associated with the immune response-in detail. We conclude that the Viking diaspora was characterized by substantial transregional engagement: distinct populations influenced the genomic makeup of different regions of Europe, and Scandinavia experienced increased contact with the rest of the continent.


Assuntos
Fluxo Gênico/genética , Genética Populacional , Genoma Humano/genética , Genômica , Migração Humana/história , Alelos , Conjuntos de Dados como Assunto , Inglaterra , Evolução Molecular , Groenlândia , História Medieval , Humanos , Imunidade/genética , Irlanda , Lactase/genética , Lactase/metabolismo , Masculino , Países Escandinavos e Nórdicos , Seleção Genética , Análise Espaço-Temporal , Adulto Jovem
4.
Genome Res ; 2022 Jul 06.
Artigo em Inglês | MEDLINE | ID: mdl-35794006

RESUMO

Accurate inference of population structure is important in many studies of population genetics. Here we present HaploNet, a method for performing dimensionality reduction and clustering of genetic data. The method is based on local clustering of phased haplotypes using neural networks from whole-genome sequencing or dense genotype data. By using Gaussian mixtures in a variational autoencoder framework, we are able to learn a low-dimensional latent space in which we cluster haplotypes along the genome in a highly scalable manner. We show that we can use haplotype clusters in the latent space to infer global population structure using haplotype information by exploiting the generative properties of our framework. Based on fitted neural networks and their latent haplotype clusters, we can perform principal component analysis and estimate ancestry proportions based on a maximum likelihood framework. Using sequencing data from simulations and closely related human populations, we show that our approach is better at distinguishing closely related populations than standard admixture and principal component analysis software. We further show that HaploNet is fast and highly scalable by applying it to genotype array data of the UK Biobank.

5.
Mol Ecol ; 33(2): e17205, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-37971141

RESUMO

Genomic studies of species threatened by extinction are providing crucial information about evolutionary mechanisms and genetic consequences of population declines and bottlenecks. However, to understand how species avoid the extinction vortex, insights can be drawn by studying species that thrive despite past declines. Here, we studied the population genomics of the muskox (Ovibos moschatus), an Ice Age relict that was at the brink of extinction for thousands of years at the end of the Pleistocene yet appears to be thriving today. We analysed 108 whole genomes, including present-day individuals representing the current native range of both muskox subspecies, the white-faced and the barren-ground muskox (O. moschatus wardi and O. moschatus moschatus) and a ~21,000-year-old ancient individual from Siberia. We found that the muskox' demographic history was profoundly shaped by past climate changes and post-glacial re-colonizations. In particular, the white-faced muskox has the lowest genome-wide heterozygosity recorded in an ungulate. Yet, there is no evidence of inbreeding depression in native muskox populations. We hypothesize that this can be explained by the effect of long-term gradual population declines that allowed for purging of strongly deleterious mutations. This study provides insights into how species with a history of population bottlenecks, small population sizes and low genetic diversity survive against all odds.


Assuntos
Metagenômica , Resiliência Psicológica , Humanos , Animais , Recém-Nascido , Evolução Biológica , Genômica , Ruminantes/genética , Variação Genética/genética
6.
Nature ; 553(7687): 203-207, 2018 01 11.
Artigo em Inglês | MEDLINE | ID: mdl-29323294

RESUMO

Despite broad agreement that the Americas were initially populated via Beringia, the land bridge that connected far northeast Asia with northwestern North America during the Pleistocene epoch, when and how the peopling of the Americas occurred remains unresolved. Analyses of human remains from Late Pleistocene Alaska are important to resolving the timing and dispersal of these populations. The remains of two infants were recovered at Upward Sun River (USR), and have been dated to around 11.5 thousand years ago (ka). Here, by sequencing the USR1 genome to an average coverage of approximately 17 times, we show that USR1 is most closely related to Native Americans, but falls basal to all previously sequenced contemporary and ancient Native Americans. As such, USR1 represents a distinct Ancient Beringian population. Using demographic modelling, we infer that the Ancient Beringian population and ancestors of other Native Americans descended from a single founding population that initially split from East Asians around 36 ± 1.5 ka, with gene flow persisting until around 25 ± 1.1 ka. Gene flow from ancient north Eurasians into all Native Americans took place 25-20 ka, with Ancient Beringians branching off around 22-18.1 ka. Our findings support a long-term genetic structure in ancestral Native Americans, consistent with the Beringian 'standstill model'. We show that the basal northern and southern Native American branches, to which all other Native Americans belong, diverged around 17.5-14.6 ka, and that this probably occurred south of the North American ice sheets. We also show that after 11.5 ka, some of the northern Native American populations received gene flow from a Siberian population most closely related to Koryaks, but not Palaeo-Eskimos, Inuits or Kets, and that Native American gene flow into Inuits was through northern and not southern Native American groups. Our findings further suggest that the far-northern North American presence of northern Native Americans is from a back migration that replaced or absorbed the initial founding population of Ancient Beringians.


Assuntos
Efeito Fundador , Genoma Humano/genética , Indígenas Norte-Americanos/genética , Modelos Genéticos , Filogenia , Alaska , Ásia Oriental/etnologia , Fluxo Gênico , Genética Populacional , História Antiga , Migração Humana , Humanos , Lactente , Rios , Sibéria/etnologia , Fatores de Tempo
7.
Mol Biol Evol ; 39(7)2022 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-35779009

RESUMO

African wild pigs have a contentious evolutionary and biogeographic history. Until recently, desert warthog (Phacochoerus aethiopicus) and common warthog (P. africanus) were considered a single species. Molecular evidence surprisingly suggested they diverged at least 4.4 million years ago, and possibly outside of Africa. We sequenced the first whole-genomes of four desert warthogs and 35 common warthogs from throughout their range. We show that these two species diverged much later than previously estimated, 400,000-1,700,000 years ago depending on assumptions of gene flow. This brings it into agreement with the paleontological record. We found that the common warthog originated in western Africa and subsequently colonized eastern and southern Africa. During this range expansion, the common warthog interbred with the desert warthog, presumably in eastern Africa, underlining this region's importance in African biogeography. We found that immune system-related genes may have adaptively introgressed into common warthogs, indicating that resistance to novel diseases was one of the most potent drivers of evolution as common warthogs expanded their range. Hence, we solve some of the key controversies surrounding warthog evolution and reveal a complex evolutionary history involving range expansion, introgression, and adaptation to new diseases.


Assuntos
Resistência à Doença , Doenças dos Suínos , África , África Oriental , Animais , Sequência de Bases , Resistência à Doença/genética , Suínos
8.
Gastroenterology ; 162(4): 1171-1182.e3, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-34914943

RESUMO

BACKGROUND & AIMS: The sucrase-isomaltase (SI) c.273_274delAG loss-of-function variant is common in Arctic populations and causes congenital sucrase-isomaltase deficiency, which is an inability to break down and absorb sucrose and isomaltose. Children with this condition experience gastrointestinal symptoms when dietary sucrose is introduced. We aimed to describe the health of adults with sucrase-isomaltase deficiency. METHODS: The association between c.273_274delAG and phenotypes related to metabolic health was assessed in 2 cohorts of Greenlandic adults (n = 4922 and n = 1629). A sucrase-isomaltase knockout (Sis-KO) mouse model was used to further elucidate the findings. RESULTS: Homozygous carriers of the variant had a markedly healthier metabolic profile than the remaining population, including lower body mass index (ß [standard error], -2.0 [0.5] kg/m2; P = 3.1 × 10-5), body weight (-4.8 [1.4] kg; P = 5.1 × 10-4), fat percentage (-3.3% [1.0%]; P = 3.7 × 10-4), fasting triglyceride (-0.27 [0.07] mmol/L; P = 2.3 × 10-6), and remnant cholesterol (-0.11 [0.03] mmol/L; P = 4.2 × 10-5). Further analyses suggested that this was likely mediated partly by higher circulating levels of acetate observed in homozygous carriers (ß [standard error], 0.056 [0.002] mmol/L; P = 2.1 × 10-26), and partly by reduced sucrose uptake, but not lower caloric intake. These findings were verified in Sis-KO mice, which, compared with wild-type mice, were leaner on a sucrose-containing diet, despite similar caloric intake, had significantly higher plasma acetate levels in response to a sucrose gavage, and had lower plasma glucose level in response to a sucrose-tolerance test. CONCLUSIONS: These results suggest that sucrase-isomaltase constitutes a promising drug target for improvement of metabolic health, and that the health benefits are mediated by reduced dietary sucrose uptake and possibly also by higher levels of circulating acetate.


Assuntos
Sacarose Alimentar , Complexo Sacarase-Isomaltase , Acetatos , Animais , Erros Inatos do Metabolismo dos Carboidratos , Sacarose Alimentar/efeitos adversos , Humanos , Camundongos , Oligo-1,6-Glucosidase , Complexo Sacarase-Isomaltase/deficiência , Complexo Sacarase-Isomaltase/genética , Complexo Sacarase-Isomaltase/metabolismo
9.
Mol Ecol ; 32(8): 1860-1874, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36651275

RESUMO

The iconic Cape buffalo has experienced several documented population declines in recent history. These declines have been largely attributed to the late 19th century rinderpest pandemic. However, the effect of the rinderpest pandemic on their genetic diversity remains contentious, and other factors that have potentially affected this diversity include environmental changes during the Pleistocene, range expansions and recent human activity. Motivated by this, we present analyses of whole genome sequencing data from 59 individuals from across the Cape buffalo range to assess present-day levels of genome-wide genetic diversity and what factors have influenced these levels. We found that the Cape buffalo has high average heterozygosity overall (0.40%), with the two southernmost populations having significantly lower heterozygosity levels (0.33% and 0.29%) on par with that of the domesticated water buffalo (0.29%). Interestingly, we found that these lower levels are probably due to recent inbreeding (average fraction of runs of homozygosity 23.7% and 19.9%) rather than factors further back in time during the Pleistocene. Moreover, detailed investigations of recent demographic history show that events across the past three centuries were the main drivers of the exceptional loss of genetic diversity in the southernmost populations, coincident with the onset of colonialism in the southern extreme of the Cape buffalo range. Hence, our results add to the growing body of studies suggesting that multiple recent human-mediated impacts during the colonial period caused massive losses of large mammal abundance in southern Africa.


Assuntos
Genética Populacional , Peste Bovina , Animais , Humanos , África do Sul , Variação Genética , Búfalos/genética , Colonialismo
10.
PLoS Genet ; 16(1): e1008544, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31978080

RESUMO

The genetic architecture of the small and isolated Greenlandic population is advantageous for identification of novel genetic variants associated with cardio-metabolic traits. We aimed to identify genetic loci associated with body mass index (BMI), to expand the knowledge of the genetic and biological mechanisms underlying obesity. Stage 1 BMI-association analyses were performed in 4,626 Greenlanders. Stage 2 replication and meta-analysis were performed in additional cohorts comprising 1,058 Yup'ik Alaska Native people, and 1,529 Greenlanders. Obesity-related traits were assessed in the stage 1 study population. We identified a common variant on chromosome 11, rs4936356, where the derived G-allele had a frequency of 24% in the stage 1 study population. The derived allele was genome-wide significantly associated with lower BMI (beta (SE), -0.14 SD (0.03), p = 3.2x10-8), corresponding to 0.64 kg/m2 lower BMI per G allele in the stage 1 study population. We observed a similar effect in the Yup'ik cohort (-0.09 SD, p = 0.038), and a non-significant effect in the same direction in the independent Greenlandic stage 2 cohort (-0.03 SD, p = 0.514). The association remained genome-wide significant in meta-analysis of the Arctic cohorts (-0.10 SD (0.02), p = 4.7x10-8). Moreover, the variant was associated with a leaner body type (weight, -1.68 (0.37) kg; waist circumference, -1.52 (0.33) cm; hip circumference, -0.85 (0.24) cm; lean mass, -0.84 (0.19) kg; fat mass and percent, -1.66 (0.33) kg and -1.39 (0.27) %; visceral adipose tissue, -0.30 (0.07) cm; subcutaneous adipose tissue, -0.16 (0.05) cm, all p<0.0002), lower insulin resistance (HOMA-IR, -0.12 (0.04), p = 0.00021), and favorable lipid levels (triglyceride, -0.05 (0.02) mmol/l, p = 0.025; HDL-cholesterol, 0.04 (0.01) mmol/l, p = 0.0015). In conclusion, we identified a novel variant, where the derived G-allele possibly associated with lower BMI in Arctic populations, and as a consequence also leaner body type, lower insulin resistance, and a favorable lipid profile.


Assuntos
Índice de Massa Corporal , Cromossomos Humanos Par 11/genética , Inuíte/genética , Polimorfismo de Nucleotídeo Único , Adiposidade , Colesterol/sangue , DNA Intergênico/genética , Feminino , Groenlândia , Humanos , Resistência à Insulina , Masculino , Metaboloma , Circunferência da Cintura
11.
Bioinformatics ; 37(13): 1868-1875, 2021 Jul 27.
Artigo em Inglês | MEDLINE | ID: mdl-33459779

RESUMO

MOTIVATION: Principal component analysis (PCA) is a commonly used tool in genetics to capture and visualize population structure. Due to technological advances in sequencing, such as the widely used non-invasive prenatal test, massive datasets of ultra-low coverage sequencing are being generated. These datasets are characterized by having a large amount of missing genotype information. RESULTS: We present EMU, a method for inferring population structure in the presence of rampant non-random missingness. We show through simulations that several commonly used PCA methods cannot handle missing data arisen from various sources, which leads to biased results as individuals are projected into the PC space based on their amount of missingness. In terms of accuracy, EMU outperforms an existing method that also accommodates missingness while being competitively fast. We further tested EMU on around 100K individuals of the Phase 1 dataset of the Chinese Millionome Project, that were shallowly sequenced to around 0.08×. From this data we are able to capture the population structure of the Han Chinese and to reproduce previous analysis in a matter of CPU hours instead of CPU years. EMU's capability to accurately infer population structure in the presence of missingness will be of increasing importance with the rising number of large-scale genetic datasets. AVAILABILITY AND IMPLEMENTATION: EMU is written in Python and is freely available at https://github.com/rosemeis/emu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

12.
Nature ; 538(7624): 207-214, 2016 Oct 13.
Artigo em Inglês | MEDLINE | ID: mdl-27654914

RESUMO

The population history of Aboriginal Australians remains largely uncharacterized. Here we generate high-coverage genomes for 83 Aboriginal Australians (speakers of Pama-Nyungan languages) and 25 Papuans from the New Guinea Highlands. We find that Papuan and Aboriginal Australian ancestors diversified 25-40 thousand years ago (kya), suggesting pre-Holocene population structure in the ancient continent of Sahul (Australia, New Guinea and Tasmania). However, all of the studied Aboriginal Australians descend from a single founding population that differentiated ~10-32 kya. We infer a population expansion in northeast Australia during the Holocene epoch (past 10,000 years) associated with limited gene flow from this region to the rest of Australia, consistent with the spread of the Pama-Nyungan languages. We estimate that Aboriginal Australians and Papuans diverged from Eurasians 51-72 kya, following a single out-of-Africa dispersal, and subsequently admixed with archaic populations. Finally, we report evidence of selection in Aboriginal Australians potentially associated with living in the desert.


Assuntos
Genoma Humano/genética , Genômica , Havaiano Nativo ou Outro Ilhéu do Pacífico/genética , Filogenia , Grupos Raciais/genética , África/etnologia , Austrália , Conjuntos de Dados como Assunto , Clima Desértico , Fluxo Gênico , Genética Populacional , História Antiga , Migração Humana/história , Humanos , Idioma , Nova Guiné , Dinâmica Populacional , Tasmânia
14.
BMC Bioinformatics ; 22(1): 470, 2021 Sep 29.
Artigo em Inglês | MEDLINE | ID: mdl-34587903

RESUMO

BACKGROUND: Identification of selection signatures between populations is often an important part of a population genetic study. Leveraging high-throughput DNA sequencing larger sample sizes of populations with similar ancestries has become increasingly common. This has led to the need of methods capable of identifying signals of selection in populations with a continuous cline of genetic differentiation. Individuals from continuous populations are inherently challenging to group into meaningful units which is why existing methods rely on principal components analysis for inference of the selection signals. These existing methods require called genotypes as input which is problematic for studies based on low-coverage sequencing data. MATERIALS AND METHODS: We have extended two principal component analysis based selection statistics to genotype likelihood data and applied them to low-coverage sequencing data from the 1000 Genomes Project for populations with European and East Asian ancestry to detect signals of selection in samples with continuous population structure. RESULTS: Here, we present two selections statistics which we have implemented in the PCAngsd framework. These methods account for genotype uncertainty, opening for the opportunity to conduct selection scans in continuous populations from low and/or variable coverage sequencing data. To illustrate their use, we applied the methods to low-coverage sequencing data from human populations of East Asian and European ancestries and show that the implemented selection statistics can control the false positive rate and that they identify the same signatures of selection from low-coverage sequencing data as state-of-the-art software using high quality called genotypes. CONCLUSION: We show that selection scans of low-coverage sequencing data of populations with similar ancestry perform on par with that obtained from high quality genotype data. Moreover, we demonstrate that PCAngsd outperform selection statistics obtained from called genotypes from low-coverage sequencing data without the need for ad-hoc filtering.


Assuntos
Genética Populacional , Sequenciamento de Nucleotídeos em Larga Escala , Genoma , Genótipo , Humanos , Polimorfismo de Nucleotídeo Único , Análise de Componente Principal
15.
Diabetologia ; 64(8): 1795-1804, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-33912980

RESUMO

AIMS/HYPOTHESIS: The common muscle-specific TBC1D4 p.Arg684Ter loss-of-function variant defines a subtype of non-autoimmune diabetes in Arctic populations. Homozygous carriers are characterised by elevated postprandial glucose and insulin levels. Because 3.8% of the Greenlandic population are homozygous carriers, it is important to explore possibilities for precision medicine. We aimed to investigate whether physical activity attenuates the effect of this variant on 2 h plasma glucose levels after an oral glucose load. METHODS: In a Greenlandic population cohort (n = 2655), 2 h plasma glucose levels were obtained after an OGTT, physical activity was estimated as physical activity energy expenditure and TBC1D4 genotype was determined. We performed TBC1D4-physical activity interaction analysis, applying a linear mixed model to correct for genetic admixture and relatedness. RESULTS: Physical activity was inversely associated with 2 h plasma glucose levels (ß[main effect of physical activity] -0.0033 [mmol/l] / [kJ kg-1 day-1], p = 6.5 × 10-5), and significantly more so among homozygous carriers of the TBC1D4 risk variant compared with heterozygous carriers and non-carriers (ß[interaction] -0.015 [mmol/l] / [kJ kg-1 day-1], p = 0.0085). The estimated effect size suggests that 1 h of vigorous physical activity per day (compared with resting) reduces 2 h plasma glucose levels by an additional ~0.7 mmol/l in homozygous carriers of the risk variant. CONCLUSIONS/INTERPRETATION: Physical activity improves glucose homeostasis particularly in homozygous TBC1D4 risk variant carriers via a skeletal muscle TBC1 domain family member 4-independent pathway. This provides a rationale to implement physical activity as lifestyle precision medicine in Arctic populations. DATA REPOSITORY: The Greenlandic Cardio-Metabochip data for the Inuit Health in Transition study has been deposited at the European Genome-phenome Archive ( https://www.ebi.ac.uk/ega/dacs/EGAC00001000736 ) under accession EGAD00010001428.


Assuntos
Diabetes Mellitus Tipo 2/genética , Diabetes Mellitus Tipo 2/fisiopatologia , Exercício Físico/fisiologia , Proteínas Ativadoras de GTPase/genética , Hiperglicemia/prevenção & controle , Mutação com Perda de Função/genética , Período Pós-Prandial/fisiologia , Adulto , Glicemia/metabolismo , Diabetes Mellitus Tipo 2/sangue , Feminino , Interação Gene-Ambiente , Predisposição Genética para Doença , Técnicas de Genotipagem , Teste de Tolerância a Glucose , Groenlândia/epidemiologia , Humanos , Hiperglicemia/genética , Insulina/sangue , Inuíte/genética , Estilo de Vida , Masculino , Pessoa de Meia-Idade
16.
Bioinformatics ; 36(3): 828-841, 2020 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-31504166

RESUMO

MOTIVATION: The presence of present-day human contaminating DNA fragments is one of the challenges defining ancient DNA (aDNA) research. This is especially relevant to the ancient human DNA field where it is difficult to distinguish endogenous molecules from human contaminants due to their genetic similarity. Recently, with the advent of high-throughput sequencing and new aDNA protocols, hundreds of ancient human genomes have become available. Contamination in those genomes has been measured with computational methods often developed specifically for these empirical studies. Consequently, some of these methods have not been implemented and tested for general use while few are aimed at low-depth nuclear data, a common feature in aDNA datasets. RESULTS: We develop a new X-chromosome-based maximum likelihood method for estimating present-day human contamination in low-depth sequencing data from male individuals. We implement our method for general use, assess its performance under conditions typical of ancient human DNA research, and compare it to previous nuclear data-based methods through extensive simulations. For low-depth data, we show that existing methods can produce unusable estimates or substantially underestimate contamination. In contrast, our method provides accurate estimates for a depth of coverage as low as 0.5× on the X-chromosome when contamination is below 25%. Moreover, our method still yields meaningful estimates in very challenging situations, i.e. when the contaminant and the target come from closely related populations or with increased error rates. With a running time below 5 min, our method is applicable to large scale aDNA genomic studies. AVAILABILITY AND IMPLEMENTATION: The method is implemented in C++ and R and is available in github.com/sapfo/contaminationX and popgen.dk/angsd.


Assuntos
DNA Antigo , Sequenciamento de Nucleotídeos em Larga Escala , Cromossomos , Humanos , Funções Verossimilhança , Masculino , Análise de Sequência de DNA
17.
Mol Ecol ; 30(2): 528-544, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-33226701

RESUMO

Grant's gazelles have recently been proposed to be a species complex comprising three highly divergent mtDNA lineages (Nanger granti, N. notata and N. petersii). The three lineages have nonoverlapping distributions in East Africa, but without any obvious geographical divisions, making them an interesting model for studying the early-stage evolutionary dynamics of allopatric speciation in detail. Here, we use genomic data obtained by restriction site-associated (RAD) sequencing of 106 gazelle individuals to shed light on the evolutionary processes underlying Grant's gazelle divergence, to characterize their genetic structure and to assess the presence of gene flow between the main lineages in the species complex. We date the species divergence to 134,000 years ago, which is recent in evolutionary terms. We find population subdivision within N. granti, which coincides with the previously suggested two subspecies, N. g. granti and N. g. robertsii. Moreover, these two lineages seem to have hybridized in Masai Mara. Perhaps more surprisingly given their extreme genetic differentiation, N. granti and N. petersii also show signs of prolonged admixture in Mkomazi, which we identified as a hybrid population most likely founded by allopatric lineages coming into secondary contact. Despite the admixed composition of this population, elevated X chromosomal differentiation suggests that selection may be shaping the outcome of hybridization in this population. Our results therefore provide detailed insights into the processes of allopatric speciation and secondary contact in a recently radiated species complex.


Assuntos
Antílopes , Fluxo Gênico , África Oriental , Animais , Antílopes/genética , DNA Mitocondrial/genética , Especiação Genética , Hibridização Genética , Filogenia
18.
Nature ; 523(7561): 455-458, 2015 Jul 23.
Artigo em Inglês | MEDLINE | ID: mdl-26087396

RESUMO

Kennewick Man, referred to as the Ancient One by Native Americans, is a male human skeleton discovered in Washington state (USA) in 1996 and initially radiocarbon dated to 8,340-9,200 calibrated years before present (BP). His population affinities have been the subject of scientific debate and legal controversy. Based on an initial study of cranial morphology it was asserted that Kennewick Man was neither Native American nor closely related to the claimant Plateau tribes of the Pacific Northwest, who claimed ancestral relationship and requested repatriation under the Native American Graves Protection and Repatriation Act (NAGPRA). The morphological analysis was important to judicial decisions that Kennewick Man was not Native American and that therefore NAGPRA did not apply. Instead of repatriation, additional studies of the remains were permitted. Subsequent craniometric analysis affirmed Kennewick Man to be more closely related to circumpacific groups such as the Ainu and Polynesians than he is to modern Native Americans. In order to resolve Kennewick Man's ancestry and affiliations, we have sequenced his genome to ∼1× coverage and compared it to worldwide genomic data including for the Ainu and Polynesians. We find that Kennewick Man is closer to modern Native Americans than to any other population worldwide. Among the Native American groups for whom genome-wide data are available for comparison, several seem to be descended from a population closely related to that of Kennewick Man, including the Confederated Tribes of the Colville Reservation (Colville), one of the five tribes claiming Kennewick Man. We revisit the cranial analyses and find that, as opposed to genome-wide comparisons, it is not possible on that basis to affiliate Kennewick Man to specific contemporary groups. We therefore conclude based on genetic comparisons that Kennewick Man shows continuity with Native North Americans over at least the last eight millennia.


Assuntos
Indígenas Norte-Americanos/genética , Filogenia , Esqueleto , América , Genoma Humano/genética , Genômica , Humanos , Masculino , Crânio/anatomia & histologia , Washington
19.
Nucleic Acids Res ; 47(20): e126, 2019 11 18.
Artigo em Inglês | MEDLINE | ID: mdl-31504776

RESUMO

Methylation of guanosine on position N7 (m7G) on internal RNA positions has been found in all domains of life and have been implicated in human disease. Here, we present m7G Mutational Profiling sequencing (m7G-MaP-seq), which allows high throughput detection of m7G modifications at nucleotide resolution. In our method, m7G modified positions are converted to abasic sites by reduction with sodium borohydride, directly recorded as cDNA mutations through reverse transcription and sequenced. We detect positions with increased mutation rates in the reduced and control samples taking the possibility of sequencing/alignment error into account and use replicates to calculate statistical significance based on log likelihood ratio tests. We show that m7G-MaP-seq efficiently detects known m7G modifications in rRNA with mutational rates up to 25% and we map a previously uncharacterised evolutionarily conserved rRNA modification at position 1581 in Arabidopsis thaliana SSU rRNA. Furthermore, we identify m7G modifications in budding yeast, human and arabidopsis tRNAs and demonstrate that m7G modification occurs before tRNA splicing. We do not find any evidence for internal m7G modifications being present in other small RNA, such as miRNA, snoRNA and sRNA, including human Let-7e. Likewise, high sequencing depth m7G-MaP-seq analysis of mRNA from E. coli or yeast cells did not identify any internal m7G modifications.


Assuntos
Guanosina/análogos & derivados , Mutação , Processamento Pós-Transcricional do RNA , RNA/química , Análise de Sequência de RNA/métodos , Arabidopsis , Guanosina/análise , Células HeLa , Humanos , Metilação , RNA/genética , RNA/metabolismo , Saccharomyces cerevisiae
20.
Genet Epidemiol ; 43(5): 506-521, 2019 07.
Artigo em Inglês | MEDLINE | ID: mdl-30883944

RESUMO

During the last decade genome-wide association studies have proven to be a powerful approach to identifying disease-causing variants. However, for admixed populations, most current methods for association testing are based on the assumption that the effect of a genetic variant is the same regardless of its ancestry. This is a reasonable assumption for a causal variant but may not hold for the genetic variants that are tested in genome-wide association studies, which are usually not causal. The effects of noncausal genetic variants depend on how strongly their presence correlate with the presence of the causal variant, which may vary between ancestral populations because of different linkage disequilibrium patterns and allele frequencies. Motivated by this, we here introduce a new statistical method for association testing in recently admixed populations, where the effect size is allowed to depend on the ancestry of a given allele. Our method does not rely on accurate inference of local ancestry, yet using simulations we show that in some scenarios it gives a substantial increase in statistical power to detect associations. In addition, the method allows for testing for difference in effect size between ancestral populations, which can be used to help determine if a given genetic variant is causal. We demonstrate the usefulness of the method on data from the Greenlandic population.


Assuntos
Genética Populacional , Estudo de Associação Genômica Ampla , Filogenia , Alelos , Estudos de Casos e Controles , Estudos de Coortes , Simulação por Computador , Groenlândia , Humanos , Modelos Genéticos , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA