Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 70
Filter
1.
Cell ; 179(3): 736-749.e15, 2019 10 17.
Article in English | MEDLINE | ID: mdl-31626772

ABSTRACT

Underrepresentation of Asian genomes has hindered population and medical genetics research on Asians, leading to population disparities in precision medicine. By whole-genome sequencing of 4,810 Singapore Chinese, Malays, and Indians, we found 98.3 million SNPs and small insertions or deletions, over half of which are novel. Population structure analysis demonstrated great representation of Asian genetic diversity by three ethnicities in Singapore and revealed a Malay-related novel ancestry component. Furthermore, demographic inference suggested that Malays split from Chinese Ć¢ĀˆĀ¼24,800 years ago and experienced significant admixture with East Asians Ć¢ĀˆĀ¼1,700 years ago, coinciding with the Austronesian expansion. Additionally, we identified 20 candidate loci for natural selection, 14 of which harbored robust associations with complex traits and diseases. Finally, we show that our data can substantially improve genotype imputation in diverse Asian and Oceanian populations. These results highlight the value of our data as a resource to empower human genetics discovery across broad geographic regions.


Subject(s)
Genetics, Population , Genome, Human/genetics , Selection, Genetic , Whole Genome Sequencing , Asian People/genetics , Female , Genotype , Humans , Malaysia/epidemiology , Male , Polymorphism, Single Nucleotide/genetics , Singapore/epidemiology
2.
PLoS Genet ; 18(4): e1010134, 2022 04.
Article in English | MEDLINE | ID: mdl-35404934

ABSTRACT

The inference of positive selection in genomes is a problem of great interest in evolutionary genomics. By identifying putative regions of the genome that contain adaptive mutations, we are able to learn about the biology of organisms and their evolutionary history. Here we introduce a composite likelihood method that identifies recently completed or ongoing positive selection by searching for extreme distortions in the spatial distribution of the haplotype frequency spectrum along the genome relative to the genome-wide expectation taken as neutrality. Furthermore, the method simultaneously infers two parameters of the sweep: the number of sweeping haplotypes and the "width" of the sweep, which is related to the strength and timing of selection. We demonstrate that this method outperforms the leading haplotype-based selection statistics, though strong signals in low-recombination regions merit extra scrutiny. As a positive control, we apply it to two well-studied human populations from the 1000 Genomes Project and examine haplotype frequency spectrum patterns at the LCT and MHC loci. We also apply it to a data set of brown rats sampled in NYC and identify genes related to olfactory perception. To facilitate use of this method, we have implemented it in user-friendly open source software.


Subject(s)
Models, Genetic , Selection, Genetic , Animals , Genetics, Population , Genomics , Haplotypes/genetics , Rats , Software
3.
Proc Natl Acad Sci U S A ; 119(13): e2111533119, 2022 03 29.
Article in English | MEDLINE | ID: mdl-35312358

ABSTRACT

SignificanceCalifornia supports a high cultural and linguistic diversity of Indigenous peoples. In a partnership of researchers with the Muwekma Ohlone tribe, we studied genomes of eight present-day tribal members and 12 ancient individuals from two archaeological sites in the San Francisco Bay Area, spanning Ć¢ĀˆĀ¼2,000 y. We find that compared to genomes of Indigenous individuals from throughout the Americas, the 12 ancient individuals are most genetically similar to ancient individuals from Southern California, and that despite spanning a large time period, they share distinctive ancestry. This ancestry is also shared with present-day tribal members, providing evidence of genetic continuity between past and present Indigenous individuals in the region, in contrast to some popular reconstructions based on archaeological and linguistic information.


Subject(s)
Genomics , Indigenous Peoples , Archaeology , DNA, Ancient , Genetics, Population , History, Ancient , Humans , Linguistics , San Francisco
4.
Mol Biol Evol ; 40(7)2023 07 05.
Article in English | MEDLINE | ID: mdl-37440530

ABSTRACT

Likelihood-based tests of phylogenetic trees are a foundation of modern systematics. Over the past decade, an enormous wealth and diversity of model-based approaches have been developed for phylogenetic inference of both gene trees and species trees. However, while many techniques exist for conducting formal likelihood-based tests of gene trees, such frameworks are comparatively underdeveloped and underutilized for testing species tree hypotheses. To date, widely used tests of tree topology are designed to assess the fit of classical models of molecular sequence data and individual gene trees and thus are not readily applicable to the problem of species tree inference. To address this issue, we derive several analogous likelihood-based approaches for testing topologies using modern species tree models and heuristic algorithms that use gene tree topologies as input for maximum likelihood estimation under the multispecies coalescent. For the purpose of comparing support for species trees, these tests leverage the statistical procedures of their original gene tree-based counterparts that have an extended history for testing phylogenetic hypotheses at a single locus. We discuss and demonstrate a number of applications, limitations, and important considerations of these tests using simulated and empirical phylogenomic data sets that include both bifurcating topologies and reticulate network models of species relationships. Finally, we introduce the open-source R package SpeciesTopoTestR (SpeciesTopology Tests in R) that includes a suite of functions for conducting formal likelihood-based tests of species topologies given a set of input gene tree topologies.


Subject(s)
Algorithms , Models, Genetic , Phylogeny , Likelihood Functions
5.
Mol Biol Evol ; 40(7)2023 07 05.
Article in English | MEDLINE | ID: mdl-37433019

ABSTRACT

Natural selection leaves a spatial pattern along the genome, with a haplotype distribution distortion near the selected locus that fades with distance. Evaluating the spatial signal of a population-genetic summary statistic across the genome allows for patterns of natural selection to be distinguished from neutrality. Considering the genomic spatial distribution of multiple summary statistics is expected to aid in uncovering subtle signatures of selection. In recent years, numerous methods have been devised that consider genomic spatial distributions across summary statistics, utilizing both classical machine learning and deep learning architectures. However, better predictions may be attainable by improving the way in which features are extracted from these summary statistics. We apply wavelet transform, multitaper spectral analysis, and S-transform to summary statistic arrays to achieve this goal. Each analysis method converts one-dimensional summary statistic arrays to two-dimensional images of spectral analysis, allowing simultaneous temporal and spectral assessment. We feed these images into convolutional neural networks and consider combining models using ensemble stacking. Our modeling framework achieves high accuracy and power across a diverse set of evolutionary settings, including population size changes and test sets of varying sweep strength, softness, and timing. A scan of central European whole-genome sequences recapitulated well-established sweep candidates and predicted novel cancer-associated genes as sweeps with high support. Given that this modeling framework is also robust to missing genomic segments, we believe that it will represent a welcome addition to the population-genomic toolkit for learning about adaptive processes from genomic data.


Subject(s)
Genetics, Population , Selection, Genetic , Genomics/methods , Neural Networks, Computer , Haplotypes
6.
Mol Biol Evol ; 40(10)2023 10 04.
Article in English | MEDLINE | ID: mdl-37772983

ABSTRACT

Inferences of adaptive events are important for learning about traits, such as human digestion of lactose after infancy and the rapid spread of viral variants. Early efforts toward identifying footprints of natural selection from genomic data involved development of summary statistic and likelihood methods. However, such techniques are grounded in simple patterns or theoretical models that limit the complexity of settings they can explore. Due to the renaissance in artificial intelligence, machine learning methods have taken center stage in recent efforts to detect natural selection, with strategies such as convolutional neural networks applied to images of haplotypes. Yet, limitations of such techniques include estimation of large numbers of model parameters under nonconvex settings and feature identification without regard to location within an image. An alternative approach is to use tensor decomposition to extract features from multidimensional data although preserving the latent structure of the data, and to feed these features to machine learning models. Here, we adopt this framework and present a novel approach termed T-REx, which extracts features from images of haplotypes across sampled individuals using tensor decomposition, and then makes predictions from these features using classical machine learning methods. As a proof of concept, we explore the performance of T-REx on simulated neutral and selective sweep scenarios and find that it has high power and accuracy to discriminate sweeps from neutrality, robustness to common technical hurdles, and easy visualization of feature importance. Therefore, T-REx is a powerful addition to the toolkit for detecting adaptive processes from genomic data.


Subject(s)
Artificial Intelligence , Genomics , Humans , Genomics/methods , Neural Networks, Computer , Machine Learning , Selection, Genetic
7.
Genome Res ; 31(7): 1136-1149, 2021 Jul.
Article in English | MEDLINE | ID: mdl-34187812

ABSTRACT

Approximately 1% of the human genome has the ability to fold into G-quadruplexes (G4s)-noncanonical strand-specific DNA structures forming at G-rich motifs. G4s regulate several key cellular processes (e.g., transcription) and have been hypothesized to participate in others (e.g., firing of replication origins). Moreover, G4s differ in their thermostability, and this may affect their function. Yet, G4s may also hinder replication, transcription, and translation and may increase genome instability and mutation rates. Therefore, depending on their genomic location, thermostability, and functionality, G4 loci might evolve under different selective pressures, which has never been investigated. Here we conducted the first genome-wide analysis of G4 distribution, thermostability, and selection. We found an overrepresentation, high thermostability, and purifying selection for G4s within genic components in which they are expected to be functional-promoters, CpG islands, and 5' and 3' UTRs. A similar pattern was observed for G4s within replication origins, enhancers, eQTLs, and TAD boundary regions, strongly suggesting their functionality. In contrast, G4s on the nontranscribed strand of exons were underrepresented, were unstable, and evolved neutrally. In general, G4s on the nontranscribed strand of genic components had lower density and were less stable than those on the transcribed strand, suggesting that the former are avoided at the RNA level. Across the genome, purifying selection was stronger at stable G4s. Our results suggest that purifying selection preserves the sequences of functional G4s, whereas nonfunctional G4s are too costly to be tolerated in the genome. Thus, G4s are emerging as fundamental, functional genomic elements.

8.
Syst Biol ; 2023 Nov 30.
Article in English | MEDLINE | ID: mdl-38035624

ABSTRACT

Modern comparative biology owes much to phylogenetic regression. At its conception, this technique sparked a revolution that armed biologists with phylogenetic comparative methods (PCMs) for disentangling evolutionary correlations from those arising from hierarchical phylogenetic relationships. Over the past few decades, the phylogenetic regression framework has become a paradigm of modern comparative biology that has been widely embraced as a remedy for shared ancestry. However, recent evidence has sown doubt over the efficacy of phylogenetic regression, and PCMs more generally, with the suggestion that many of these methods fail to provide an adequate defense against unreplicated evolution-the primary justification for using them in the first place. Importantly, some of the most compelling examples of biological innovation in nature result from abrupt lineage-specific evolutionary shifts, which current regression models are largely ill-equipped to deal with. Here we explore a solution to this problem by applying robust linear regression to comparative trait data. We formally introduce robust phylogenetic regression to the PCM toolkit with linear estimators that are less sensitive to model violations than the standard least-squares estimator, while still retaining high power to detect true trait associations. Our analyses also highlight an ingenuity of the original algorithm for phylogenetic regression based on independent contrasts, whereby robust estimators are particularly effective. Collectively, we find that robust estimators hold promise for improving tests of trait associations and offer a path forward in scenarios where classical approaches may fail. Our study joins recent arguments for increased vigilance against unreplicated evolution and a better understanding of evolutionary model performance in challenging-yet biologically important-settings.

9.
Bioinformatics ; 38(3): 861-863, 2022 01 12.
Article in English | MEDLINE | ID: mdl-34664624

ABSTRACT

SUMMARY: The growing availability of genomewide polymorphism data has fueled interest in detecting diverse selective processes affecting population diversity. However, no model-based approaches exist to jointly detect and distinguish the two complementary processes of balancing and positive selection. We extend the BalLeRMix Ā B-statistic framework described in Cheng and DeGiorgio (2020) for detecting balancing selection and present BalLeRMix+, which implements five B statistic extensions based on mixture models to robustly identify both types of selection. BalLeRMix+ is implemented in Python and computes the composite likelihood ratios and associated model parameters for each genomic test position. AVAILABILITY AND IMPLEMENTATION: BalLeRMix+ is freely available at https://github.com/bioXiaoheng/BallerMixPlus. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Polymorphism, Genetic , Software
10.
PLoS Genet ; 16(8): e1008896, 2020 08.
Article in English | MEDLINE | ID: mdl-32853200

ABSTRACT

Identifying regions of positive selection in genomic data remains a challenge in population genetics. Most current approaches rely on comparing values of summary statistics calculated in windows. We present an approach termed SURFDAWave, which translates measures of genetic diversity calculated in genomic windows to functional data. By transforming our discrete data points to be outputs of continuous functions defined over genomic space, we are able to learn the features of these functions that signify selection. This enables us to confidently identify complex modes of natural selection, including adaptive introgression. We are also able to predict important selection parameters that are responsible for shaping the inferred selection events. By applying our model to human population-genomic data, we recapitulate previously identified regions of selective sweeps, such as OCA2 in Europeans, and predict that its beneficial mutation reached a frequency of 0.02 before it swept 1,802 generations ago, a time when humans were relatively new to Europe. In addition, we identify BNC2 in Europeans as a target of adaptive introgression, and predict that it harbors a beneficial mutation that arose in an archaic human population that split from modern humans within the hypothesized modern human-Neanderthal divergence range.


Subject(s)
Models, Genetic , Mutation Rate , White People/genetics , Animals , DNA-Binding Proteins/genetics , Genetic Variation , Humans , Membrane Transport Proteins , Neanderthals/genetics , Selection, Genetic , Software
11.
PLoS Genet ; 16(6): e1008867, 2020 06.
Article in English | MEDLINE | ID: mdl-32555579

ABSTRACT

Recent research shows that introgression between closely-related species is an important source of adaptive alleles for a wide range of taxa. Typically, detection of adaptive introgression from genomic data relies on comparative analyses that require sequence data from both the recipient and the donor species. However, in many cases, the donor is unknown or the data is not currently available. Here, we introduce a genome-scan method-VolcanoFinder-to detect recent events of adaptive introgression using polymorphism data from the recipient species only. VolcanoFinder detects adaptive introgression sweeps from the pattern of excess intermediate-frequency polymorphism they produce in the flanking region of the genome, a pattern which appears as a volcano-shape in pairwise genetic diversity. Using coalescent theory, we derive analytical predictions for these patterns. Based on these results, we develop a composite-likelihood test to detect signatures of adaptive introgression relative to the genomic background. Simulation results show that VolcanoFinder has high statistical power to detect these signatures, even for older sweeps and for soft sweeps initiated by multiple migrant haplotypes. Finally, we implement VolcanoFinder to detect archaic introgression in European and sub-Saharan African human populations, and uncovered interesting candidates in both populations, such as TSHR in Europeans and TCHH-RPTN in Africans. We discuss their biological implications and provide guidelines for identifying and circumventing artifactual signals during empirical applications of VolcanoFinder.


Subject(s)
Genetic Introgression , Genetics, Population/methods , Genome, Human/genetics , Models, Genetic , Polymorphism, Genetic , Africa South of the Sahara , Alleles , Antigens/genetics , Black People/genetics , Computer Simulation , Europe , Evolution, Molecular , Haplotypes , Humans , Intermediate Filament Proteins/genetics , Receptors, Thyrotropin/genetics , S100 Proteins/genetics , Selection, Genetic , Software , White People/genetics
12.
Mol Biol Evol ; 38(3): 1209-1224, 2021 03 09.
Article in English | MEDLINE | ID: mdl-33045078

ABSTRACT

Learning about the roles that duplicate genes play in the origins of novel phenotypes requires an understanding of how their functions evolve. A previous method for achieving this goal, CDROM, employs gene expression distances as proxies for functional divergence and then classifies the evolutionary mechanisms retaining duplicate genes from comparisons of these distances in a decision tree framework. However, CDROM does not account for stochastic shifts in gene expression or leverage advances in contemporary statistical learning for performing classification, nor is it capable of predicting the parameters driving duplicate gene evolution. Thus, here we develop CLOUD, a multi-layer neural network built on a model of gene expression evolution that can both classify duplicate gene retention mechanisms and predict their underlying evolutionary parameters. We show that not only is the CLOUD classifier substantially more powerful and accurate than CDROM, but that it also yields accurate parameter predictions, enabling a better understanding of the specific forces driving the evolution and long-term retention of duplicate genes. Further, application of the CLOUD classifier and predictor to empirical data from Drosophila recapitulates many previous findings about gene duplication in this lineage, showing that new functions often emerge rapidly and asymmetrically in younger duplicate gene copies, and that functional divergence is driven by strong natural selection. Hence, CLOUD represents a major advancement in classifying retention mechanisms and predicting evolutionary parameters of duplicate genes, thereby highlighting the utility of incorporating sophisticated statistical learning techniques to address long-standing questions about evolution after gene duplication.


Subject(s)
Evolution, Molecular , Gene Duplication , Gene Expression , Models, Genetic , Neural Networks, Computer , Software , Animals , Drosophila
13.
Proc Biol Sci ; 289(1986): 20221078, 2022 11 09.
Article in English | MEDLINE | ID: mdl-36322514

ABSTRACT

An increasing body of archaeological and genomic evidence has hinted at a complex settlement process of the Americas by humans. This is especially true for South America, where unexpected ancestral signals have raised perplexing scenarios for the early migrations into different regions of the continent. Here, we present ancient human genomes from the archaeologically rich Northeast Brazil and compare them to ancient and present-day genomic data. We find a distinct relationship between ancient genomes from Northeast Brazil, Lagoa Santa, Uruguay and Panama, representing evidence for ancient migration routes along South America's Atlantic coast. To further add to the existing complexity, we also detect greater Denisovan than Neanderthal ancestry in ancient Uruguay and Panama individuals. Moreover, we find a strong Australasian signal in an ancient genome from Panama. This work sheds light on the deep demographic history of eastern South America and presents a starting point for future fine-scale investigations on the regional level.


Subject(s)
Human Migration , Neanderthals , Humans , History, Ancient , Animals , Genomics , Genome, Human , Brazil
14.
Bioinformatics ; 37(13): 1923-1925, 2021 07 27.
Article in English | MEDLINE | ID: mdl-33051672

ABSTRACT

SUMMARY: Here, we present PhyloWGA, an open source R package for conducting phylogenetic analysis and investigation of whole genome data. AVAILABILITYAND IMPLEMENTATION: Available at Github (https://github.com/radamsRHA/PhyloWGA). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genome , Software , Chromosomes , Phylogeny
15.
Syst Biol ; 70(4): 660-680, 2021 06 16.
Article in English | MEDLINE | ID: mdl-33587145

ABSTRACT

Stochastic models of character trait evolution have become a cornerstone of evolutionary biology in an array of contexts. While probabilistic models have been used extensively for statistical inference, they have largely been ignored for the purpose of measuring distances between phylogeny-aware models. Recent contributions to the problem of phylogenetic distance computation have highlighted the importance of explicitly considering evolutionary model parameters and their impacts on molecular sequence data when quantifying dissimilarity between trees. By comparing two phylogenies in terms of their induced probability distributions that are functions of many model parameters, these distances can be more informative than traditional approaches that rely strictly on differences in topology or branch lengths alone. Currently, however, these approaches are designed for comparing models of nucleotide substitution and gene tree distributions, and thus, are unable to address other classes of traits and associated models that may be of interest to evolutionary biologists. Here, we expand the principles of probabilistic phylogenetic distances to compute tree distances under models of continuous trait evolution along a phylogeny. By explicitly considering both the degree of relatedness among species and the evolutionary processes that collectively give rise to character traits, these distances provide a foundation for comparing models and their predictions, and for quantifying the impacts of assuming one phylogenetic background over another while studying the evolution of a particular trait. We demonstrate the properties of these approaches using theory, simulations, and several empirical data sets that highlight potential uses of probabilistic distances in many scenarios. We also introduce an open-source R package named PRDATR for easy application by the scientific community for computing phylogenetic distances under models of character trait evolution.[Brownian motion; comparative methods; phylogeny; quantitative traits.].


Subject(s)
Models, Statistical , Phenotype , Phylogeny , Probability
16.
Nature ; 538(7624): 238-242, 2016 Oct 13.
Article in English | MEDLINE | ID: mdl-27654910

ABSTRACT

High-coverage whole-genome sequence studies have so far focused on a limited number of geographically restricted populations, or been targeted at specific diseases, such as cancer. Nevertheless, the availability of high-resolution genomic data has led to the development of new methodologies for inferring population history and refuelled the debate on the mutation rate in humans. Here we present the Estonian Biocentre Human Genome Diversity Panel (EGDP), a dataset of 483 high-coverage human genomes from 148 populations worldwide, including 379 new genomes from 125 populations, which we group into diversity and selection sets. We analyse this dataset to refine estimates of continent-wide patterns of heterozygosity, long- and short-distance gene flow, archaic admixture, and changes in effective population size through time as well as for signals of positive or balancing selection. We find a genetic signature in present-day Papuans that suggests that at least 2% of their genome originates from an early and largely extinct expansion of anatomically modern humans (AMHs) out of Africa. Together with evidence from the western Asian fossil record, and admixture between AMHs and Neanderthals predating the main Eurasian expansion, our results contribute to the mounting evidence for the presence of AMHs out of Africa earlier than 75,000 years ago.


Subject(s)
Genome, Human/genetics , Genomics , Human Migration/history , Racial Groups/genetics , Africa/ethnology , Animals , Asia , Datasets as Topic , Estonia , Europe , Fossils , Gene Flow , Genetics, Population , Heterozygote , History, Ancient , Humans , Native Hawaiian or Other Pacific Islander/genetics , Neanderthals/genetics , New Guinea , Population Dynamics
17.
Mol Biol Evol ; 37(11): 3267-3291, 2020 11 01.
Article in English | MEDLINE | ID: mdl-32462188

ABSTRACT

Long-term balancing selection typically leaves narrow footprints of increased genetic diversity, and therefore most detection approaches only achieve optimal performances when sufficiently small genomic regions (i.e., windows) are examined. Such methods are sensitive to window sizes and suffer substantial losses in power when windows are large. Here, we employ mixture models to construct a set of five composite likelihood ratio test statistics, which we collectively term B statistics. These statistics are agnostic to window sizes and can operate on diverse forms of input data. Through simulations, we show that they exhibit comparable power to the best-performing current methods, and retain substantially high power regardless of window sizes. They also display considerable robustness to high mutation rates and uneven recombination landscapes, as well as an array of other common confounding scenarios. Moreover, we applied a specific version of the B statistics, termed B2, to a human population-genomic data set and recovered many top candidates from prior studies, including the then-uncharacterized STPG2 and CCDC169-SOHLH2, both of which are related to gamete functions. We further applied B2 on a bonobo population-genomic data set. In addition to the MHC-DQ genes, we uncovered several novel candidate genes, such as KLRD1, involved in viral defense, and SCN9A, associated with pain perception. Finally, we show that our methods can be extended to account for multiallelic balancing selection and integrated the set of statistics into open-source software named BalLeRMix for future applications by the scientific community.


Subject(s)
Models, Genetic , Selection, Genetic , Animals , HLA-D Antigens/genetics , Humans , Mutation Rate , Pan paniscus/genetics
18.
Mol Biol Evol ; 37(10): 3023-3046, 2020 10 01.
Article in English | MEDLINE | ID: mdl-32392293

ABSTRACT

Selective sweeps are frequent and varied signatures in the genomes of natural populations, and detecting them is consequently important in understanding mechanisms of adaptation by natural selection. Following a selective sweep, haplotypic diversity surrounding the site under selection decreases, and this deviation from the background pattern of variation can be applied to identify sweeps. Multiple methods exist to locate selective sweeps in the genome from haplotype data, but none leverages the power of a model-based approach to make their inference. Here, we propose a likelihood ratio test statistic T to probe whole-genome polymorphism data sets for selective sweep signatures. Our framework uses a simple but powerful model of haplotype frequency spectrum distortion to find sweeps and additionally make an inference on the number of presently sweeping haplotypes in a population. We found that the T statistic is suitable for detecting both hard and soft sweeps across a variety of demographic models, selection strengths, and ages of the beneficial allele. Accordingly, we applied the T statistic to variant calls from European and sub-Saharan African human populations, yielding primarily literature-supported candidates, including LCT, RSPH3, and ZNF211 in CEU, SYT1, RGS18, and NNT in YRI, and HLA genes in both populations. We also searched for sweep signatures in Drosophila melanogaster, finding expected candidates at Ace, Uhg1, and Pimet. Finally, we provide open-source software to compute the T statistic and the inferred number of presently sweeping haplotypes from whole-genome data.


Subject(s)
Adaptation, Biological , Genetic Techniques , Models, Genetic , Selection, Genetic , Software , Animals , Drosophila melanogaster , Haplotypes , Humans
19.
Am J Hum Genet ; 102(5): 806-815, 2018 05 03.
Article in English | MEDLINE | ID: mdl-29706345

ABSTRACT

The effects of European colonization on the genomes of Native Americans may have produced excesses of potentially deleterious features, mainly due to the severe reductions in population size and corresponding losses of genetic diversity. This assumption, however, neither considers actual genomic patterns that existed before colonization nor does it adequately capture the effects of admixture. In this study, we analyze the whole-exome sequences of modern and ancient individuals from a Northwest Coast First Nation, with a demographic history similar to other indigenous populations from the Americas. We show that in approximately ten generations from initial European contact, the modern individuals exhibit reduced levels of novel and low-frequency variants, a lower proportion of potentially deleterious alleles, and decreased heterozygosity when compared to their ancestors. This pattern can be explained by a dramatic population decline, resulting in the loss of potentially damaging low-frequency variants, and subsequent admixture. We also find evidence that the indigenous population was on a steady decline in effective population size for several thousand years before contact, which emphasizes regional demography over the common conception of a uniform expansion after entry into the Americas. This study examines the genomic consequences of colonialism on an indigenous group and describes the continuing role of gene flow among modern populations.


Subject(s)
Genetic Variation , Indians, North American/genetics , White People/genetics , Base Pairing/genetics , Gene Frequency/genetics , Gene Pool , Heterozygote , Humans , Polymorphism, Single Nucleotide/genetics , Time Factors
20.
Mol Biol Evol ; 36(1): 177-199, 2019 01 01.
Article in English | MEDLINE | ID: mdl-30380122

ABSTRACT

Trans-species polymorphism has been widely used as a key sign of long-term balancing selection across multiple species. However, such sites are often rare in the genome and could result from mutational processes or technical artifacts. Few methods are yet available to specifically detect footprints of trans-species balancing selection without using trans-species polymorphic sites. In this study, we develop summary- and model-based approaches that are each specifically tailored to uncover regions of long-term balancing selection shared by a set of species by using genomic patterns of intraspecific polymorphism and interspecific fixed differences. We demonstrate that our trans-species statistics have substantially higher power than single-species approaches to detect footprints of trans-species balancing selection, and are robust to those that do not affect all tested species. We further apply our model-based methods to human and chimpanzee whole-genome sequencing data. In addition to the previously established major histocompatibility complex and malaria resistance-associated FREM3/GYPE regions, we also find outstanding genomic regions involved in barrier integrity and innate immunity, such as the GRIK1/CLDN17 intergenic region, and the SLC35F1 and ABCA13 genes. Our findings not only echo the significance of pathogen defense but also reveal novel candidates in maintaining balanced polymorphisms across human and chimpanzee lineages. Finally, we show that these trans-species statistics can be applied to and work well for an arbitrary number of species, and integrate them into open-source software packages for ease of use by the scientific community.


Subject(s)
Genetic Techniques , Models, Genetic , Polymorphism, Genetic , Selection, Genetic , Animals , Extracellular Matrix Proteins/genetics , Gene Frequency , Humans , Major Histocompatibility Complex , Mutation Rate , Pan troglodytes/genetics , Receptors, Kainic Acid/genetics , Recombination, Genetic
SELECTION OF CITATIONS
SEARCH DETAIL