RESUMEN
Over the past 500 years, North America has been the site of ongoing mixing of Native Americans, European settlers, and Africans (brought largely by the trans-Atlantic slave trade), shaping the early history of what became the United States. We studied the genetic ancestry of 5,269 self-described African Americans, 8,663 Latinos, and 148,789 European Americans who are 23andMe customers and show that the legacy of these historical interactions is visible in the genetic ancestry of present-day Americans. We document pervasive mixed ancestry and asymmetrical male and female ancestry contributions in all groups studied. We show that regional ancestry differences reflect historical events, such as early Spanish colonization, waves of immigration from many regions of Europe, and forced relocation of Native Americans within the US. This study sheds light on the fine-scale differences in ancestry within and across the United States and informs our understanding of the relationship between racial and ethnic identities and genetic ancestry.
Asunto(s)
Negro o Afroamericano/genética , Hispánicos o Latinos/genética , Población Blanca/genética , Estudios de Cohortes , ADN Mitocondrial/genética , Femenino , Estudios de Asociación Genética , Variación Genética , Genoma Humano , Genotipo , Técnicas de Genotipaje , Humanos , Modelos Logísticos , Masculino , Reproducibilidad de los Resultados , Encuestas y Cuestionarios , Estados UnidosRESUMEN
Despite the recent rapid growth in genome-wide data, much of human variation remains entirely unexplained. A significant challenge in the pursuit of the genetic basis for variation in common human traits is the efficient, coordinated collection of genotype and phenotype data. We have developed a novel research framework that facilitates the parallel study of a wide assortment of traits within a single cohort. The approach takes advantage of the interactivity of the Web both to gather data and to present genetic information to research participants, while taking care to correct for the population structure inherent to this study design. Here we report initial results from a participant-driven study of 22 traits. Replications of associations (in the genes OCA2, HERC2, SLC45A2, SLC24A4, IRF4, TYR, TYRP1, ASIP, and MC1R) for hair color, eye color, and freckling validate the Web-based, self-reporting paradigm. The identification of novel associations for hair morphology (rs17646946, near TCHH; rs7349332, near WNT10A; and rs1556547, near OFCC1), freckling (rs2153271, in BNC2), the ability to smell the methanethiol produced after eating asparagus (rs4481887, near OR2M7), and photic sneeze reflex (rs10427255, near ZEB2, and rs11856995, near NR2F2) illustrates the power of the approach.
Asunto(s)
Variación Genética , Estudio de Asociación del Genoma Completo/métodos , Cromosomas Humanos , Genómica , Genotipo , Cabello , Humanos , Internet , Modelos Genéticos , FenotipoRESUMEN
Much effort and interest have focused on assessing the importance of natural selection, particularly positive natural selection, in shaping the human genome. Although scans for positive selection have identified candidate loci that may be associated with positive selection in humans, such scans do not indicate whether adaptation is frequent in general in humans. Studies based on the reasoning of the MacDonald-Kreitman test, which, in principle, can be used to evaluate the extent of positive selection, suggested that adaptation is detectable in the human genome but that it is less common than in Drosophila or Escherichia coli. Both positive and purifying natural selection at functional sites should affect levels and patterns of polymorphism at linked nonfunctional sites. Here, we search for these effects by analyzing patterns of neutral polymorphism in humans in relation to the rates of recombination, functional density, and functional divergence with chimpanzees. We find that the levels of neutral polymorphism are lower in the regions of lower recombination and in the regions of higher functional density or divergence. These correlations persist after controlling for the variation in GC content, density of simple repeats, selective constraint, mutation rate, and depth of sequencing coverage. We argue that these results are most plausibly explained by the effects of natural selection at functional sites -- either recurrent selective sweeps or background selection -- on the levels of linked neutral polymorphism. Natural selection at both coding and regulatory sites appears to affect linked neutral polymorphism, reducing neutral polymorphism by 6% genome-wide and by 11% in the gene-rich half of the human genome. These findings suggest that the effects of natural selection at linked sites cannot be ignored in the study of neutral human polymorphism.
Asunto(s)
Polimorfismo Genético , Secuencias Reguladoras de Ácidos Nucleicos/genética , Selección Genética , Evolución Molecular , Variación Genética , Genoma Humano , Humanos , Recombinación GenéticaRESUMEN
Although transposable elements (TEs) are known to be potent sources of mutation, their contribution to the generation of recent adaptive changes has never been systematically assessed. In this work, we conduct a genome-wide screen for adaptive TE insertions in Drosophila melanogaster that have taken place during or after the spread of this species out of Africa. We determine population frequencies of 902 of the 1,572 TEs in Release 3 of the D. melanogaster genome and identify a set of 13 putatively adaptive TEs. These 13 TEs increased in population frequency sharply after the spread out of Africa. We argue that many of these TEs are in fact adaptive by demonstrating that the regions flanking five of these TEs display signatures of partial selective sweeps. Furthermore, we show that eight out of the 13 putatively adaptive elements show population frequency heterogeneity consistent with these elements playing a role in adaptation to temperate climates. We conclude that TEs have contributed considerably to recent adaptive evolution (one TE-induced adaptation every 200-1,250 y). The majority of these adaptive insertions are likely to be involved in regulatory changes. Our results also suggest that TE-induced adaptations arise more often from standing variants than from new mutations. Such a high rate of TE-induced adaptation is inconsistent with the number of fixed TEs in the D. melanogaster genome, and we discuss possible explanations for this discrepancy.
Asunto(s)
Adaptación Fisiológica/genética , Elementos Transponibles de ADN/genética , Drosophila melanogaster/genética , Genoma de los Insectos , Animales , Evolución Molecular , Mutagénesis Insercional , Mutación , Reacción en Cadena de la PolimerasaRESUMEN
Molecular identification of mixed-species pollen samples has a range of applications in various fields of research. To date, such molecular identification has primarily been carried out via amplicon sequencing, but whole-genome shotgun (WGS) sequencing of pollen DNA has potential advantages, including (1) more genetic information per sample and (2) the potential for better quantitative matching. In this study, we tested the performance of WGS sequencing methodology and publicly available reference sequences in identifying species and quantifying their relative abundance in pollen mock communities. Using mock communities previously analyzed with DNA metabarcoding, we sequenced approximately 200Mbp for each sample using Illumina HiSeq and MiSeq. Taxonomic identifications were based on the Kraken k-mer identification method with reference libraries constructed from full-genome and short read archive data from the NCBI database. We found WGS to be a reliable method for taxonomic identification of pollen with near 100% identification of species in mixtures but generating higher rates of false positives (reads not identified to the correct taxon at the required taxonomic level) relative to rbcL and ITS2 amplicon sequencing. For quantification of relative species abundance, WGS data provided a stronger correlation between pollen grain proportion and sequence read proportion, but diverged more from a 1:1 relationship, likely due to the higher rate of false positives. Currently, a limitation of WGS-based pollen identification is the lack of representation of plant diversity in publicly available genome databases. As databases improve and costs drop, we expect that eventually genomics methods will become the methods of choice for species identification and quantification of mixed-species pollen samples.
RESUMEN
A recent genomewide screen identified 13 transposable elements that are likely to have been adaptive during or after the spread of Drosophila melanogaster out of Africa. One of these insertions, Bari-Juvenile hormone epoxy hydrolase (Bari-Jheh), was associated with the selective sweep of its flanking neutral variation and with reduction of expression of one of its neighboring genes: Jheh3. Here, we provide further evidence that Bari-Jheh insertion is adaptive. We delimit the extent of the selective sweep and show that Bari-Jheh is the only mutation linked to the sweep. Bari-Jheh also lowers the expression of its other flanking gene, Jheh2. Subtle consequences of Bari-Jheh insertion on life-history traits are consistent with the effects of reduced expression of the Jheh genes. Finally, we analyze molecular evolution of Jheh genes in both the long- and the short-term and conclude that Bari-Jheh appears to be a very rare adaptive event in the history of these genes. We discuss the implications of these findings for the detection and understanding of adaptation.
Asunto(s)
Secuencia Conservada , Elementos Transponibles de ADN/genética , Drosophila melanogaster/crecimiento & desarrollo , Drosophila melanogaster/genética , Mutagénesis Insercional/genética , Sitios de Carácter Cuantitativo/genética , Alelos , Sustitución de Aminoácidos/genética , Animales , Emparejamiento Base/genética , Secuencia de Bases , Supervivencia Celular , ADN Intergénico/genética , Evolución Molecular , Regulación de la Expresión Génica , Genes de Insecto , Modelos Genéticos , Datos de Secuencia Molecular , Mutación/genética , Sistemas de Lectura Abierta/genética , Óvulo/citología , Fenotipo , Polimorfismo Genético , Selección GenéticaRESUMEN
Transposable elements (TEs) constitute a substantial fraction of the genomes of many species, and it is thus important to understand their population dynamics. The strength of natural selection against TEs is a key parameter in understanding these dynamics. In principle, the strength of selection can be inferred from the frequencies of a sample of TEs. However, complicated demographic histories, such as found in Drosophila melanogaster, could lead to a substantial distortion of the TE frequency distribution compared with that expected for a panmictic, constant-sized population. The current methodology for the estimation of selection intensity acting against TEs does not take into account demographic history and might generate erroneous estimates especially for TE families under weak selection. Here, we develop a flexible maximum likelihood methodology that explicitly accounts both for demographic history and for the ascertainment biases of identifying TEs. We apply this method to the newly generated frequency data of the BS family of non-long terminal repeat retrotransposons in D. melanogaster in concert with two recent models of the demographic history of the species to infer the intensity of selection against this family. We find the estimate to differ substantially compared with a prior estimate that was made assuming a model of constant population size. Further, we find there to be relatively little information about selection intensity present in the derived non-African frequency data and that the ancestral African subpopulation is much more informative in this respect. These findings highlight the importance of accounting for demographic history and bear on study design for the inference of selection coefficients generally.
Asunto(s)
Drosophila melanogaster/genética , Genética de Población , Selección Genética , Animales , Elementos Transponibles de ADN , Métodos , Modelos GenéticosRESUMEN
A beneficial mutation that has nearly but not yet fixed in a population produces a characteristic haplotype configuration, called a partial selective sweep. Whether nonadaptive processes might generate similar haplotype configurations has not been extensively explored. Here, we consider 5 population genetic data sets taken from regions flanking high-frequency transposable elements in North American strains of Drosophila melanogaster, each of which appears to be consistent with the expectations of a partial selective sweep. We use coalescent simulations to explore whether incorporation of the species' demographic history, purifying selection against the element, or suppression of recombination caused by the element could generate putatively adaptive haplotype configurations. Whereas most of the data sets would be rejected as nonneutral under the standard neutral null model, only the data set for which there is strong external evidence in support of an adaptive transposition appears to be nonneutral under the more complex null model and in particular when demography is taken into account. High-frequency, derived mutations from a recently bottlenecked population, such as we study here, are of great interest to evolutionary genetics in the context of scans for adaptive events; we discuss the broader implications of our findings in this context.
Asunto(s)
Adaptación Biológica/genética , Drosophila melanogaster/genética , Drosophila melanogaster/fisiología , Modelos Genéticos , Mutación , Animales , Secuencia de Bases , Simulación por Computador , Elementos Transponibles de ADN , Genómica , Datos de Secuencia Molecular , Recombinación GenéticaRESUMEN
The effect of recurrent selective sweeps is a spatially heterogeneous reduction in neutral polymorphism throughout the genome. The pattern of reduction depends on the selective advantage and recurrence rate of the sweeps. Because many adaptive substitutions responsible for these sweeps also contribute to nonsynonymous divergence, the spatial distribution of nonsynonymous divergence also reflects the distribution of adaptive substitutions. Thus, the spatial correspondence between neutral polymorphism and nonsynonymous divergence may be especially informative about the process of adaptation. Here we study this correspondence using genomewide polymorphism data from Drosophila simulans and the divergence between D. simulans and D. melanogaster. Focusing on highly recombining portions of the autosomes, at a spatial scale appropriate to the study of selective sweeps, we find that neutral polymorphism is both lower and, as measured by a new statistic Q(S), less homogeneous where nonsynonymous divergence is higher and that the spatial structure of this correlation is best explained by the action of strong recurrent selective sweeps. We introduce a method to infer, from the spatial correspondence between polymorphism and divergence, the rate and selective strength of adaptation. Our results independently confirm a high rate of adaptive substitution (approximately 1/3000 generations) and newly suggest that many adaptations are of surprisingly great selective effect (approximately 1%), reducing the effective population size by approximately 15% even in highly recombining regions of the genome.
Asunto(s)
Adaptación Fisiológica/genética , Drosophila/genética , Genoma de los Insectos/genética , Polimorfismo Genético , Animales , Recolección de Datos , Drosophila melanogaster , Genómica/métodos , Recombinación GenéticaRESUMEN
BACKGROUND: Levels of molecular diversity in Drosophila have repeatedly been shown to be higher in ancestral, African populations than in derived, non-African populations. This pattern holds for both coding and noncoding regions for a variety of molecular markers including single nucleotide polymorphisms and microsatellites. Comparisons of X-linked and autosomal diversity have yielded results largely dependent on population of origin. RESULTS: In an attempt to further elucidate patterns of sequence diversity in Drosophila melanogaster, we studied nucleotide variation at putatively nonfunctional X-linked and autosomal loci in sub-Saharan African and North American strains of D. melanogaster. We combine our experimental results with data from previous studies of molecular polymorphism in this species. We confirm that levels of diversity are consistently higher in African versus North American strains. The relative reduction of diversity for X-linked and autosomal loci in the derived, North American strains depends heavily on the studied loci. While the compiled dataset, comprised primarily of regions within or in close proximity to genes, shows a much more severe reduction of diversity on the X chromosome compared to autosomes in derived strains, the dataset consisting of intergenic loci located far from genes shows very similar reductions of diversities for X-linked and autosomal loci in derived strains. In addition, levels of diversity at X-linked and autosomal loci in the presumably ancestral African population are more similar than expected under an assumption of neutrality and equal numbers of breeding males and females. CONCLUSION: We show that simple demographic scenarios under assumptions of neutral theory cannot explain all of the observed patterns of molecular diversity. We suggest that the simplest model is a population bottleneck that retains an ancestral female-biased sex ratio, coupled with higher rates of positive selection at X-linked loci in close proximity to genes specifically in derived, non-African populations.
Asunto(s)
Drosophila melanogaster/genética , Genes de Insecto , Genes Ligados a X , Variación Genética , Animales , Secuencia de Bases , Femenino , Masculino , Modelos Genéticos , Polimorfismo de Nucleótido Simple , Alineación de Secuencia , Especificidad de la EspecieRESUMEN
As phylogenetically controlled experimental designs become increasingly common in ecology, the need arises for a standardized statistical treatment of these datasets. Phylogenetically paired designs circumvent the need for resolved phylogenies and have been used to compare species groups, particularly in the areas of invasion biology and adaptation. Despite the widespread use of this approach, the statistical analysis of paired designs has not been critically evaluated. We propose a mixed model approach that includes random effects for pair and species. These random effects introduce a "two-layer" compound symmetry variance structure that captures both the correlations between observations on related species within a pair as well as the correlations between the repeated measurements within species. We conducted a simulation study to assess the effect of model misspecification on Type I and II error rates. We also provide an illustrative example with data containing taxonomically similar species and several outcome variables of interest. We found that a mixed model with species and pair as random effects performed better in these phylogenetically explicit simulations than two commonly used reference models (no or single random effect) by optimizing Type I error rates and power. The proposed mixed model produces acceptable Type I and II error rates despite the absence of a phylogenetic tree. This design can be generalized to a variety of datasets to analyze repeated measurements in clusters of related subjects/species.
RESUMEN
Although a few hundred single nucleotide polymorphisms (SNPs) suffice to infer close familial relationships, high density genome-wide SNP data make possible the inference of more distant relationships such as 2(nd) to 9(th) cousinships. In order to characterize the relationship between genetic similarity and degree of kinship given a timeframe of 100-300 years, we analyzed the sharing of DNA inferred to be identical by descent (IBD) in a subset of individuals from the 23andMe customer database (nâ=â22,757) and from the Human Genome Diversity Panel (HGDP-CEPH, nâ=â952). With data from 121 populations, we show that the average amount of DNA shared IBD in most ethnolinguistically-defined populations, for example Native American groups, Finns and Ashkenazi Jews, differs from continentally-defined populations by several orders of magnitude. Via extensive pedigree-based simulations, we determined bounds for predicted degrees of relationship given the amount of genomic IBD sharing in both endogamous and 'unrelated' population samples. Using these bounds as a guide, we detected tens of thousands of 2(nd) to 9(th) degree cousin pairs within a heterogenous set of 5,000 Europeans. The ubiquity of distant relatives, detected via IBD segments, in both ethnolinguistic populations and in large 'unrelated' populations samples has important implications for genetic genealogy, forensics and genotype/phenotype mapping studies.
Asunto(s)
Biología Computacional , Genoma Humano/genética , Filogenia , Secuencia de Bases , Evolución Molecular , Femenino , Variación Genética/genética , Homocigoto , Humanos , Masculino , Linaje , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
While the cost and speed of generating genomic data have come down dramatically in recent years, the slow pace of collecting medical data for large cohorts continues to hamper genetic research. Here we evaluate a novel online framework for obtaining large amounts of medical information from a recontactable cohort by assessing our ability to replicate genetic associations using these data. Using web-based questionnaires, we gathered self-reported data on 50 medical phenotypes from a generally unselected cohort of over 20,000 genotyped individuals. Of a list of genetic associations curated by NHGRI, we successfully replicated about 75% of the associations that we expected to (based on the number of cases in our cohort and reported odds ratios, and excluding a set of associations with contradictory published evidence). Altogether we replicated over 180 previously reported associations, including many for type 2 diabetes, prostate cancer, cholesterol levels, and multiple sclerosis. We found significant variation across categories of conditions in the percentage of expected associations that we were able to replicate, which may reflect systematic inflation of the effects in some initial reports, or differences across diseases in the likelihood of misdiagnosis or misreport. We also demonstrated that we could improve replication success by taking advantage of our recontactable cohort, offering more in-depth questions to refine self-reported diagnoses. Our data suggest that online collection of self-reported data from a recontactable cohort may be a viable method for both broad and deep phenotyping in large populations.
Asunto(s)
Estudios de Asociación Genética/métodos , Genoma Humano/genética , Estudio de Asociación del Genoma Completo/métodos , Encuestas y Cuestionarios , Adulto , Anciano , Estudios de Cohortes , Femenino , Genotipo , Humanos , Modelos Logísticos , Masculino , Persona de Mediana Edad , Oportunidad Relativa , Polimorfismo de Nucleótido Simple/genética , Adulto JovenRESUMEN
To study adaptation, it is essential to identify multiple adaptive mutations and to characterize their molecular, phenotypic, selective, and ecological consequences. Here we describe a genomic screen for adaptive insertions of transposable elements in Drosophila. Using a pilot application of this screen, we have identified an adaptive transposable element insertion, which truncates a gene and apparently generates a functional protein in the process. The insertion of this transposable element confers increased resistance to an organophosphate pesticide and has spread in D. melanogaster recently.
Asunto(s)
Elementos Transponibles de ADN , Proteínas de Drosophila/genética , Drosophila melanogaster/genética , Evolución Molecular , Genes de Insecto , Resistencia a los Insecticidas/genética , Adaptación Fisiológica , Alelos , Sustitución de Aminoácidos , Animales , Azinfosmetilo/farmacología , Secuencia de Bases , Colina/metabolismo , Cruzamientos Genéticos , Drosophila/efectos de los fármacos , Drosophila/genética , Drosophila/fisiología , Proteínas de Drosophila/química , Proteínas de Drosophila/fisiología , Drosophila melanogaster/efectos de los fármacos , Drosophila melanogaster/fisiología , Exones , Femenino , Expresión Génica , Haplotipos , Insecticidas/farmacología , Intrones , Elementos de Nucleótido Esparcido Largo , Datos de Secuencia Molecular , Mutación , Polimorfismo Genético , Recombinación Genética , Selección GenéticaRESUMEN
The force exerted by skeletal muscle is modulated by compliance of tissues to which it is connected. Force of the muscle sarcomere is modulated by compliance of the myofilaments. We tested the hypothesis that myofilament compliance influences Ca2+ regulation of muscle by constructing a computational model of the muscle half sarcomere that includes compliance of the filaments as a variable. The biomechanical model consists of three half-filaments of myosin and 13 thin filaments. Initial spacing of motor domains of myosin on thick filaments and myosin-binding sites on thin filaments was taken to be that measured experimentally in unstrained filaments. Monte-Carlo simulations were used to determine transitions around a three-state cycle for each cross-bridge and between two-states for each thin filament regulatory unit. This multifilament model exhibited less "tuning" of maximum force than an earlier two-filament model. Significantly, both the apparent Ca(2+)-sensitivity and cooperativity of activation of steady-state isometric force were modulated by myofilament compliance. Activation-dependence of the kinetics of tension development was also modulated by filament compliance. Tuning in the full myofilament lattice appears to be more significant at submaximal levels of thin filament activation.
Asunto(s)
Calcio/metabolismo , Modelos Biológicos , Sarcómeros/fisiología , Animales , HumanosRESUMEN
Polymorphisms in microsatellites on the human Y chromosome have been used to estimate important demographic parameters of human history. We compare two coalescent-based statistical methods that give estimates for a number of demographic parameters using the seven Y chromosome polymorphisms in the HGDP-CEPH Cell Line Panel, a collection of samples from 52 worldwide populations. The estimates for the time to the most recent common ancestor vary according to the method used and the assumptions about the prior distributions of model parameters, but are generally consistent with other global Y chromosome studies. We explore the sensitivity of these results to assumptions about the prior distributions and the evolutionary models themselves.