RESUMEN
Population genetic studies of North Asian ethnic groups have focused on genetic variation of sex chromosomes and mitochondria. Studies of the extensive variation available from autosomal variation have appeared infrequently. We focus on relationships among population samples using new North Asia microhaplotype data. We combined genotypes from our laboratory on 58 microhaplotypes, distributed across 18 autosomes, on 3945 individuals from 75 populations with corresponding data extracted for 26 populations from the Thousand Genomes consortium and for 22 populations from the GenomeAsia 100 K project. A total of 7107 individuals in 122 total populations are analyzed using STRUCTURE, Principal Component Analysis, and phylogenetic tree analyses. North Asia populations sampled in Mongolia include: Buryats, Mongolians, Altai Kazakhs, and Tsaatans. Available Siberians include samples of Yakut, Khanty, and Komi Zyriane. Analyses of all 122 populations confirm many known relationships and show that most populations from North Asia form a cluster distinct from all other groups. Refinement of analyses on smaller subsets of populations reinforces the distinctiveness of North Asia and shows that the North Asia cluster identifies a region that is ancestral to Native Americans.
Asunto(s)
Pueblo Asiatico , Genética de Población , Pueblo Asiatico/genética , Etnicidad/genética , Variación Genética , Haplotipos , Humanos , Filogenia , Análisis de Componente PrincipalRESUMEN
The Southwest Asian, circum-Mediterranean, and Southern European populations (collectively, SWAMSE) together with Northern European populations form one of five "continental" groups of global populations in many analyses of population relationships. This region is of great anthropologic and forensic interest but relationships of large numbers of populations within the region have not been able to be cleanly resolved with autosomal genetic markers. To examine the genetic boundaries to the SWAMSE region and whether internal structure can be detected we have assembled data for a total of 151 separate autosomal genetic markers on populations in this region and other parts of the world for a global set of 95 populations. The markers include 83 ancestry informative SNPs as singletons and 68 microhaplotype loci defined by 204 SNPs. The 151 loci are ancestry informative on a global scale, identifying at least five biogeographic clusters. One of those clusters is a clear grouping of 37 populations containing the SWAMSE plus northern European populations to the exclusion of populations in South Central Asia and populations from farther East. A refined analysis of the 37 populations shows the northern European populations clustering separately from the SWAMSE populations. Within Southwest Asia the Samaritans and Shabaks are distinct outliers. The Yemenite Jews, Saudi, Kuwaiti, Palestinian Arabs, and Southern Tunisians cluster together loosely while the remaining populations from Northern Iraq, Mediterranean Europe, the Caucasus region, and Iran cluster in a more complex graded fashion. The majority of the SWAMSE populations from the mainland of Southwest Asia form a cluster with little internal structure reflecting a very complex history of endogamy and migrations. The set of 151 DNA polymorphisms not only distinguishes major geographical regions globally but can distinguish ancestry to a small degree within geographical regions such as SWAMSE. We discuss forensic characteristics of the polymorphisms and also identify those that rank highest by Rosenberg's In measure for the SWAMSE region populations and for the global set of populations analyzed. DATA AVAILABILITY: Genotypes on all 151 markers for all 3790 individuals typed in the Kidd Lab on the 72 Kidd lab populations have been deposited in the Zenodo archive and can be freely accessed at https://doi.org/10.5281/zenodo.4658892. Some of the data has been made public previously as supplemental files appended to publications. Data for the additional individuals included in the analyses was taken from already public datasets as indicated in the text.
Asunto(s)
Etnicidad/genética , Genética de Población , Polimorfismo de Nucleótido Simple , Asia , Haplotipos , Humanos , Región Mediterránea , Análisis de Componente Principal , Grupos Raciales/genéticaRESUMEN
The benefits of ancestry informative SNP (AISNP) panels can best accrue and be properly evaluated only as sufficient reference population data become readily accessible. Ideally the set of reference populations should approximate the genetic diversity of human populations worldwide. The Kidd and Seldin AISNP sets are two panels that have separately accumulated thus far the largest and most diverse collections of data on human reference populations from the major continental regions. A recent tally in the ALFRED allele frequency database finds 164 reference populations available for all the 55 Kidd AISNPs and 132 reference populations for all the 128 Seldin AISNPs. Although much more of the genetic diversity in human populations around the world still needs to be documented, 81 populations have genotype data available for all 170 AISNPs in the union of the Kidd and Seldin panels. In this report we examine admixture and principal component analyses on these 81 worldwide populations and some regional subsets of these reference populations to determine how well the combined panel illuminates population relationships. Analyses of this dataset that focused on Native American populations revealed very strong cluster patterns associated with many of the individual populations studied.
Asunto(s)
Frecuencia de los Genes , Variación Genética , Genotipo , Polimorfismo de Nucleótido Simple , Bases de Datos Genéticas , Genética de Población , Secuenciación de Nucleótidos de Alto Rendimiento , HumanosRESUMEN
The set of 55 ancestry informative SNPs (AISNPs) originally developed by the Kidd Lab has been studied on a large number of populations and continues to be applied to new population samples. The existing reference database of population samples allows the relationships of new population samples to be inferred on a global level. Analyses show that these autosomal markers constitute one of the better panels of AISNPs. Continuing to build this reference database enhances its value. Because more than half of the 25 ethnic groups recently studied with these AISNPs are from Southwest Asia and the Mediterranean region, we present here various analyses focused on populations from these regions along with selected reference populations from nearby regions where genotype data are available. Many of these ethnic groups have not been previously studied for forensic markers. Data on populations from other world regions have also been added to the database but are not included in these focused analyses. The new population samples added to ALFRED and FROG-kb increase the total to 164 population samples that have been studied for all 55 AISNPs.
Asunto(s)
Etnicidad/genética , Genética de Población , Polimorfismo de Nucleótido Simple/genética , Grupos Raciales/genética , Pueblo Asiatico/genética , Europa (Continente)/epidemiología , Femenino , Frecuencia de los Genes , Genotipo , Humanos , Masculino , Región Mediterránea/epidemiologíaRESUMEN
The derived human alcohol dehydrogenase (ADH)1B*48His allele of the ADH1B Arg48His polymorphism (rs1229984) has been identified as one component of an East Asian specific core haplotype that underwent recent positive selection. Our study has been extended to Southwest Asia and additional markers in East Asia. Fst values (Sewall Wright's fixation index) and long-range haplotype analyses identify a strong signature of selection not only in East Asian but also in Southwest Asian populations. However, except for the ADH2B*48His allele, different core haplotypes occur in Southwest Asia compared to East Asia and the extended haplotypes also differ. Thus, the ADH1B*48His allele, as part of a core haplotype of 10 kb, has undergone recent rapid increases in frequency independently in the two regions after divergence of the respective populations. Emergence of agriculture may be the common factor underlying the evident selection.
RESUMEN
Crohn's disease (CD) involves chronic inflammation in the gastrointestinal tract due to dysregulation of the host immune response to the gut microbiome. Even though the host-microbiome interactions are likely contributors to the development of CD, a few studies have detected genetic variants that change bacterial compositions and increase CD risk. We focus on one of the well-replicated susceptible genes, tumor necrosis factor superfamily member 15 (TNFSF15), and apply statistical analyses for personal profiles of genotypes and salivary microbiota collected from CD cases and controls in the Ryukyu Islands, southernmost islands of the Japanese archipelago. Our association test confirmed the susceptibility of TNFSF15 in the Ryukyu Islands. We found that the recessive model was supported to fit the observed genotype frequency of risk alleles slightly better than the additive model, defining the genetic effect on CD if a pair of the chromosomes in an individual consists of all risk alleles. The combined analysis of haplotypes and salivary microbiome from a small set of samples showed a significant association of the genetic effect with the increase of Prevotella, which led to a significant increase of CD risk. However, the genetic effect on CD disappeared if the abundance of Prevotella was low, suggesting the genetic contribution to CD is conditionally independent given a fixed amount of Prevotella. Although our statistical power is limited due to the small sample size, these results support an idea that the genetic susceptibility of TNFSF15 to CD may be confounded, in part, by the increase of Prevotella.
Asunto(s)
Enfermedad de Crohn/genética , Predisposición Genética a la Enfermedad , Microbiota , Ligando Inductor de Apoptosis Relacionado con TNF/genética , Estudios de Casos y Controles , Factores de Confusión Epidemiológicos , Humanos , Japón , Modelos Logísticos , Polimorfismo de Nucleótido Simple , Saliva/microbiologíaRESUMEN
Ancestry inference for an individual can only be as good as the reference populations with allele frequency data on the SNPs being used. If the most relevant ancestral population(s) does not have data available for the SNPs studied, then analyses based on DNA evidence may indicate a quite distantly related population, albeit one among the more closely related of the existing reference populations. We have added reference population allele frequencies for 14 additional population samples (with >1100 individuals studied) to the 125 population samples previously published for the Kidd Lab 55 AISNP panel. Allele frequencies are now publicly available for all 55 SNPs in ALFRED and FROG-kb for a total of 139 population samples. This Kidd Lab panel of 55 ancestry informative SNPs has been incorporated in commercial kits by both ThermoFisher Scientific and Illumina for massively parallel sequencing. Researchers employing those kits will find the enhanced set of reference populations useful.
Asunto(s)
Etnicidad/genética , Frecuencia de los Genes , Genética de Población , Polimorfismo de Nucleótido Simple , Grupos Raciales/genética , Bases de Datos Genéticas , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , HumanosRESUMEN
OBJECTIVES: North Africa has a complex demographic history of migrations from within Africa, Europe, and the Middle East. However, population genetic studies, especially for autosomal genetic markers, are few relative to other world regions. We examined autosomal markers for eight Tunisian and Libyan populations in order to place them in a global context. MATERIALS AND METHODS: Data were collected by TaqMan on 399 autosomal single nucleotide polymorphisms on 331 individuals from Tunisia and Libya. These data were combined with data on the same SNPs previously typed on 2585 individuals from 57 populations from around the world. Where meaningful, close by SNPs were combined into multiallelic haplotypes. Data were evaluated by clustering, principal components, and population tree analyses. For a subset of 102 SNPs, data from the literature on seven additional North African populations were included in analyses. RESULTS: Average heterozygosity of the North African populations is high relative to our global samples, consistent with a complex demographic history. The Tunisian and Libyan samples form a discrete cluster in the global and regional views and can be separated from sub-Sahara, Middle East, and Europe. Within Tunisia the Nebeur and Smar are outlier groups. Across North Africa, pervasive East-West geographical patterns were not found. DISCUSSION: Known historical migrations and invasions did not displace or homogenize the genetic variation in the region but rather enriched it. Even a small region like Tunisia contains considerable genetic diversity. Future studies across North Africa have the potential to increase our understanding of the historical demographic factors influencing the region. Am J Phys Anthropol 161:62-71, 2016. © 2016 The Authors American Journal of Physical Anthropology Published by Wiley Periodicals, Inc.
Asunto(s)
Variación Genética/genética , Migración Humana , Antropología Física , Europa (Continente) , Genética de Población , Haplotipos/genética , Humanos , Libia , Filogenia , Polimorfismo de Nucleótido Simple/genética , Análisis de Componente Principal , TúnezRESUMEN
Many ancestry informative SNP (AISNP) panels have been published. Ancestry resolution in them varies from three to eight continental clusters of populations depending on the panel used. However, none of these panels differentiates well among East Asian populations. To meet this need, we have developed a 74 AISNP panel after analyzing a much larger number of SNPs for Fst and allele frequency differences between two geographically close population groups within East Asia. The 74 AISNP panel can now distinguish at least 10 biogeographic groups of populations globally: Sub-Saharan Africa, North Africa, Europe, Southwest Asia, South Asia, North Asia, East Asia, Southeast Asia, Pacific and Americas. Compared with our previous 55-AISNP panel, Southeast Asia and North Asia are two newly assignable clusters. For individual ancestry assignment, the likelihood ratio and ancestry components were analyzed on a different set of 500 test individuals from 11 populations. All individuals from five of the test populations - Yoruba (YRI), European (CEU), Han Chinese in Henan (CHNH), Rondonian Surui (SUR) and Ticuna (TIC) - were assigned to their appropriate geographical regions unambiguously. For the other test populations, most of the individuals were assigned to their self-identified geographical regions with a certain degree of overlap with adjacent populations. These alternative ancestry components for each individual thus help give a clearer picture of the possible group origins of the individual. We have demonstrated that the new AISNP panel can achieve a deeper resolution of global ancestry.
Asunto(s)
Pueblo Asiatico/genética , Frecuencia de los Genes , Genética de Población , Polimorfismo de Nucleótido Simple , Asia Sudoriental , Etnicidad/genética , Humanos , Funciones de VerosimilitudRESUMEN
Ancestry inference for a person using a panel of SNPs depends on the variation of frequencies of those SNPs around the world and the amount of reference data available for calculation/comparison. The Kidd Lab panel of 55 AISNPs has been incorporated in commercial kits by both Life Technologies and Illumina for massively parallel sequencing. Therefore, a larger set of reference populations will be useful for researchers using those kits. We have added reference population allele frequencies for 52 population samples to the 73 previously entered so that there are now allele frequencies publicly available in ALFRED and FROG-kb for a total of 125 population samples.
Asunto(s)
Genética de Población , ADN/genética , Bases de Datos Genéticas , Humanos , Polimorfismo de Nucleótido SimpleRESUMEN
The Y chromosome is one of the best genetic materials to explore the evolutionary history of human populations. Global analyses of Y chromosomal short tandem repeats (STRs) data can reveal very interesting world population structures and histories. However, previous Y-STR works tended to focus on small geographical ranges or only included limited sample sizes. In this study, we have investigated population structure and demographic history using 17 Y chromosomal STRs data of 979 males from 44 worldwide populations. The largest genetic distances have been observed between pairs of African and non-African populations. American populations with the lowest genetic diversities also showed large genetic distances and coancestry coefficients with other populations, whereas Eurasian populations displayed close genetic affinities. African populations tend to have the oldest time to the most recent common ancestors (TMRCAs), the largest effective population sizes and the earliest expansion times, whereas the American, Siberian, Melanesian, and isolated Atayal populations have the most recent TMRCAs and expansion times, and the smallest effective population sizes. This clear geographic pattern is well consistent with serial founder model for the origin of populations outside Africa. The Y-STR dataset presented here provides the most detailed view of worldwide population structure and human male demographic history, and additionally will be of great benefit to future forensic applications and population genetic studies.
Asunto(s)
Cromosomas Humanos Y/genética , Genética de Población , Repeticiones de Microsatélite/genética , Demografía , Genealogía y Heráldica , Humanos , Masculino , Datos de Secuencia Molecular , Densidad de Población , Factores de TiempoRESUMEN
Genetic data on North and Central Asian populations are underrepresented in the literature, especially for autosomal markers. In the present study we used 812 single nucleotide polymorphisms (SNPs) distributed across all the human autosomes and extensively studied at Yale to examine the affinities of two recently collected samples of populations: rural and cosmopolitan Mongolians from Ulaanbaatar and nomadic, Turkic-speaking Tsaatan from Mongolia near the Siberian border. We compare these two populations with each other and with a global set of populations and discuss their relationships to New World populations. Specifically, we analyze data on 521 autosomal loci (single SNPs and multi-SNP haplotypes) studied in 57 populations representing all the major geographical regions of the world. We conclude that these North and Central Asian populations are genetically distinct from all other populations in our study and may be close to the ancestral lineage leading to the New World populations.
Asunto(s)
Arqueología/métodos , Pueblo Asiatico/genética , Asia Central/etnología , ADN/química , ADN/genética , Evolución Molecular , Frecuencia de los Genes , Genética de Población , Haplotipos , Humanos , Mongolia , Polimorfismo de Nucleótido Simple , Saliva/químicaRESUMEN
SNPs that are molecularly very close (<10kb) will generally have extremely low recombination rates, much less than 10(-4). Multiple haplotypes will often exist because of the history of the origins of the variants at the different sites, rare recombinants, and the vagaries of random genetic drift and/or selection. Such multiallelic haplotype loci are potentially important in forensic work for individual identification, for defining ancestry, and for identifying familial relationships. The new DNA sequencing capabilities currently available make possible continuous runs of a few hundred base pairs so that we can now determine the allelic combination of multiple SNPs on each chromosome of an individual, i.e., the phase, for multiple SNPs within a small segment of DNA. Therefore, we have begun to identify regions, encompassing two to four SNPs with an extent of <200bp that define multiallelic haplotype loci. We have identified candidate regions and have collected pilot data on many candidate microhaplotype loci. Here we present 31 microhaplotype loci that have at least three alleles, have high heterozygosity, are globally informative, and are statistically independent at the population level. This study of microhaplotype loci (microhaps) provides proof of principle that such markers exist and validates their usefulness for ancestry inference, lineage-clan-family inference, and individual identification. The true value of microhaplotypes will come with sequencing methods that can establish alleles unambiguously, including disentangling of mixtures, because a single sequencing run on a single strand of DNA will encompass all of the SNPs.
Asunto(s)
Genética Forense , Haplotipos , Marcadores Genéticos , Humanos , Polimorfismo de Nucleótido SimpleRESUMEN
Many panels of ancestry informative single nucleotide polymorphisms have been proposed in recent years for various purposes including detecting stratification in biomedical studies and determining an individual's ancestry in a forensic context. All of the panels have limitations in their generality and efficiency for routine forensic work. Some panels have used only a few populations to validate them. Some panels are based on very large numbers of SNPs thereby limiting the ability of others to test different populations. We have been working toward an efficient and globally useful panel of ancestry informative markers that is comprised of a small number of highly informative SNPs. We have developed a panel of 55 SNPs analyzed on 73 populations from around the world. We present the details of the panel and discuss its strengths and limitations.
Asunto(s)
Linaje , Polimorfismo de Nucleótido Simple , Genética Forense , HumanosRESUMEN
BACKGROUND: Accurate determination of genetic ancestry is of high interest for many areas such as biomedical research, personal genomics and forensics. It remains an important topic in genetic association studies, as it has been shown that population stratification, if not appropriately considered, can lead to false-positive and -negative results. While large association studies typically extract ancestry information from available genome-wide SNP genotypes, many important clinical data sets on rare phenotypes and historical collections assembled before the GWAS area are in need of a feasible method (i.e., ease of genotyping, small number of markers) to infer the geographic origin and potential admixture of the study subjects. Here we report on the development, application and limitations of a small, multiplexable ancestry informative marker (AIM) panel of SNPs (or AISNP) developed specifically for this purpose. RESULTS: Based on worldwide populations from the HGDP, a 41-AIM AISNP panel for multiplex application with the ABI SNPlex and a subset with 31 AIMs for the Sequenome iPLEX system were selected and found to be highly informative for inferring ancestry among the seven continental regions Africa, the Middle East, Europe, Central/South Asia, East Asia, the Americas and Oceania. The panel was found to be least informative for Eurasian populations, and additional AIMs for a higher resolution are suggested. A large reference set including over 4,000 subjects collected from 120 global populations was assembled to facilitate accurate ancestry determination. We show practical applications of this AIM panel, discuss its limitations for admixed individuals and suggest ways to incorporate ancestry information into genetic association studies. CONCLUSION: We demonstrated the utility of a small AISNP panel specifically developed to discern global ancestry. We believe that it will find wide application because of its feasibility and potential for a wide range of applications.
RESUMEN
BACKGROUND: ADH1B is one of the most studied human genes with many polymorphic sites. One of the single nucleotide polymorphism (SNP), rs1229984, coding for the Arg48His substitution, have been associated with many serious diseases including alcoholism and cancers of the digestive system. The derived allele, ADH1B*48His, reaches high frequency only in East Asia and Southwest Asia, and is highly associated with agriculture. Micro-evolutionary study has defined seven haplogroups for ADH1B based on seven SNPs encompassing the gene. Three of those haplogroups, H5, H6, and H7, contain the ADH1B*48His allele. H5 occurs in Southwest Asia and the other two are found in East Asia. H7 is derived from H6 by the derived allele of rs3811801. The H7 haplotype has been shown to have undergone significant positive selection in Han Chinese, Hmong, Koreans, Japanese, Khazak, Mongols, and so on. METHODS: In the present study, we tested whether Tibetans also showed evidence for selection by typing 23 SNPs in the region covering the ADH1B gene in 1,175 individuals from 12 Tibetan populations representing all districts of the Tibet Autonomous Region. Multiple statistics were estimated to examine the gene diversities and positive selection signals among the Tibetans and other populations in East Asia. RESULTS: The larger Tibetan populations (Qamdo, Lhasa, Nagqu, Nyingchi, Shannan, and Shigatse) comprised mostly farmers, have around 12% of H7, and 2% of H6. The smaller populations, living on hunting or recently switched to farming, have lower H7 frequencies (Tingri 9%, Gongbo 8%, Monba and Sherpa 6%). Luoba (2%) and Deng (0%) have even lower frequencies. Long-range haplotype analyses revealed very weak signals of positive selection for H7 among Tibetans. Interestingly, the haplotype diversity of H7 is higher in Tibetans than in any other populations studied, indicating a longer diversification history for that haplogroup in Tibetans. Network analysis on the long-range haplotypes revealed that H7 in the Han Chinese did not come from the Tibetans but from a common ancestor of the two populations. CONCLUSIONS: We argue that H7 of ADH1B originated in the ancestors of Sino-Tibetan populations and flowed to Tibetans very early. However, as Tibetans depend less on crops, and therefore were not significantly affected by selection. Thus, H7 has not risen to a high frequency, whereas the diversity of the haplogroup has accumulated to a very high level.
RESUMEN
Studies of the genomic structure of the Greek population and Southeastern Europe are limited, despite the central position of the area as a gateway for human migrations into Europe. HapMap has provided a unique tool for the analysis of human genetic variation. Europe is represented by the CEU (Northwestern Europe) and the TSI populations (Tuscan Italians from Southern Europe), which serve as reference for the design of genetic association studies. Furthermore, genetic association findings are often transferred to unstudied populations. Although initial studies support the fact that the CEU can, in general, be used as reference for the selection of tagging SNPs in European populations, this has not been extensively studied across Europe. We set out to explore the genomic structure of the Greek population (56 individuals) and compare it to the HapMap TSI and CEU populations. We studied 1112 SNPs (27 regions, 13 chromosomes). Although the HapMap European populations are, in general, a good reference for the Greek population, regions of population differentiation do exist and results should not be light-heartedly generalized. We conclude that, perhaps due to the individual evolutionary history of each genomic region, geographic proximity is not always a perfect guide for selecting a reference population for an unstudied population.
Asunto(s)
Genómica , Proyecto Mapa de Haplotipos , Población Blanca/genética , Alelos , Etnicidad/genética , Frecuencia de los Genes , Estudio de Asociación del Genoma Completo , Grecia/etnología , Humanos , Polimorfismo de Nucleótido SimpleRESUMEN
We propose that haplotyped loci with high heterozygosity can be useful in human identification, especially within families, if recombination is very low among the sites. Three or more SNPs extending over small molecular intervals (<10 KB) can be identified in the human genome to define miniature haplotypes with moderate levels of linkage disequilibrium. Properly selected, these mini-haplotypes (or minihaps) consist of multiple haplotype lineages (alleles) that have evolved from the ancestral human haplotype but show no evidence of recurring recombination, allowing each distinct haplotype to be equated with an allele, all copies of which are essentially identical by descent. Historic recombinants, representing rare events that have drifted to common frequencies over many generations, can be identified in some cases, they do not equate to frequently recurring recombination. We have identified examples in our data collected on various projects and present eight such mini-haplotypes comprised of informative SNPs. We also discuss the ideal characteristics and advantages of minihaps for human familial identification and ancestry inference, and compare them to other types of forensic markers in use and/or that have been proposed. We expect that it is possible to carry out a systematic search and identify a useful panel of mini-haplotypes, with even better properties than the examples presented here.
Asunto(s)
Haplotipos , Polimorfismo de Nucleótido Simple , Alelos , Genoma Humano , Heterocigoto , Humanos , Desequilibrio de Ligamiento , Grupos de Población/genética , Recombinación GenéticaRESUMEN
The potential value of SNPs for individual identification has been recognized by many researchers and different panels have been proposed. Here we present a new interface in the ALFRED database to access compendia of allele frequencies for several published panels of markers for forensic uses. One of those is our panel of individual identification SNPs (IISNPs) based on samples of 44 populations originating from many parts of the world. Here we also present additional data and additional statistical analyses that continue to support the value of our panel of IISNPs as a universal panel. We also describe initial developments of multiplex methods and various robustness analyses for our 45 marker IISNP panel.
Asunto(s)
Antropología Forense , Polimorfismo de Nucleótido Simple , Genética Forense , Frecuencia de los Genes , HumanosRESUMEN
Risk alleles for complex diseases are widely spread throughout human populations. However, little is known about the geographic distribution and frequencies of risk alleles, which may contribute to differences in disease susceptibility and prevalence among populations. Here, we focus on Crohn's disease (CD) as a model for the evolutionary study of complex disease alleles. Recent genome-wide association studies and classical linkage analyses have identified more than 70 susceptible genomic regions for CD in Europeans, but only a few have been confirmed in non-European populations. Our analysis of eight European-specific susceptibility genes using HapMap data shows that at the NOD2 locus the CD-risk alleles are linked with a haplotype specific to CEU at a frequency that is significantly higher compared with the entire genome. We subsequently examined nine global populations and found that the CD-risk alleles spread through hitchhiking with a high-frequency haplotype (H1) exclusive to Europeans. To examine the neutrality of NOD2, we performed phylogenetic network analyses, coalescent simulation, protein structural prediction, characterization of mutation patterns, and estimations of population growth and time to most recent common ancestor (TMRCA). We found that while H1 was significantly prevalent in European populations, the H1 TMRCA predated human migration out of Africa. H1 is likely to have undergone negative selection because 1) the root of H1 genealogy is defined by a preexisting amino acid substitution that causes serious conformational changes to the NOD2 protein, 2) the haplotype has almost become extinct in Africa, and 3) the haplotype has not been affected by the recent European expansion reflected in the other haplotypes. Nevertheless, H1 has survived in European populations, suggesting that the haplotype is advantageous to this group. We propose that several CD-risk alleles, which destabilize and disrupt the NOD2 protein, have been maintained by natural selection on standing variation because the deleterious haplotype of NOD2 is advantageous in diploid individuals due to heterozygote advantage and/or intergenic interactions.