RESUMEN
Estimation of admixture proportions has become one of the most commonly used computational tools in population genomics. However, there is remarkably little population genetic theory on statistical properties of these variables. We develop theoretical results that can accurately predict means and variances of admixture proportions within a population using models with recombination and genetic drift. Based on established theory on measures of multilocus disequilibrium, we show that there is a set of recurrence relations that can be used to derive expectations for higher moments of the admixture proportions distribution. We obtain closed form solutions for some special cases. Using these results, we develop a method for estimating admixture parameters from estimated admixture proportions obtained from programs such as Structure or Admixture. We apply this method to HapMap 3 data and find that the population history of African Americans, as expected, is not best explained by a single admixture event between people of European and African ancestry. The model of constant gene flow starting at 8 generations and ending at 2 generations before present gives the best fit.
Asunto(s)
Flujo Génico , Genética de Población , Desequilibrio de Ligamiento , Conceptos Matemáticos , Modelos Genéticos , Blanco , Humanos , Negro o Afroamericano/genética , Flujo Genético , Genética de Población/estadística & datos numéricos , Recombinación Genética , Blanco/genéticaRESUMEN
Sex-biased gene expression differs across human populations; however, the underlying genetic basis and molecular mechanisms remain largely unknown. Here, we explore the influence of ancestry on sex differences in the human transcriptome and its genetic effects on a Eurasian admixed population: Uyghurs living in Xinjiang (XJU), by analyzing whole-genome sequencing data and transcriptome data of 90 XJU and 40 unrelated Han Chinese individuals. We identified 302 sex-biased expressed genes and 174 sex-biased cis-expression quantitative loci (sb-cis-eQTLs) in XJU, which were enriched in innate immune-related functions, indicating sex differences in immunity. Notably, approximately one-quarter of the sb-cis-eQTLs showed a strong correlation with ancestry composition; i.e. populations of similar ancestry tended to show similar patterns of sex-biased gene expression. Our analysis further suggested that genetic admixture induced a moderate degree of sex-biased gene expression. Interestingly, analysis of chromosome interactions revealed that the X chromosome acted on autosomal immunity-associated genes, partially explaining the sex-biased phenotypic differences. Our work extends the knowledge of sex-biased gene expression from the perspective of genetic admixture and bridges the gap in the exploration of sex-biased phenotypes shaped by autosome and X-chromosome interactions. Notably, we demonstrated that sex chromosomes cannot fully explain sex differentiation in immune-related phenotypes.
Asunto(s)
Pueblo de Asia Central , Pueblos del Este de Asia , Sitios de Carácter Cuantitativo , Femenino , Humanos , Masculino , China , Cromosomas Humanos X/genética , Perfilación de la Expresión Génica/métodos , Regulación de la Expresión Génica , Genética de Población , Caracteres Sexuales , Transcriptoma , Pueblos del Este de Asia/genética , Pueblo de Asia Central/genéticaRESUMEN
Human populations have interacted throughout history, and a considerable portion of modern human populations show evidence of admixture. Local ancestry inference (LAI) is focused on detecting the genetic ancestry of chromosomal segments in admixed individuals and has wide applications. In this work, we proposed a new LAI method based on population-specific single-nucleotide polymorphisms (SNPs) and applied it in the analysis of admixed populations in the 1000 Genomes Project (1KGP). Based on population-specific SNPs in a sliding window, we computed local ancestry information vectors, which are moment estimators of local ancestral proportions, for two haplotypes of an admixed individual and inferred the local ancestral origins. Then we used African (AFR), East Asian (EAS), European (EUR) and South Asian (SAS) populations from the 1KGP and indigenous American (AMR) populations from the Human Genome Diversity Project (HGDP) as reference populations and conducted the proposed LAI analysis on African American populations and American populations in the 1KGP. The results were compared with those obtained by RFMix, G-Nomix and FLARE. We demonstrated that the existence of alleles in a chromosomal region that are specific to a particular reference population and the absence of alleles specific to the other reference populations provide reasonable evidence for determining the ancestral origin of the region. Contemporary AFR, AMR and EUR populations approximate ancestral populations of the admixed populations well, and the results from RFMix, G-Nomix and FLARE largely agree with those from the Ancestral Spectrum Analyzer (ASA), in which the proposed method was implemented. When admixtures are ancient and contemporary reference populations do not satisfactorily approximate ancestral populations, the performances of RFMix, G-Nomix and FLARE deteriorate with increased error rates and fragmented chromosomal segments. In contrast, our method provides fair results.
Asunto(s)
Genética de Población , Genoma Humano , Polimorfismo de Nucleótido Simple , Humanos , Pueblo Asiatico/genética , Población Negra/genética , Genética de Población/métodos , Haplotipos , Proyecto Genoma Humano , Población Blanca/genética , Pueblos Indígenas/genéticaRESUMEN
The fastest way to significantly change the composition of a population is through admixture, an evolutionary mechanism. In animal breeding history, genetic admixture has provided both short-term and long-term advantages by utilizing the phenomenon of complementarity and heterosis in several traits and genetic diversity, respectively. The traditional method of admixture analysis by pedigree records has now been replaced greatly by genome-wide marker data that enables more precise estimations. Among these markers, SNPs have been the popular choice since they are cost-effective, not so laborious, and automation of genotyping is easy. Certain markers can suggest the possibility of a population's origin from a sample of DNA where the source individual is unknown or unwilling to disclose their lineage, which are called Ancestry-Informative Markers (AIMs). Revealing admixture level at the locus-specific level is termed as local ancestry and can be exploited to identify signs of recent selective response and can account for genetic drift. Considering the importance of genetic admixture and local ancestry, in this mini-review, both concepts are illustrated, encompassing basics, their estimation/identification methods, tools/software used and their applications.
RESUMEN
Gene variants in the UGT1A1 gene are strongly associated with circulating bilirubin levels in several populations, as well as other variants of modest effect across the genome. However, the effects of such variants are unknown regarding the Native American ancestry of the admixed Latino population. Our objective was to assess the Native American genetic determinants of serum bilirubin in Chilean admixed adolescents using the local ancestry deconvolution approach. We measured total serum bilirubin levels in 707 adolescents of the Chilean Growth and Obesity Cohort Study (GOCS) and performed high-density genotyping using the Illumina-MEGA array (>1.7 million genotypes). We constructed a local ancestry reference panel with participants from the 1000 Genomes Project, the Human Genome Diversity Project, and our GOCS cohort. Then, we inferred and isolated haplotype tracts of Native American, European, or African origin to perform genome-wide association studies. In the whole cohort, the rs887829 variant and others near UGT1A1 were the unique signals achieving genome-wide statistical significance (b = 0.30; p = 3.34 × 10-57). After applying deconvolution methods, we found that significance is also maintained in Native American (b = 0.35; p = 3.29 × 10-17) and European (b = 0.28; p = 1.14 × 10-23) ancestry components. The rs887829 variant explained a higher percentage of the variance of bilirubin in the Native American (37.6%) compared to European ancestry (28.4%). In Native American ancestry, carriers of the TT genotype of this variant averaged 4-fold higher bilirubinemia compared to the CC genotype (p = 2.82 × 10-12). We showed for the first time that UGT1A1 variants are the primary determinant of bilirubin levels in Native American ancestry, confirming its pan-ethnic relevance. Our study illustrates the general value of the local ancestry deconvolution approach to assessing isolated ancestry effects in admixed populations.
RESUMEN
Admixture is a common biological phenomenon among populations of the same or different species. Identifying admixed tracts within individual genomes can provide valuable information to date admixture events, reconstruct ancestry-specific demographic histories, or detect adaptive introgression, genetic incompatibilities, as well as regions of the genomes affected by (associative-) overdominance. Although many local ancestry inference (LAI) methods have been developed in the last decade, their performance was accessed using large reference panels, which are rarely available for non-model organisms or ancient samples. Moreover, the demographic conditions for which LAI becomes unreliable have not been explicitly outlined. Here, we identify the demographic conditions for which local ancestries can be best estimated using very small reference panels. Furthermore, we compare the performance of two LAI methods (RFMix and MOSAIC) with the performance of a newly developed approach (simpLAI) that can be used even when reference populations consist of single individuals. Based on simulations of various demographic models, we also determine the limits of these LAI tools and propose post-painting filtering steps to reduce false-positive rates and improve the precision and accuracy of the inferred admixed tracts. Besides providing a guide for using LAI, our work shows that reasonable inferences can be obtained from a single diploid genome per reference under demographic conditions that are not uncommon among past human groups and non-model organisms.
Asunto(s)
Genética de Población , Genética de Población/métodos , Humanos , Biología Computacional/métodosRESUMEN
Introduction: The development of reproducible tools for the rapid genotyping of thousands of genetic markers (SNPs) has promoted cross border collaboration in the study of sheep genetic diversity on a global scale. Methods: In this study, we collected a comprehensive dataset of 239 African and Eurasian sheep breeds genotyped at 37,638 filtered SNP markers, with the aim of understanding the genetic structure of 22 North African (NA) sheep breeds within a global context. Results and discussion: We revealed asubstantial enrichment of the gene pool between the north and south shores of the Mediterranean Sea, which corroborates the importance of the maritime route in the history of livestock. The genetic structure of North African breeds mirrors the differential composition of genetic backgrounds following the breed history. Indeed, Maghrebin sheep stocks constitute a geographically and historically coherent unit with any breed-level genetic distinctness among them due to considerable gene flow. We detected a broad east-west pattern describing the most important trend in NA fat-tailed populations, exhibited by the genetic closeness of Egyptian and Libyan fat-tailed sheep to Middle Eastern breeds rather than Maghrebin ones. A Bayesian FST scan analysis revealed a set of genes with potentially key adaptive roles in lipid metabolism (BMP2, PDGFD VEGFA, TBX15, and WARS2), coat pigmentation (SOX10, PICK1, PDGFRA, MC1R, and MTIF) and horn morphology RXFP2) in Tunisian sheep. The local ancestry method detected a Merino signature in Tunisian Noire de Thibar sheep near the SULF1gene introgressed by Merino's European breeds. This study will contribute to the general picture of worldwide sheep genetic diversity.
RESUMEN
By conducting hierarchical clustering along a sliding window, we generated haplotypes across hundreds of re-sequenced genomes in a few hours. We leveraged our method to define cryptic introgressions underlying disease resistance in tomato (Solanum lycopersicum L.) and to discover resistant germplasm in the tomato seed bank. The genomes of 9 accessions with early blight (Alternaria linariae) disease resistance were newly sequenced and analyzed together with published sequences for 770 tomato and wild species accessions, most of which are available in germplasm collections. Identification of common ancestral haplotypes among resistant germplasm enabled rapid fine mapping of recently discovered quantitative trait loci (QTL) conferring resistance and the identification of possible causal variants. The source of the early blight QTL EB-9 was traced to a vintage tomato named 'Devon Surprise'. Another QTL, EB-5, as well as resistance to bacterial spot disease (Xanthomonas spp.), was traced to Hawaii 7998. A genomic survey of all accessions forecasted EB-9-derived resistance in several heirloom tomatoes, accessions of S. lycopersicum var. cerasiforme, and S. pimpinellifolium PI 37009. Our haplotype-based predictions were validated by screening the accessions against the causal pathogen. There was little evidence of EB-5 prevalence in surveyed contemporary germplasm, presenting an opportunity to bolster tomato disease resistance by adding this rare locus. Our work demonstrates practical insights that can be derived from the efficient processing of large genome-scale datasets, including rapid functional prediction of disease resistance QTL in diverse genetic backgrounds. Finally, our work finds more efficient ways to leverage public genetic resources for crop improvement.
Asunto(s)
Solanum lycopersicum , Solanum lycopersicum/genética , Sitios de Carácter Cuantitativo/genética , Resistencia a la Enfermedad/genética , Fenotipo , Genómica , Enfermedades de las Plantas/genética , Enfermedades de las Plantas/microbiologíaRESUMEN
The European wildcat population in Scotland is considered critically endangered as a result of hybridization with introduced domestic cats,1,2 though the time frame over which this gene flow has taken place is unknown. Here, using genome data from modern, museum, and ancient samples, we reconstructed the trajectory and dated the decline of the local wildcat population from viable to severely hybridized. We demonstrate that although domestic cats have been present in Britain for over 2,000 years,3 the onset of hybridization was only within the last 70 years. Our analyses reveal that the domestic ancestry present in modern wildcats is markedly over-represented in many parts of the genome, including the major histocompatibility complex (MHC). We hypothesize that introgression provides wildcats with protection against diseases harbored and introduced by domestic cats, and that this selection contributes to maladaptive genetic swamping through linkage drag. Using the case of the Scottish wildcat, we demonstrate the importance of local ancestry estimates to both understand the impacts of hybridization in wild populations and support conservation efforts to mitigate the consequences of anthropogenic and environmental change.
Asunto(s)
Flujo Génico , Hibridación Genética , Animales , Gatos , EscociaRESUMEN
The Sunong black pig is a new composite breed under development generated from Chinese indigenous pig breeds (i.e., Taihu and Huai) and intensive pig breeds (i.e., Landrace and Berkshire), which is an important genetic material for studying breeding mechanisms. However, there is currently limited knowledge about the genetic structure and germplasm characteristics of Sunong black pigs. To comprehensively understand their genetic composition and ancestry proportions, we performed population structure and local ancestry inference analysis based on whole-genome sequencing information. The results showed that Sunong black pigs could be clustered independently into a group, whose pedigree was intermediate between indigenous and commercial pig breeds, but closer to commercial pigs. Furthermore, local ancestry inference analysis revealed that Sunong black pigs inherited immune and reproductive traits from indigenous pig breeds, including CC and CXC chemokine family, Toll-like receptor family, IFN gene family, ESR1, AREG and EREG gene, while growth and development-related traits were inherited from commercial pig breeds, including IGF1 and GSY2 gene. Overall, Sunong black pigs have formed a relatively stable genome structure with some advantageous traits inherited from their ancestral breeds. This study deepened the understanding of the breeding mechanism of Sunong black pigs and provided a reference for cross-breeding programmes in livestock.
Asunto(s)
Polimorfismo de Nucleótido Simple , Sus scrofa , Porcinos/genética , Animales , Sus scrofa/genética , Linaje , Genoma , Análisis de Secuencia de ADN/veterinaria , Variación GenéticaRESUMEN
The heritability explained by local ancestry markers in an admixed population (hγ2) provides crucial insight into the genetic architecture of a complex disease or trait. Estimation of hγ2 can be susceptible to biases due to population structure in ancestral populations. Here, we present heritability estimation from admixture mapping summary statistics (HAMSTA), an approach that uses summary statistics from admixture mapping to infer heritability explained by local ancestry while adjusting for biases due to ancestral stratification. Through extensive simulations, we demonstrate that HAMSTA hγ2 estimates are approximately unbiased and are robust to ancestral stratification compared to existing approaches. In the presence of ancestral stratification, we show a HAMSTA-derived sampling scheme provides a calibrated family-wise error rate (FWER) of â¼5% for admixture mapping, unlike existing FWER estimation approaches. We apply HAMSTA to 20 quantitative phenotypes of up to 15,988 self-reported African American individuals in the Population Architecture using Genomics and Epidemiology (PAGE) study. We observe hËγ2 in the 20 phenotypes range from 0.0025 to 0.033 (mean hËγ2 = 0.012 ± 9.2 × 10-4), which translates to hË2 ranging from 0.062 to 0.85 (mean hË2 = 0.30 ± 0.023). Across these phenotypes we find little evidence of inflation due to ancestral population stratification in current admixture mapping studies (mean inflation factor of 0.99 ± 0.001). Overall, HAMSTA provides a fast and powerful approach to estimate genome-wide heritability and evaluate biases in test statistics of admixture mapping studies.
Asunto(s)
Negro o Afroamericano , Genética de Población , Humanos , Mapeo Cromosómico , Fenotipo , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
The majority of native cattle are taurine × indicine cattle of diverse phenotypes in the central region of China. Sanjiang cattle, a typical breed in the central region, play a central role in human livelihood and have good adaptability, including resistance to dampness, heat, roughage, and disease, and are thus regarded as an important genetic resource. However, the genetic history of the successful breed remains unknown. Here, we sequenced 10 Sanjiang cattle genomes and compared them to the 70 genomes of 5 representative populations worldwide. We characterized the genomic diversity and breed formation process of Sanjiang cattle and found that Sanjiang cattle have a mixed ancestry of indicine (55.6%) and taurine (33.2%) dating to approximately 30 generations ago, which has shaped the genome of Sanjiang cattle. Through ancestral fragment inference, selective sweep and transcriptomic analysis, we identified several genes linked to lipid metabolism, immune regulation, and stress reactions across the mosaic genome of Sanjiang cattle showing an excess of taurine or indicine ancestry. Taurine ancestry might contribute to meat quality, and indicine ancestry is more conducive to adaptation to hot climate conditions, making Sanjiang cattle a valuable genetic resource for the central region of China. Our results will help us understand the evolutionary history and ancestry components of Sanjiang cattle, which will provide a reference for resource conservation and selective breeding of Chinese native cattle.
RESUMEN
Demographic models of Latin American populations often fail to fully capture their complex evolutionary history, which has been shaped by both recent admixture and deeper-in-time demographic events. To address this gap, we used high-coverage whole-genome data from Indigenous American ancestries in present-day Mexico and existing genomes from across Latin America to infer multiple demographic models that capture the impact of different timescales on genetic diversity. Our approach, which combines analyses of allele frequencies and ancestry tract length distributions, represents a significant improvement over current models in predicting patterns of genetic variation in admixed Latin American populations. We jointly modeled the contribution of European, African, East Asian, and Indigenous American ancestries into present-day Latin American populations. We infer that the ancestors of Indigenous Americans and East Asians diverged â¼30 thousand years ago, and we characterize genetic contributions of recent migrations from East and Southeast Asia to Peru and Mexico. Our inferred demographic histories are consistent across different genomic regions and annotations, suggesting that our inferences are robust to the potential effects of linked selection. In conjunction with published distributions of fitness effects for new nonsynonymous mutations in humans, we show in large-scale simulations that our models recover important features of both neutral and deleterious variation. By providing a more realistic framework for understanding the evolutionary history of Latin American populations, our models can help address the historical under-representation of admixed groups in genomics research and can be a valuable resource for future studies of populations with complex admixture and demographic histories.
Asunto(s)
Genética de Población , Genoma Humano , Humanos , América Latina , Genoma Humano/genética , Demografía , BlancoRESUMEN
Secondary contact zones between deeply divergent, yet interfertile, lineages provide windows into the speciation process. North American grey foxes (Urocyon cinereoargenteus) are divided into western and eastern lineages that diverged approximately 1 million years ago. These ancient lineages currently hybridize in a relatively narrow zone of contact in the southern Great Plains, a pattern more commonly observed in smaller-bodied taxa, which suggests relatively recent contact after a long period of allopatry. Based on local ancestry inference with whole-genome sequencing (n = 43), we identified two distinct Holocene pulses of admixture. The older pulse (500-3500 YBP) reflected unidirectional gene flow from east to west, whereas the more recent pulse (70-200 YBP) of admixture was bi-directional. Augmented with genotyping-by-sequencing data from 216 additional foxes, demographic analyses indicated that the eastern lineage declined precipitously after divergence, remaining small throughout most of the late Pleistocene, and expanding only during the Holocene. Genetic diversity in the eastern lineage was highest in the southeast and lowest near the contact zone, consistent with a westward expansion. Concordantly, distribution modelling indicated that during their isolation, the most suitable habitat occurred far east of today's contact zone or west of the Great Plains. Thus, long-term isolation was likely caused by the small, distant location of the eastern refugium, with recent contact reflecting a large increase in suitable habitat and corresponding demographic expansion from the eastern refugium. Ultimately, long-term isolation in grey foxes may reflect their specialized bio-climatic niche. This system presents an opportunity for future investigation of potential pre- and post-zygotic isolating mechanisms.
Asunto(s)
Zorros , Variación Genética , Animales , Zorros/genética , Flujo Génico , Filogenia , ADN Mitocondrial/genética , DemografíaRESUMEN
BACKGROUND: Metabolic pathways are related to physiological functions and disease states and are influenced by genetic variation and environmental factors. Hispanics/Latino individuals have ancestry-derived genomic regions (local ancestry) from their recent admixture that have been less characterized for associations with metabolite abundance and disease risk. METHODS: We performed admixture mapping of 640 circulating metabolites in 3887 Hispanic/Latino individuals from the Hispanic Community Health Study/Study of Latinos (HCHS/SOL). Metabolites were quantified in fasting serum through non-targeted mass spectrometry (MS) analysis using ultra-performance liquid chromatography-MS/MS. Replication was performed in 1856 nonoverlapping HCHS/SOL participants with metabolomic data. RESULTS: By leveraging local ancestry, this study identified significant ancestry-enriched associations for 78 circulating metabolites at 484 independent regions, including 116 novel metabolite-genomic region associations that replicated in an independent sample. Among the main findings, we identified Native American enriched genomic regions at chromosomes 11 and 15, mapping to FADS1/FADS2 and LIPC, respectively, associated with reduced long-chain polyunsaturated fatty acid metabolites implicated in metabolic and inflammatory pathways. An African-derived genomic region at chromosome 2 was associated with N-acetylated amino acid metabolites. This region, mapped to ALMS1, is associated with chronic kidney disease, a disease that disproportionately burdens individuals of African descent. CONCLUSIONS: Our findings provide important insights into differences in metabolite quantities related to ancestry in admixed populations including metabolites related to regulation of lipid polyunsaturated fatty acids and N-acetylated amino acids, which may have implications for common diseases in populations.
Asunto(s)
Estudio de Asociación del Genoma Completo , Hispánicos o Latinos , Espectrometría de Masas en Tándem , Humanos , Población Negra/genética , Genoma Humano , Estudio de Asociación del Genoma Completo/métodos , Hispánicos o Latinos/genética , Polimorfismo de Nucleótido Simple , Indio Americano o Nativo de Alaska/genética , Metabolismo/genética , Grupos de Población/etnología , Grupos de Población/genéticaRESUMEN
The wild boar (Sus scrofa meridionalis) arrived in Sardinia with the first human settlers in the early Neolithic with the potential to hybridize with the domestic pig (S. s. domesticus) throughout its evolution on the island. In this paper, we investigated the possible microevolutionary effects of such introgressive hybridization on the present wild boar population, comparing Sardinian wild specimens with several commercial pig breeds and Sardinian local pigs, along with a putatively unadmixed wild boar population from Central Italy, all genotyped with a medium density SNP chip. We first aimed at identifying hybrids in the population using different approaches, then examined genomic regions enriched for domestic alleles in the hybrid group, and finally we applied two methods to find regions under positive selection to possibly highlight instances of domestic adaptive introgression into a wild population. We found three hybrids within the Sardinian sample (3.1% out of the whole dataset). We reported 11 significant windows under positive selection with a method that looks for overly differentiated loci in the target population, compared with other two populations. We also identified 82 genomic regions with signs of selection in the domestic pig but not in the wild boar, two of which overlapped with genomic regions enriched for domestic alleles in the hybrid pool. Genes in these regions can be linked with reproductive success. Given our results, domestic introgression does not seem to be pervasive in the Sardinian wild boar. Nevertheless, we suggest monitoring the possible spread of advantageous domestic alleles in the coming years.
Asunto(s)
Adaptación Biológica , Sus scrofa , Animales , Sus scrofa/genética , Hibridación Genética , Genoma , Selección GenéticaRESUMEN
Alzheimer disease (AD) is the most common form of senile dementia, with high incidence late in life in many populations including Caribbean Hispanic (CH) populations. Such admixed populations, descended from more than one ancestral population, can present challenges for genetic studies, including limited sample sizes and unique analytical constraints. Therefore, CH populations and other admixed populations have not been well represented in studies of AD, and much of the genetic variation contributing to AD risk in these populations remains unknown. Here, we conduct genome-wide analysis of AD in multiplex CH families from the Alzheimer Disease Sequencing Project (ADSP). We developed, validated, and applied an implementation of a logistic mixed model for admixture mapping with binary traits that leverages genetic ancestry to identify ancestry-of-origin loci contributing to AD. We identified three loci on chromosome 13q33.3 associated with reduced risk of AD, where associations were driven by Native American (NAM) ancestry. This AD admixture mapping signal spans the FAM155A, ABHD13, TNFSF13B, LIG4, and MYO16 genes and was supported by evidence for association in an independent sample from the Alzheimer's Genetics in Argentina-Alzheimer Argentina consortium (AGA-ALZAR) study with considerable NAM ancestry. We also provide evidence of NAM haplotypes and key variants within 13q33.3 that segregate with AD in the ADSP whole-genome sequencing data. Interestingly, the widely used genome-wide association study approach failed to identify associations in this region. Our findings underscore the potential of leveraging genetic ancestry diversity in recently admixed populations to improve genetic mapping, in this case for AD-relevant loci.
Asunto(s)
Enfermedad de Alzheimer , Humanos , Enfermedad de Alzheimer/genética , Estudio de Asociación del Genoma Completo , Hispánicos o Latinos/genética , Sitios Genéticos/genética , EtnicidadRESUMEN
Genome-wide association studies (GWASs) have identified thousands of variants for disease risk. These studies have predominantly been conducted in individuals of European ancestries, which raises questions about their transferability to individuals of other ancestries. Of particular interest are admixed populations, usually defined as populations with recent ancestry from two or more continental sources. Admixed genomes contain segments of distinct ancestries that vary in composition across individuals in the population, allowing for the same allele to induce risk for disease on different ancestral backgrounds. This mosaicism raises unique challenges for GWASs in admixed populations, such as the need to correctly adjust for population stratification. In this work we quantify the impact of differences in estimated allelic effect sizes for risk variants between ancestry backgrounds on association statistics. Specifically, while the possibility of estimated allelic effect-size heterogeneity by ancestry (HetLanc) can be modeled when performing a GWAS in admixed populations, the extent of HetLanc needed to overcome the penalty from an additional degree of freedom in the association statistic has not been thoroughly quantified. Using extensive simulations of admixed genotypes and phenotypes, we find that controlling for and conditioning effect sizes on local ancestry can reduce statistical power by up to 72%. This finding is especially pronounced in the presence of allele frequency differentiation. We replicate simulation results using 4,327 African-European admixed genomes from the UK Biobank for 12 traits to find that for most significant SNPs, HetLanc is not large enough for GWASs to benefit from modeling heterogeneity in this way.
Asunto(s)
Genética de Población , Estudio de Asociación del Genoma Completo , Humanos , Estudio de Asociación del Genoma Completo/métodos , Frecuencia de los Genes/genética , Genotipo , Fenotipo , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
Humans have had a major influence on the dissemination of crops beyond their native range, thereby offering new hybridization opportunities. Characterizing admixed genomes with mosaic origins generates valuable insight into the adaptive history of crops and the impact on current varietal diversity. We applied the ELAI tool-an efficient local ancestry inference method based on a two-layer hidden Markov model to track segments of wild origin in cultivated accessions in the case of multiway admixtures. Source populations-which may actually be limited and partially admixed-must be generally specified when using such inference models. We thus developed a framework to identify local ancestry with admixed source populations. Using sequencing data for wild and cultivated Coffea canephora (commonly called Robusta), our approach was found to be highly efficient and accurate on simulated hybrids. Application of the method to assess elite Robusta varieties from Vietnam led to the identification of an accession derived from a likely backcross between two genetic groups from the Congo Basin and the western coastal region of Central Africa. Admixtures resulting from crop hybridization and diffusion could thus lead to the generation of elite high-yielding varieties. Our methods should be widely applicable to gain insight into the role of hybridization during plant and animal evolutionary history.
Asunto(s)
Coffea , Café , Humanos , Animales , Coffea/genética , Mapeo Cromosómico , Genoma de Planta , Programas Informáticos , Productos Agrícolas/genéticaRESUMEN
The regulatory elements in proximal and distal regions of genes are involved in the regulation of gene expression. Risk alleles in intronic and intergenic regions may alter gene expression by modifying the binding affinity and stability of diverse DNA-binding proteins implicated in gene expression regulation. By focusing on the local ancestral structure of coding and regulatory regions using the paired whole-genome sequence and tissue-wide transcriptome datasets from the Genotype-Tissue Expression project, we investigated the impact of genetic variants, in aggregate, on tissue-specific gene expression regulation. Local ancestral origins of the coding region, immediate and distant upstream regions, and distal regulatory region were determined using RFMix with the reference panel from the 1000 Genomes Project. For each tissue, inter-individual variation of gene expression levels explained by concordant or discordant local ancestry between coding and regulatory regions was estimated. Compared to European, African descent showed more frequent change in local ancestral structure, with shorter haplotype blocks. The expression level of the Adenosine Deaminase Like (ADAL) gene was significantly associated with admixed ancestral structure in the regulatory region across multiple tissue types. Further validations are required to understand the impact of the local ancestral structure of regulatory regions on gene expression regulation in humans and other species.