Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 108
Filter
Add more filters

Publication year range
1.
Cell ; 184(8): 2068-2083.e11, 2021 04 15.
Article in English | MEDLINE | ID: mdl-33861964

ABSTRACT

Understanding population health disparities is an essential component of equitable precision health efforts. Epidemiology research often relies on definitions of race and ethnicity, but these population labels may not adequately capture disease burdens and environmental factors impacting specific sub-populations. Here, we propose a framework for repurposing data from electronic health records (EHRs) in concert with genomic data to explore the demographic ties that can impact disease burdens. Using data from a diverse biobank in New York City, we identified 17 communities sharing recent genetic ancestry. We observed 1,177 health outcomes that were statistically associated with a specific group and demonstrated significant differences in the segregation of genetic variants contributing to Mendelian diseases. We also demonstrated that fine-scale population structure can impact the prediction of complex disease risk within groups. This work reinforces the utility of linking genomic data to EHRs and provides a framework toward fine-scale monitoring of population health.


Subject(s)
Ethnicity/genetics , Population Health , Databases, Genetic , Electronic Health Records , Genomics , Humans , Self Report
2.
Cell ; 171(6): 1340-1353.e14, 2017 Nov 30.
Article in English | MEDLINE | ID: mdl-29195075

ABSTRACT

Approximately 15 genes have been directly associated with skin pigmentation variation in humans, leading to its characterization as a relatively simple trait. However, by assembling a global survey of quantitative skin pigmentation phenotypes, we demonstrate that pigmentation is more complex than previously assumed, with genetic architecture varying by latitude. We investigate polygenicity in the KhoeSan populations indigenous to southern Africa who have considerably lighter skin than equatorial Africans. We demonstrate that skin pigmentation is highly heritable, but known pigmentation loci explain only a small fraction of the variance. Rather, baseline skin pigmentation is a complex, polygenic trait in the KhoeSan. Despite this, we identify canonical and non-canonical skin pigmentation loci, including near SLC24A5, TYRP1, SMARCA2/VLDLR, and SNX13, using a genome-wide association approach complemented by targeted resequencing. By considering diverse, under-studied African populations, we show how the architecture of skin pigmentation can vary across humans subject to different local evolutionary pressures.


Subject(s)
Skin Pigmentation , Africa , Black People/genetics , Humans , Polymorphism, Single Nucleotide
3.
Nature ; 622(7984): 775-783, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37821706

ABSTRACT

Latin America continues to be severely underrepresented in genomics research, and fine-scale genetic histories and complex trait architectures remain hidden owing to insufficient data1. To fill this gap, the Mexican Biobank project genotyped 6,057 individuals from 898 rural and urban localities across all 32 states in Mexico at a resolution of 1.8 million genome-wide markers with linked complex trait and disease information creating a valuable nationwide genotype-phenotype database. Here, using ancestry deconvolution and inference of identity-by-descent segments, we inferred ancestral population sizes across Mesoamerican regions over time, unravelling Indigenous, colonial and postcolonial demographic dynamics2-6. We observed variation in runs of homozygosity among genomic regions with different ancestries reflecting distinct demographic histories and, in turn, different distributions of rare deleterious variants. We conducted genome-wide association studies (GWAS) for 22 complex traits and found that several traits are better predicted using the Mexican Biobank GWAS compared to the UK Biobank GWAS7,8. We identified genetic and environmental factors associating with trait variation, such as the length of the genome in runs of homozygosity as a predictor for body mass index, triglycerides, glucose and height. This study provides insights into the genetic histories of individuals in Mexico and dissects their complex trait architectures, both crucial for making precision and preventive medicine initiatives accessible worldwide.


Subject(s)
Biological Specimen Banks , Genetics, Medical , Genome, Human , Genomics , Hispanic or Latino , Humans , Blood Glucose/genetics , Blood Glucose/metabolism , Body Height/genetics , Body Mass Index , Gene-Environment Interaction , Genetic Markers/genetics , Genome-Wide Association Study , Hispanic or Latino/classification , Hispanic or Latino/genetics , Homozygote , Mexico , Phenotype , Triglycerides/blood , Triglycerides/genetics , United Kingdom , Genome, Human/genetics
4.
Nat Rev Genet ; 23(11): 665-679, 2022 11.
Article in English | MEDLINE | ID: mdl-35581355

ABSTRACT

Genome-wide association studies using large-scale genome and exome sequencing data have become increasingly valuable in identifying associations between genetic variants and disease, transforming basic research and translational medicine. However, this progress has not been equally shared across all people and conditions, in part due to limited resources. Leveraging publicly available sequencing data as external common controls, rather than sequencing new controls for every study, can better allocate resources by augmenting control sample sizes or providing controls where none existed. However, common control studies must be carefully planned and executed as even small differences in sample ascertainment and processing can result in substantial bias. Here, we discuss challenges and opportunities for the robust use of common controls in high-throughput sequencing studies, including study design, quality control and statistical approaches. Thoughtful generation and use of large and valuable genetic sequencing data sets will enable investigation of a broader and more representative set of conditions, environments and genetic ancestries than otherwise possible.


Subject(s)
Exome , Genome-Wide Association Study , Exome/genetics , Genetic Predisposition to Disease , High-Throughput Nucleotide Sequencing , Humans , Exome Sequencing
5.
Am J Hum Genet ; 111(1): 11-23, 2024 Jan 04.
Article in English | MEDLINE | ID: mdl-38181729

ABSTRACT

Precision medicine initiatives across the globe have led to a revolution of repositories linking large-scale genomic data with electronic health records, enabling genomic analyses across the entire phenome. Many of these initiatives focus solely on research insights, leading to limited direct benefit to patients. We describe the biobank at the Colorado Center for Personalized Medicine (CCPM Biobank) that was jointly developed by the University of Colorado Anschutz Medical Campus and UCHealth to serve as a unique, dual-purpose research and clinical resource accelerating personalized medicine. This living resource currently has more than 200,000 participants with ongoing recruitment. We highlight the clinical, laboratory, regulatory, and HIPAA-compliant informatics infrastructure along with our stakeholder engagement, consent, recontact, and participant engagement strategies. We characterize aspects of genetic and geographic diversity unique to the Rocky Mountain region, the primary catchment area for CCPM Biobank participants. We leverage linked health and demographic information of the CCPM Biobank participant population to demonstrate the utility of the CCPM Biobank to replicate complex trait associations in the first 33,674 genotyped individuals across multiple disease domains. Finally, we describe our current efforts toward return of clinical genetic test results, including high-impact pathogenic variants and pharmacogenetic information, and our broader goals as the CCPM Biobank continues to grow. Bringing clinical and research interests together fosters unique clinical and translational questions that can be addressed from the large EHR-linked CCPM Biobank resource within a HIPAA- and CLIA-certified environment.


Subject(s)
Learning Health System , Precision Medicine , Humans , Biological Specimen Banks , Colorado , Genomics
6.
Nature ; 597(7877): 522-526, 2021 09.
Article in English | MEDLINE | ID: mdl-34552258

ABSTRACT

Polynesia was settled in a series of extraordinary voyages across an ocean spanning one third of the Earth1, but the sequences of islands settled remain unknown and their timings disputed. Currently, several centuries separate the dates suggested by different archaeological surveys2-4. Here, using genome-wide data from merely 430 modern individuals from 21 key Pacific island populations and novel ancestry-specific computational analyses, we unravel the detailed genetic history of this vast, dispersed island network. Our reconstruction of the branching Polynesian migration sequence reveals a serial founder expansion, characterized by directional loss of variants, that originated in Samoa and spread first through the Cook Islands (Rarotonga), then to the Society (Totaiete ma) Islands (11th century), the western Austral (Tuha'a Pae) Islands and Tuamotu Archipelago (12th century), and finally to the widely separated, but genetically connected, megalithic statue-building cultures of the Marquesas (Te Henua 'Enana) Islands in the north, Raivavae in the south, and Easter Island (Rapa Nui), the easternmost of the Polynesian islands, settled in approximately AD 1200 via Mangareva.


Subject(s)
Genome, Human/genetics , Genomics , Human Migration/history , Native Hawaiian or Other Pacific Islander/genetics , Female , History, Medieval , Humans , Male , Polynesia
7.
Am J Hum Genet ; 110(11): 1853-1862, 2023 11 02.
Article in English | MEDLINE | ID: mdl-37875120

ABSTRACT

The heritability explained by local ancestry markers in an admixed population (hγ2) provides crucial insight into the genetic architecture of a complex disease or trait. Estimation of hγ2 can be susceptible to biases due to population structure in ancestral populations. Here, we present heritability estimation from admixture mapping summary statistics (HAMSTA), an approach that uses summary statistics from admixture mapping to infer heritability explained by local ancestry while adjusting for biases due to ancestral stratification. Through extensive simulations, we demonstrate that HAMSTA hγ2 estimates are approximately unbiased and are robust to ancestral stratification compared to existing approaches. In the presence of ancestral stratification, we show a HAMSTA-derived sampling scheme provides a calibrated family-wise error rate (FWER) of ∼5% for admixture mapping, unlike existing FWER estimation approaches. We apply HAMSTA to 20 quantitative phenotypes of up to 15,988 self-reported African American individuals in the Population Architecture using Genomics and Epidemiology (PAGE) study. We observe hˆγ2 in the 20 phenotypes range from 0.0025 to 0.033 (mean hˆγ2 = 0.012 ± 9.2 × 10-4), which translates to hˆ2 ranging from 0.062 to 0.85 (mean hˆ2 = 0.30 ± 0.023). Across these phenotypes we find little evidence of inflation due to ancestral population stratification in current admixture mapping studies (mean inflation factor of 0.99 ± 0.001). Overall, HAMSTA provides a fast and powerful approach to estimate genome-wide heritability and evaluate biases in test statistics of admixture mapping studies.


Subject(s)
Black or African American , Genetics, Population , Humans , Chromosome Mapping , Phenotype , Polymorphism, Single Nucleotide/genetics
8.
Nature ; 583(7817): 572-577, 2020 07.
Article in English | MEDLINE | ID: mdl-32641827

ABSTRACT

The possibility of voyaging contact between prehistoric Polynesian and Native American populations has long intrigued researchers. Proponents have pointed to the existence of New World crops, such as the sweet potato and bottle gourd, in the Polynesian archaeological record, but nowhere else outside the pre-Columbian Americas1-6, while critics have argued that these botanical dispersals need not have been human mediated7. The Norwegian explorer Thor Heyerdahl controversially suggested that prehistoric South American populations had an important role in the settlement of east Polynesia and particularly of Easter Island (Rapa Nui)2. Several limited molecular genetic studies have reached opposing conclusions, and the possibility continues to be as hotly contested today as it was when first suggested8-12. Here we analyse genome-wide variation in individuals from islands across Polynesia for signs of Native American admixture, analysing 807 individuals from 17 island populations and 15 Pacific coast Native American groups. We find conclusive evidence for prehistoric contact of Polynesian individuals with Native American individuals (around AD 1200) contemporaneous with the settlement of remote Oceania13-15. Our analyses suggest strongly that a single contact event occurred in eastern Polynesia, before the settlement of Rapa Nui, between Polynesian individuals and a Native American group most closely related to the indigenous inhabitants of present-day Colombia.


Subject(s)
Gene Flow/genetics , Genome, Human/genetics , Human Migration/history , Indians, Central American/genetics , Indians, South American/genetics , Islands , Native Hawaiian or Other Pacific Islander/genetics , Central America/ethnology , Colombia/ethnology , Europe/ethnology , Genetics, Population , History, Medieval , Humans , Polymorphism, Single Nucleotide/genetics , Polynesia , South America/ethnology , Time Factors
9.
Am J Hum Genet ; 109(4): 680-691, 2022 04 07.
Article in English | MEDLINE | ID: mdl-35298919

ABSTRACT

Identification of rare-variant associations is crucial to full characterization of the genetic architecture of complex traits and diseases. Essential in this process is the evaluation of novel methods in simulated data that mirror the distribution of rare variants and haplotype structure in real data. Additionally, importing real-variant annotation enables in silico comparison of methods, such as rare-variant association tests and polygenic scoring methods, that focus on putative causal variants. Existing simulation methods are either unable to employ real-variant annotation or severely under- or overestimate the number of singletons and doubletons, thereby reducing the ability to generalize simulation results to real studies. We present RAREsim, a flexible and accurate rare-variant simulation algorithm. Using parameters and haplotypes derived from real sequencing data, RAREsim efficiently simulates the expected variant distribution and enables real-variant annotations. We highlight RAREsim's utility across various genetic regions, sample sizes, ancestries, and variant classes.


Subject(s)
Genetic Variation , Research Design , Computer Simulation , Genetic Variation/genetics , Haplotypes/genetics , Humans , Models, Genetic , Multifactorial Inheritance
10.
Am J Hum Genet ; 109(4): 669-679, 2022 04 07.
Article in English | MEDLINE | ID: mdl-35263625

ABSTRACT

One mechanism by which genetic factors influence complex traits and diseases is altering gene expression. Direct measurement of gene expression in relevant tissues is rarely tenable; however, genetically regulated gene expression (GReX) can be estimated using prediction models derived from large multi-omic datasets. These approaches have led to the discovery of many gene-trait associations, but whether models derived from predominantly European ancestry (EA) reference panels can map novel associations in ancestrally diverse populations remains unclear. We applied PrediXcan to impute GReX in 51,520 ancestrally diverse Population Architecture using Genomics and Epidemiology (PAGE) participants (35% African American, 45% Hispanic/Latino, 10% Asian, and 7% Hawaiian) across 25 key cardiometabolic traits and relevant tissues to identify 102 novel associations. We then compared associations in PAGE to those in a random subset of 50,000 White British participants from UK Biobank (UKBB50k) for height and body mass index (BMI). We identified 517 associations across 47 tissues in PAGE but not UKBB50k, demonstrating the importance of diverse samples in identifying trait-associated GReX. We observed that variants used in PrediXcan models were either more or less differentiated across continental-level populations than matched-control variants depending on the specific population reflecting sampling bias. Additionally, variants from identified genes specific to either PAGE or UKBB50k analyses were more ancestrally differentiated than those in genes detected in both analyses, underlining the value of population-specific discoveries. This suggests that while EA-derived transcriptome imputation models can identify new associations in non-EA populations, models derived from closely matched reference panels may yield further insights. Our findings call for more diversity in reference datasets of tissue-specific gene expression.


Subject(s)
Cardiovascular Diseases , Genome-Wide Association Study , Genetic Predisposition to Disease , Humans , Life Style , Polymorphism, Single Nucleotide , Transcriptome
11.
Am J Hum Genet ; 109(6): 1117-1139, 2022 06 02.
Article in English | MEDLINE | ID: mdl-35588731

ABSTRACT

Preeclampsia is a multi-organ complication of pregnancy characterized by sudden hypertension and proteinuria that is among the leading causes of preterm delivery and maternal morbidity and mortality worldwide. The heterogeneity of preeclampsia poses a challenge for understanding its etiology and molecular basis. Intriguingly, risk for the condition increases in high-altitude regions such as the Peruvian Andes. To investigate the genetic basis of preeclampsia in a population living at high altitude, we characterized genome-wide variation in a cohort of preeclamptic and healthy Andean families (n = 883) from Puno, Peru, a city located above 3,800 meters of altitude. Our study collected genomic DNA and medical records from case-control trios and duos in local hospital settings. We generated genotype data for 439,314 SNPs, determined global ancestry patterns, and mapped associations between genetic variants and preeclampsia phenotypes. A transmission disequilibrium test (TDT) revealed variants near genes of biological importance for placental and blood vessel function. The top candidate region was found on chromosome 13 of the fetal genome and contains clotting factor genes PROZ, F7, and F10. These findings provide supporting evidence that common genetic variants within coagulation genes play an important role in preeclampsia. A selection scan revealed a potential adaptive signal around the ADAM12 locus on chromosome 10, implicated in pregnancy disorders. Our discovery of an association in a functional pathway relevant to pregnancy physiology in an understudied population of Native American origin demonstrates the increased power of family-based study design and underscores the importance of conducting genetic research in diverse populations.


Subject(s)
Pre-Eclampsia , Altitude , Blood Coagulation Factors , Blood Proteins/genetics , Case-Control Studies , Factor VII/genetics , Factor X/genetics , Female , Humans , Peru/epidemiology , Placenta , Pre-Eclampsia/epidemiology , Pre-Eclampsia/genetics , Pregnancy
12.
Nature ; 570(7762): 514-518, 2019 06.
Article in English | MEDLINE | ID: mdl-31217584

ABSTRACT

Genome-wide association studies (GWAS) have laid the foundation for investigations into the biology of complex traits, drug development and clinical guidelines. However, the majority of discovery efforts are based on data from populations of European ancestry1-3. In light of the differential genetic architecture that is known to exist between populations, bias in representation can exacerbate existing disease and healthcare disparities. Critical variants may be missed if they have a low frequency or are completely absent in European populations, especially as the field shifts its attention towards rare variants, which are more likely to be population-specific4-10. Additionally, effect sizes and their derived risk prediction scores derived in one population may not accurately extrapolate to other populations11,12. Here we demonstrate the value of diverse, multi-ethnic participants in large-scale genomic studies. The Population Architecture using Genomics and Epidemiology (PAGE) study conducted a GWAS of 26 clinical and behavioural phenotypes in 49,839 non-European individuals. Using strategies tailored for analysis of multi-ethnic and admixed populations, we describe a framework for analysing diverse populations, identify 27 novel loci and 38 secondary signals at known loci, as well as replicate 1,444 GWAS catalogue associations across these traits. Our data show evidence of effect-size heterogeneity across ancestries for published GWAS associations, substantial benefits for fine-mapping using diverse cohorts and insights into clinical implications. In the United States-where minority populations have a disproportionately higher burden of chronic conditions13-the lack of representation of diverse populations in genetic research will result in inequitable access to precision medicine for those with the highest burden of disease. We strongly advocate for continued, large genome-wide efforts in diverse populations to maximize genetic discovery and reduce health disparities.


Subject(s)
Asian People/genetics , Black People/genetics , Genome-Wide Association Study/methods , Hispanic or Latino/genetics , Minority Groups , Multifactorial Inheritance/genetics , Women's Health , Body Height/genetics , Cohort Studies , Female , Genetics, Medical/methods , Health Equity/trends , Health Status Disparities , Humans , Male , United States
13.
Am J Hum Genet ; 108(11): 2099-2111, 2021 11 04.
Article in English | MEDLINE | ID: mdl-34678161

ABSTRACT

The integration of genomic data into health systems offers opportunities to identify genomic factors underlying the continuum of rare and common disease. We applied a population-scale haplotype association approach based on identity-by-descent (IBD) in a large multi-ethnic biobank to a spectrum of disease outcomes derived from electronic health records (EHRs) and uncovered a risk locus for liver disease. We used genome sequencing and in silico approaches to fine-map the signal to a non-coding variant (c.2784-12T>C) in the gene ABCB4. In vitro analysis confirmed the variant disrupted splicing of the ABCB4 pre-mRNA. Four of five homozygotes had evidence of advanced liver disease, and there was a significant association with liver disease among heterozygotes, suggesting the variant is linked to increased risk of liver disease in an allele dose-dependent manner. Population-level screening revealed the variant to be at a carrier rate of 1.95% in Puerto Rican individuals, likely as the result of a Puerto Rican founder effect. This work demonstrates that integrating EHR and genomic data at a population scale can facilitate strategies for understanding the continuum of genomic risk for common diseases, particularly in populations underrepresented in genomic medicine.


Subject(s)
Delivery of Health Care/organization & administration , Genetic Predisposition to Disease , Liver Diseases/genetics , ATP Binding Cassette Transporter, Subfamily B/genetics , Electronic Health Records , Haplotypes , Heterozygote , Hispanic or Latino/genetics , Homozygote , Humans , Puerto Rico
14.
Am J Hum Genet ; 108(7): 1270-1282, 2021 07 01.
Article in English | MEDLINE | ID: mdl-34157305

ABSTRACT

Publicly available genetic summary data have high utility in research and the clinic, including prioritizing putative causal variants, polygenic scoring, and leveraging common controls. However, summarizing individual-level data can mask population structure, resulting in confounding, reduced power, and incorrect prioritization of putative causal variants. This limits the utility of publicly available data, especially for understudied or admixed populations where additional research and resources are most needed. Although several methods exist to estimate ancestry in individual-level data, methods to estimate ancestry proportions in summary data are lacking. Here, we present Summix, a method to efficiently deconvolute ancestry and provide ancestry-adjusted allele frequencies (AFs) from summary data. Using continental reference ancestry, African (AFR), non-Finnish European (EUR), East Asian (EAS), Indigenous American (IAM), South Asian (SAS), we obtain accurate and precise estimates (within 0.1%) for all simulation scenarios. We apply Summix to gnomAD v.2.1 exome and genome groups and subgroups, finding heterogeneous continental ancestry for several groups, including African/African American (∼84% AFR, ∼14% EUR) and American/Latinx (∼4% AFR, ∼5% EAS, ∼43% EUR, ∼46% IAM). Compared to the unadjusted gnomAD AFs, Summix's ancestry-adjusted AFs more closely match respective African and Latinx reference samples. Even on modern, dense panels of summary statistics, Summix yields results in seconds, allowing for estimation of confidence intervals via block bootstrap. Given an accompanying R package, Summix increases the utility and equity of public genetic resources, empowering novel research opportunities.


Subject(s)
Data Interpretation, Statistical , Metagenomics/methods , Pedigree , Racial Groups/genetics , Alleles , Computer Simulation , Gene Frequency , Humans , Inheritance Patterns , Software
15.
Mol Biol Evol ; 39(4)2022 04 11.
Article in English | MEDLINE | ID: mdl-35460423

ABSTRACT

Throughout human evolutionary history, large-scale migrations have led to intermixing (i.e., admixture) between previously separated human groups. Although classical and recent work have shown that studying admixture can yield novel historical insights, the extent to which this process contributed to adaptation remains underexplored. Here, we introduce a novel statistical model, specific to admixed populations, that identifies loci under selection while determining whether the selection likely occurred post-admixture or prior to admixture in one of the ancestral source populations. Through extensive simulations, we show that this method is able to detect selection, even in recently formed admixed populations, and to accurately differentiate between selection occurring in the ancestral or admixed population. We apply this method to genome-wide SNP data of ∼4,000 individuals in five admixed Latin American cohorts from Brazil, Chile, Colombia, Mexico, and Peru. Our approach replicates previous reports of selection in the human leukocyte antigen region that are consistent with selection post-admixture. We also report novel signals of selection in genomic regions spanning 47 genes, reinforcing many of these signals with an alternative, commonly used local-ancestry-inference approach. These signals include several genes involved in immunity, which may reflect responses to endemic pathogens of the Americas and to the challenge of infectious disease brought by European contact. In addition, some of the strongest signals inferred to be under selection in the Native American ancestral groups of modern Latin Americans overlap with genes implicated in energy metabolism phenotypes, plausibly reflecting adaptations to novel dietary sources available in the Americas.


Subject(s)
Genetics, Population , Genome, Human , Genomics/methods , Hispanic or Latino/genetics , Humans , Polymorphism, Single Nucleotide/genetics , White People/genetics
16.
Hum Genet ; 142(10): 1477-1489, 2023 Oct.
Article in English | MEDLINE | ID: mdl-37658231

ABSTRACT

Inadequate representation of non-European ancestry populations in genome-wide association studies (GWAS) has limited opportunities to isolate functional variants. Fine-mapping in multi-ancestry populations should improve the efficiency of prioritizing variants for functional interrogation. To evaluate this hypothesis, we leveraged ancestry architecture to perform comparative GWAS and fine-mapping of obesity-related phenotypes in European ancestry populations from the UK Biobank (UKBB) and multi-ancestry samples from the Population Architecture for Genetic Epidemiology (PAGE) consortium with comparable sample sizes. In the investigated regions with genome-wide significant associations for obesity-related traits, fine-mapping in our ancestrally diverse sample led to 95% and 99% credible sets (CS) with fewer variants than in the European ancestry sample. Lead fine-mapped variants in PAGE regions had higher average coding scores, and higher average posterior probabilities for causality compared to UKBB. Importantly, 99% CS in PAGE loci contained strong expression quantitative trait loci (eQTLs) in adipose tissues or harbored more variants in tighter linkage disequilibrium (LD) with eQTLs. Leveraging ancestrally diverse populations with heterogeneous ancestry architectures, coupled with functional annotation, increased fine-mapping efficiency and performance, and reduced the set of candidate variants for consideration for future functional studies. Significant overlap in genetic causal variants across populations suggests generalizability of genetic mechanisms underpinning obesity-related traits across populations.


Subject(s)
Genome-Wide Association Study , Obesity , Humans , Molecular Epidemiology , Linkage Disequilibrium , Obesity/genetics , Quantitative Trait Loci/genetics
17.
PLoS Comput Biol ; 18(2): e1009059, 2022 02.
Article in English | MEDLINE | ID: mdl-35192601

ABSTRACT

Highly polymorphic interaction of KIR3DL1 and KIR3DS1 with HLA class I ligands modulates the effector functions of natural killer (NK) cells and some T cells. This genetically determined diversity affects severity of infections, immune-mediated diseases, and some cancers, and impacts the course of immunotherapies, including transplantation. KIR3DL1 is an inhibitory receptor, and KIR3DS1 is an activating receptor encoded by the KIR3DL1/S1 gene that has more than 200 diverse and divergent alleles. Determination of KIR3DL1/S1 genotypes for medical application is hampered by complex sequence and structural variation, requiring targeted approaches to generate and analyze high-resolution allele data. To overcome these obstacles, we developed and optimized a model for imputing KIR3DL1/S1 alleles at high-resolution from whole-genome SNP data. We designed the model to represent a substantial component of human genetic diversity. Our Global imputation model is effective at genotyping KIR3DL1/S1 alleles with an accuracy ranging from 88% in Africans to 97% in East Asians, with mean specificity of 99% and sensitivity of 95% for alleles >1% frequency. We used the established algorithm of the HIBAG program, in a modification named Pulling Out Natural killer cell Genomics (PONG). Because HIBAG was designed to impute HLA alleles also from whole-genome SNP data, PONG allows combinatorial diversity of KIR3DL1/S1 with HLA-A and -B to be analyzed using complementary techniques on a single data source. The use of PONG thus negates the need for targeted sequencing data in very large-scale association studies where such methods might not be tractable.


Subject(s)
Receptors, KIR3DL1 , Receptors, KIR3DS1 , Alleles , Genotype , HLA-B Antigens/genetics , Humans , Receptors, KIR/genetics , Receptors, KIR3DL1/genetics , Receptors, KIR3DS1/genetics
18.
PLoS Genet ; 16(8): e1008927, 2020 08.
Article in English | MEDLINE | ID: mdl-32797036

ABSTRACT

The genetic control of gene expression is a core component of human physiology. For the past several years, transcriptome-wide association studies have leveraged large datasets of linked genotype and RNA sequencing information to create a powerful gene-based test of association that has been used in dozens of studies. While numerous discoveries have been made, the populations in the training data are overwhelmingly of European descent, and little is known about the generalizability of these models to other populations. Here, we test for cross-population generalizability of gene expression prediction models using a dataset of African American individuals with RNA-Seq data in whole blood. We find that the default models trained in large datasets such as GTEx and DGN fare poorly in African Americans, with a notable reduction in prediction accuracy when compared to European Americans. We replicate these limitations in cross-population generalizability using the five populations in the GEUVADIS dataset. Via realistic simulations of both populations and gene expression, we show that accurate cross-population generalizability of transcriptome prediction only arises when eQTL architecture is substantially shared across populations. In contrast, models with non-identical eQTLs showed patterns similar to real-world data. Therefore, generating RNA-Seq data in diverse populations is a critical step towards multi-ethnic utility of gene expression prediction.


Subject(s)
Black or African American/genetics , Genome-Wide Association Study/methods , Models, Genetic , Transcriptome , Gene Expression Profiling/methods , Gene Expression Profiling/standards , Genome-Wide Association Study/standards , Humans , Quantitative Trait Loci , RNA-Seq/methods , RNA-Seq/standards , Reference Standards
19.
J Allergy Clin Immunol ; 149(1): 145-155, 2022 01.
Article in English | MEDLINE | ID: mdl-34111454

ABSTRACT

BACKGROUND: While numerous genetic loci associated with atopic dermatitis (AD) have been discovered, to date, work leveraging the combined burden of AD risk variants across the genome to predict disease risk has been limited. OBJECTIVES: This study aims to determine whether polygenic risk scores (PRSs) relying on genetic determinants for AD provide useful predictions for disease occurrence and severity. It also explicitly tests the value of including genome-wide association studies of related allergic phenotypes and known FLG loss-of-function (LOF) variants. METHODS: AD PRSs were constructed for 1619 European American individuals from the Atopic Dermatitis Research Network using an AD training dataset and an atopic training dataset including AD, childhood onset asthma, and general allergy. Additionally, whole genome sequencing data were used to explore genetic scoring specific to FLG LOF mutations. RESULTS: Genetic scores derived from the AD-only genome-wide association studies were predictive of AD cases (PRSAD: odds ratio [OR], 1.70; 95% CI, 1.49-1.93). Accuracy was first improved when PRSs were built off the larger atopy genome-wide association studies (PRSAD+: OR, 2.16; 95% CI, 1.89-2.47) and further improved when including FLG LOF mutations (PRSAD++: OR, 3.23; 95% CI, 2.57-4.07). Importantly, while all 3 PRSs correlated with AD severity, the best prediction was from PRSAD++, which distinguished individuals with severe AD from control subjects with OR of 3.86 (95% CI, 2.77-5.36). CONCLUSIONS: This study demonstrates how PRSs for AD that include genetic determinants across atopic phenotypes and FLG LOF variants may be a promising tool for identifying individuals at high risk for developing disease and specifically severe disease.


Subject(s)
Dermatitis, Atopic/genetics , Filaggrin Proteins/genetics , Female , Genetic Predisposition to Disease , Genome-Wide Association Study , Humans , Infant , Linkage Disequilibrium , Loss of Function Mutation , Male , Phenotype
20.
Mol Biol Evol ; 38(6): 2582-2596, 2021 05 19.
Article in English | MEDLINE | ID: mdl-33616658

ABSTRACT

Human natural killer (NK) cells are essential for controlling infection, cancer, and fetal development. NK cell functions are modulated by interactions between polymorphic inhibitory killer cell immunoglobulin-like receptors (KIR) and polymorphic HLA-A, -B, and -C ligands expressed on tissue cells. All HLA-C alleles encode a KIR ligand and contribute to reproduction and immunity. In contrast, only some HLA-A and -B alleles encode KIR ligands and they focus on immunity. By high-resolution analysis of KIR and HLA-A, -B, and -C genes, we show that the Chinese Southern Han (CHS) are significantly enriched for interactions between inhibitory KIR and HLA-A and -B. This enrichment has had substantial input through population admixture with neighboring populations, who contributed HLA class I haplotypes expressing the KIR ligands B*46:01 and B*58:01, which subsequently rose to high frequency by natural selection. Consequently, over 80% of Southern Han HLA haplotypes encode more than one KIR ligand. Complementing the high number of KIR ligands, the CHS KIR locus combines a high frequency of genes expressing potent inhibitory KIR, with a low frequency of those expressing activating KIR. The Southern Han centromeric KIR region encodes strong, conserved, inhibitory HLA-C-specific receptors, and the telomeric region provides a high number and diversity of inhibitory HLA-A and -B-specific receptors. In all these characteristics, the CHS represent other East Asians, whose NK cell repertoires are thus enhanced in quantity, diversity, and effector strength, likely augmenting resistance to endemic viral infections.


Subject(s)
Evolution, Molecular , Genes, MHC Class I , Killer Cells, Natural/physiology , Receptors, KIR/genetics , China , HLA-A Antigens/metabolism , HLA-B Antigens/metabolism , Humans , Receptors, KIR/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL