RESUMO
Genome-wide association studies (GWASs) can require immense sample sizes to identify variants associated with human health across the frequency spectrum. As the Global Biobank Meta-analysis Initiative (GBMI), Zhou et al. describe a collaborative network across 23 biobanks and 2.2 million participants to address challenges of underrepresentation of diversity in genomic research.
Assuntos
Estudo de Associação Genômica Ampla , Genômica , Humanos , Bancos de Espécimes BiológicosRESUMO
Understanding population health disparities is an essential component of equitable precision health efforts. Epidemiology research often relies on definitions of race and ethnicity, but these population labels may not adequately capture disease burdens and environmental factors impacting specific sub-populations. Here, we propose a framework for repurposing data from electronic health records (EHRs) in concert with genomic data to explore the demographic ties that can impact disease burdens. Using data from a diverse biobank in New York City, we identified 17 communities sharing recent genetic ancestry. We observed 1,177 health outcomes that were statistically associated with a specific group and demonstrated significant differences in the segregation of genetic variants contributing to Mendelian diseases. We also demonstrated that fine-scale population structure can impact the prediction of complex disease risk within groups. This work reinforces the utility of linking genomic data to EHRs and provides a framework toward fine-scale monitoring of population health.
Assuntos
Etnicidade/genética , Saúde da População , Bases de Dados Genéticas , Registros Eletrônicos de Saúde , Genômica , Humanos , AutorrelatoRESUMO
Gene-environment interactions (G × E), the interplay of genetic variation with environmental factors, have a pivotal impact on human complex traits and diseases. Statistically, G × E can be assessed by determining the deviation from expectation of predictive models based solely on the phenotypic effects of genetics or environmental exposures. Despite the unprecedented, widespread and diverse use of G × E analytical frameworks, heterogeneity in their application and reporting hinders their applicability in public health. In this Review, we discuss study design considerations as well as G × E analytical frameworks to assess polygenic liability dependent on the environment, to identify specific genetic variants exhibiting G × E, and to characterize environmental context for these dynamics. We conclude with recommendations to address the most common challenges and pitfalls in the conceptualization, methodology and reporting of G × E studies, as well as future directions.
Assuntos
Interação Gene-Ambiente , Herança Multifatorial , Humanos , Herança Multifatorial/genética , Variação Genética , Predisposição Genética para Doença , Fenótipo , Modelos GenéticosRESUMO
Genetic variation that influences gene expression and splicing is a key source of phenotypic diversity1-5. Although invaluable, studies investigating these links in humans have been strongly biased towards participants of European ancestries, which constrains generalizability and hinders evolutionary research. Here to address these limitations, we developed MAGE, an open-access RNA sequencing dataset of lymphoblastoid cell lines from 731 individuals from the 1000 Genomes Project6, spread across 5 continental groups and 26 populations. Most variation in gene expression (92%) and splicing (95%) was distributed within versus between populations, which mirrored the variation in DNA sequence. We mapped associations between genetic variants and expression and splicing of nearby genes (cis-expression quantitative trait loci (eQTLs) and cis-splicing QTLs (sQTLs), respectively). We identified more than 15,000 putatively causal eQTLs and more than 16,000 putatively causal sQTLs that are enriched for relevant epigenomic signatures. These include 1,310 eQTLs and 1,657 sQTLs that are largely private to underrepresented populations. Our data further indicate that the magnitude and direction of causal eQTL effects are highly consistent across populations. Moreover, the apparent 'population-specific' effects observed in previous studies were largely driven by low resolution or additional independent eQTLs of the same genes that were not detected. Together, our study expands our understanding of human gene expression diversity and provides an inclusive resource for studying the evolution and function of human genomes.
Assuntos
Regulação da Expressão Gênica , Variação Genética , Genoma Humano , Internacionalidade , Locos de Características Quantitativas , Splicing de RNA , Grupos Raciais , Feminino , Humanos , Masculino , Artefatos , Viés , Linhagem Celular , Estudos de Coortes , Conjuntos de Dados como Assunto , Epigenômica , Evolução Molecular , Regulação da Expressão Gênica/genética , Genética Populacional , Genoma Humano/genética , Linfócitos/citologia , Linfócitos/metabolismo , Locos de Características Quantitativas/genética , Grupos Raciais/genética , Splicing de RNA/genética , Análise de Sequência de RNARESUMO
Latin America continues to be severely underrepresented in genomics research, and fine-scale genetic histories and complex trait architectures remain hidden owing to insufficient data1. To fill this gap, the Mexican Biobank project genotyped 6,057 individuals from 898 rural and urban localities across all 32 states in Mexico at a resolution of 1.8 million genome-wide markers with linked complex trait and disease information creating a valuable nationwide genotype-phenotype database. Here, using ancestry deconvolution and inference of identity-by-descent segments, we inferred ancestral population sizes across Mesoamerican regions over time, unravelling Indigenous, colonial and postcolonial demographic dynamics2-6. We observed variation in runs of homozygosity among genomic regions with different ancestries reflecting distinct demographic histories and, in turn, different distributions of rare deleterious variants. We conducted genome-wide association studies (GWAS) for 22 complex traits and found that several traits are better predicted using the Mexican Biobank GWAS compared to the UK Biobank GWAS7,8. We identified genetic and environmental factors associating with trait variation, such as the length of the genome in runs of homozygosity as a predictor for body mass index, triglycerides, glucose and height. This study provides insights into the genetic histories of individuals in Mexico and dissects their complex trait architectures, both crucial for making precision and preventive medicine initiatives accessible worldwide.
Assuntos
Bancos de Espécimes Biológicos , Genética Médica , Genoma Humano , Genômica , Hispânico ou Latino , Humanos , Glicemia/genética , Glicemia/metabolismo , Estatura/genética , Índice de Massa Corporal , Interação Gene-Ambiente , Marcadores Genéticos/genética , Estudo de Associação Genômica Ampla , Hispânico ou Latino/classificação , Hispânico ou Latino/genética , Homozigoto , México , Fenótipo , Triglicerídeos/sangue , Triglicerídeos/genética , Reino Unido , Genoma Humano/genéticaRESUMO
Genome-wide association studies using large-scale genome and exome sequencing data have become increasingly valuable in identifying associations between genetic variants and disease, transforming basic research and translational medicine. However, this progress has not been equally shared across all people and conditions, in part due to limited resources. Leveraging publicly available sequencing data as external common controls, rather than sequencing new controls for every study, can better allocate resources by augmenting control sample sizes or providing controls where none existed. However, common control studies must be carefully planned and executed as even small differences in sample ascertainment and processing can result in substantial bias. Here, we discuss challenges and opportunities for the robust use of common controls in high-throughput sequencing studies, including study design, quality control and statistical approaches. Thoughtful generation and use of large and valuable genetic sequencing data sets will enable investigation of a broader and more representative set of conditions, environments and genetic ancestries than otherwise possible.
Assuntos
Exoma , Estudo de Associação Genômica Ampla , Exoma/genética , Predisposição Genética para Doença , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Sequenciamento do ExomaRESUMO
Polygenic risk scores (PRSs), which often aggregate results from genome-wide association studies, can bridge the gap between initial discovery efforts and clinical applications for the estimation of disease risk using genetics. However, there is notable heterogeneity in the application and reporting of these risk scores, which hinders the translation of PRSs into clinical care. Here, in a collaboration between the Clinical Genome Resource (ClinGen) Complex Disease Working Group and the Polygenic Score (PGS) Catalog, we present the Polygenic Risk Score Reporting Standards (PRS-RS), in which we update the Genetic Risk Prediction Studies (GRIPS) Statement to reflect the present state of the field. Drawing on the input of experts in epidemiology, statistics, disease-specific applications, implementation and policy, this comprehensive reporting framework defines the minimal information that is needed to interpret and evaluate PRSs, especially with respect to downstream clinical applications. Items span detailed descriptions of study populations, statistical methods for the development and validation of PRSs and considerations for the potential limitations of these scores. In addition, we emphasize the need for data availability and transparency, and we encourage researchers to deposit and share PRSs through the PGS Catalog to facilitate reproducibility and comparative benchmarking. By providing these criteria in a structured format that builds on existing standards and ontologies, the use of this framework in publishing PRSs will facilitate translation into clinical care and progress towards defining best practice.
Assuntos
Predisposição Genética para Doença , Genética Médica/normas , Herança Multifatorial/genética , Humanos , Reprodutibilidade dos Testes , Medição de Risco/normasRESUMO
Polygenic scores (PGSs) aggregate the effects of variants across the genome to estimate genetic liability, but have lower performance in external study populations. A new study by Ding et al. has applied a novel framework to estimate the individual-level predictive accuracy of PGSs, and demonstrates that performance reduction occurs linearly with genetic distance.
RESUMO
Genome-wide association studies (GWASs) have been performed to identify host genetic factors for a range of phenotypes, including for infectious diseases. The use of population-based common control subjects from biobanks and extensive consortia is a valuable resource to increase sample sizes in the identification of associated loci with minimal additional expense. Non-differential misclassification of the outcome has been reported when the control subjects are not well characterized, which often attenuates the true effect size. However, for infectious diseases the comparison of affected subjects to population-based common control subjects regardless of pathogen exposure can also result in selection bias. Through simulated comparisons of pathogen-exposed cases and population-based common control subjects, we demonstrate that not accounting for pathogen exposure can result in biased effect estimates and spurious genome-wide significant signals. Further, the observed association can be distorted depending upon strength of the association between a locus and pathogen exposure and the prevalence of pathogen exposure. We also used a real data example from the hepatitis C virus (HCV) genetic consortium comparing HCV spontaneous clearance to persistent infection with both well-characterized control subjects and population-based common control subjects from the UK Biobank. We find biased effect estimates for known HCV clearance-associated loci and potentially spurious HCV clearance associations. These findings suggest that the choice of control subjects is especially important for infectious diseases or outcomes that are conditional upon environmental exposures.
Assuntos
Doenças Transmissíveis , Hepatite C , Humanos , Estudo de Associação Genômica Ampla , Doenças Transmissíveis/genética , Fenótipo , Hepatite C/genética , HepacivirusRESUMO
The heritability explained by local ancestry markers in an admixed population (hγ2) provides crucial insight into the genetic architecture of a complex disease or trait. Estimation of hγ2 can be susceptible to biases due to population structure in ancestral populations. Here, we present heritability estimation from admixture mapping summary statistics (HAMSTA), an approach that uses summary statistics from admixture mapping to infer heritability explained by local ancestry while adjusting for biases due to ancestral stratification. Through extensive simulations, we demonstrate that HAMSTA hγ2 estimates are approximately unbiased and are robust to ancestral stratification compared to existing approaches. In the presence of ancestral stratification, we show a HAMSTA-derived sampling scheme provides a calibrated family-wise error rate (FWER) of â¼5% for admixture mapping, unlike existing FWER estimation approaches. We apply HAMSTA to 20 quantitative phenotypes of up to 15,988 self-reported African American individuals in the Population Architecture using Genomics and Epidemiology (PAGE) study. We observe hËγ2 in the 20 phenotypes range from 0.0025 to 0.033 (mean hËγ2 = 0.012 ± 9.2 × 10-4), which translates to hË2 ranging from 0.062 to 0.85 (mean hË2 = 0.30 ± 0.023). Across these phenotypes we find little evidence of inflation due to ancestral population stratification in current admixture mapping studies (mean inflation factor of 0.99 ± 0.001). Overall, HAMSTA provides a fast and powerful approach to estimate genome-wide heritability and evaluate biases in test statistics of admixture mapping studies.
Assuntos
Negro ou Afro-Americano , Genética Populacional , Humanos , Mapeamento Cromossômico , Fenótipo , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
The possibility of voyaging contact between prehistoric Polynesian and Native American populations has long intrigued researchers. Proponents have pointed to the existence of New World crops, such as the sweet potato and bottle gourd, in the Polynesian archaeological record, but nowhere else outside the pre-Columbian Americas1-6, while critics have argued that these botanical dispersals need not have been human mediated7. The Norwegian explorer Thor Heyerdahl controversially suggested that prehistoric South American populations had an important role in the settlement of east Polynesia and particularly of Easter Island (Rapa Nui)2. Several limited molecular genetic studies have reached opposing conclusions, and the possibility continues to be as hotly contested today as it was when first suggested8-12. Here we analyse genome-wide variation in individuals from islands across Polynesia for signs of Native American admixture, analysing 807 individuals from 17 island populations and 15 Pacific coast Native American groups. We find conclusive evidence for prehistoric contact of Polynesian individuals with Native American individuals (around AD 1200) contemporaneous with the settlement of remote Oceania13-15. Our analyses suggest strongly that a single contact event occurred in eastern Polynesia, before the settlement of Rapa Nui, between Polynesian individuals and a Native American group most closely related to the indigenous inhabitants of present-day Colombia.
Assuntos
Fluxo Gênico/genética , Genoma Humano/genética , Migração Humana/história , Indígenas Centro-Americanos/genética , Indígenas Sul-Americanos/genética , Ilhas , Havaiano Nativo ou Outro Ilhéu do Pacífico/genética , América Central/etnologia , Colômbia/etnologia , Europa (Continente)/etnologia , Genética Populacional , História Medieval , Humanos , Polimorfismo de Nucleotídeo Único/genética , Polinésia , América do Sul/etnologia , Fatores de TempoRESUMO
Since 2005, genome-wide association (GWA) datasets have been largely biased toward sampling European ancestry individuals, and recent studies have shown that GWA results estimated from self-identified European individuals are not transferable to non-European individuals because of various confounding challenges. Here, we demonstrate that enrichment analyses that aggregate SNP-level association statistics at multiple genomic scales-from genes to genomic regions and pathways-have been underutilized in the GWA era and can generate biologically interpretable hypotheses regarding the genetic basis of complex trait architecture. We illustrate examples of the robust associations generated by enrichment analyses while studying 25 continuous traits assayed in 566,786 individuals from seven diverse self-identified human ancestries in the UK Biobank and the Biobank Japan as well as 44,348 admixed individuals from the PAGE consortium including cohorts of African American, Hispanic and Latin American, Native Hawaiian, and American Indian/Alaska Native individuals. We identify 1,000 gene-level associations that are genome-wide significant in at least two ancestry cohorts across these 25 traits as well as highly conserved pathway associations with triglyceride levels in European, East Asian, and Native Hawaiian cohorts.
Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Estudo de Associação Genômica Ampla/métodos , Humanos , Herança Multifatorial , Fenótipo , Polimorfismo de Nucleotídeo Único/genética , Grupos RaciaisRESUMO
Spontaneous clearance of acute hepatitis C virus (HCV) infection is associated with single nucleotide polymorphisms (SNPs) on the MHC class II. We fine-mapped the MHC region in European (n = 1,600; 594 HCV clearance/1,006 HCV persistence) and African (n = 1,869; 340 HCV clearance/1,529 HCV persistence) ancestry individuals and evaluated HCV peptide binding affinity of classical alleles. In both populations, HLA-DQß1Leu26 (p valueMeta = 1.24 × 10-14) located in pocket 4 was negatively associated with HCV spontaneous clearance and HLA-DQß1Pro55 (p valueMeta = 8.23 × 10-11) located in the peptide binding region was positively associated, independently of HLA-DQß1Leu26. These two amino acids are not in linkage disequilibrium (r2 < 0.1) and explain the SNPs and classical allele associations represented by rs2647011, rs9274711, HLA-DQB1∗03:01, and HLA-DRB1∗01:01. Additionally, HCV persistence classical alleles tagged by HLA-DQß1Leu26 had fewer HCV binding epitopes and lower predicted binding affinities compared to clearance alleles (geometric mean of combined IC50 nM of persistence versus clearance; 2,321 nM versus 761.7 nM, p value = 1.35 × 10-38). In summary, MHC class II fine-mapping revealed key amino acids in HLA-DQß1 explaining allelic and SNP associations with HCV outcomes. This mechanistic advance in understanding of natural recovery and immunogenetics of HCV might set the stage for much needed enhancement and design of vaccine to promote spontaneous clearance of HCV infection.
Assuntos
Cadeias beta de HLA-DQ/genética , Hepacivirus/patogenicidade , Hepatite C/genética , Interações Hospedeiro-Patógeno/genética , Polimorfismo de Nucleotídeo Único , Doença Aguda , Alelos , Substituição de Aminoácidos , População Negra , Feminino , Expressão Gênica , Estudo de Associação Genômica Ampla , Genótipo , Cadeias beta de HLA-DQ/imunologia , Hepacivirus/crescimento & desenvolvimento , Hepacivirus/imunologia , Hepatite C/etnologia , Hepatite C/imunologia , Hepatite C/virologia , Interações Hospedeiro-Patógeno/imunologia , Humanos , Leucina/imunologia , Leucina/metabolismo , Masculino , Prolina/imunologia , Prolina/metabolismo , Isoformas de Proteínas/genética , Isoformas de Proteínas/imunologia , Remissão Espontânea , População BrancaRESUMO
Preeclampsia is a multi-organ complication of pregnancy characterized by sudden hypertension and proteinuria that is among the leading causes of preterm delivery and maternal morbidity and mortality worldwide. The heterogeneity of preeclampsia poses a challenge for understanding its etiology and molecular basis. Intriguingly, risk for the condition increases in high-altitude regions such as the Peruvian Andes. To investigate the genetic basis of preeclampsia in a population living at high altitude, we characterized genome-wide variation in a cohort of preeclamptic and healthy Andean families (n = 883) from Puno, Peru, a city located above 3,800 meters of altitude. Our study collected genomic DNA and medical records from case-control trios and duos in local hospital settings. We generated genotype data for 439,314 SNPs, determined global ancestry patterns, and mapped associations between genetic variants and preeclampsia phenotypes. A transmission disequilibrium test (TDT) revealed variants near genes of biological importance for placental and blood vessel function. The top candidate region was found on chromosome 13 of the fetal genome and contains clotting factor genes PROZ, F7, and F10. These findings provide supporting evidence that common genetic variants within coagulation genes play an important role in preeclampsia. A selection scan revealed a potential adaptive signal around the ADAM12 locus on chromosome 10, implicated in pregnancy disorders. Our discovery of an association in a functional pathway relevant to pregnancy physiology in an understudied population of Native American origin demonstrates the increased power of family-based study design and underscores the importance of conducting genetic research in diverse populations.
Assuntos
Pré-Eclâmpsia , Altitude , Fatores de Coagulação Sanguínea , Proteínas Sanguíneas/genética , Estudos de Casos e Controles , Fator VII/genética , Fator X/genética , Feminino , Humanos , Peru/epidemiologia , Placenta , Pré-Eclâmpsia/epidemiologia , Pré-Eclâmpsia/genética , GravidezRESUMO
One mechanism by which genetic factors influence complex traits and diseases is altering gene expression. Direct measurement of gene expression in relevant tissues is rarely tenable; however, genetically regulated gene expression (GReX) can be estimated using prediction models derived from large multi-omic datasets. These approaches have led to the discovery of many gene-trait associations, but whether models derived from predominantly European ancestry (EA) reference panels can map novel associations in ancestrally diverse populations remains unclear. We applied PrediXcan to impute GReX in 51,520 ancestrally diverse Population Architecture using Genomics and Epidemiology (PAGE) participants (35% African American, 45% Hispanic/Latino, 10% Asian, and 7% Hawaiian) across 25 key cardiometabolic traits and relevant tissues to identify 102 novel associations. We then compared associations in PAGE to those in a random subset of 50,000 White British participants from UK Biobank (UKBB50k) for height and body mass index (BMI). We identified 517 associations across 47 tissues in PAGE but not UKBB50k, demonstrating the importance of diverse samples in identifying trait-associated GReX. We observed that variants used in PrediXcan models were either more or less differentiated across continental-level populations than matched-control variants depending on the specific population reflecting sampling bias. Additionally, variants from identified genes specific to either PAGE or UKBB50k analyses were more ancestrally differentiated than those in genes detected in both analyses, underlining the value of population-specific discoveries. This suggests that while EA-derived transcriptome imputation models can identify new associations in non-EA populations, models derived from closely matched reference panels may yield further insights. Our findings call for more diversity in reference datasets of tissue-specific gene expression.
Assuntos
Doenças Cardiovasculares , Estudo de Associação Genômica Ampla , Predisposição Genética para Doença , Humanos , Estilo de Vida , Polimorfismo de Nucleotídeo Único , TranscriptomaRESUMO
Genome-wide association studies (GWAS) have laid the foundation for investigations into the biology of complex traits, drug development and clinical guidelines. However, the majority of discovery efforts are based on data from populations of European ancestry1-3. In light of the differential genetic architecture that is known to exist between populations, bias in representation can exacerbate existing disease and healthcare disparities. Critical variants may be missed if they have a low frequency or are completely absent in European populations, especially as the field shifts its attention towards rare variants, which are more likely to be population-specific4-10. Additionally, effect sizes and their derived risk prediction scores derived in one population may not accurately extrapolate to other populations11,12. Here we demonstrate the value of diverse, multi-ethnic participants in large-scale genomic studies. The Population Architecture using Genomics and Epidemiology (PAGE) study conducted a GWAS of 26 clinical and behavioural phenotypes in 49,839 non-European individuals. Using strategies tailored for analysis of multi-ethnic and admixed populations, we describe a framework for analysing diverse populations, identify 27 novel loci and 38 secondary signals at known loci, as well as replicate 1,444 GWAS catalogue associations across these traits. Our data show evidence of effect-size heterogeneity across ancestries for published GWAS associations, substantial benefits for fine-mapping using diverse cohorts and insights into clinical implications. In the United States-where minority populations have a disproportionately higher burden of chronic conditions13-the lack of representation of diverse populations in genetic research will result in inequitable access to precision medicine for those with the highest burden of disease. We strongly advocate for continued, large genome-wide efforts in diverse populations to maximize genetic discovery and reduce health disparities.
Assuntos
Povo Asiático/genética , População Negra/genética , Estudo de Associação Genômica Ampla/métodos , Hispânico ou Latino/genética , Grupos Minoritários , Herança Multifatorial/genética , Saúde da Mulher , Estatura/genética , Estudos de Coortes , Feminino , Genética Médica/métodos , Equidade em Saúde/tendências , Disparidades nos Níveis de Saúde , Humanos , Masculino , Estados UnidosRESUMO
BACKGROUND: Diarrhea is the second leading cause of death in children under 5 years old worldwide. Known diarrhea risk factors include sanitation, water sources, and pathogens but do not fully explain the heterogeneity in frequency and duration of diarrhea in young children. We evaluated the role of host genetics in diarrhea. METHODS: Using 3 well-characterized birth cohorts from an impoverished area of Dhaka, Bangladesh, we compared infants with no diarrhea in the first year of life to those with an abundance, measured by either frequency or duration. We performed a genome-wide association analysis for each cohort under an additive model and then meta-analyzed across the studies. RESULTS: For diarrhea frequency, we identified 2 genome-wide significant loci associated with not having any diarrhea, on chromosome 21 within the noncoding RNA AP000959 (C allele odds ratio [OR] = 0.31, P = 4.01 × 10-8), and on chromosome 8 within SAMD12 (T allele OR = 0.35, P = 4.74 × 10-7). For duration of diarrhea, we identified 2 loci associated with no diarrhea, including the same locus on chromosome 21 (C allele OR = 0.31, P = 1.59 × 10-8) and another locus on chromosome 17 near WSCD1 (C allele OR = 0.35, P = 1.09 × 10-7). CONCLUSIONS: These loci are in or near genes involved in enteric nervous system development and intestinal inflammation and may be potential targets for diarrhea therapeutics.
Assuntos
Diarreia , Estudo de Associação Genômica Ampla , Criança , Humanos , Lactente , Pré-Escolar , Bangladesh/epidemiologia , Fatores de Risco , Diarreia/epidemiologia , Diarreia/genética , AlelosRESUMO
Genetic association studies of child health outcomes often employ family-based study designs. One of the most popular family-based designs is the case-parent trio design that considers the smallest possible nuclear family consisting of two parents and their affected child. This trio design is particularly advantageous for studying relatively rare disorders because it is less prone to type 1 error inflation due to population stratification compared to population-based study designs (e.g., case-control studies). However, obtaining genetic data from both parents is difficult, from a practical perspective, and many large studies predominantly measure genetic variants in mother-child dyads. While some statistical methods for analyzing parent-child dyad data (most commonly involving mother-child pairs) exist, it is not clear if they provide the same advantage as trio methods in protecting against population stratification, or if a specific dyad design (e.g., case-mother dyads vs. case-mother/control-mother dyads) is more advantageous. In this article, we review existing statistical methods for analyzing genome-wide marker data on dyads and perform extensive simulation experiments to benchmark their type I errors and statistical power under different scenarios. We extend our evaluation to existing methods for analyzing a combination of case-parent trios and dyads together. We apply these methods on genotyped and imputed data from multiethnic mother-child pairs only, case-parent trios only or combinations of both dyads and trios from the Gene, Environment Association Studies consortium (GENEVA), where each family was ascertained through a child affected by nonsyndromic cleft lip with or without cleft palate. Results from the GENEVA study corroborate the findings from our simulation experiments. Finally, we provide recommendations for using statistical genetic association methods for dyads.
Assuntos
Fenda Labial , Fissura Palatina , Benchmarking , Fenda Labial/genética , Fissura Palatina/genética , Feminino , Estudos de Associação Genética , Humanos , Modelos Genéticos , Mães , Relações Pais-Filho , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Throughout human evolutionary history, large-scale migrations have led to intermixing (i.e., admixture) between previously separated human groups. Although classical and recent work have shown that studying admixture can yield novel historical insights, the extent to which this process contributed to adaptation remains underexplored. Here, we introduce a novel statistical model, specific to admixed populations, that identifies loci under selection while determining whether the selection likely occurred post-admixture or prior to admixture in one of the ancestral source populations. Through extensive simulations, we show that this method is able to detect selection, even in recently formed admixed populations, and to accurately differentiate between selection occurring in the ancestral or admixed population. We apply this method to genome-wide SNP data of â¼4,000 individuals in five admixed Latin American cohorts from Brazil, Chile, Colombia, Mexico, and Peru. Our approach replicates previous reports of selection in the human leukocyte antigen region that are consistent with selection post-admixture. We also report novel signals of selection in genomic regions spanning 47 genes, reinforcing many of these signals with an alternative, commonly used local-ancestry-inference approach. These signals include several genes involved in immunity, which may reflect responses to endemic pathogens of the Americas and to the challenge of infectious disease brought by European contact. In addition, some of the strongest signals inferred to be under selection in the Native American ancestral groups of modern Latin Americans overlap with genes implicated in energy metabolism phenotypes, plausibly reflecting adaptations to novel dietary sources available in the Americas.
Assuntos
Genética Populacional , Genoma Humano , Genômica/métodos , Hispânico ou Latino/genética , Humanos , Polimorfismo de Nucleotídeo Único/genética , População Branca/genéticaRESUMO
AIMS/HYPOTHESIS: Type 2 diabetes is a growing global public health challenge. Investigating quantitative traits, including fasting glucose, fasting insulin and HbA1c, that serve as early markers of type 2 diabetes progression may lead to a deeper understanding of the genetic aetiology of type 2 diabetes development. Previous genome-wide association studies (GWAS) have identified over 500 loci associated with type 2 diabetes, glycaemic traits and insulin-related traits. However, most of these findings were based only on populations of European ancestry. To address this research gap, we examined the genetic basis of fasting glucose, fasting insulin and HbA1c in participants of the diverse Population Architecture using Genomics and Epidemiology (PAGE) Study. METHODS: We conducted a GWAS of fasting glucose (n = 52,267), fasting insulin (n = 48,395) and HbA1c (n = 23,357) in participants without diabetes from the diverse PAGE Study (23% self-reported African American, 46% Hispanic/Latino, 40% European, 4% Asian, 3% Native Hawaiian, 0.8% Native American), performing transethnic and population-specific GWAS meta-analyses, followed by fine-mapping to identify and characterise novel loci and independent secondary signals in known loci. RESULTS: Four novel associations were identified (p < 5 × 10-9), including three loci associated with fasting insulin, and a novel, low-frequency African American-specific locus associated with fasting glucose. Additionally, seven secondary signals were identified, including novel independent secondary signals for fasting glucose at the known GCK locus and for fasting insulin at the known PPP1R3B locus in transethnic meta-analysis. CONCLUSIONS/INTERPRETATION: Our findings provide new insights into the genetic architecture of glycaemic traits and highlight the continued importance of conducting genetic studies in diverse populations. DATA AVAILABILITY: Full summary statistics from each of the population-specific and transethnic results are available at NHGRI-EBI GWAS catalog ( https://www.ebi.ac.uk/gwas/downloads/summary-statistics ).