RESUMO
Fully understanding autism spectrum disorder (ASD) genetics requires whole-genome sequencing (WGS). We present the latest release of the Autism Speaks MSSNG resource, which includes WGS data from 5,100 individuals with ASD and 6,212 non-ASD parents and siblings (total n = 11,312). Examining a wide variety of genetic variants in MSSNG and the Simons Simplex Collection (SSC; n = 9,205), we identified ASD-associated rare variants in 718/5,100 individuals with ASD from MSSNG (14.1%) and 350/2,419 from SSC (14.5%). Considering genomic architecture, 52% were nuclear sequence-level variants, 46% were nuclear structural variants (including copy-number variants, inversions, large insertions, uniparental isodisomies, and tandem repeat expansions), and 2% were mitochondrial variants. Our study provides a guidebook for exploring genotype-phenotype correlations in families who carry ASD-associated rare variants and serves as an entry point to the expanded studies required to dissect the etiology in the â¼85% of the ASD population that remain idiopathic.
Assuntos
Transtorno do Espectro Autista , Transtorno Autístico , Humanos , Transtorno do Espectro Autista/genética , Predisposição Genética para Doença , Variações do Número de Cópias de DNA/genética , GenômicaRESUMO
The increasing proportion of variance in human complex traits explained by polygenic scores, along with progress in preimplantation genetic diagnosis, suggests the possibility of screening embryos for traits such as height or cognitive ability. However, the expected outcomes of embryo screening are unclear, which undermines discussion of associated ethical concerns. Here, we use theory, simulations, and real data to evaluate the potential gain of embryo screening, defined as the difference in trait value between the top-scoring embryo and the average embryo. The gain increases very slowly with the number of embryos but more rapidly with the variance explained by the score. Given current technology, the average gain due to screening would be ≈2.5 cm for height and ≈2.5 IQ points for cognitive ability. These mean values are accompanied by wide prediction intervals, and indeed, in large nuclear families, the majority of children top-scoring for height are not the tallest.
Assuntos
Embrião de Mamíferos/metabolismo , Testes Genéticos , Herança Multifatorial/genética , Adulto , Família , Estudo de Associação Genômica Ampla , Humanos , FenótipoRESUMO
Lennon et al. recently proposed a clinical polygenic score (PGS) pipeline as part of the Electronic Medical Records and Genomics (eMERGE) network initiative. In this spotlight article we discuss the broader context for the use of PGS in preventive medicine and highlight key limitations and challenges facing their inclusion in prediction models.
Assuntos
Herança Multifatorial , Herança Multifatorial/genética , Humanos , Genômica , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Registros Eletrônicos de Saúde , Medicina PreventivaRESUMO
Tumor mutational burden (TMB), the total number of somatic mutations in the tumor, and copy number burden (CNB), the corresponding measure of aneuploidy, are established fundamental somatic features and emerging biomarkers for immunotherapy. However, the genetic and non-genetic influences on TMB/CNB and, critically, the manner by which they influence patient outcomes remain poorly understood. Here, we present a large germline-somatic study of TMB/CNB with >23,000 individuals across 17 cancer types, of which 12,000 also have extensive clinical, treatment, and overall survival (OS) measurements available. We report dozens of clinical associations with TMB/CNB, observing older age and male sex to have a strong effect on TMB and weaker impact on CNB. We additionally identified significant germline influences on TMB/CNB, including fine-scale European ancestry and germline polygenic risk scores (PRSs) for smoking, tanning, white blood cell counts, and educational attainment. We quantify the causal effect of exposures on somatic mutational processes using Mendelian randomization. Many of the identified features associated with TMB/CNB were additionally associated with OS for individuals treated at a single tertiary cancer center. For individuals receiving immunotherapy, we observed a complex relationship between PRSs for educational attainment, self-reported college attainment, TMB, and survival, suggesting that the influence of this biomarker may be substantially modified by socioeconomic status. While the accumulation of somatic alterations is a stochastic process, our work demonstrates that it can be shaped by host characteristics including germline genetics.
Assuntos
Neoplasias , Humanos , Masculino , Mutação/genética , Neoplasias/genética , Neoplasias/patologia , Imunoterapia , Biomarcadores Tumorais/genética , Células Germinativas/patologiaRESUMO
Methods of estimating polygenic scores (PGSs) from genome-wide association studies are increasingly utilized. However, independent method evaluation is lacking, and method comparisons are often limited. Here, we evaluate polygenic scores derived via seven methods in five biobank studies (totaling about 1.2 million participants) across 16 diseases and quantitative traits, building on a reference-standardized framework. We conducted meta-analyses to quantify the effects of method choice, hyperparameter tuning, method ensembling, and the target biobank on PGS performance. We found that no single method consistently outperformed all others. PGS effect sizes were more variable between biobanks than between methods within biobanks when methods were well tuned. Differences between methods were largest for the two investigated autoimmune diseases, seropositive rheumatoid arthritis and type 1 diabetes. For most methods, cross-validation was more reliable for tuning hyperparameters than automatic tuning (without the use of target data). For a given target phenotype, elastic net models combining PGS across methods (ensemble PGS) tuned in the UK Biobank provided consistent, high, and cross-biobank transferable performance, increasing PGS effect sizes (ß coefficients) by a median of 5.0% relative to LDpred2 and MegaPRS (the two best-performing single methods when tuned with cross-validation). Our interactively browsable online-results and open-source workflow prspipe provide a rich resource and reference for the analysis of polygenic scoring methods across biobanks.
Assuntos
Bancos de Espécimes Biológicos , Estudo de Associação Genômica Ampla , Herança Multifatorial , Humanos , Herança Multifatorial/genética , Fenótipo , Diabetes Mellitus Tipo 1/genética , Polimorfismo de Nucleotídeo Único , Aprendizado de MáquinaRESUMO
Polygenic risk scores (PRS) enhance population risk stratification and advance personalized medicine, but existing methods face several limitations, encompassing issues related to computational burden, predictive accuracy, and adaptability to a wide range of genetic architectures. To address these issues, we propose Aggregated L0Learn using Summary-level data (ALL-Sum), a fast and scalable ensemble learning method for computing PRS using summary statistics from genome-wide association studies (GWAS). ALL-Sum leverages a L0L2 penalized regression and ensemble learning across tuning parameters to flexibly model traits with diverse genetic architectures. In extensive large-scale simulations across a wide range of polygenicity and GWAS sample sizes, ALL-Sum consistently outperformed popular alternative methods in terms of prediction accuracy, runtime, and memory usage by 10%, 20-fold, and threefold, respectively, and demonstrated robustness to diverse genetic architectures. We validated the performance of ALL-Sum in real data analysis of 11 complex traits using GWAS summary statistics from nine data sources, including the Global Lipids Genetics Consortium, Breast Cancer Association Consortium, and FinnGen Biobank, with validation in the UK Biobank. Our results show that on average, ALL-Sum obtained PRS with 25% higher accuracy on average, with 15 times faster computation and half the memory than the current state-of-the-art methods, and had robust performance across a wide range of traits and diseases. Furthermore, our method demonstrates stable prediction when using linkage disequilibrium computed from different data sources. ALL-Sum is available as a user-friendly R software package with publicly available reference data for streamlined analysis.
Assuntos
Estudo de Associação Genômica Ampla , Herança Multifatorial , Humanos , Herança Multifatorial/genética , Estudo de Associação Genômica Ampla/métodos , Aprendizado de Máquina , Predisposição Genética para Doença , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Multi-omic analysis is an effective approach for dissecting the mechanisms of diseases; however, collecting multi-omic data in large populations is time-consuming and costly. Recently, Xu et al. developed genetic scores for multi-omic traits and demonstrated their utilization to gain novel insights, advancing the application of multi-omic data in disease research.
Assuntos
Multiômica , FenótipoRESUMO
Polygenic scores (PGSs) aggregate the effects of variants across the genome to estimate genetic liability, but have lower performance in external study populations. A new study by Ding et al. has applied a novel framework to estimate the individual-level predictive accuracy of PGSs, and demonstrates that performance reduction occurs linearly with genetic distance.
RESUMO
Autoimmunity and cancer represent two different aspects of immune dysfunction. Autoimmunity is characterized by breakdowns in immune self-tolerance, while impaired immune surveillance can allow for tumorigenesis. The class I major histocompatibility complex (MHC-I), which displays derivatives of the cellular peptidome for immune surveillance by CD8+ T cells, serves as a common genetic link between these conditions. As melanoma-specific CD8+ T cells have been shown to target melanocyte-specific peptide antigens more often than melanoma-specific antigens, we investigated whether vitiligo- and psoriasis-predisposing MHC-I alleles conferred a melanoma-protective effect. In individuals with cutaneous melanoma from both The Cancer Genome Atlas (n = 451) and an independent validation set (n = 586), MHC-I autoimmune-allele carrier status was significantly associated with a later age of melanoma diagnosis. Furthermore, MHC-I autoimmune-allele carriers were significantly associated with decreased risk of developing melanoma in the Million Veteran Program (OR = 0.962, p = 0.024). Existing melanoma polygenic risk scores (PRSs) did not predict autoimmune-allele carrier status, suggesting these alleles provide orthogonal risk-relevant information. Mechanisms of autoimmune protection were neither associated with improved melanoma-driver mutation association nor improved gene-level conserved antigen presentation relative to common alleles. However, autoimmune alleles showed higher affinity relative to common alleles for particular windows of melanocyte-conserved antigens and loss of heterozygosity of autoimmune alleles caused the greatest reduction in presentation for several conserved antigens across individuals with loss of HLA alleles. Overall, this study presents evidence that MHC-I autoimmune-risk alleles modulate melanoma risk unaccounted for by current PRSs.
Assuntos
Melanoma , Neoplasias Cutâneas , Humanos , Alelos , Melanoma/genética , Melanoma/metabolismo , Linfócitos T CD8-Positivos/metabolismo , Neoplasias Cutâneas/genética , Histocompatibilidade , Antígenos de Histocompatibilidade Classe I/genéticaRESUMO
Accurate polygenic scores (PGSs) facilitate the genetic prediction of complex traits and aid in the development of personalized medicine. Here, we develop a statistical method called multi-trait assisted PGS (mtPGS), which can construct accurate PGSs for a target trait of interest by leveraging multiple traits relevant to the target trait. Specifically, mtPGS borrows SNP effect size similarity information between the target trait and its relevant traits to improve the effect size estimation on the target trait, thus achieving accurate PGSs. In the process, mtPGS flexibly models the shared genetic architecture between the target and the relevant traits to achieve robust performance, while explicitly accounting for the environmental covariance among them to accommodate different study designs with various sample overlap patterns. In addition, mtPGS uses only summary statistics as input and relies on a deterministic algorithm with several algebraic techniques for scalable computation. We evaluate the performance of mtPGS through comprehensive simulations and applications to 25 traits in the UK Biobank, where in the real data mtPGS achieves an average of 0.90%-52.91% accuracy gain compared to the state-of-the-art PGS methods. Overall, mtPGS represents an accurate, fast, and robust solution for PGS construction in biobank-scale datasets.
Assuntos
Estudo de Associação Genômica Ampla , Herança Multifatorial , Humanos , Herança Multifatorial/genética , Estudo de Associação Genômica Ampla/métodos , Fenótipo , Algoritmos , Projetos de PesquisaRESUMO
The coefficient of determination (R2) is a well-established measure to indicate the predictive ability of polygenic scores (PGSs). However, the sampling variance of R2 is rarely considered so that 95% confidence intervals (CI) are not usually reported. Moreover, when comparisons are made between PGSs based on different discovery samples, the sampling covariance of R2 is required to test the difference between them. Here, we show how to estimate the variance and covariance of R2 values to assess the 95% CI and p value of the R2 difference. We apply this approach to real data calculating PGSs in 28,880 European participants derived from UK Biobank (UKBB) and Biobank Japan (BBJ) GWAS summary statistics for cholesterol and BMI. We quantify the significantly higher predictive ability of UKBB PGSs compared to BBJ PGSs (p value 7.6e-31 for cholesterol and 1.4e-50 for BMI). A joint model of UKBB and BBJ PGSs significantly improves the predictive ability, compared to a model of UKBB PGS only (p value 3.5e-05 for cholesterol and 1.3e-28 for BMI). We also show that the predictive ability of regulatory SNPs is significantly enriched over non-regulatory SNPs for cholesterol (p value 8.9e-26 for UKBB and 3.8e-17 for BBJ). We suggest that the proposed approach (available in R package r2redux) should be used to test the statistical significance of difference between pairs of PGSs, which may help to draw a correct conclusion about the comparative predictive ability of PGSs.
Assuntos
Herança Multifatorial , Polimorfismo de Nucleotídeo Único , Humanos , Estudo de Associação Genômica AmplaRESUMO
Genetic variants used as instruments for exposures in Mendelian randomisation (MR) analyses may have horizontal pleiotropic effects (i.e., influence outcomes via pathways other than through the exposure), which can undermine the validity of results. We examined the extent of this using smoking behaviours as an example. We first ran a phenome-wide association study in UK Biobank, using a smoking initiation genetic instrument. From the most strongly associated phenotypes, we selected those we considered could either plausibly or not plausibly be caused by smoking. We examined associations between genetic instruments for smoking initiation, smoking heaviness and lifetime smoking and these phenotypes in UK Biobank and the Avon Longitudinal Study of Parents and Children (ALSPAC). We conducted negative control analyses among never smokers, including children. We found evidence that smoking-related genetic instruments were associated with phenotypes not plausibly caused by smoking in UK Biobank and (to a lesser extent) ALSPAC. We observed associations with phenotypes among never smokers. Our results demonstrate that smoking-related genetic risk scores are associated with unexpected phenotypes that are less plausibly downstream of smoking. This may reflect horizontal pleiotropy in these genetic risk scores, and we would encourage researchers to exercise caution this when using these and genetic risk scores for other complex behavioural exposures. We outline approaches that could be taken to consider this and overcome issues caused by potential horizontal pleiotropy, for example, in genetically informed causal inference analyses (e.g., MR) it is important to consider negative control outcomes and triangulation approaches, to avoid arriving at incorrect conclusions.
RESUMO
BACKGROUND: Life's Simple 7 (LS7) is an easily calculated and interpreted metric of cardiovascular health based on 7 domains: smoking, diet, physical activity, body mass index, blood pressure, cholesterol, and fasting glucose. The Life's Essential 8 (LE8) metric was subsequently introduced, adding sleep metrics and revisions of the previous 7 domains. Although calculating LE8 requires additional information, we hypothesized that it would be a more reliable index of cardiovascular health. METHODS: Both the LS7 and LE8 metrics yield scores with higher values indicating lower risk. These were calculated among 11 609 Black and White participants free of baseline cardiovascular disease (CVD) in the Reasons for Geographic and Racial Differences in Stroke study, enrolled in 2003 to 2007, and followed for a median of 13 years. Differences in 10-year risk of incident CVD (coronary heart disease or stroke) were calculated as a function LS7, and LE8 scores were calculated using Kaplan-Meier and proportional hazards analyses. Differences in incident CVD discrimination were quantified by difference in the c-statistic. RESULTS: For both LS7 and LE8, the 10-year risk was approximately 5% for participants around the 99th percentile of scores, and a 4× higher 20% risk for participants around the first percentile. Comparing LS7 to LE8, 10-year risk was nearly identical for individuals at the same relative position in score distribution. For example, the "cluster" of 2013 participants with an LS7 score of 7 was at the 35.8th percentile in distribution of LS7 scores, and had an estimated 10-year CVD risk of 8.4% (95% CI, 7.2%-9.8%). In a similar location in the LE8 distribution, the 1457 participants with an LE8 score of 60±2.5 at the 39.4th percentile of LE8 scores had a 10-year risk of CVD of 8.5% (95% CI, 7.1%-10.1%), similar to the cluster defined by LS7. The age-race-sex adjusted c-statistic of the LS7 model was 0.691 (95% CI, 0.667-0.705), and 0.695 for LE8 (95% CI, 0.681-0.709) (P for difference, 0.12). CONCLUSIONS: Both LS7 and LE8 were associated with incident CVD, with discrimination of the 2 indices practically indistinguishable. As a simpler metric, LS7 may be favored for use by the general population and clinicians.
Assuntos
Doenças Cardiovasculares , Acidente Vascular Cerebral , Humanos , Estados Unidos/epidemiologia , Doenças Cardiovasculares/diagnóstico , Doenças Cardiovasculares/epidemiologia , Fatores de Risco , Fumar/epidemiologia , Fatores de Risco de Doenças Cardíacas , Acidente Vascular Cerebral/diagnóstico , Acidente Vascular Cerebral/epidemiologiaRESUMO
The low portability of polygenic scores (PGSs) across global populations is a major concern that must be addressed before PGSs can be used for everyone in the clinic. Indeed, prediction accuracy has been shown to decay as a function of the genetic distance between the training and test cohorts. However, such cohorts differ not only in their genetic distance but also in their geographical distance and their data collection and assaying, conflating multiple factors. In this study, we examine the extent to which PGSs are transferable between ancestries by deriving polygenic scores for 245 curated traits from the UK Biobank data and applying them in nine ancestry groups from the same cohort. By restricting both training and testing to the UK Biobank data, we reduce the risk of environmental and genotyping confounding from using different cohorts. We define the nine ancestry groups at a sub-continental level, based on a simple, robust, and effective method that we introduce here. We then apply two different predictive methods to derive polygenic scores for all 245 phenotypes and show a systematic and dramatic reduction in portability of PGSs trained using Northwestern European individuals and applied to nine ancestry groups. These analyses demonstrate that prediction already drops off within European ancestries and reduces globally in proportion to genetic distance. Altogether, our study provides unique and robust insights into the PGS portability problem.
Assuntos
Estudos de Associação Genética/métodos , Predisposição Genética para Doença , Genética Populacional/métodos , Herança Multifatorial , Algoritmos , Alelos , Bancos de Espécimes Biológicos , Variação Genética , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Modelos Genéticos , Fenótipo , Reprodutibilidade dos Testes , Reino UnidoRESUMO
Complex traits are influenced by genetic risk factors, lifestyle, and environmental variables, so-called exposures. Some exposures, e.g., smoking or lipid levels, have common genetic modifiers identified in genome-wide association studies. Because measurements are often unfeasible, exposure polygenic risk scores (ExPRSs) offer an alternative to study the influence of exposures on various phenotypes. Here, we collected publicly available summary statistics for 28 exposures and applied four common PRS methods to generate ExPRSs in two large biobanks: the Michigan Genomics Initiative and the UK Biobank. We established ExPRSs for 27 exposures and demonstrated their applicability in phenome-wide association studies and as predictors for common chronic conditions. Especially the addition of multiple ExPRSs showed, for several chronic conditions, an improvement compared to prediction models that only included traditional, disease-focused PRSs. To facilitate follow-up studies, we share all ExPRS constructs and generated results via an online repository called ExPRSweb.
Assuntos
Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Lipídeos , Herança Multifatorial/genética , Fatores de RiscoRESUMO
Polygenic risk scores (PRSs) quantify the contribution of multiple genetic loci to an individual's likelihood of a complex trait or disease. However, existing PRSs estimate this likelihood with common genetic variants, excluding the impact of rare variants. Here, we report on a method to identify rare variants associated with outlier gene expression and integrate their impact into PRS predictions for body mass index (BMI), obesity, and bariatric surgery. Between the top and bottom 10%, we observed a 20.8% increase in risk for obesity (p = 3 × 10-14), 62.3% increase in risk for severe obesity (p = 1 × 10-6), and median 5.29 years earlier onset for bariatric surgery (p = 0.008), as a function of expression outlier-associated rare variant burden when controlling for common variant PRS. We show that these predictions were more significant than integrating the effects of rare protein-truncating variants (PTVs), observing a mean 19% increase in phenotypic variance explained with expression outlier-associated rare variants when compared with PTVs (p = 2 × 10-15). We replicated these findings by using data from the Million Veteran Program and demonstrated that PRSs across multiple traits and diseases can benefit from the inclusion of expression outlier-associated rare variants identified through population-scale transcriptome sequencing.
Assuntos
Herança Multifatorial , Obesidade , Índice de Massa Corporal , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Herança Multifatorial/genética , Obesidade/genética , Fenótipo , Fatores de RiscoRESUMO
Attempts to identify and prioritize functional DNA elements in coding and non-coding regions, particularly through use of in silico functional annotation data, continue to increase in popularity. However, specific functional roles can vary widely from one variant to another, making it challenging to summarize different aspects of variant function with a one-dimensional rating. Here we propose multi-dimensional annotation-class integrative estimation (MACIE), an unsupervised multivariate mixed-model framework capable of integrating annotations of diverse origin to assess multi-dimensional functional roles for both coding and non-coding variants. Unlike existing one-dimensional scoring methods, MACIE views variant functionality as a composite attribute encompassing multiple characteristics and estimates the joint posterior functional probabilities of each genomic position. This estimate offers more comprehensive and interpretable information in the presence of multiple aspects of functionality. Applied to a variety of independent coding and non-coding datasets, MACIE demonstrates powerful and robust performance in discriminating between functional and non-functional variants. We also show an application of MACIE to fine-mapping and heritability enrichment analysis by using the lipids GWAS summary statistics data from the European Network for Genetic and Genomic Epidemiology Consortium.
Assuntos
Genoma Humano , Estudo de Associação Genômica Ampla , Genoma Humano/genética , Estudo de Associação Genômica Ampla/métodos , Genômica , Humanos , Anotação de Sequência Molecular , Polimorfismo de Nucleotídeo Único/genética , ProbabilidadeRESUMO
The identification of genes that evolve under recessive natural selection is a long-standing goal of population genetics research that has important applications to the discovery of genes associated with disease. We found that commonly used methods to evaluate selective constraint at the gene level are highly sensitive to genes under heterozygous selection but ubiquitously fail to detect recessively evolving genes. Additionally, more sophisticated likelihood-based methods designed to detect recessivity similarly lack power for a human gene of realistic length from current population sample sizes. However, extensive simulations suggested that recessive genes may be detectable in aggregate. Here, we offer a method informed by population genetics simulations designed to detect recessive purifying selection in gene sets. Applying this to empirical gene sets produced significant enrichments for strong recessive selection in genes previously inferred to be under recessive selection in a consanguineous cohort and in genes involved in autosomal recessive monogenic disorders.
Assuntos
Frequência do Gene , Genes Recessivos , Genética Populacional , Seleção Genética , Algoritmos , Alelos , Genes Dominantes , Predisposição Genética para Doença , Variação Genética , Genética Populacional/métodos , Genômica/métodos , Genótipo , Humanos , Padrões de Herança , Funções Verossimilhança , Modelos Genéticos , Mutação , Reino UnidoRESUMO
Quantifying an individual's risk for common diseases is an important goal of precision health. The polygenic risk score (PRS), which aggregates multiple risk alleles of candidate diseases, has emerged as a standard approach for identifying high-risk individuals. Although several studies have been performed to benchmark the PRS calculation tools and assess their potential to guide future clinical applications, some issues remain to be further investigated, such as lacking (i) various simulated data with different genetic effects; (ii) evaluation of machine learning models and (iii) evaluation on multiple ancestries studies. In this study, we systematically validated and compared 13 statistical methods, 5 machine learning models and 2 ensemble models using simulated data with additive and genetic interaction models, 22 common diseases with internal training sets, 4 common diseases with external summary statistics and 3 common diseases for trans-ancestry studies in UK Biobank. The statistical methods were better in simulated data from additive models and machine learning models have edges for data that include genetic interactions. Ensemble models are generally the best choice by integrating various statistical methods. LDpred2 outperformed the other standalone tools, whereas PRS-CS, lassosum and DBSLMM showed comparable performance. We also identified that disease heritability strongly affected the predictive performance of all methods. Both the number and effect sizes of risk SNPs are important; and sample size strongly influences the performance of all methods. For the trans-ancestry studies, we found that the performance of most methods became worse when training and testing sets were from different populations.
Assuntos
Aprendizado de Máquina , Herança Multifatorial , Humanos , Fatores de Risco , Genômica , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla/métodosRESUMO
Trans-ethnic genome-wide association studies have revealed that many loci identified in European populations can be reproducible in non-European populations, indicating widespread trans-ethnic genetic similarity. However, how to leverage such shared information more efficiently in association analysis is less investigated for traits in underrepresented populations. We here propose a statistical framework, trans-ethnic genetic risk score informed gene-based association mixed model (GAMM), by hierarchically modeling single-nucleotide polymorphism effects in the target population as a function of effects of the same trait in well-studied populations. GAMM powerfully integrates genetic similarity across distinct ancestral groups to enhance power in understudied populations, as confirmed by extensive simulations. We illustrate the usefulness of GAMM via the application to 13 blood cell traits (i.e. basophil count, eosinophil count, hematocrit, hemoglobin concentration, lymphocyte count, mean corpuscular hemoglobin, mean corpuscular hemoglobin concentration, mean corpuscular volume, monocyte count, neutrophil count, platelet count, red blood cell count and total white blood cell count) in Africans of the UK Biobank (n = 3204) while utilizing genetic overlap shared in Europeans (n = 746 667) and East Asians (n = 162 255). We discovered multiple new associated genes, which had otherwise been missed by existing methods, and revealed that the trans-ethnic information indirectly contributed much to the phenotypic variance. Overall, GAMM represents a flexible and powerful statistical framework of association analysis for complex traits in underrepresented populations by integrating trans-ethnic genetic similarity across well-studied populations, and helps attenuate health inequities in current genetics research for people of minority populations.