RESUMO
Most loci identified by GWASs have been found in populations of European ancestry (EUR). In trans-ethnic meta-analyses for 15 hematological traits in 746,667 participants, including 184,535 non-EUR individuals, we identified 5,552 trait-variant associations at p < 5 × 10-9, including 71 novel associations not found in EUR populations. We also identified 28 additional novel variants in ancestry-specific, non-EUR meta-analyses, including an IL7 missense variant in South Asians associated with lymphocyte count in vivo and IL-7 secretion levels in vitro. Fine-mapping prioritized variants annotated as functional and generated 95% credible sets that were 30% smaller when using the trans-ethnic as opposed to the EUR-only results. We explored the clinical significance and predictive value of trans-ethnic variants in multiple populations and compared genetic architecture and the effect of natural selection on these blood phenotypes between populations. Altogether, our results for hematological traits highlight the value of a more global representation of populations in genetic studies.
Assuntos
Povo Asiático/genética , Mutação de Sentido Incorreto/genética , Polimorfismo de Nucleotídeo Único/genética , População Branca/genética , Genética , Estudo de Associação Genômica Ampla/métodos , Células HEK293 , Humanos , Interleucina-7/genética , FenótipoRESUMO
Blood cells play essential roles in human health, underpinning physiological processes such as immunity, oxygen transport, and clotting, which when perturbed cause a significant global health burden. Here we integrate data from UK Biobank and a large-scale international collaborative effort, including data for 563,085 European ancestry participants, and discover 5,106 new genetic variants independently associated with 29 blood cell phenotypes covering a range of variation impacting hematopoiesis. We holistically characterize the genetic architecture of hematopoiesis, assess the relevance of the omnigenic model to blood cell phenotypes, delineate relevant hematopoietic cell states influenced by regulatory genetic variants and gene networks, identify novel splice-altering variants mediating the associations, and assess the polygenic prediction potential for blood traits and clinical disorders at the interface of complex and Mendelian genetics. These results show the power of large-scale blood cell trait GWAS to interrogate clinically meaningful variants across a wide allelic spectrum of human variation.
Assuntos
Predisposição Genética para Doença/genética , Herança Multifatorial/genética , Feminino , Redes Reguladoras de Genes/genética , Estudo de Associação Genômica Ampla/métodos , Hematopoese/genética , Humanos , Masculino , Fenótipo , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
Inflammation biomarkers can provide valuable insight into the role of inflammatory processes in many diseases and conditions. Sequencing based analyses of such biomarkers can also serve as an exemplar of the genetic architecture of quantitative traits. To evaluate the biological insight, which can be provided by a multi-ancestry, whole-genome based association study, we performed a comprehensive analysis of 21 inflammation biomarkers from up to 38 465 individuals with whole-genome sequencing from the Trans-Omics for Precision Medicine (TOPMed) program (with varying sample size by trait, where the minimum sample size was n = 737 for MMP-1). We identified 22 distinct single-variant associations across 6 traits-E-selectin, intercellular adhesion molecule 1, interleukin-6, lipoprotein-associated phospholipase A2 activity and mass, and P-selectin-that remained significant after conditioning on previously identified associations for these inflammatory biomarkers. We further expanded upon known biomarker associations by pairing the single-variant analysis with a rare variant set-based analysis that further identified 19 significant rare variant set-based associations with 5 traits. These signals were distinct from both significant single variant association signals within TOPMed and genetic signals observed in prior studies, demonstrating the complementary value of performing both single and rare variant analyses when analyzing quantitative traits. We also confirm several previously reported signals from semi-quantitative proteomics platforms. Many of these signals demonstrate the extensive allelic heterogeneity and ancestry-differentiated variant-trait associations common for inflammation biomarkers, a characteristic we hypothesize will be increasingly observed with well-powered, large-scale analyses of complex traits.
Assuntos
Biomarcadores , Estudo de Associação Genômica Ampla , Inflamação , Medicina de Precisão , Sequenciamento Completo do Genoma , Humanos , Medicina de Precisão/métodos , Inflamação/genética , Estudo de Associação Genômica Ampla/métodos , Sequenciamento Completo do Genoma/métodos , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Predisposição Genética para Doença , Feminino , Interleucina-6/genéticaRESUMO
Malignant progression of normal tissue is typically driven by complex networks of somatic changes, including genetic mutations, copy number aberrations, epigenetic changes, and transcriptional reprogramming. To delineate aberrant multi-omic tumor features that correlate with clinical outcomes, we present a novel pathway-centric tool based on the multiple factor analysis framework called padma. Using a multi-omic consensus representation, padma quantifies and characterizes individualized pathway-specific multi-omic deviations and their underlying drivers, with respect to the sampled population. We demonstrate the utility of padma to correlate patient outcomes with complex genetic, epigenetic, and transcriptomic perturbations in clinically actionable pathways in breast and lung cancer.
Assuntos
Neoplasias , Análise Fatorial , Humanos , Neoplasias/genética , TranscriptomaRESUMO
In this study, the asymptotic distributions of the likelihood ratio test (LRT), the restricted likelihood ratio test (RLRT), the F and the sequence kernel association test (SKAT) statistics for testing an additive effect of the expected familial relatedness (FR) in a linear mixed model are examined based on an eigenvalue approach. First, the covariance structure for modeling the FR effect in a LMM is presented. Then, the multiplicity of eigenvalues for the log-likelihood and restricted log-likelihood is established under a replicate family setting and extended to a more general replicate family setting (GRFS) as well. After that, the asymptotic null distributions of LRT, RLRT, F and SKAT statistics under GRFS are derived. The asymptotic null distribution of SKAT for testing genetic rare variants is also constructed. In addition, a simple formula for sample size calculation is provided based on the restricted maximum likelihood estimate of the effect size for the expected FR. Finally, a power comparison of these test statistics on hypothesis test of the expected FR effect is made via simulation. The four test statistics are also applied to a data set from the UK Biobank.
Assuntos
Modelos Genéticos , Humanos , Funções Verossimilhança , Simulação por Computador , Modelos LinearesRESUMO
Familial relatedness (FR) and population structure (PS) are two major sources for genetic correlation. In the human population, both FR and PS can further break down into additive and dominant components to account for potential additive and dominant genetic effects. In this study, besides the classical additive genomic relationship matrix, a dominant genomic relationship matrix is introduced. A link between the additive/dominant genomic relationship matrices and the coancestry (or kinship)/double coancestry coefficients is also established. In addition, a way to separate the FR and PS correlations based on the estimates of coancestry and double coancestry coefficients from the genomic relationship matrices is proposed. A unified linear mixed model is also developed, which can account for both the additive and dominance effects of FR and PS correlations as well as their possible random interactions. Finally, this unified linear mixed model is applied to analyze two study cohorts from UK Biobank.
Assuntos
Genoma , Modelos Genéticos , Genes Dominantes , Estudos de Associação Genética , Genômica , HumanosRESUMO
BACKGROUND AND PURPOSE: Stroke is the leading cause of death and long-term disability worldwide. Previous genome-wide association studies identified 51 loci associated with stroke (mostly ischemic) and its subtypes among predominantly European populations. Using whole-genome sequencing in ancestrally diverse populations from the Trans-Omics for Precision Medicine (TOPMed) Program, we aimed to identify novel variants, especially low-frequency or ancestry-specific variants, associated with all stroke, ischemic stroke and its subtypes (large artery, cardioembolic, and small vessel), and hemorrhagic stroke and its subtypes (intracerebral and subarachnoid). METHODS: Whole-genome sequencing data were available for 6833 stroke cases and 27 116 controls, including 22 315 European, 7877 Black, 2616 Hispanic/Latino, 850 Asian, 54 Native American, and 237 other ancestry participants. In TOPMed, we performed single variant association analysis examining 40 million common variants and aggregated association analysis focusing on rare variants. We also combined TOPMed European populations with over 28 000 additional European participants from the UK BioBank genome-wide array data through meta-analysis. RESULTS: In the single variant association analysis in TOPMed, we identified one novel locus 13q33 for large artery at whole-genome-wide significance (P<5.00×10-9) and 4 novel loci at genome-wide significance (P<5.00×10-8), all of which need confirmation in independent studies. Lead variants in all 5 loci are low-frequency but are more common in non-European populations. An aggregation of synonymous rare variants within the gene C6orf26 demonstrated suggestive evidence of association for hemorrhagic stroke (P<3.11×10-6). By meta-analyzing European ancestry samples in TOPMed and UK BioBank, we replicated several previously reported stroke loci including PITX2, HDAC9, ZFHX3, and LRCH1. CONCLUSIONS: We represent the first association analysis for stroke and its subtypes using whole-genome sequencing data from ancestrally diverse populations. While our findings suggest the potential benefits of combining whole-genome sequencing data with populations of diverse genetic backgrounds to identify possible low-frequency or ancestry-specific variants, they also highlight the need to increase genome coverage and sample sizes.
Assuntos
Loci Gênicos , Predisposição Genética para Doença , Polimorfismo de Nucleotídeo Único , Medicina de Precisão , Grupos Raciais/genética , Acidente Vascular Cerebral/genética , Idoso , Idoso de 80 Anos ou mais , Feminino , Estudo de Associação Genômica Ampla , Humanos , Masculino , Pessoa de Meia-Idade , Sequenciamento Completo do GenomaRESUMO
The effects of assortative mating (AM) on estimates from genetic studies has been receiving increasing attention in recent years. We extend existing AM theory to more general models of sorting and conclude that correct theory-based AM adjustments require knowledge of complicated, unknown historical sorting patterns. We propose a simple, general-purpose approach using polygenic indexes (PGIs). Our approach can estimate the fraction of genetic variance and genetic correlation that is driven by AM. Our approach is less effective when applied to Mendelian randomization (MR) studies for two reasons: AM can induce a form of selection bias in MR studies that remains after our adjustment; and, in the MR context, the adjustment is particularly sensitive to PGI estimation error. Using data from the UK Biobank, we find that AM inflates genetic correlation estimates between health traits and education by 14% on average. Our results suggest caution in interpreting genetic correlations or MR estimates for traits subject to AM.
RESUMO
Inflammation biomarkers can provide valuable insight into the role of inflammatory processes in many diseases and conditions. Sequencing based analyses of such biomarkers can also serve as an exemplar of the genetic architecture of quantitative traits. To evaluate the biological insight, which can be provided by a multi-ancestry, whole-genome based association study, we performed a comprehensive analysis of 21 inflammation biomarkers from up to 38,465 individuals with whole-genome sequencing from the Trans-Omics for Precision Medicine (TOPMed) program. We identified 22 distinct single-variant associations across 6 traits - E-selectin, intercellular adhesion molecule 1, interleukin-6, lipoprotein-associated phospholipase A2 activity and mass, and P-selectin - that remained significant after conditioning on previously identified associations for these inflammatory biomarkers. We further expanded upon known biomarker associations by pairing the single-variant analysis with a rare variant set-based analysis that further identified 19 significant rare variant set-based associations with 5 traits. These signals were distinct from both significant single variant association signals within TOPMed and genetic signals observed in prior studies, demonstrating the complementary value of performing both single and rare variant analyses when analyzing quantitative traits. We also confirm several previously reported signals from semi-quantitative proteomics platforms. Many of these signals demonstrate the extensive allelic heterogeneity and ancestry-differentiated variant-trait associations common for inflammation biomarkers, a characteristic we hypothesize will be increasingly observed with well-powered, large-scale analyses of complex traits.
RESUMO
Prenatal socioeconomic disadvantage (SD) has been linked to DNA methylation (DNAm) in adulthood, but whether such epigenetic alterations are present at birth remains unclear. We carried out an epigenome-wide analysis of the association between several measures of individual- and area-level prenatal SD and DNAm assessed in neonatal cord blood via the Infinium EpicBeadChip among offspring born to mothers of White British (N = 455) and Pakistani (N = 493) origin in the Born in Bradford Study. Models were adjusted for mother's age, ethnicity, and education level as well as cell-type fractions and then for maternal health behaviours and neonate characteristics, and last, stratified by mother's ethnicity. P-values were corrected for multiple testing and a permutation-based approach was used to account for small cell sizes. Among all children, housing tenure (owning versus renting) as well as father's occupation (manual versus non-manual) were each associated with DNAm of one CpG site and index of multiple deprivation (IMD) was associated with DNAm of 11 CpG sites. Among children born to White British mothers, father's occupation (student or unemployed versus non-manual) was associated with DNAm of 1 CpG site and IMD with DNAm of 3 CpG sites. Among children born to Pakistani mothers, IMD was associated with DNAm of 1 CpG site. Associations were largely unchanged after further adjustment for maternal health behaviours or neonate characteristics and remained statistically significant. Our findings suggest that individual- and area-level prenatal SD may shape alterations to the neonatal epigenome, but associations vary across ethnic groups.