RESUMO
Epidemiological studies reveal that marijuana increases the risk of cardiovascular disease (CVD); however, little is known about the mechanism. Δ9-tetrahydrocannabinol (Δ9-THC), the psychoactive component of marijuana, binds to cannabinoid receptor 1 (CB1/CNR1) in the vasculature and is implicated in CVD. A UK Biobank analysis found that cannabis was an risk factor for CVD. We found that marijuana smoking activated inflammatory cytokines implicated in CVD. In silico virtual screening identified genistein, a soybean isoflavone, as a putative CB1 antagonist. Human-induced pluripotent stem cell-derived endothelial cells were used to model Δ9-THC-induced inflammation and oxidative stress via NF-κB signaling. Knockdown of the CB1 receptor with siRNA, CRISPR interference, and genistein attenuated the effects of Δ9-THC. In mice, genistein blocked Δ9-THC-induced endothelial dysfunction in wire myograph, reduced atherosclerotic plaque, and had minimal penetration of the central nervous system. Genistein is a CB1 antagonist that attenuates Δ9-THC-induced atherosclerosis.
Assuntos
Cannabis , Doenças Cardiovasculares , Alucinógenos , Analgésicos , Animais , Agonistas de Receptores de Canabinoides/farmacologia , Dronabinol/farmacologia , Células Endoteliais , Genisteína/farmacologia , Genisteína/uso terapêutico , Inflamação/tratamento farmacológico , Camundongos , Receptor CB1 de Canabinoide , Receptores de CanabinoidesRESUMO
Blood cells play essential roles in human health, underpinning physiological processes such as immunity, oxygen transport, and clotting, which when perturbed cause a significant global health burden. Here we integrate data from UK Biobank and a large-scale international collaborative effort, including data for 563,085 European ancestry participants, and discover 5,106 new genetic variants independently associated with 29 blood cell phenotypes covering a range of variation impacting hematopoiesis. We holistically characterize the genetic architecture of hematopoiesis, assess the relevance of the omnigenic model to blood cell phenotypes, delineate relevant hematopoietic cell states influenced by regulatory genetic variants and gene networks, identify novel splice-altering variants mediating the associations, and assess the polygenic prediction potential for blood traits and clinical disorders at the interface of complex and Mendelian genetics. These results show the power of large-scale blood cell trait GWAS to interrogate clinically meaningful variants across a wide allelic spectrum of human variation.
Assuntos
Predisposição Genética para Doença/genética , Herança Multifatorial/genética , Feminino , Redes Reguladoras de Genes/genética , Estudo de Associação Genômica Ampla/métodos , Hematopoese/genética , Humanos , Masculino , Fenótipo , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
The melanocortin 4 receptor (MC4R) is a G protein-coupled receptor whose disruption causes obesity. We functionally characterized 61 MC4R variants identified in 0.5 million people from UK Biobank and examined their associations with body mass index (BMI) and obesity-related cardiometabolic diseases. We found that the maximal efficacy of ß-arrestin recruitment to MC4R, rather than canonical Gαs-mediated cyclic adenosine-monophosphate production, explained 88% of the variance in the association of MC4R variants with BMI. While most MC4R variants caused loss of function, a subset caused gain of function; these variants were associated with significantly lower BMI and lower odds of obesity, type 2 diabetes, and coronary artery disease. Protective associations were driven by MC4R variants exhibiting signaling bias toward ß-arrestin recruitment and increased mitogen-activated protein kinase pathway activation. Harnessing ß-arrestin-biased MC4R signaling may represent an effective strategy for weight loss and the treatment of obesity-related cardiometabolic diseases.
Assuntos
Mutação com Ganho de Função/genética , Obesidade/patologia , Receptor Tipo 4 de Melanocortina/genética , Transdução de Sinais , Adulto , Idoso , Índice de Massa Corporal , Doença da Artéria Coronariana/complicações , Doença da Artéria Coronariana/metabolismo , Doença da Artéria Coronariana/patologia , AMP Cíclico/metabolismo , Bases de Dados Factuais , Diabetes Mellitus Tipo 2/complicações , Diabetes Mellitus Tipo 2/metabolismo , Diabetes Mellitus Tipo 2/patologia , Feminino , Subunidades alfa Gs de Proteínas de Ligação ao GTP/metabolismo , Predisposição Genética para Doença , Genótipo , Humanos , Masculino , Pessoa de Meia-Idade , Obesidade/complicações , Obesidade/metabolismo , Polimorfismo de Nucleotídeo Único , Receptor Tipo 4 de Melanocortina/química , Receptor Tipo 4 de Melanocortina/metabolismo , beta-Arrestinas/metabolismoRESUMO
Severe obesity is a rapidly growing global health threat. Although often attributed to unhealthy lifestyle choices or environmental factors, obesity is known to be heritable and highly polygenic; the majority of inherited susceptibility is related to the cumulative effect of many common DNA variants. Here we derive and validate a new polygenic predictor comprised of 2.1 million common variants to quantify this susceptibility and test this predictor in more than 300,000 individuals ranging from middle age to birth. Among middle-aged adults, we observe a 13-kg gradient in weight and a 25-fold gradient in risk of severe obesity across polygenic score deciles. In a longitudinal birth cohort, we note minimal differences in birthweight across score deciles, but a significant gradient emerged in early childhood and reached 12 kg by 18 years of age. This new approach to quantify inherited susceptibility to obesity affords new opportunities for clinical prevention and mechanistic assessment.
Assuntos
Peso Corporal , Herança Multifatorial/genética , Obesidade/patologia , Adolescente , Índice de Massa Corporal , Criança , Bases de Dados Factuais , Feminino , Estudo de Associação Genômica Ampla , Humanos , Recém-Nascido , Estudos Longitudinais , Masculino , Pessoa de Meia-Idade , Obesidade/genética , Fatores de Risco , Índice de Gravidade de DoençaRESUMO
Infectious agents contribute significantly to the global burden of diseases through both acute infection and their chronic sequelae. We leveraged the UK Biobank to identify genetic loci that influence humoral immune response to multiple infections. From 45 genome-wide association studies in 9,611 participants from UK Biobank, we identified NFKB1 as a locus associated with quantitative antibody responses to multiple pathogens, including those from the herpes, retro-, and polyoma-virus families. An insertion-deletion variant thought to affect NFKB1 expression (rs28362491), was mapped as the likely causal variant and could play a key role in regulation of the immune response. Using 121 infection- and inflammation-related traits in 487,297 UK Biobank participants, we show that the deletion allele was associated with an increased risk of infection from diverse pathogens but had a protective effect against allergic disease. We propose that altered expression of NFKB1, as a result of the deletion, modulates hematopoietic pathways and likely impacts cell survival, antibody production, and inflammation. Taken together, we show that disruptions to the tightly regulated immune processes may tip the balance between exacerbated immune responses and allergy, or increased risk of infection and impaired resolution of inflammation.
Assuntos
Predisposição Genética para Doença , Hipersensibilidade , Inflamação , Humanos , Estudo de Associação Genômica Ampla , Hipersensibilidade/genética , Inflamação/genética , Subunidade p50 de NF-kappa B/genética , Biobanco do Reino UnidoRESUMO
We propose TetraHer, a method for estimating the liability heritability of binary phenotypes. TetraHer has five key features. First, it can be applied to data from complex pedigrees that contain multiple types of relationships. Second, it can correct for ascertainment of cases or controls. Third, it can accommodate covariates. Fourth, it can model the contribution of common environment. Fifth, it produces a likelihood that can be used for significance testing. We first demonstrate the validity of TetraHer on simulated data. We then use TetraHer to estimate liability heritability for 229 codes from the tenth International Classification of Diseases (ICD-10). We identify 107 codes with significant heritability (p < 0.05/229), which can be used in future analyses for investigating the genetic architecture of human diseases.
Assuntos
Estudo de Associação Genômica Ampla , Modelos Genéticos , Humanos , Linhagem , Fenótipo , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Understanding the contribution of gene-environment interactions (GxE) to complex trait variation can provide insights into disease mechanisms, explain sources of heritability, and improve genetic risk prediction. While large biobanks with genetic and deep phenotypic data hold promise for obtaining novel insights into GxE, our understanding of GxE architecture in complex traits remains limited. We introduce a method to estimate the proportion of trait variance explained by GxE (GxE heritability) and additive genetic effects (additive heritability) across the genome and within specific genomic annotations. We show that our method is accurate in simulations and computationally efficient for biobank-scale datasets. We applied our method to common array SNPs (MAF ≥1%), fifty quantitative traits, and four environmental variables (smoking, sex, age, and statin usage) in unrelated white British individuals in the UK Biobank. We found 68 trait-E pairs with significant genome-wide GxE heritability (p<0.05/200) with a ratio of GxE to additive heritability of ≈6.8% on average. Analyzing ≈8 million imputed SNPs (MAF ≥0.1%), we documented an approximate 28% increase in genome-wide GxE heritability compared to array SNPs. We partitioned GxE heritability across minor allele frequency (MAF) and local linkage disequilibrium (LD) values, revealing that, like additive allelic effects, GxE allelic effects tend to increase with decreasing MAF and LD. Analyzing GxE heritability near genes highly expressed in specific tissues, we find significant brain-specific enrichment for body mass index (BMI) and basal metabolic rate in the context of smoking and adipose-specific enrichment for waist-hip ratio (WHR) in the context of sex.
Assuntos
Interação Gene-Ambiente , Estudo de Associação Genômica Ampla , Herança Multifatorial , Polimorfismo de Nucleotídeo Único , Humanos , Herança Multifatorial/genética , Masculino , Feminino , Característica Quantitativa Herdável , Fenótipo , Modelos Genéticos , Locos de Características QuantitativasRESUMO
Sex-differential selection (SDS), which occurs when the fitness effects of alleles differ between males and females, can have profound impacts on the maintenance of genetic variation, disease risk, and other key aspects of natural populations. Because the sexes mix their autosomal genomes each generation, quantifying SDS is not possible using conventional population genetic approaches. Here, we introduce a method that exploits subtle sex differences in haplotype frequencies resulting from SDS acting in the current generation. Using data from 300K individuals in the UK Biobank, we estimate the strength of SDS throughout the genome. While only a handful of loci under SDS are individually significant, we uncover highly polygenic signals of genome-wide SDS for both viability and fecundity. Selection coefficients of [Formula: see text] may be typical. Despite its ubiquity, SDS may impose a mortality load of less than 1%. An interesting life-history tradeoff emerges: Alleles that increase viability more strongly in females than males tend to increase fecundity more strongly in males than in females. Finally, we find marginal evidence of SDS on fecundity acting on alleles affecting arm fat-free mass. Taken together, our findings connect the long-standing evidence of SDS acting on human phenotypes with its impact on the genome.
Assuntos
Fertilidade , Herança Multifatorial , Humanos , Masculino , Feminino , Herança Multifatorial/genética , Fertilidade/genética , Seleção Genética , Haplótipos , Alelos , Caracteres Sexuais , Estudo de Associação Genômica Ampla , Genoma HumanoRESUMO
CYP2A6, a genetically variable enzyme, inactivates nicotine, activates carcinogens, and metabolizes many pharmaceuticals. Variation in CYP2A6 influences smoking behaviors and tobacco-related disease risk. This phenome-wide association study examined associations between a reconstructed version of our weighted genetic risk score (wGRS) for CYP2A6 activity with diseases in the UK Biobank (N = 395 887). Causal effects of phenotypic CYP2A6 activity (measured as the nicotine metabolite ratio: 3'-hydroxycotinine/cotinine) on the phenome-wide significant (PWS) signals were then estimated in two-sample Mendelian Randomization using the wGRS as the instrument. Time-to-diagnosis age was compared between faster versus slower CYP2A6 metabolizers for the PWS signals in survival analyses. In the total sample, six PWS signals were identified: two lung cancers and four obstructive respiratory diseases PheCodes, where faster CYP2A6 activity was associated with greater disease risk (Ps < 1 × 10-6). A significant CYP2A6-by-smoking status interaction was found (Psinteraction < 0.05); in current smokers, the same six PWS signals were found as identified in the total group, whereas no PWS signals were found in former or never smokers. In the total sample and current smokers, CYP2A6 activity causal estimates on the six PWS signals were significant in Mendelian Randomization (Ps < 5 × 10-5). Additionally, faster CYP2A6 metabolizer status was associated with younger age of disease diagnosis for the six PWS signals (Ps < 5 × 10-4, in current smokers). These findings support a role for faster CYP2A6 activity as a causal risk factor for lung cancers and obstructive respiratory diseases among current smokers, and a younger onset of these diseases. This research utilized the UK Biobank Resource.
Assuntos
Neoplasias Pulmonares , Doenças Respiratórias , Humanos , Nicotina/genética , Análise da Randomização Mendeliana , Fumar/efeitos adversos , Fumar/genética , Neoplasias Pulmonares/genética , Doenças Respiratórias/complicações , Citocromo P-450 CYP2A6/genética , Citocromo P-450 CYP2A6/metabolismoRESUMO
Whole genome sequencing (WGS) from large clinically unselected cohorts provides a unique opportunity to assess the penetrance and expressivity of rare and/or known pathogenic mitochondrial variants in population. Using WGS from 179 862 clinically unselected individuals from the UK Biobank, we performed extensive single and rare variant aggregation association analyses of 15 881 mtDNA variants and 73 known pathogenic variants with 15 mitochondrial disease-relevant phenotypes. We identified 12 homoplasmic and one heteroplasmic variant (m.3243A>G) with genome-wide significant associations in our clinically unselected cohort. Heteroplasmic m.3243A>G (MAF = 0.0002, a known pathogenic variant) was associated with diabetes, deafness and heart failure and 12 homoplasmic variants increased aspartate aminotransferase levels including three low-frequency variants (MAF ~0.002 and beta~0.3 SD). Most pathogenic mitochondrial disease variants (n = 66/74) were rare in the population (<1:9000). Aggregated or single variant analysis of pathogenic variants showed low penetrance in unselected settings for the relevant phenotypes, except m.3243A>G. Multi-system disease risk and penetrance of diabetes, deafness and heart failure greatly increased with m.3243A>G level ≥ 10%. The odds ratio of these traits increased from 5.61, 12.3 and 10.1 to 25.1, 55.0 and 39.5, respectively. Diabetes risk with m.3243A>G was further influenced by type 2 diabetes genetic risk. Our study of mitochondrial variation in a large-unselected population identified novel associations and demonstrated that pathogenic mitochondrial variants have lower penetrance in clinically unselected settings. m.3243A>G was an exception at higher heteroplasmy showing a significant impact on health making it a good candidate for incidental reporting.
Assuntos
Surdez , Diabetes Mellitus Tipo 2 , Insuficiência Cardíaca , Doenças Mitocondriais , Humanos , Penetrância , Diabetes Mellitus Tipo 2/genética , DNA Mitocondrial/genética , Doenças Mitocondriais/genética , Surdez/genética , MutaçãoRESUMO
The first release of UK Biobank whole-genome sequence data contains 150,119 genomes. We present an open-source pipeline for filtering, phasing, and indexing these genomes on the cloud-based UK Biobank Research Analysis Platform. This pipeline makes it possible to apply haplotype-based methods to UK Biobank whole-genome sequence data. The pipeline uses BCFtools for marker filtering, Beagle for genotype phasing, and Tabix for VCF indexing. We used the pipeline to phase 406 million single-nucleotide variants on chromosomes 1-22 and X at a cost of £2,309. The maximum time required to process a chromosome was 2.6 days. In order to assess phase accuracy, we modified the pipeline to exclude trio parents. We observed a switch error rate of 0.0016 on chromosome 20 in the White British trio offspring. If we exclude markers with nonmajor allele frequency < 0.1% after phasing, this switch error rate decreases by 80% to 0.00032.
Assuntos
Bancos de Espécimes Biológicos , Genoma , Humanos , Cães , Animais , Genótipo , Haplótipos/genética , Polimorfismo de Nucleotídeo Único/genética , Reino Unido , Algoritmos , Análise de Sequência de DNA/métodosRESUMO
We present LDAK-GBAT, a tool for gene-based association testing using summary statistics from genome-wide association studies that is computationally efficient, produces well-calibrated p values, and is significantly more powerful than existing tools. LDAK-GBAT takes approximately 30 min to analyze imputed data (2.9M common, genic SNPs), requiring less than 10 Gb memory. It shows good control of type 1 error given an appropriate reference panel. Across 109 phenotypes (82 from the UK Biobank, 18 from the Million Veteran Program, and nine from the Psychiatric Genetics Consortium), LDAK-GBAT finds on average 19% (SE: 1%) more significant genes than the existing tool sumFREGAT-ACAT, with even greater gains in comparison with MAGMA, GCTA-fastBAT, sumFREGAT-SKAT-O, and sumFREGAT-PCA.
Assuntos
Testes Genéticos , Estudo de Associação Genômica Ampla , Fenótipo , Polimorfismo de Nucleotídeo Único/genéticaRESUMO
Genome-wide association studies (GWASs) have established the contribution of common and low-frequency variants to metabolic blood measurements in the UK Biobank (UKB). To complement existing GWAS findings, we assessed the contribution of rare protein-coding variants in relation to 355 metabolic blood measurements-including 325 predominantly lipid-related nuclear magnetic resonance (NMR)-derived blood metabolite measurements (Nightingale Health Plc) and 30 clinical blood biomarkers-using 412,393 exome sequences from four genetically diverse ancestries in the UKB. Gene-level collapsing analyses were conducted to evaluate a diverse range of rare-variant architectures for the metabolic blood measurements. Altogether, we identified significant associations (p < 1 × 10-8) for 205 distinct genes that involved 1,968 significant relationships for the Nightingale blood metabolite measurements and 331 for the clinical blood biomarkers. These include associations for rare non-synonymous variants in PLIN1 and CREB3L3 with lipid metabolite measurements and SYT7 with creatinine, among others, which may not only provide insights into novel biology but also deepen our understanding of established disease mechanisms. Of the study-wide significant clinical biomarker associations, 40% were not previously detected on analyzing coding variants in a GWAS in the same cohort, reinforcing the importance of studying rare variation to fully understand the genetic architecture of metabolic blood measurements.
Assuntos
Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Bancos de Espécimes Biológicos , Biomarcadores , Lipídeos , Reino Unido , Polimorfismo de Nucleotídeo ÚnicoRESUMO
Several breast cancer susceptibility genes have been discovered, but more are likely to exist. To identify additional breast cancer susceptibility genes, we used the founder population of Poland and performed whole-exome sequencing on 510 women with familial breast cancer and 308 control subjects. We identified a rare mutation in ATRIP (GenBank: NM_130384.3: c.1152_1155del [p.Gly385Ter]) in two women with breast cancer. At the validation phase, we found this variant in 42/16,085 unselected Polish breast cancer-affected individuals and in 11/9,285 control subjects (OR = 2.14, 95% CI = 1.13-4.28, p = 0.02). By analyzing the sequence data of the UK Biobank study participants (450,000 individuals), we identified ATRIP loss-of-function variants among 13/15,643 breast cancer-affected individuals versus 40/157,943 control subjects (OR = 3.28, 95% CI = 1.76-6.14, p < 0.001). Immunohistochemistry and functional studies showed the ATRIP c.1152_1155del variant allele is weakly expressed compared to the wild-type allele, and truncated ATRIP fails to perform its normal function to prevent replicative stress. We showed that tumors of women with breast cancer who have a germline ATRIP mutation have loss of heterozygosity at the site of ATRIP mutation and genomic homologous recombination deficiency. ATRIP is a critical partner of ATR that binds to RPA coating single-stranded DNA at sites of stalled DNA replication forks. Proper activation of ATR-ATRIP elicits a DNA damage checkpoint crucial in regulating cellular responses to DNA replication stress. Based on our observations, we conclude ATRIP is a breast cancer susceptibility gene candidate linking DNA replication stress to breast cancer.
Assuntos
Proteínas Adaptadoras de Transdução de Sinal , Neoplasias da Mama , Proteínas de Ligação a DNA , Feminino , Humanos , Proteínas Adaptadoras de Transdução de Sinal/genética , Bancos de Espécimes Biológicos , Neoplasias da Mama/genética , Proteínas de Ciclo Celular/genética , Dano ao DNA , Proteínas de Ligação a DNA/genética , Polônia/epidemiologia , Proteína de Replicação A/genética , Proteína de Replicação A/metabolismo , Reino Unido/epidemiologiaRESUMO
The advent of large-scale genome-wide association studies (GWASs) has motivated the development of statistical methods for phenotype prediction with single-nucleotide polymorphism (SNP) array data. These polygenic risk score (PRS) methods use a multiple linear regression framework to infer joint effect sizes of all genetic variants on the trait. Among the subset of PRS methods that operate on GWAS summary statistics, sparse Bayesian methods have shown competitive predictive ability. However, most existing Bayesian approaches employ Markov chain Monte Carlo (MCMC) algorithms, which are computationally inefficient and do not scale favorably to higher dimensions, for posterior inference. Here, we introduce variational inference of polygenic risk scores (VIPRS), a Bayesian summary statistics-based PRS method that utilizes variational inference techniques to approximate the posterior distribution for the effect sizes. Our experiments with 36 simulation configurations and 12 real phenotypes from the UK Biobank dataset demonstrated that VIPRS is consistently competitive with the state-of-the-art in prediction accuracy while being more than twice as fast as popular MCMC-based approaches. This performance advantage is robust across a variety of genetic architectures, SNP heritabilities, and independent GWAS cohorts. In addition to its competitive accuracy on the "White British" samples, VIPRS showed improved transferability when applied to other ethnic groups, with up to 1.7-fold increase in R2 among individuals of Nigerian ancestry for low-density lipoprotein (LDL) cholesterol. To illustrate its scalability, we applied VIPRS to a dataset of 9.6 million genetic markers, which conferred further improvements in prediction accuracy for highly polygenic traits, such as height.
Assuntos
Estudo de Associação Genômica Ampla , Herança Multifatorial , Humanos , Herança Multifatorial/genética , Estudo de Associação Genômica Ampla/métodos , Teorema de Bayes , Polimorfismo de Nucleotídeo Único/genética , Fatores de Risco , Predisposição Genética para DoençaRESUMO
Pedigree analysis showed that a large proportion of Leber hereditary optic neuropathy (LHON) family members who carry a mitochondrial risk variant never lose vision. Mitochondrial haplotype appears to be a major factor influencing the risk of vision loss from LHON. Mitochondrial variants, including m.14484T>C and m.11778G>A, have been added to gene arrays, and thus many patients and research participants are tested for LHON mutations. Analysis of the UK Biobank and Australian cohort studies found more than 1 in 1,000 people in the general population carry either the m.14484T>C or the m.11778G>A LHON variant. None of the subset of carriers examined had visual acuity at 20/200 or worse, suggesting a very low penetrance of LHON. Haplogroup analysis of m.14484T>C carriers showed a high rate of haplogroup U subclades, previously shown to have low penetrance in pedigrees. Penetrance calculations of the general population are lower than pedigree calculations, most likely because of modifier genetic factors. This Matters Arising Response paper addresses the Watson et al. (2022) Matters Arising paper, published concurrently in The American Journal of Human Genetics.
Assuntos
DNA Mitocondrial , Atrofia Óptica Hereditária de Leber , Humanos , Penetrância , DNA Mitocondrial/genética , Atrofia Óptica Hereditária de Leber/genética , Austrália/epidemiologia , Mutação/genética , LinhagemRESUMO
While extensively studied in clinical cohorts, the phenotypic consequences of 22q11.2 copy-number variants (CNVs) in the general population remain understudied. To address this gap, we performed a phenome-wide association scan in 405,324 unrelated UK Biobank (UKBB) participants by using CNV calls from genotyping array. We mapped 236 Human Phenotype Ontology terms linked to any of the 90 genes encompassed by the region to 170 UKBB traits and assessed the association between these traits and the copy-number state of 504 genotyping array probes in the region. We found significant associations for eight continuous and nine binary traits associated under different models (duplication-only, deletion-only, U-shape, and mirror models). The causal effect of the expression level of 22q11.2 genes on associated traits was assessed through transcriptome-wide Mendelian randomization (TWMR), revealing that increased expression of ARVCF increased BMI. Similarly, increased DGCR6 expression causally reduced mean platelet volume, in line with the corresponding CNV effect. Furthermore, cross-trait multivariable Mendelian randomization (MVMR) suggested a predominant role of genuine (horizontal) pleiotropy in the CNV region. Our findings show that within the general population, 22q11.2 CNVs are associated with traits previously linked to genes in the region, and duplications and deletions act upon traits in different fashions. We also showed that gain or loss of distinct segments within 22q11.2 may impact a trait under different association models. Our results have provided new insights to help further the understanding of the complex 22q11.2 region.
Assuntos
Variações do Número de Cópias de DNA , Fenômica , Humanos , Variações do Número de Cópias de DNA/genética , Fenótipo , Cromossomos Humanos Par 22RESUMO
Admixed individuals offer unique opportunities for addressing limited transferability in polygenic scores (PGSs), given the substantial trans-ancestry genetic correlation in many complex traits. However, they are rarely considered in PGS training, given the challenges in representing ancestry-matched linkage-disequilibrium reference panels for admixed individuals. Here we present inclusive PGS (iPGS), which captures ancestry-shared genetic effects by finding the exact solution for penalized regression on individual-level data and is thus naturally applicable to admixed individuals. We validate our approach in a simulation study across 33 configurations with varying heritability, polygenicity, and ancestry composition in the training set. When iPGS is applied to n = 237,055 ancestry-diverse individuals in the UK Biobank, it shows the greatest improvements in Africans by 48.9% on average across 60 quantitative traits and up to 50-fold improvements for some traits (neutrophil count, R2 = 0.058) over the baseline model trained on the same number of European individuals. When we allowed iPGS to use n = 284,661 individuals, we observed an average improvement of 60.8% for African, 11.6% for South Asian, 7.3% for non-British White, 4.8% for White British, and 17.8% for the other individuals. We further developed iPGS+refit to jointly model the ancestry-shared and -dependent genetic effects when heterogeneous genetic associations were present. For neutrophil count, for example, iPGS+refit showed the highest predictive performance in the African group (R2 = 0.115), which exceeds the best predictive performance for the White British group (R2 = 0.090 in the iPGS model), even though only 1.49% of individuals used in the iPGS training are of African ancestry. Our results indicate the power of including diverse individuals for developing more equitable PGS models.
Assuntos
Herança Multifatorial , População Branca , Humanos , Herança Multifatorial/genética , População Branca/genética , Fenótipo , População Negra/genética , Povo Asiático/genética , Estudo de Associação Genômica Ampla/métodosRESUMO
Uterine leiomyomas (ULs) are benign smooth muscle tumors that are common in premenopausal women. Somatic alterations in MED12, HMGA2, FH, genes encoding subunits of the SRCAP complex, and genes involved in Cullin 3-RING E3 ligase neddylation are mutually exclusive UL drivers. Established predisposition genes explain only partially the estimated heritability of leiomyomas. Here, we examined loss-of-function variants across 18,899 genes in a cohort of 233,614 White European women, revealing variants in four genes encoding SRCAP complex subunits (YEATS4, ZNHIT1, DMAP1, and ACTL6A) with a significant association to ULs, and YEATS4 and ZNHIT1 strikingly rank first and second, respectively. Positive mutation status was also associated with younger age at diagnosis and hysterectomy. Moderate-penetrance UL risk was largely attributed to rare non-synonymous mutations affecting the SRCAP complex. To examine this disease phenotype more closely, we set out to identify inherited mutations affecting the SRCAP complex in our in-house sample collection of Finnish individuals with ULs (n = 860). We detected one individual with an ACTL6A splice-site mutation, two individuals with a YEATS4 missense mutation, and four individuals with DMAP1 mutations: one splice-site, one nonsense, and two missense variants. These individuals had large and/or multiple ULs, were often diagnosed at an early age, and many had family history of ULs. When a somatic second hit was found, ACTL6A and DMAP1 were silenced in tumors by somatic mutation and YEATS4 by promoter hypermethylation. Decreased H2A.Z staining was observed in the tumors, providing further evidence for the pathogenic nature of the germline mutations. Our results establish inactivation of genes encoding SRCAP complex subunits as a central contributor to moderate-penetrance UL predisposition.
Assuntos
Leiomioma , Neoplasias Uterinas , Humanos , Feminino , Neoplasias Uterinas/genética , Neoplasias Uterinas/patologia , Mutação em Linhagem Germinativa , Penetrância , Análise Mutacional de DNA , Leiomioma/genética , Leiomioma/patologia , Mutação , Complexo Mediador/genética , Actinas/genética , Proteínas Cromossômicas não Histona/genética , Proteínas de Ligação a DNA/genética , Adenosina Trifosfatases/genéticaRESUMO
The ongoing release of large-scale sequencing data in the UK Biobank allows for the identification of associations between rare variants and complex traits. SAIGE-GENE+ is a valid approach to conducting set-based association tests for quantitative and binary traits. However, for ordinal categorical phenotypes, applying SAIGE-GENE+ with treating the trait as quantitative or binarizing the trait can cause inflated type I error rates or power loss. In this study, we propose a scalable and accurate method for rare-variant association tests, POLMM-GENE, in which we used a proportional odds logistic mixed model to characterize ordinal categorical phenotypes while adjusting for sample relatedness. POLMM-GENE fully utilizes the categorical nature of phenotypes and thus can well control type I error rates while remaining powerful. In the analyses of UK Biobank 450k whole-exome-sequencing data for five ordinal categorical traits, POLMM-GENE identified 54 gene-phenotype associations.