Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 20 de 6.753
1.
Curr Protoc ; 4(6): e1055, 2024 Jun.
Article En | MEDLINE | ID: mdl-38837690

Data harmonization involves combining data from multiple independent sources and processing the data to produce one uniform dataset. Merging separate genotypes or whole-genome sequencing datasets has been proposed as a strategy to increase the statistical power of association tests by increasing the effective sample size. However, data harmonization is not a widely adopted strategy due to the difficulties with merging data (including confounding produced by batch effects and population stratification). Detailed data harmonization protocols are scarce and are often conflicting. Moreover, data harmonization protocols that accommodate samples of admixed ancestry are practically non-existent. Existing data harmonization procedures must be modified to ensure the heterogeneous ancestry of admixed individuals is incorporated into additional downstream analyses without confounding results. Here, we propose a set of guidelines for merging multi-platform genetic data from admixed samples that can be adopted by any investigator with elementary bioinformatics experience. We have applied these guidelines to aggregate 1544 tuberculosis (TB) case-control samples from six separate in-house datasets and conducted a genome-wide association study (GWAS) of TB susceptibility. The GWAS performed on the merged dataset had improved power over analyzing the datasets individually and produced summary statistics free from bias introduced by batch effects and population stratification. © 2024 Wiley Periodicals LLC. Basic Protocol 1: Processing separate datasets comprising array genotype data Alternate Protocol 1: Processing separate datasets comprising array genotype and whole-genome sequencing data Alternate Protocol 2: Performing imputation using a local reference panel Basic Protocol 2: Merging separate datasets Basic Protocol 3: Ancestry inference using ADMIXTURE and RFMix Basic Protocol 4: Batch effect correction using pseudo-case-control comparisons.


Genome-Wide Association Study , Humans , Genome-Wide Association Study/methods , Genome-Wide Association Study/standards , Genomics/methods , Genomics/standards , Tuberculosis/genetics , Case-Control Studies , Guidelines as Topic , Genetic Predisposition to Disease
2.
HLA ; 103(6): e15543, 2024 Jun.
Article En | MEDLINE | ID: mdl-38837862

The MHC class I region contains crucial genes for the innate and adaptive immune response, playing a key role in susceptibility to many autoimmune and infectious diseases. Genome-wide association studies have identified numerous disease-associated SNPs within this region. However, these associations do not fully capture the immune-biological relevance of specific HLA alleles. HLA imputation techniques may leverage available SNP arrays by predicting allele genotypes based on the linkage disequilibrium between SNPs and specific HLA alleles. Successful imputation requires diverse and large reference panels, especially for admixed populations. This study employed a bioinformatics approach to call SNPs and HLA alleles in multi-ethnic samples from the 1000 genomes (1KG) dataset and admixed individuals from Brazil (SABE), utilising 30X whole-genome sequencing data. Using HIBAG, we created three reference panels: 1KG (n = 2504), SABE (n = 1171), and the full model (n = 3675) encompassing all samples. In extensive cross-validation of these reference panels, the multi-ethnic 1KG reference exhibited overall superior performance than the reference with only Brazilian samples. However, the best results were achieved with the full model. Additionally, we expanded the scope of imputation by developing reference panels for non-classical, MICA, MICB and HLA-H genes, previously unavailable for multi-ethnic populations. Validation in an independent Brazilian dataset showcased the superiority of our reference panels over the Michigan Imputation Server, particularly in predicting HLA-B alleles among Brazilians. Our investigations underscored the need to enhance or adapt reference panels to encompass the target population's genetic diversity, emphasising the significance of multiethnic references for accurate imputation across different populations.


Alleles , Ethnicity , Gene Frequency , Polymorphism, Single Nucleotide , Humans , Brazil , Ethnicity/genetics , HLA Antigens/genetics , Linkage Disequilibrium , Genome-Wide Association Study/methods , Genotype , Genetics, Population/methods , Histocompatibility Antigens Class I/genetics , Computational Biology/methods
4.
Clin Epigenetics ; 16(1): 75, 2024 Jun 06.
Article En | MEDLINE | ID: mdl-38845005

BACKGROUND AND AIMS: Stroke is the leading cause of adult-onset disability. Although clinical factors influence stroke outcome, there is a significant variability among individuals that may be attributed to genetics and epigenetics, including DNA methylation (DNAm). We aimed to study the association between DNAm and stroke prognosis. METHODS AND RESULTS: To that aim, we conducted a two-phase study (discovery-replication and meta-analysis) in Caucasian patients with ischemic stroke from two independent centers (BasicMar [discovery, N = 316] and St. Pau [replication, N = 92]). Functional outcome was assessed using the modified Rankin Scale (mRS) at three months after stroke, being poor outcome defined as mRS > 2. DNAm was determined using the 450K and EPIC BeadChips in whole-blood samples collected within the first 24 h. We searched for differentially methylated positions (DMPs) in 370,344 CpGs, and candidates below p-value < 10-5 were subsequently tested in the replication cohort. We then meta-analyzed DMP results from both cohorts and used them to identify differentially methylated regions (DMRs). After doing the epigenome-wide association study, we found 29 DMPs at p-value < 10-5 and one of them was replicated: cg24391982, annotated to thrombospondin-2 (THBS2) gene (p-valuediscovery = 1.54·10-6; p-valuereplication = 9.17·10-4; p-valuemeta-analysis = 6.39·10-9). Besides, four DMRs were identified in patients with poor outcome annotated to zinc finger protein 57 homolog (ZFP57), Arachidonate 12-Lipoxygenase 12S Type (ALOX12), ABI Family Member 3 (ABI3) and Allantoicase (ALLC) genes (p-value < 1·10-9 in all cases). DISCUSSION: Patients with poor outcome showed a DMP at THBS2 and four DMRs annotated to ZFP57, ALOX12, ABI3 and ALLC genes. This suggests an association between stroke outcome and DNAm, which may help identify new stroke recovery mechanisms.


DNA Methylation , Epigenesis, Genetic , Genome-Wide Association Study , Humans , DNA Methylation/genetics , Female , Prognosis , Male , Genome-Wide Association Study/methods , Aged , Middle Aged , Epigenesis, Genetic/genetics , Epigenome/genetics , Stroke/genetics , CpG Islands/genetics , Ischemic Stroke/genetics , Thrombospondins/genetics
5.
Clin Epigenetics ; 16(1): 74, 2024 Jun 06.
Article En | MEDLINE | ID: mdl-38840168

BACKGROUND: Epigenetic modifications, particularly DNA methylation (DNAm) in cord blood, are an important biological marker of how external exposures during gestation can influence the in-utero environment and subsequent offspring development. Despite the recognized importance of DNAm during gestation, comparative studies to determine the consistency of these epigenetic signals across different ethnic groups are largely absent. To address this gap, we first performed epigenome-wide association studies (EWAS) of gestational age (GA) using newborn cord blood DNAm comparatively in a white European (n = 342) and a South Asian (n = 490) birth cohort living in Canada. Then, we capitalized on established cord blood epigenetic GA clocks to examine the associations between maternal exposures, offspring characteristics and epigenetic GA, as well as GA acceleration, defined as the residual difference between epigenetic and chronological GA at birth. RESULTS: Individual EWASs confirmed 1,211 and 1,543 differentially methylated CpGs previously reported to be associated with GA, in white European and South Asian cohorts, respectively, with a similar distribution of effects. We confirmed that Bohlin's cord blood GA clock was robustly correlated with GA in white Europeans (r = 0.71; p = 6.0 × 10-54) and South Asians (r = 0.66; p = 6.9 × 10-64). In both cohorts, Bohlin's clock was positively associated with newborn weight and length and negatively associated with parity, newborn female sex, and gestational diabetes. Exclusive to South Asians, the GA clock was positively associated with the newborn ponderal index, while pre-pregnancy weight and gestational weight gain were strongly predictive of increased epigenetic GA in white Europeans. Important predictors of GA acceleration included gestational diabetes mellitus, newborn sex, and parity in both cohorts. CONCLUSIONS: These results demonstrate the consistent DNAm signatures of GA and the utility of Bohlin's GA clock across the two populations. Although the overall pattern of DNAm is similar, its connections with the mother's environment and the baby's anthropometrics can differ between the two groups. Further research is needed to understand these unique relationships.


Asian People , DNA Methylation , Epigenesis, Genetic , Fetal Blood , Gestational Age , White People , Adult , Female , Humans , Infant, Newborn , Pregnancy , Asian People/genetics , Canada , Cohort Studies , CpG Islands/genetics , DNA Methylation/genetics , Epigenesis, Genetic/genetics , Fetal Blood/chemistry , Genome-Wide Association Study/methods , White People/genetics
6.
Alzheimers Res Ther ; 16(1): 120, 2024 Jun 01.
Article En | MEDLINE | ID: mdl-38824563

BACKGROUND: Transcriptome-wide association study (TWAS) is an influential tool for identifying genes associated with complex diseases whose genetic effects are likely mediated through transcriptome. TWAS utilizes reference genetic and transcriptomic data to estimate effect sizes of genetic variants on gene expression (i.e., effect sizes of a broad sense of expression quantitative trait loci, eQTL). These estimated effect sizes are employed as variant weights in gene-based association tests, facilitating the mapping of risk genes with genome-wide association study (GWAS) data. However, most existing TWAS of Alzheimer's disease (AD) dementia are limited to studying only cis-eQTL proximal to the test gene. To overcome this limitation, we applied the Bayesian Genome-wide TWAS (BGW-TWAS) method to leveraging both cis- and trans- eQTL of brain and blood tissues, in order to enhance mapping risk genes for AD dementia. METHODS: We first applied BGW-TWAS to the Genotype-Tissue Expression (GTEx) V8 dataset to estimate cis- and trans- eQTL effect sizes of the prefrontal cortex, cortex, and whole blood tissues. Estimated eQTL effect sizes were integrated with the summary data of the most recent GWAS of AD dementia to obtain BGW-TWAS (i.e., gene-based association test) p-values of AD dementia per gene per tissue type. Then we used the aggregated Cauchy association test to combine TWAS p-values across three tissues to obtain omnibus TWAS p-values per gene. RESULTS: We identified 85 significant genes in prefrontal cortex, 82 in cortex, and 76 in whole blood that were significantly associated with AD dementia. By combining BGW-TWAS p-values across these three tissues, we obtained 141 significant risk genes including 34 genes primarily due to trans-eQTL and 35 mapped risk genes in GWAS Catalog. With these 141 significant risk genes, we detected functional clusters comprised of both known mapped GWAS risk genes of AD in GWAS Catalog and our identified TWAS risk genes by protein-protein interaction network analysis, as well as several enriched phenotypes related to AD. CONCLUSION: We applied BGW-TWAS and aggregated Cauchy test methods to integrate both cis- and trans- eQTL data of brain and blood tissues with GWAS summary data, identifying 141 TWAS risk genes of AD dementia. These identified risk genes provide novel insights into the underlying biological mechanisms of AD dementia and potential gene targets for therapeutics development.


Alzheimer Disease , Bayes Theorem , Brain , Genetic Predisposition to Disease , Genome-Wide Association Study , Quantitative Trait Loci , Transcriptome , Humans , Alzheimer Disease/genetics , Alzheimer Disease/blood , Genome-Wide Association Study/methods , Brain/metabolism , Genetic Predisposition to Disease/genetics , Quantitative Trait Loci/genetics , Polymorphism, Single Nucleotide , Gene Expression Profiling/methods
7.
Clin Respir J ; 18(6): e13775, 2024 Jun.
Article En | MEDLINE | ID: mdl-38830831

Pulmonary heart disease (PHD) involves altered structure and function of the right ventricle caused by an abnormal respiratory system that causes pulmonary hypertension. However, the association between changes in plasma proteomics and PHD remains unclear. Hence, we aimed to identify causal associations between genetically predicted plasma protein levels and PHD. Mendelian randomization was performed to test the target proteins associated with PHD. Summary statistics for the human plasma proteome and pulmonary heart disease were acquired from the UK Biobank (6038 cases and 426 977 controls) and the FinnGen study (6753 cases and 302 401 controls). Publicly available pQTLs datasets for human plasma proteins were obtained from a largescale genome-wide association study in the INTERVAL study. The results were validated using a case-control cohort. We first enrolled 3622 plasma proteins with conditionally independent genetic variants; three proteins (histo-blood group ABO system transferase, activating signal cointegration 1 complex subunit 1, and calcium/calmodulin-dependent protein kinase I [CAMK1]) were significantly associated with the risk of pulmonary heart disease in the UK Biobank cohort. Only CAMK1 was successfully replicated (odds ratio: 1.1056, 95% confidence interval: 1.019-1.095, p = 0.0029) in the FinnGen population. In addition, the level of CAMK1 in 40 patients with PHD was significantly higher (p = 0.023) than that in the control group. This work proposes that CAMK1 is associated with PHD, underscoring the importance of the calcium signaling pathway in the pathophysiology to improve therapies for PHD.


Genome-Wide Association Study , Mendelian Randomization Analysis , Proteome , Pulmonary Heart Disease , Humans , Mendelian Randomization Analysis/methods , Genome-Wide Association Study/methods , Male , Female , Proteome/metabolism , Case-Control Studies , Pulmonary Heart Disease/genetics , Pulmonary Heart Disease/blood , Pulmonary Heart Disease/epidemiology , Middle Aged , United Kingdom/epidemiology , Blood Proteins/genetics , Blood Proteins/metabolism , ABO Blood-Group System/genetics , Aged , Proteomics/methods , Adult , Polymorphism, Single Nucleotide
8.
Arthritis Res Ther ; 26(1): 114, 2024 Jun 03.
Article En | MEDLINE | ID: mdl-38831441

BACKGROUND: Gout is a prevalent manifestation of metabolic osteoarthritis induced by elevated blood uric acid levels. The purpose of this study was to investigate the mechanisms of gene expression regulation in gout disease and elucidate its pathogenesis. METHODS: The study integrated gout genome-wide association study (GWAS) data, single-cell transcriptomics (scRNA-seq), expression quantitative trait loci (eQTL), and methylation quantitative trait loci (mQTL) data for analysis, and utilized two-sample Mendelian randomization study to comprehend the causal relationship between proteins and gout. RESULTS: We identified 17 association signals for gout at unique genetic loci, including four genes related by protein-protein interaction network (PPI) analysis: TRIM46, THBS3, MTX1, and KRTCAP2. Additionally, we discerned 22 methylation sites in relation to gout. The study also found that genes such as TRIM46, MAP3K11, KRTCAP2, and TM7SF2 could potentially elevate the risk of gout. Through a Mendelian randomization (MR) analysis, we identified three proteins causally associated with gout: ADH1B, BMP1, and HIST1H3A. CONCLUSION: According to our findings, gout is linked with the expression and function of particular genes and proteins. These genes and proteins have the potential to function as novel diagnostic and therapeutic targets for gout. These discoveries shed new light on the pathological mechanisms of gout and clear the way for future research on this condition.


Genetic Predisposition to Disease , Genome-Wide Association Study , Gout , Mendelian Randomization Analysis , Quantitative Trait Loci , Single-Cell Analysis , Gout/genetics , Humans , Mendelian Randomization Analysis/methods , Genome-Wide Association Study/methods , Genetic Predisposition to Disease/genetics , Quantitative Trait Loci/genetics , Single-Cell Analysis/methods , DNA Methylation/genetics , Polymorphism, Single Nucleotide , Protein Interaction Maps/genetics , Alcohol Dehydrogenase
9.
PLoS One ; 19(6): e0298501, 2024.
Article En | MEDLINE | ID: mdl-38833463

Quantitative trait loci (QTL) denote regions of DNA whose variation is associated with variations in quantitative traits. QTL discovery is a powerful approach to understand how changes in molecular and clinical phenotypes may be related to DNA sequence changes. However, QTL discovery analysis encompasses multiple analytical steps and the processing of multiple input files, which can be laborious, error prone, and hard to reproduce if performed manually. To facilitate and automate large-scale QTL analysis, we developed the yQTL Pipeline, where the 'y' indicates the dependent quantitative variable being modeled. Prior to the association test, the pipeline supports the calculation or the direct input of pre-defined genome-wide principal components and genetic relationship matrix when applicable. User-specified covariates can also be provided. Depending on whether familial relatedness exists among the subjects, genome-wide association tests will be performed using either a linear mixed-effect model or a linear model. The options to run an ANOVA model or testing the interaction with a covariate are also available. Using the workflow management tool Nextflow, the pipeline parallelizes the analysis steps to optimize run-time and ensure results reproducibility. In addition, a user-friendly R Shiny App is developed to facilitate result visualization. It can generate Manhattan and Miami plots of phenotype traits, genotype-phenotype boxplots, and trait-QTL connection networks. We applied the yQTL Pipeline to analyze metabolomics profiles of blood serum from the New England Centenarians Study (NECS) participants. A total of 9.1M SNPs and 1,052 metabolites across 194 participants were analyzed. Using a p-value cutoff 5e-8, we found 14,983 mQTLs associated with 312 metabolites. The built-in parallelization of our pipeline reduced the run time from ~90 min to ~26 min. Visualization using the R Shiny App revealed multiple mQTLs shared across multiple metabolites. The yQTL Pipeline is available with documentation on GitHub at https://github.com/montilab/yQTLpipeline.


Genome-Wide Association Study , Quantitative Trait Loci , Workflow , Humans , Genome-Wide Association Study/methods , Software , Phenotype , Computational Biology/methods , Polymorphism, Single Nucleotide , Male
10.
Sci Adv ; 10(19): eadj1424, 2024 May 10.
Article En | MEDLINE | ID: mdl-38718126

The ongoing expansion of human genomic datasets propels therapeutic target identification; however, extracting gene-disease associations from gene annotations remains challenging. Here, we introduce Mantis-ML 2.0, a framework integrating AstraZeneca's Biological Insights Knowledge Graph and numerous tabular datasets, to assess gene-disease probabilities throughout the phenome. We use graph neural networks, capturing the graph's holistic structure, and train them on hundreds of balanced datasets via a robust semi-supervised learning framework to provide gene-disease probabilities across the human exome. Mantis-ML 2.0 incorporates natural language processing to automate disease-relevant feature selection for thousands of diseases. The enhanced models demonstrate a 6.9% average classification power boost, achieving a median receiver operating characteristic (ROC) area under curve (AUC) score of 0.90 across 5220 diseases from Human Phenotype Ontology, OpenTargets, and Genomics England. Notably, Mantis-ML 2.0 prioritizes associations from an independent UK Biobank phenome-wide association study (PheWAS), providing a stronger form of triaging and mitigating against underpowered PheWAS associations. Results are exposed through an interactive web resource.


Biological Specimen Banks , Neural Networks, Computer , Humans , Genome-Wide Association Study/methods , Phenotype , United Kingdom , Phenomics/methods , Genetic Predisposition to Disease , Genomics/methods , Databases, Genetic , Algorithms , Computational Biology/methods , UK Biobank
11.
Eur J Med Res ; 29(1): 261, 2024 May 02.
Article En | MEDLINE | ID: mdl-38698427

BACKGROUND: Prior observational research has investigated the association between dietary patterns and Alzheimer's disease (AD) risk. Nevertheless, due to constraints in past observational studies, establishing a causal link between dietary habits and AD remains challenging. METHODS: Methodology involved the utilization of extensive cohorts sourced from publicly accessible genome-wide association study (GWAS) datasets of European descent for conducting Mendelian randomization (MR) analyses. The principal analytical technique utilized was the inverse-variance weighted (IVW) method. RESULTS: The MR analysis conducted in this study found no statistically significant causal association between 20 dietary habits and the risk of AD (All p > 0.05). These results were consistent across various MR methods employed, including MR-Egger, weighted median, simple mode, and weighted mode approaches. Moreover, there was no evidence of horizontal pleiotropy detected (All p > 0.05). CONCLUSION: In this MR analysis, our finding did not provide evidence to support the causal genetic relationships between dietary habits and AD risk.


Alzheimer Disease , Genome-Wide Association Study , Mendelian Randomization Analysis , Alzheimer Disease/genetics , Alzheimer Disease/epidemiology , Alzheimer Disease/etiology , Humans , Mendelian Randomization Analysis/methods , Genome-Wide Association Study/methods , Risk Factors , Feeding Behavior/physiology , Diet/adverse effects , Polymorphism, Single Nucleotide , Genetic Predisposition to Disease
12.
BMC Bioinformatics ; 25(1): 192, 2024 May 15.
Article En | MEDLINE | ID: mdl-38750431

BACKGROUND: Researchers have long studied the regulatory processes of genes to uncover their functions. Gene regulatory network analysis is one of the popular approaches for understanding these processes, requiring accurate identification of interactions among the genes to establish the gene regulatory network. Advances in genome-wide association studies and expression quantitative trait loci studies have led to a wealth of genomic data, facilitating more accurate inference of gene-gene interactions. However, unknown confounding factors may influence these interactions, making their interpretation complicated. Mendelian randomization (MR) has emerged as a valuable tool for causal inference in genetics, addressing confounding effects by estimating causal relationships using instrumental variables. In this paper, we propose a new statistical method, MR-GGI, for accurately inferring gene-gene interactions using Mendelian randomization. RESULTS: MR-GGI applies one gene as the exposure and another as the outcome, using causal cis-single-nucleotide polymorphisms as instrumental variables in the inverse-variance weighted MR model. Through simulations, we have demonstrated MR-GGI's ability to control type 1 error and maintain statistical power despite confounding effects. MR-GGI performed the best when compared to other methods using the F1 score on the DREAM5 dataset. Additionally, when applied to yeast genomic data, MR-GGI successfully identified six clusters. Through gene ontology analysis, we have confirmed that each cluster in our study performs distinct functional roles by gathering genes with specific functions. CONCLUSION: These findings demonstrate that MR-GGI accurately inferences gene-gene interactions despite the confounding effects in real biological environments.


Mendelian Randomization Analysis , Polymorphism, Single Nucleotide , Genome-Wide Association Study/methods , Gene Regulatory Networks/genetics , Epistasis, Genetic/genetics , Quantitative Trait Loci , Humans , Saccharomyces cerevisiae/genetics
13.
Clin Epigenetics ; 16(1): 69, 2024 May 22.
Article En | MEDLINE | ID: mdl-38778395

Adverse neonatal outcomes are a prevailing risk factor for both short- and long-term mortality and morbidity in infants. Given the importance of these outcomes, refining their assessment is paramount for improving prevention and care. Here we aim to enhance the assessment of these often correlated and multifaceted neonatal outcomes. To achieve this, we employ factor analysis to identify common and unique effects and further confirm these effects using criterion-related validity testing. This validation leverages methylome-wide profiles from neonatal blood. Specifically, we investigate nine neonatal health risk variables, including gestational age, Apgar score, three indicators of body size, jaundice, birth diagnosis, maternal preeclampsia, and maternal age. The methylomic profiles used for this research capture data from nearly all 28 million methylation sites in human blood, derived from the blood spot collected from 333 neonates, within 72 h post-birth. Our factor analysis revealed two common factors, size factor, that captured the shared effects of weight, head size, height, and gestational age and disease factor capturing the orthogonal shared effects of gestational age, combined with jaundice and birth diagnosis. To minimize false positives in the validation studies, validation was limited to variables with significant cumulative association as estimated through an in-sample replication procedure. This screening resulted in that the two common factors and the unique effects for gestational age, jaundice and Apgar were further investigated with full-scale cell-type specific methylome-wide association analyses. Highly significant, cell-type specific, associations were detected for both common effect factors and for Apgar. Gene Ontology analyses revealed multiple significant biologically relevant terms for the five fully investigated neonatal health risk variables. Given the established links between adverse neonatal outcomes and both immediate and long-term health, the distinct factor effects (representing the common and unique effects of the risk variables) and their biological profiles confirmed in our work, suggest their potential role as clinical biomarkers for assessing health risks and enhancing personalized care.


DNA Methylation , Epigenome , Genome-Wide Association Study , Humans , Infant, Newborn , Female , DNA Methylation/genetics , Genome-Wide Association Study/methods , Epigenome/genetics , Pregnancy , Gestational Age , Male , Risk Factors , Infant Health , Apgar Score , Maternal Age , Adult , Epigenesis, Genetic/genetics
14.
Respir Res ; 25(1): 217, 2024 May 23.
Article En | MEDLINE | ID: mdl-38783236

BACKGROUND: Idiopathic pulmonary fibrosis (IPF) is a chronic fibrotic interstitial lung disease characterized by progressive dyspnea and decreased lung function, yet its exact etiology remains unclear. It is of great significance to discover new drug targets for IPF. METHODS: We obtained the cis-expression quantitative trait locus (cis-eQTL) of druggable genes from eQTLGen Consortium as exposure and the genome wide association study (GWAS) of IPF from the International IPF Genetics Consortium as outcomes to simulate the effects of drugs on IPF by employing mendelian randomization analysis. Then colocalization analysis was performed to calculate the probability of both cis-eQTL of druggable genes and IPF sharing a causal variant. For further validation, we conducted protein quantitative trait locus (pQTL) analysis to reaffirm our findings. RESULTS: The expression of 45 druggable genes was significantly associated with IPF susceptibility at FDR < 0.05. The expression of 23 and 15 druggable genes was significantly associated with decreased forced vital capacity (FVC) and diffusing capacity of the lungs for carbon monoxide (DLco) in IPF patients, respectively. IPF susceptibility and two significant genes (IL-7 and ABCB2) were likely to share a causal variant. The results of the pQTL analysis demonstrated that high levels of IL-7 in plasma are associated with a reduced risk of IPF (OR = 0.67, 95%CI: 0.47-0.97). CONCLUSION: IL-7 stands out as the most promising potential drug target to mitigate the risk of IPF. Our study not only sheds light on potential drug targets but also provides a direction for future drug development in IPF.


Genome-Wide Association Study , Idiopathic Pulmonary Fibrosis , Mendelian Randomization Analysis , Humans , Idiopathic Pulmonary Fibrosis/genetics , Idiopathic Pulmonary Fibrosis/drug therapy , Idiopathic Pulmonary Fibrosis/diagnosis , Mendelian Randomization Analysis/methods , Genome-Wide Association Study/methods , Quantitative Trait Loci , Genetic Predisposition to Disease , Female , Molecular Targeted Therapy/methods , Male
15.
PLoS Genet ; 20(5): e1011273, 2024 May.
Article En | MEDLINE | ID: mdl-38728357

Existing imaging genetics studies have been mostly limited in scope by using imaging-derived phenotypes defined by human experts. Here, leveraging new breakthroughs in self-supervised deep representation learning, we propose a new approach, image-based genome-wide association study (iGWAS), for identifying genetic factors associated with phenotypes discovered from medical images using contrastive learning. Using retinal fundus photos, our model extracts a 128-dimensional vector representing features of the retina as phenotypes. After training the model on 40,000 images from the EyePACS dataset, we generated phenotypes from 130,329 images of 65,629 British White participants in the UK Biobank. We conducted GWAS on these phenotypes and identified 14 loci with genome-wide significance (p<5×10-8 and intersection of hits from left and right eyes). We also did GWAS on the retina color, the average color of the center region of the retinal fundus photos. The GWAS of retina colors identified 34 loci, 7 are overlapping with GWAS of raw image phenotype. Our results establish the feasibility of this new framework of genomic study based on self-supervised phenotyping of medical images.


Fundus Oculi , Genome-Wide Association Study , Phenotype , Retina , Humans , Genome-Wide Association Study/methods , Retina/diagnostic imaging , Male , Polymorphism, Single Nucleotide , Female , Image Processing, Computer-Assisted/methods
16.
PLoS Genet ; 20(5): e1011245, 2024 May.
Article En | MEDLINE | ID: mdl-38728360

Joint analysis of multiple correlated phenotypes for genome-wide association studies (GWAS) can identify and interpret pleiotropic loci which are essential to understand pleiotropy in diseases and complex traits. Meanwhile, constructing a network based on associations between phenotypes and genotypes provides a new insight to analyze multiple phenotypes, which can explore whether phenotypes and genotypes might be related to each other at a higher level of cellular and organismal organization. In this paper, we first develop a bipartite signed network by linking phenotypes and genotypes into a Genotype and Phenotype Network (GPN). The GPN can be constructed by a mixture of quantitative and qualitative phenotypes and is applicable to binary phenotypes with extremely unbalanced case-control ratios in large-scale biobank datasets. We then apply a powerful community detection method to partition phenotypes into disjoint network modules based on GPN. Finally, we jointly test the association between multiple phenotypes in a network module and a single nucleotide polymorphism (SNP). Simulations and analyses of 72 complex traits in the UK Biobank show that multiple phenotype association tests based on network modules detected by GPN are much more powerful than those without considering network modules. The newly proposed GPN provides a new insight to investigate the genetic architecture among different types of phenotypes. Multiple phenotypes association studies based on GPN are improved by incorporating the genetic information into the phenotype clustering. Notably, it might broaden the understanding of genetic architecture that exists between diagnoses, genes, and pleiotropy.


Genome-Wide Association Study , Genotype , Phenotype , Polymorphism, Single Nucleotide , Humans , Genome-Wide Association Study/methods , Polymorphism, Single Nucleotide/genetics , Models, Genetic , Genetic Pleiotropy , Genetic Association Studies/methods , Quantitative Trait Loci/genetics
17.
Clin Epigenetics ; 16(1): 70, 2024 May 27.
Article En | MEDLINE | ID: mdl-38802969

BACKGROUND: Obesity is a global public health concern linked to chronic diseases such as cardiovascular disease and type 2 diabetes (T2D). Emerging evidence suggests that epigenetic modifications, particularly DNA methylation, may contribute to obesity. However, the molecular mechanism underlying the longitudinal change of BMI has not been well-explored, especially in East Asian populations. METHODS: This study performed a longitudinal epigenome-wide association analysis of DNA methylation to uncover novel loci associated with BMI change in 533 individuals across two Chinese cohorts with repeated DNA methylation and BMI measurements over four years. RESULTS: We identified three novel CpG sites (cg14671384, cg25540824, and cg10848724) significantly associated with BMI change. Two of the identified CpG sites were located in regions previously associated with body shape and basal metabolic rate. Annotation of the top 20 BMI change-associated CpGs revealed strong connections to obesity and T2D. Notably, these CpGs exhibited active regulatory roles and located in genes with high expression in the liver and digestive tract, suggesting a potential regulatory pathway from genome to phenotypes of energy metabolism and absorption via DNA methylation. Cross-sectional and longitudinal EWAS comparisons indicated different mechanisms between CpGs related to BMI and BMI change. CONCLUSION: This study enhances our understanding of the epigenetic dynamics underlying BMI change and emphasizes the value of longitudinal analyses in deciphering the complex interplay between epigenetics and obesity.


Asian People , Body Mass Index , CpG Islands , DNA Methylation , Epigenesis, Genetic , Genome-Wide Association Study , Obesity , Humans , DNA Methylation/genetics , Longitudinal Studies , Male , Female , CpG Islands/genetics , Obesity/genetics , Middle Aged , Genome-Wide Association Study/methods , Epigenesis, Genetic/genetics , Asian People/genetics , Diabetes Mellitus, Type 2/genetics , Adult , Epigenome/genetics , China , Cross-Sectional Studies , East Asian People
18.
Nat Commun ; 15(1): 4433, 2024 May 29.
Article En | MEDLINE | ID: mdl-38811555

Dominance heritability in complex traits has received increasing recognition. However, most polygenic score (PGS) approaches do not incorporate non-additive effects. Here, we present GenoBoost, a flexible PGS modeling framework capable of considering both additive and non-additive effects, specifically focusing on genetic dominance. Building on statistical boosting theory, we derive provably optimal GenoBoost scores and provide its efficient implementation for analyzing large-scale cohorts. We benchmark it against seven commonly used PGS methods and demonstrate its competitive predictive performance. GenoBoost is ranked the best for four traits and second-best for three traits among twelve tested disease outcomes in UK Biobank. We reveal that GenoBoost improves prediction for autoimmune diseases by incorporating non-additive effects localized in the MHC locus and, more broadly, works best in less polygenic traits. We further demonstrate that GenoBoost can infer the mode of genetic inheritance without requiring prior knowledge. For example, GenoBoost finds non-zero genetic dominance effects for 602 of 900 selected genetic variants, resulting in 2.5% improvements in predicting psoriasis cases. Lastly, we show that GenoBoost can prioritize genetic loci with genetic dominance not previously reported in the GWAS catalog. Our results highlight the increased accuracy and biological insights from incorporating non-additive effects in PGS models.


Genome-Wide Association Study , Models, Genetic , Multifactorial Inheritance , Multifactorial Inheritance/genetics , Humans , Genome-Wide Association Study/methods , Polymorphism, Single Nucleotide , Genetic Predisposition to Disease , Autoimmune Diseases/genetics , Genes, Dominant , Psoriasis/genetics
19.
Nat Comput Sci ; 4(5): 360-366, 2024 May.
Article En | MEDLINE | ID: mdl-38745108

For many genome-wide association studies, imputing genotypes from a haplotype reference panel is a necessary step. Over the past 15 years, reference panels have become larger and more diverse, leading to improvements in imputation accuracy. However, the latest generation of reference panels is subject to restrictions on data sharing due to concerns about privacy, limiting their usefulness for genotype imputation. In this context, here we propose RESHAPE, a method that employs a recombination Poisson process on a reference panel to simulate the genomes of hypothetical descendants after multiple generations. This data transformation helps to protect against re-identification threats and preserves data attributes, such as linkage disequilibrium patterns and, to some degree, identity-by-descent sharing, allowing for genotype imputation. Our experiments on gold-standard datasets show that simulated descendants up to eight generations can serve as reference panels without substantially reducing genotype imputation accuracy.


Genome-Wide Association Study , Genotype , Humans , Genome-Wide Association Study/methods , Linkage Disequilibrium , Haplotypes/genetics , Polymorphism, Single Nucleotide/genetics , Information Dissemination/methods , Computer Simulation , Models, Genetic , Algorithms , Genome, Human/genetics , Poisson Distribution
20.
BMC Geriatr ; 24(1): 469, 2024 May 29.
Article En | MEDLINE | ID: mdl-38811889

BACKGROUND: Recent genetic evidence supports a causal role for sarcopenia in osteoarthritis, which may be mediated by the occurrence of obesity or changes in circulating inflammatory protein levels. Here, we leveraged publicly available genome-wide association study data to investigate the intrinsic causal relationship between sarcopenia, obesity, circulating inflammatory protein levels, and osteoarthritis. METHODS: In this study, we used Mendelian randomization analyses to explore the causal relationship between sarcopenia phenotypes (Appendicular lean mass [ALM], Low hand-grip strength [LHG], and usual walking pace [UWP]) and osteoarthritis (Knee osteoarthritis [KOA], and Hip osteoarthritis [HOA]). Univariable Mendelian randomization (UVMR) analyses were performed using the inverse variance weighted (IVW) method, MR-Egger, weighted median method, simple mode, and weighted mode, with the IVW method being the primary analytical technique. Subsequently, the independent causal effects of sarcopenia phenotype on osteoarthritis were investigated using multivariate Mendelian randomization (MVMR) analysis. To further explore the mechanisms involved, obesity and circulating inflammatory proteins were introduced as the mediator variables, and a two-step Mendelian randomization analysis was used to explore the mediating effects of obesity and circulating inflammatory proteins between ALM and KOA as well as the mediating proportions. RESULTS: UVMR analysis showed a causal relationship between ALM, LHG, UWP and KOA [(OR = 1.151, 95% CI: 1.087-1.218, P = 1.19 × 10-6, PFDR = 7.14 × 10-6) (OR = 1.215, 95% CI: 1.004-1.470; P = 0.046, PFDR = 0.055) (OR = 0.503, 95% CI: 0.292-0.867; P = 0.013, PFDR = 0.027)], and a causal relationship between ALM, UWP and HOA [(OR = 1.181, 95% CI: 1.103-1.265, P = 2.05 × 10-6, PFDR = 6.15 × 10-6) (OR = 0.438, 95% CI: 0.226-0.849, P = 0.014, PFDR = 0.022)]. In the MVMR analyses adjusting for confounders (body mass index, insomnia, sedentary behavior, and bone density), causal relationships were observed between ALM, LHG, UWP and KOA [(ALM: OR = 1.323, 95%CI: 1.224- 1.431, P = 2.07 × 10-12), (LHG: OR = 1.161, 95%CI: 1.044- 1.292, P = 0.006), (UWP: OR = 0.511, 95%CI: 0.290- 0.899, P = 0.020)], and between ALM and HOA (ALM: OR = 1.245, 95%CI: 1.149- 1.348, P = 7.65 × 10-8). In a two-step MR analysis, obesity was identified to play a potential mediating role in ALM and KOA (proportion mediated: 5.9%). CONCLUSIONS: The results of this study suggest that decreased appendicular lean mass, grip strength, and walking speed increase the risk of KOA and decreased appendicular lean mass increases the risk of HOA in patients with sarcopenia in a European population. Obesity plays a mediator role in the occurrence of KOA due to appendicular lean body mass reduction.


Genome-Wide Association Study , Mendelian Randomization Analysis , Obesity , Sarcopenia , Humans , Mendelian Randomization Analysis/methods , Sarcopenia/epidemiology , Sarcopenia/genetics , Sarcopenia/diagnosis , Obesity/epidemiology , Obesity/genetics , Obesity/complications , Genome-Wide Association Study/methods , Osteoarthritis, Hip/genetics , Osteoarthritis, Hip/epidemiology , Osteoarthritis, Hip/diagnosis , Aged , Hand Strength/physiology , Male , Osteoarthritis, Knee/genetics , Osteoarthritis, Knee/epidemiology , Osteoarthritis, Knee/diagnosis , Female , Osteoarthritis/genetics , Osteoarthritis/epidemiology , Multivariate Analysis , Phenotype
...