RESUMEN
Primary open-angle glaucoma (POAG), the leading cause of irreversible blindness worldwide, disproportionately affects individuals of African ancestry. We conducted a genome-wide association study (GWAS) for POAG in 11,275 individuals of African ancestry (6,003 cases; 5,272 controls). We detected 46 risk loci associated with POAG at genome-wide significance. Replication and post-GWAS analyses, including functionally informed fine-mapping, multiple trait co-localization, and in silico validation, implicated two previously undescribed variants (rs1666698 mapping to DBF4P2; rs34957764 mapping to ROCK1P1) and one previously associated variant (rs11824032 mapping to ARHGEF12) as likely causal. For individuals of African ancestry, a polygenic risk score (PRS) for POAG from our mega-analysis (African ancestry individuals) outperformed a PRS from summary statistics of a much larger GWAS derived from European ancestry individuals. This study quantifies the genetic architecture similarities and differences between African and non-African ancestry populations for this blinding disease.
Asunto(s)
Estudio de Asociación del Genoma Completo , Glaucoma de Ángulo Abierto , Humanos , Predisposición Genética a la Enfermedad , Glaucoma de Ángulo Abierto/genética , Población Negra/genética , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
Metabolic dysfunction-associated Fatty Liver Disease (MAFLD) has emerged as one of the leading cardiometabolic diseases. Friend of GATA2 (FOG2) is a transcriptional co-regulator that has been shown to regulate hepatic lipid metabolism and accumulation. Using meta-analysis from several different biobank datasets, we identified a coding variant of FOG2 (rs28374544, A1969G, S657G) predominantly found in individuals of African ancestry (minor allele frequency~20%), which is associated with liver failure/cirrhosis phenotype and liver injury. To gain insight into potential pathways associated with this variant, we interrogated a previously published genomics dataset of 38 human induced pluripotent stem cell (iPSCs) lines differentiated into hepatocytes (iHeps). Using Differential Gene Expression Analysis and Gene Set Enrichment Analysis, we identified the mTORC1 pathway as differentially regulated between iHeps from individuals with and without the variant. Transient lipid-based transfections were performed on the human hepatoma cell line (Huh7) using wild-type FOG2 and FOG2S657G and demonstrated that FOG2S657G increased mTORC1 signaling, de novo lipogenesis, and cellular triglyceride synthesis and mass. In addition, we observed a significant downregulation of oxidative phosphorylation in FOG2S657G cells in fatty acid-loaded cells but not untreated cells, suggesting that FOG2S657G may also reduce fatty acid to promote lipid accumulation. Taken together, our multi-pronged approach suggests a model whereby the FOG2S657G may promote MAFLD through mTORC1 activation, increased de novo lipogenesis, and lipid accumulation. Our results provide insights into the molecular mechanisms by which FOG2S657G may affect the complex molecular landscape underlying MAFLD.
Asunto(s)
Proteínas de Unión al ADN , Diana Mecanicista del Complejo 1 de la Rapamicina , Transducción de Señal , Factores de Transcripción , Humanos , Diana Mecanicista del Complejo 1 de la Rapamicina/genética , Diana Mecanicista del Complejo 1 de la Rapamicina/metabolismo , Transducción de Señal/genética , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Proteínas de Unión al ADN/genética , Proteínas de Unión al ADN/metabolismo , Hepatocitos/metabolismo , Polimorfismo de Nucleótido Simple , Células Madre Pluripotentes Inducidas/metabolismo , Metabolismo de los Lípidos/genética , Línea Celular Tumoral , Genotipo , Hepatopatías/genética , Hepatopatías/metabolismo , Hepatopatías/patologíaRESUMEN
Background: Endometriosis affects 10% of reproductive-age women, and yet, it goes undiagnosed for 3.6 years on average after symptoms onset. Despite large GWAS meta-analyses (N > 750,000), only a few dozen causal loci have been identified. We hypothesized that the challenges in identifying causal genes for endometriosis stem from heterogeneity across clinical and biological factors underlying endometriosis diagnosis. Methods: We extracted known endometriosis risk factors, symptoms, and concomitant conditions from the Penn Medicine Biobank (PMBB) and performed unsupervised spectral clustering on 4,078 women with endometriosis. The 5 clusters were characterized by utilizing additional electronic health record (EHR) variables, such as endometriosis-related comorbidities and confirmed surgical phenotypes. From four EHR-linked genetic datasets, PMBB, eMERGE, AOU, and UKBB, we extracted lead variants and tag variants 39 known endometriosis loci for association testing. We meta-analyzed ancestry-stratified case/control tests for each locus and cluster in addition to a positive control (Total N endometriosis cases = 10,108). Results: We have designated the five subtype clusters as pain comorbidities, uterine disorders, pregnancy complications, cardiometabolic comorbidities, and EHR-asymptomatic based on enriched features from each group. One locus, RNLS , surpassed the genome-wide significant threshold in the positive control. Thirteen more loci reached a Bonferroni threshold of 1.3 x 10 -3 (0.05 / 39) in the positive control. The cluster-stratified tests yielded more significant associations than the positive control for anywhere from 5 to 15 loci depending on the cluster. Bonferroni significant loci were identified for four out of five clusters, including WNT4 and GREB1 for the uterine disorders cluster, RNLS for the cardiometabolic cluster, FSHB for the pregnancy complications cluster, and SYNE1 and CDKN2B-AS1 for the EHR-asymptomatic cluster. This study enhances our understanding of the clinical presentation patterns of endometriosis subtypes, showcasing the innovative approach employed to investigate this complex disease.
RESUMEN
Endometriosis is a complex and heterogeneous condition affecting 10% of reproductive-age women, and yet, it often goes undiagnosed for several years. Limited observed heritability (7%) of large genetic association studies may be attributable to underlying heterogeneity of disease mechanisms. Therefore, we conducted this study to investigate genetic associations across sub-phenotypes of endometriosis. We performed unsupervised clustering of 4,078 women with endometriosis based on known endometriosis risk factors, symptoms, and concomitant conditions. The clusters were characterized by examining electronic health record (EHR) data and comprehensive chart reviews. We then performed genetic association for each cluster with 39 endometriosis-associated loci (Total Nendometriosis cases = 12,350). We identified five sub-phenotype clusters: (1) pain comorbidities, (2) uterine disorders, (3) pregnancy complications, (4) cardiometabolic comorbidities, and (5) HER-asymptomatic. Bonferroni significant loci included PDLIM5 for the cluster 1, GREB1 for cluster 2, WNT4 for cluster 3, RNLS for cluster 4, and ABO for cluster 5. The difference in associations between the groups suggests complex and varied genetic mechanisms of endometriosis and its symptoms. This study enhances our understanding of the clinical patterns of endometriosis sub-phenotypes, showcasing the innovative approach employed to investigate this complex disease.
RESUMEN
Primary open-angle glaucoma (POAG), a leading cause of irreversible blindness globally, shows disparity in prevalence and manifestations across ancestries. We perform meta-analysis across 15 biobanks (of the Global Biobank Meta-analysis Initiative) (n = 1,487,441: cases = 26,848) and merge with previous multi-ancestry studies, with the combined dataset representing the largest and most diverse POAG study to date (n = 1,478,037: cases = 46,325) and identify 17 novel significant loci, 5 of which were ancestry specific. Gene-enrichment and transcriptome-wide association analyses implicate vascular and cancer genes, a fifth of which are primary ciliary related. We perform an extensive statistical analysis of SIX6 and CDKN2B-AS1 loci in human GTEx data and across large electronic health records showing interaction between SIX6 gene and causal variants in the chr9p21.3 locus, with expression effect on CDKN2A/B. Our results suggest that some POAG risk variants may be ancestry specific, sex specific, or both, and support the contribution of genes involved in programmed cell death in POAG pathogenesis.
Asunto(s)
Predisposición Genética a la Enfermedad , Glaucoma de Ángulo Abierto , Masculino , Femenino , Humanos , Predisposición Genética a la Enfermedad/genética , Glaucoma de Ángulo Abierto/genética , Glaucoma de Ángulo Abierto/epidemiología , Polimorfismo de Nucleótido Simple , Proliferación Celular , BiologíaRESUMEN
One of the justifiable criticisms of human genetic studies is the underrepresentation of participants from diverse populations. Lack of inclusion must be addressed at-scale to identify causal disease factors and understand the genetic causes of health disparities. We present genome-wide associations for 2068 traits from 635,969 participants in the Department of Veterans Affairs Million Veteran Program, a longitudinal study of diverse United States Veterans. Systematic analysis revealed 13,672 genomic risk loci; 1608 were only significant after including non-European populations. Fine-mapping identified causal variants at 6318 signals across 613 traits. One-third (n = 2069) were identified in participants from non-European populations. This reveals a broadly similar genetic architecture across populations, highlights genetic insights gained from underrepresented groups, and presents an extensive atlas of genetic associations.
Asunto(s)
Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Sitios de Carácter Cuantitativo , Veteranos , Humanos , Masculino , Variación Genética , Estudios Longitudinales , Polimorfismo de Nucleótido Simple , Estados Unidos , United States Department of Veterans Affairs , FemeninoRESUMEN
Breast density, the amount of fibroglandular versus fatty tissue in the breast, is a strong breast cancer risk factor. Understanding genetic factors associated with breast density may help in clarifying mechanisms by which breast density increases cancer risk. To date, 50 genetic loci have been associated with breast density, however, these studies were performed among predominantly European ancestry populations. We utilized a cohort of women aged 40-85 years who underwent screening mammography and had genetic information available from the Penn Medicine BioBank to conduct a Genome-Wide Association Study (GWAS) of breast density among 1323 women of African ancestry. For each mammogram, the publicly available "LIBRA" software was used to quantify dense area and area percent density. We identified 34 significant loci associated with dense area and area percent density, with the strongest signals in GACAT3, CTNNA3, HSD17B6, UGDH, TAAR8, ARHGAP10, BOD1L2, and NR3C2. There was significant overlap between previously identified breast cancer SNPs and SNPs identified as associated with breast density. Our results highlight the importance of breast density GWAS among diverse populations, including African ancestry populations. They may provide novel insights into genetic factors associated with breast density and help in elucidating mechanisms by which density increases breast cancer risk.
RESUMEN
Modeling with longitudinal electronic health record (EHR) data proves challenging given the high dimensionality, redundancy, and noise captured in EHR. In order to improve precision medicine strategies and identify predictors of disease risk in advance, evaluating meaningful patient disease trajectories is essential. In this study, we develop the algorithm DiseasE Trajectory fEature extraCTion (DETECT) for feature extraction and trajectory generation in high-throughput temporal EHR data. This algorithm can 1) simulate longitudinal individual-level EHR data, specified to user parameters of scale, complexity, and noise and 2) use a convergent relative risk framework to test intermediate codes occurring between specified index code(s) and outcome code(s) to determine if they are predictive features of the outcome. Temporal range can be specified to investigate predictors occurring during a specific period of time prior to onset of the outcome. We benchmarked our method on simulated data and generated real-world disease trajectories using DETECT in a cohort of 145,575 individuals diagnosed with hypertension in Penn Medicine EHR for severe cardiometabolic outcomes.
RESUMEN
Genome-wide association studies (GWAS) have underrepresented individuals from non-European populations, impeding progress in characterizing the genetic architecture and consequences of health and disease traits. To address this, we present a population-stratified phenome-wide GWAS followed by a multi-population meta-analysis for 2,068 traits derived from electronic health records of 635,969 participants in the Million Veteran Program (MVP), a longitudinal cohort study of diverse U.S. Veterans genetically similar to the respective African (121,177), Admixed American (59,048), East Asian (6,702), and European (449,042) superpopulations defined by the 1000 Genomes Project. We identified 38,270 independent variants associating with one or more traits at experiment-wide P<4.6×10-11 significance; fine-mapping 6,318 signals identified from 613 traits to single-variant resolution. Among these, a third (2,069) of the associations were found only among participants genetically similar to non-European reference populations, demonstrating the importance of expanding diversity in genetic studies. Our work provides a comprehensive atlas of phenome-wide genetic associations for future studies dissecting the architecture of complex traits in diverse populations.
RESUMEN
There are currently >1.3 million human -omics samples that are publicly available. This valuable resource remains acutely underused because discovering particular samples from this ever-growing data collection remains a significant challenge. The major impediment is that sample attributes are routinely described using varied terminologies written in unstructured natural language. We propose a natural-language-processing-based machine learning approach (NLP-ML) to infer tissue and cell-type annotations for genomics samples based only on their free-text metadata. NLP-ML works by creating numerical representations of sample descriptions and using these representations as features in a supervised learning classifier that predicts tissue/cell-type terms. Our approach significantly outperforms an advanced graph-based reasoning annotation method (MetaSRA) and a baseline exact string matching method (TAGGER). Model similarities between related tissues demonstrate that NLP-ML models capture biologically-meaningful signals in text. Additionally, these models correctly classify tissue-associated biological processes and diseases based on their text descriptions alone. NLP-ML models are nearly as accurate as models based on gene-expression profiles in predicting sample tissue annotations but have the distinct capability to classify samples irrespective of the genomics experiment type based on their text metadata. Python NLP-ML prediction code and trained tissue models are available at https://github.com/krishnanlab/txt2onto .
Asunto(s)
Metadatos , Procesamiento de Lenguaje Natural , Humanos , Aprendizaje Automático , Genómica , LenguajeRESUMEN
The Penn Medicine BioBank (PMBB) is an electronic health record (EHR)-linked biobank at the University of Pennsylvania (Penn Medicine). A large variety of health-related information, ranging from diagnosis codes to laboratory measurements, imaging data and lifestyle information, is integrated with genomic and biomarker data in the PMBB to facilitate discoveries and translational science. To date, 174,712 participants have been enrolled into the PMBB, including approximately 30% of participants of non-European ancestry, making it one of the most diverse medical biobanks. There is a median of seven years of longitudinal data in the EHR available on participants, who also consent to permission to recontact. Herein, we describe the operations and infrastructure of the PMBB, summarize the phenotypic architecture of the enrolled participants, and use body mass index (BMI) as a proof-of-concept quantitative phenotype for PheWAS, LabWAS, and GWAS. The major representation of African-American participants in the PMBB addresses the essential need to expand the diversity in genetic and translational research. There is a critical need for a "medical biobank consortium" to facilitate replication, increase power for rare phenotypes and variants, and promote harmonized collaboration to optimize the potential for biological discovery and precision medicine.
RESUMEN
Biobanks facilitate genome-wide association studies (GWASs), which have mapped genomic loci across a range of human diseases and traits. However, most biobanks are primarily composed of individuals of European ancestry. We introduce the Global Biobank Meta-analysis Initiative (GBMI)-a collaborative network of 23 biobanks from 4 continents representing more than 2.2 million consented individuals with genetic data linked to electronic health records. GBMI meta-analyzes summary statistics from GWASs generated using harmonized genotypes and phenotypes from member biobanks for 14 exemplar diseases and endpoints. This strategy validates that GWASs conducted in diverse biobanks can be integrated despite heterogeneity in case definitions, recruitment strategies, and baseline characteristics. This collaborative effort improves GWAS power for diseases, benefits understudied diseases, and improves risk prediction while also enabling the nomination of disease genes and drug candidates by incorporating gene and protein expression data and providing insight into the underlying biology of human diseases and traits.