ABSTRACT
Venous thromboembolism (VTE) is a significant contributor to morbidity and mortality, with large disparities in incidence rates between Black and White Americans. Polygenic risk scores (PRSs) limited to variants discovered in genome-wide association studies in European-ancestry samples can identify European-ancestry individuals at high risk of VTE. However, there is limited evidence on whether high-dimensional PRS constructed using more sophisticated methods and more diverse training data can enhance the predictive ability and their utility across diverse populations. We developed PRSs for VTE using summary statistics from the International Network against Venous Thrombosis (INVENT) consortium genome-wide association studies meta-analyses of European- (71 771 cases and 1 059 740 controls) and African-ancestry samples (7482 cases and 129 975 controls). We used LDpred2 and PRS-CSx to construct ancestry-specific and multi-ancestry PRSs and evaluated their performance in an independent European- (6781 cases and 103 016 controls) and African-ancestry sample (1385 cases and 12 569 controls). Multi-ancestry PRSs with weights tuned in European-ancestry samples slightly outperformed ancestry-specific PRSs in European-ancestry test samples (e.g. the area under the receiver operating curve [AUC] was 0.609 for PRS-CSx_combinedEUR and 0.608 for PRS-CSxEUR [P = 0.00029]). Multi-ancestry PRSs with weights tuned in African-ancestry samples also outperformed ancestry-specific PRSs in African-ancestry test samples (PRS-CSxAFR: AUC = 0.58, PRS-CSx_combined AFR: AUC = 0.59), although this difference was not statistically significant (P = 0.34). The highest fifth percentile of the best-performing PRS was associated with 1.9-fold and 1.68-fold increased risk for VTE among European- and African-ancestry subjects, respectively, relative to those in the middle stratum. These findings suggest that the multi-ancestry PRS might be used to improve performance across diverse populations to identify individuals at highest risk for VTE.
Subject(s)
Genetic Risk Score , Venous Thromboembolism , Female , Humans , Male , Black or African American/genetics , Case-Control Studies , Genetic Predisposition to Disease , Genome-Wide Association Study , Polymorphism, Single Nucleotide , Venous Thromboembolism/genetics , Venous Thromboembolism/epidemiology , White/geneticsABSTRACT
We describe the Mitochondrial and Nuclear rRNA fragment database (MINRbase), a knowledge repository aimed at facilitating the study of ribosomal RNA-derived fragments (rRFs). MINRbase provides interactive access to the profiles of 130 238 expressed rRFs arising from the four human nuclear rRNAs (18S, 5.8S, 28S, 5S), two mitochondrial rRNAs (12S, 16S) or four spacers of 45S pre-rRNA. We compiled these profiles by analyzing 11 632 datasets, including the GEUVADIS and The Cancer Genome Atlas (TCGA) repositories. MINRbase offers a user-friendly interface that lets researchers issue complex queries based on one or more criteria, such as parental rRNA identity, nucleotide sequence, rRF minimum abundance and metadata keywords (e.g. tissue type, disease). A 'summary' page for each rRF provides a granular breakdown of its expression by tissue type, disease, sex, ancestry and other variables; it also allows users to create publication-ready plots at the click of a button. MINRbase has already allowed us to generate support for three novel observations: the internal spacers of 45S are prolific producers of abundant rRFs; many abundant rRFs straddle the known boundaries of rRNAs; rRF production is regimented and depends on 'personal attributes' (sex, ancestry) and 'context' (tissue type, tissue state, disease). MINRbase is available at https://cm.jefferson.edu/MINRbase/.
Subject(s)
Databases, Nucleic Acid , RNA, Mitochondrial , RNA, Ribosomal , Humans , Base Sequence , Mitochondria/genetics , Ribosomes , RNA, Mitochondrial/genetics , RNA, Ribosomal/geneticsABSTRACT
BACKGROUND: MicroRNA isoforms (isomiRs), tRNA-derived fragments (tRFs), and rRNA-derived fragments (rRFs) represent most of the small non-coding RNAs (sncRNAs) found in cells. Members of these three classes modulate messenger RNA (mRNA) and protein abundance and are dysregulated in diseases. Experimental studies to date have assumed that the subcellular distribution of these molecules is well-understood, independent of cell type, and the same for all isoforms of a sncRNA. RESULTS: We tested these assumptions by investigating the subcellular distribution of isomiRs, tRFs, and rRFs in biological replicates from three cell lines from the same tissue and same-sex donors that model the same cancer subtype. In each cell line, we profiled the isomiRs, tRFs, and rRFs in the nucleus, cytoplasm, whole mitochondrion (MT), mitoplast (MP), and whole cell. Using a rigorous mathematical model we developed, we accounted for cross-fraction contamination and technical errors and adjusted the measured abundances accordingly. Analyses of the adjusted abundances show that isomiRs, tRFs, and rRFs exhibit complex patterns of subcellular distributions. These patterns depend on each sncRNA's exact sequence and the cell type. Even in the same cell line, isoforms of the same sncRNA whose sequences differ by a few nucleotides (nts) can have different subcellular distributions. CONCLUSIONS: SncRNAs with similar sequences have different subcellular distributions within and across cell lines, suggesting that each isoform could have a different function. Future computational and experimental studies of isomiRs, tRFs, and rRFs will need to distinguish among each molecule's various isoforms and account for differences in each isoform's subcellular distribution in the cell line at hand. While the findings add to a growing body of evidence that isomiRs, tRFs, rRFs, tRNAs, and rRNAs follow complex intracellular trafficking rules, further investigation is needed to exclude alternative explanations for the observed subcellular distribution of sncRNAs.
Subject(s)
MicroRNAs , RNA, Ribosomal , RNA, Transfer , MicroRNAs/genetics , MicroRNAs/metabolism , RNA, Transfer/genetics , RNA, Transfer/metabolism , Humans , RNA, Ribosomal/genetics , RNA, Ribosomal/metabolism , Base Sequence , RNA Isoforms/genetics , Cell Line, Tumor , Cell LineABSTRACT
BACKGROUND: The advent of next generation sequencing (NGS) has allowed the discovery of short and long non-coding RNAs (ncRNAs) in an unbiased manner using reverse genetics approaches, enabling the discovery of multiple categories of ncRNAs and characterization of the way their expression is regulated. We previously showed that the identities and abundances of microRNA isoforms (isomiRs) and transfer RNA-derived fragments (tRFs) are tightly regulated, and that they depend on a person's sex and population origin, as well as on tissue type, tissue state, and disease type. Here, we characterize the regulation and distribution of fragments derived from ribosomal RNAs (rRNAs). rRNAs form a group that includes four (5S, 5.8S, 18S, 28S) rRNAs encoded by the human nuclear genome and two (12S, 16S) by the mitochondrial genome. rRNAs constitute the most abundant RNA type in eukaryotic cells. RESULTS: We analyzed rRNA-derived fragments (rRFs) across 434 transcriptomic datasets obtained from lymphoblastoid cell lines (LCLs) derived from healthy participants of the 1000 Genomes Project. The 434 datasets represent five human populations and both sexes. We examined each of the six rRNAs and their respective rRFs, and did so separately for each population and sex. Our analysis shows that all six rRNAs produce rRFs with unique identities, normalized abundances, and lengths. The rRFs arise from the 5'-end (5'-rRFs), the interior (i-rRFs), and the 3'-end (3'-rRFs) or straddle the 5' or 3' terminus of the parental rRNA (x-rRFs). Notably, a large number of rRFs are produced in a population-specific or sex-specific manner. Preliminary evidence suggests that rRF production is also tissue-dependent. Of note, we find that rRF production is not affected by the identity of the processing laboratory or the library preparation kit. CONCLUSIONS: Our findings suggest that rRFs are produced in a regimented manner by currently unknown processes that are influenced by both ubiquitous as well as population-specific and sex-specific factors. The properties of rRFs mirror the previously reported properties of isomiRs and tRFs and have implications for the study of homeostasis and disease.
Subject(s)
MicroRNAs/genetics , RNA, Ribosomal/genetics , Aged , Cell Line , Female , Humans , Male , MicroRNAs/metabolism , Middle Aged , RNA, Ribosomal/metabolism , Sex Factors , TranscriptomeABSTRACT
Polycystic ovary syndrome (PCOS) is a prevalent endocrine disorder in women, often accompanied by various symptoms including significant pain, such as dysmenorrhea, abdominal, and pelvic pain, which remains underexplored. This retrospective study examines electronic health records (EHR) data to assess the prevalence of pain in women with PCOS. Conducted on May 29, 2024, using data from 120 Health Care Organizations within the TriNetX Global Network, the study involved 76,859,666 women from diverse racial backgrounds. The analysis focused on the prevalence of pain among women with PCOS, both overall and in those prescribed PCOS-related medications. Relative risk ratios (RR) were calculated for future health outcomes and stratified by self-reported race. The study found that 19.21% of women with PCOS experienced pain, with the highest prevalence among Black or African American (32.11%) and White (30.75%) populations. Both the PCOS and PCOS and Pain cohorts exhibited increased RR for various health conditions, with significant differences noted across racial groups for infertility, ovarian cysts, obesity, and respiratory diseases. Additionally, women with PCOS who were treated with PCOS-related medications showed a decrease in pain diagnoses following treatment. In conclusion, this study highlights the critical need to address pain in the diagnosis and management of PCOS due to its significant impact on patient health outcomes. Impact Statement: Insufficient data exist on the prevalence of pain in women with a PCOS diagnosis, and its associations with future health outcomes. Among, 444,348 women with PCOS in the TriNextX Global Network, 19.21% have dysmenorrhea, abdominal, and pelvic pain. Women with PCOS and Pain are at increased risk for developing ovarian cysts, infertility, T2D, and fatty liver disease and are at further risk when stratified by self-reported race groups.
ABSTRACT
CONTEXT: Patients with PCOS are at high risk of depression, anxiety, and metabolic syndrome (MetSyn), a key predictor of cardiovascular disease. The impact of depression and/or anxiety on MetSyn is unknown in this population. OBJECTIVE: To compare the risk of developing MetSyn in patients with PCOS with and without a history of depression and/or anxiety. DESIGN: Retrospective longitudinal cohort study (2008-2022) with median follow-up of 7 years. SETTING: Tertiary care ambulatory practice. PATIENTS OR OTHER PARTICIPANTS: Patients with hyperandrogenic PCOS and at least 2 evaluations for MetSyn ≥3 years apart (n=321). INTERVENTION(S): N/A. MAIN OUTCOME MEASURE(S): The primary outcome was risk of developing MetSyn. We hypothesized that this risk would be higher with a history of depression and/or anxiety. RESULTS: At the first visit, 33.0% had a history of depression and/or anxiety, with a third prescribed antidepressants or anxiolytics. Depression and/or anxiety increased risk of developing MetSyn during the study period (adjusted hazard ratio [aHR] 1.45, 95% CI 1.02-2.06, p=0.04) with an incidence of MetSyn of 75.3 compared to 47.6 cases per 100 person-years among those without (p=0.002). This was primarily driven by depression (aHR 1.56, 95% CI 1.10-2.20, p=0.01). CONCLUSIONS: Patients with PCOS and depression and/or anxiety have a high risk of developing MetSyn, with a stronger association between depression and MetSyn. Our findings highlight the urgent need for guideline-directed screening for depression and anxiety at time of diagnosis of PCOS as well as screening at subsequent visits to facilitate risk stratification for metabolic monitoring and early intervention in this high-risk group.
ABSTRACT
Metabolic dysfunction-associated Fatty Liver Disease (MAFLD) has emerged as one of the leading cardiometabolic diseases. Friend of GATA2 (FOG2) is a transcriptional co-regulator that has been shown to regulate hepatic lipid metabolism and accumulation. Using meta-analysis from several different biobank datasets, we identified a coding variant of FOG2 (rs28374544, A1969G, S657G) predominantly found in individuals of African ancestry (minor allele frequency~20%), which is associated with liver failure/cirrhosis phenotype and liver injury. To gain insight into potential pathways associated with this variant, we interrogated a previously published genomics dataset of 38 human induced pluripotent stem cell (iPSCs) lines differentiated into hepatocytes (iHeps). Using Differential Gene Expression Analysis and Gene Set Enrichment Analysis, we identified the mTORC1 pathway as differentially regulated between iHeps from individuals with and without the variant. Transient lipid-based transfections were performed on the human hepatoma cell line (Huh7) using wild-type FOG2 and FOG2S657G and demonstrated that FOG2S657G increased mTORC1 signaling, de novo lipogenesis, and cellular triglyceride synthesis and mass. In addition, we observed a significant downregulation of oxidative phosphorylation in FOG2S657G cells in fatty acid-loaded cells but not untreated cells, suggesting that FOG2S657G may also reduce fatty acid to promote lipid accumulation. Taken together, our multi-pronged approach suggests a model whereby the FOG2S657G may promote MAFLD through mTORC1 activation, increased de novo lipogenesis, and lipid accumulation. Our results provide insights into the molecular mechanisms by which FOG2S657G may affect the complex molecular landscape underlying MAFLD.
Subject(s)
DNA-Binding Proteins , Mechanistic Target of Rapamycin Complex 1 , Signal Transduction , Transcription Factors , Humans , Mechanistic Target of Rapamycin Complex 1/genetics , Mechanistic Target of Rapamycin Complex 1/metabolism , Signal Transduction/genetics , Transcription Factors/genetics , Transcription Factors/metabolism , DNA-Binding Proteins/genetics , DNA-Binding Proteins/metabolism , Hepatocytes/metabolism , Polymorphism, Single Nucleotide , Induced Pluripotent Stem Cells/metabolism , Lipid Metabolism/genetics , Cell Line, Tumor , Genotype , Liver Diseases/genetics , Liver Diseases/metabolism , Liver Diseases/pathologyABSTRACT
Venous thromboembolism (VTE) is a significant contributor to morbidity and mortality, with large disparities in incidence rates between Black and White Americans. Polygenic risk scores (PRSs) limited to variants discovered in genome-wide association studies in European-ancestry samples can identify European-ancestry individuals at high risk of VTE. However, there is limited evidence on whether high-dimensional PRS constructed using more sophisticated methods and more diverse training data can enhance the predictive ability and their utility across diverse populations. We developed PRSs for VTE using summary statistics from the International Network against Venous Thrombosis (INVENT) consortium GWAS meta-analyses of European- (71,771 cases and 1,059,740 controls) and African-ancestry samples (7,482 cases and 129,975 controls). We used LDpred2 and PRSCSx to construct ancestry-specific and multi-ancestry PRSs and evaluated their performance in an independent European- (6,261 cases and 88,238 controls) and African-ancestry sample (1,385 cases and 12,569 controls). Multi-ancestry PRSs with weights tuned in European- and African-ancestry samples, respectively, outperformed ancestry-specific PRSs in European- (PRSCSXEUR: AUC=0.61 (0.60, 0.61), PRSCSX_combinedEUR: AUC=0.61 (0.60, 0.62)) and African-ancestry test samples (PRSCSXAFR: AUC=0.58 (0.57, 0.6), PRSCSX_combined AFR: AUC=0.59 (0.57, 0.60)). The highest fifth percentile of the best-performing PRS was associated with 1.9-fold and 1.68-fold increased risk for VTE among European- and African-ancestry subjects, respectively, relative to those in the middle stratum. These findings suggest that the multi-ancestry PRS may be used to identify individuals at highest risk for VTE and provide guidance for the most effective treatment strategy across diverse populations.
ABSTRACT
Transfer RNA-derived fragments (tRFs) are noncoding RNAs that arise from either mature transfer RNAs (tRNAs) or their precursors. One important category of tRFs comprises the tRNA halves, which are generated through cleavage at the anticodon. A given tRNA typically gives rise to several co-expressed 5'-tRNA halves (5'-tRHs) that differ in the location of their 3' ends. These 5'-tRHs, even though distinct, have traditionally been treated as indistinguishable from one another due to their near-identical sequences and lengths. We focused on co-expressed 5'-tRHs that arise from the same tRNA and systematically examined their exact sequences and abundances across 10 different human tissues. To this end, we manually curated and analyzed several hundred human RNA-seq datasets from NCBI's Sequence Run Archive (SRA). We grouped datasets from the same tissue into their own collection and examined each group separately. We found that a given tRNA produces different groups of co-expressed 5'-tRHs in different tissues, different cell lines, and different diseases. Importantly, the co-expressed 5'-tRHs differ in their sequences, absolute abundances, and relative abundances, even among tRNAs with near-identical sequences from the same isodecoder or isoacceptor group. The findings suggest that co-expressed 5'-tRHs that are produced from the same tRNA or closely related tRNAs have distinct, context-dependent roles. Moreover, our analyses show that cell lines modeling the same tissue type and disease may not be interchangeable when it comes to experimenting with tRFs.
ABSTRACT
Genome-wide association studies (GWAS) have yielded significant insights into the genetic architecture of myocardial infarction (MI), although studies in non-European populations are still lacking. Saudi Arabian cohorts offer an opportunity to discover novel genetic variants impacting disease risk due to a high rate of consanguinity. Genome-wide genotyping (GWG), imputation and GWAS followed by meta-analysis were performed based on two independent Saudi Arabian studies comprising 3950 MI patients and 2324 non-MI controls. Meta-analyses were then performed with these two Saudi MI studies and the CardioGRAMplusC4D and UK BioBank GWAS as controls. Meta-analyses of the two Saudi MI studies resulted in 17 SNPs with genome-wide significance. Meta-analyses of all 4 studies revealed 66 loci with genome-wide significance levels of p < 5 × 10-8. All of these variants, except rs2764203, have previously been reported as MI-associated loci or to have high linkage disequilibrium with known loci. One SNP association in Shisa family member 5 (SHISA5) (rs11707229) was evident at a much higher frequency in the Saudi MI populations (> 12% MAF). In conclusion, our results replicated many MI associations, whereas in Saudi-only GWAS (meta-analyses), several new loci were implicated that require future validation and functional analyses.
Subject(s)
Genome-Wide Association Study , Myocardial Infarction , Humans , Genome-Wide Association Study/methods , Saudi Arabia , Genotype , Myocardial Infarction/genetics , Polymorphism, Single Nucleotide , Genetic Predisposition to DiseaseABSTRACT
Genome-wide association studies (GWAS) are being conducted at an unprecedented rate in population-based cohorts and have increased our understanding of the pathophysiology of many complex diseases. Regardless of the context, the practical utility of this information ultimately depends upon the quality of the data used for statistical analyses. Quality control (QC) procedures for GWAS are constantly evolving. Here, we enumerate some of the challenges in QC of genotyped GWAS data and describe the approaches involving genotype imputation of a sample dataset along with post-imputation quality assurance, thereby minimizing potential bias and error in GWAS results. We discuss common issues associated with QC of the GWAS data (genotyped and imputed), including data file formats, software packages for data manipulation and analysis, sex chromosome anomalies, sample identity, sample relatedness, population substructure, batch effects, and marker quality. We provide detailed guidelines along with a sample dataset to suggest current best practices and discuss areas of ongoing and future research. © 2022 Wiley Periodicals LLC.
Subject(s)
Genome-Wide Association Study , Research Design , Humans , Quality Control , Genotype , Sex Chromosome AberrationsABSTRACT
Nonalcoholic fatty liver disease is common and highly heritable. Genetic studies of hepatic fat have not sufficiently addressed non-European and rare variants. In a medical biobank, we quantitate hepatic fat from clinical computed tomography (CT) scans via deep learning in 10,283 participants with whole-exome sequences available. We conduct exome-wide associations of single variants and rare predicted loss-of-function (pLOF) variants with CT-based hepatic fat and perform cross-modality replication in the UK Biobank (UKB) by linking whole-exome sequences to MRI-based hepatic fat. We confirm single variants previously associated with hepatic fat and identify several additional variants, including two (FGD5 H600Y and CITED2 S198_G199del) that replicated in UKB. A burden of rare pLOF variants in LMF2 is associated with increased hepatic fat and replicates in UKB. Quantitative phenotypes generated from clinical imaging studies and intersected with genomic data in medical biobanks have the potential to identify molecular pathways associated with human traits and disease.
Subject(s)
Exome , Non-alcoholic Fatty Liver Disease , Humans , Exome/genetics , Biological Specimen Banks , Phenotype , Tomography, X-Ray Computed , Non-alcoholic Fatty Liver Disease/diagnostic imaging , Non-alcoholic Fatty Liver Disease/genetics , Repressor Proteins/genetics , Trans-Activators/geneticsABSTRACT
We sought to determine whether commercial quantitative polymerase chain reaction (qPCR) methods are capable of distinguishing isomiRs: variants of mature microRNAs (miRNAs) with sequence endpoint differences. We used two commercially available miRNA qPCR methods to quantify miR-21-5p in both synthetic and real cell contexts. We find that although these miRNA qPCR methods possess high sensitivity for specific sequences, they also pick up background signals from closely related isomiRs, which influences the reliable quantification of individual isomiRs. We conclude that these methods do not possess the requisite specificity for reliable isomiR quantification.