Search | VHL Search Portal

Transfer RNA genes experience exceptionally elevated mutation rates.

Thornlow, Bryan P; Hough, Josh; Roger, Jacquelyn M; Gong, Henry; Lowe, Todd M; Corbett-Detig, Russell B.

Proc Natl Acad Sci U S A ; 115(36): 8996-9001, 2018 09 04.

Article in English | MEDLINE | ID: mdl-30127029

ABSTRACT

Transfer RNAs (tRNAs) are a central component for the biological synthesis of proteins, and they are among the most highly conserved and frequently transcribed genes in all living things. Despite their clear significance for fundamental cellular processes, the forces governing tRNA evolution are poorly understood. We present evidence that transcription-associated mutagenesis and strong purifying selection are key determinants of patterns of sequence variation within and surrounding tRNA genes in humans and diverse model organisms. Remarkably, the mutation rate at broadly expressed cytosolic tRNA loci is likely between 7 and 10 times greater than the nuclear genome average. Furthermore, evolutionary analyses provide strong evidence that tRNA genes, but not their flanking sequences, experience strong purifying selection acting against this elevated mutation rate. We also find a strong correlation between tRNA expression levels and the mutation rates in their immediate flanking regions, suggesting a simple method for estimating individual tRNA gene activity. Collectively, this study illuminates the extreme competing forces in tRNA gene evolution and indicates that mutations at tRNA loci contribute disproportionately to mutational load and have unexplored fitness consequences in human populations.

Subject(s)

Arabidopsis/genetics , Genes, Helminth , Genes, Plant , Mutation , RNA, Helminth/genetics , RNA, Plant/genetics , RNA, Transfer/genetics , Animals , Drosophila melanogaster , Mice

Leveraging electronic health records and knowledge networks for Alzheimer's disease prediction and sex-specific biological insights.

Tang, Alice S; Rankin, Katherine P; Cerono, Gabriel; Miramontes, Silvia; Mills, Hunter; Roger, Jacquelyn; Zeng, Billy; Nelson, Charlotte; Soman, Karthik; Woldemariam, Sarah; Li, Yaqiao; Lee, Albert; Bove, Riley; Glymour, Maria; Aghaeepour, Nima; Oskotsky, Tomiko T; Miller, Zachary; Allen, Isabel E; Sanders, Stephan J; Baranzini, Sergio; Sirota, Marina.

Nat Aging ; 4(3): 379-395, 2024 Mar.

Article in English | MEDLINE | ID: mdl-38383858

ABSTRACT

Identification of Alzheimer's disease (AD) onset risk can facilitate interventions before irreversible disease progression. We demonstrate that electronic health records from the University of California, San Francisco, followed by knowledge networks (for example, SPOKE) allow for (1) prediction of AD onset and (2) prioritization of biological hypotheses, and (3) contextualization of sex dimorphism. We trained random forest models and predicted AD onset on a cohort of 749 individuals with AD and 250,545 controls with a mean area under the receiver operating characteristic of 0.72 (7 years prior) to 0.81 (1 day prior). We further harnessed matched cohort models to identify conditions with predictive power before AD onset. Knowledge networks highlight shared genes between multiple top predictors and AD (for example, APOE, ACTB, IL6 and INS). Genetic colocalization analysis supports AD association with hyperlipidemia at the APOE locus, as well as a stronger female AD association with osteoporosis at a locus near MS4A6A. We therefore show how clinical data can be utilized for early AD prediction and identification of personalized biological hypotheses.

Subject(s)

Alzheimer Disease , Male , Humans , Female , Alzheimer Disease/diagnosis , Electronic Health Records , Apolipoproteins E/genetics , San Francisco

Retrospective analysis of wildfire smoke exposure and birth weight outcomes in the San Francisco Bay Area of California.

Fernández, Anna Claire G; Basilio, Emilia; Benmarhnia, Tarik; Roger, Jacquelyn; Gaw, Stephanie L; Robinson, Joshua F; Padula, Amy M.

Environ Res Health ; 1(2): 025009, 2023 Jun 01.

Article in English | MEDLINE | ID: mdl-37324234

ABSTRACT

Despite the occurrence of wildfires quadrupling over the past four decades, the health effects associated with wildfire smoke exposures during pregnancy remains unknown. Particulate matter less than 2.5 µms (PM2.5) is among the major pollutants emitted in wildfire smoke. Previous studies found PM2.5 associated with lower birthweight, however, the relationship between wildfire-specific PM2.5 and birthweight is uncertain. Our study of 7923 singleton births in San Francisco between January 1, 2017 and March 12, 2020 examines associations between wildfire smoke exposure during pregnancy and birthweight. We linked daily estimates of wildfire-specific PM2.5 to maternal residence at the ZIP code level. We used linear and log-binomial regression to examine the relationship between wildfire smoke exposure by trimester and birthweight and adjusted for gestational age, maternal age, race/ethnicity, and educational attainment. We stratified by infant sex to examine potential effect modification. Exposure to wildfire-specific PM2.5 during the second trimester of pregnancy was positively associated with increased risk of large for gestational age (OR = 1.13; 95% CI: 1.03, 1.24), as was the number of days of wildfire-specific PM2.5 above 5 µg m-3 in the second trimester (OR = 1.03; 95% CI: 1.01, 1.06). We found consistent results with wildfire smoke exposure in the second trimester and increased continuous birthweight-for-gestational age z-score. Differences by infant sex were not consistent. Counter to our hypothesis, results suggest that wildfire smoke exposures are associated with increased risk for higher birthweight. We observed strongest associations during the second trimester. These investigations should be expanded to other populations exposed to wildfire smoke and aim to identify vulnerable communities. Additional research is needed to clarify the biological mechanisms in this relationship between wildfire smoke exposure and adverse birth outcomes.

Associations with spontaneous and indicated preterm birth in a densely phenotyped EHR cohort.

Costello, Jean M; Takasuka, Hannah; Roger, Jacquelyn; Yin, Ophelia; Tang, Alice; Oskotsky, Tomiko; Sirota, Marina; Capra, John A.

medRxiv ; 2023 Nov 30.

Article in English | MEDLINE | ID: mdl-38077057

ABSTRACT

Background: Preterm birth (PTB) is the leading cause of infant mortality and follows multiple biological pathways, many of which are poorly understood. Some PTBs result from medically indicated labor following complications from hypertension and/or diabetes, while many others are spontaneous with unknown causes. Previously, investigation of potential risk factors has been limited by lack of data on maternal medical history and the difficulty of classifying PTBs as indicated or spontaneous. Here, we leverage electronic health record (EHR) data (patient health information including demographics, diagnoses, and medications) and a supplemental curated pregnancy database to overcome these limitations. Novel associations may provide new insight into the pathophysiology of PTB as well as help identify individuals who would be at risk of PTB. Methods: We quantified associations between maternal diagnoses and preterm birth using logistic regression controlling for maternal age and socioeconomic factors within a University of California, San Francisco (UCSF), EHR cohort with 10,643 births ( nterm = 9692, nspontaneous_preterm = 449, nindicated_preterm = 418) and maternal pre-conception diagnosis phenotypes derived from International Classification of Diseases (ICD) 9 and 10 codes. Results: Eighteen conditions significantly and robustly (False Discovery Rate (FDR)<0.05) associated with PTBs compared to term. We discovered known (hypertension, diabetes, and chronic kidney disease) and less established (blood, cardiac, gynecological, and liver conditions) associations. Type 1 diabetes was the most significant overall association (adjusted p = 1.6×10 -14 , adjusted OR = 7 (95% CI 5, 12)), and the odds ratios for the significant phenotypes ranged from 3 to 13. We further carried out analysis stratified by spontaneous vs. indicated PTB. No phenotypes significantly associated with spontaneous PTB; however, the results for indicated PTB largely recapitulated the phenotype associations with all PTBs. Conclusions: Our study underscores the limitations of approaches that combine indicated and spontaneous births together. When combined, significant associations were almost entirely driven by indicated PTBs, although our spontaneous and indicated groups were of a similar size. Investigating the spontaneous population has the potential to reveal new pathways and understanding of the heterogeneity of PTB.

Leveraging electronic health records to identify risk factors for recurrent pregnancy loss across two medical centers: a case-control study.

Roger, Jacquelyn; Xie, Feng; Costello, Jean; Tang, Alice; Liu, Jay; Oskotsky, Tomiko; Woldemariam, Sarah; Kosti, Idit; Le, Brian; Snyder, Michael P; Giudice, Linda C; Torgerson, Dara; Shaw, Gary M; Stevenson, David K; Rajkovic, Aleksandar; Glymour, M Maria; Aghaeepour, Nima; Cakmak, Hakan; Lathi, Ruth B; Sirota, Marina.

Res Sq ; 2023 Mar 31.

Article in English | MEDLINE | ID: mdl-36993325

ABSTRACT

Recurrent pregnancy loss (RPL), defined as 2 or more pregnancy losses, affects 5-6% of ever-pregnant individuals. Approximately half of these cases have no identifiable explanation. To generate hypotheses about RPL etiologies, we implemented a case-control study comparing the history of over 1,600 diagnoses between RPL and live-birth patients, leveraging the University of California San Francisco (UCSF) and Stanford University electronic health record databases. In total, our study included 8,496 RPL (UCSF: 3,840, Stanford: 4,656) and 53,278 Control (UCSF: 17,259, Stanford: 36,019) patients. Menstrual abnormalities and infertility-associated diagnoses were significantly positively associated with RPL in both medical centers. Age-stratified analysis revealed that the majority of RPL-associated diagnoses had higher odds ratios for patients <35 compared with 35+ patients. While Stanford results were sensitive to control for healthcare utilization, UCSF results were stable across analyses with and without utilization. Intersecting significant results between medical centers was an effective filter to identify associations that are robust across center-specific utilization patterns.

Development and testing of a polygenic risk score for breast cancer aggressiveness.

Shieh, Yiwey; Roger, Jacquelyn; Yau, Christina; Wolf, Denise M; Hirst, Gillian L; Swigart, Lamorna Brown; Huntsman, Scott; Hu, Donglei; Nierenberg, Jovia L; Middha, Pooja; Heise, Rachel S; Shi, Yushu; Kachuri, Linda; Zhu, Qianqian; Yao, Song; Ambrosone, Christine B; Kwan, Marilyn L; Caan, Bette J; Witte, John S; Kushi, Lawrence H; 't Veer, Laura van; Esserman, Laura J; Ziv, Elad.

NPJ Precis Oncol ; 7(1): 42, 2023 May 15.

Article in English | MEDLINE | ID: mdl-37188791

ABSTRACT

Aggressive breast cancers portend a poor prognosis, but current polygenic risk scores (PRSs) for breast cancer do not reliably predict aggressive cancers. Aggressiveness can be effectively recapitulated using tumor gene expression profiling. Thus, we sought to develop a PRS for the risk of recurrence score weighted on proliferation (ROR-P), an established prognostic signature. Using 2363 breast cancers with tumor gene expression data and single nucleotide polymorphism (SNP) genotypes, we examined the associations between ROR-P and known breast cancer susceptibility SNPs using linear regression models. We constructed PRSs based on varying p-value thresholds and selected the optimal PRS based on model r2 in 5-fold cross-validation. We then used Cox proportional hazards regression to test the ROR-P PRS's association with breast cancer-specific survival in two independent cohorts totaling 10,196 breast cancers and 785 events. In meta-analysis of these cohorts, higher ROR-P PRS was associated with worse survival, HR per SD = 1.13 (95% CI 1.06-1.21, p = 4.0 × 10-4). The ROR-P PRS had a similar magnitude of effect on survival as a comparator PRS for estrogen receptor (ER)-negative versus positive cancer risk (PRSER-/ER+). Furthermore, its effect was minimally attenuated when adjusted for PRSER-/ER+, suggesting that the ROR-P PRS provides additional prognostic information beyond ER status. In summary, we used integrated analysis of germline SNP and tumor gene expression data to construct a PRS associated with aggressive tumor biology and worse survival. These findings could potentially enhance risk stratification for breast cancer screening and prevention.

Data-driven longitudinal characterization of neonatal health and morbidity.

De Francesco, Davide; Reiss, Jonathan D; Roger, Jacquelyn; Tang, Alice S; Chang, Alan L; Becker, Martin; Phongpreecha, Thanaphong; Espinosa, Camilo; Morin, Susanna; Berson, Eloïse; Thuraiappah, Melan; Le, Brian L; Ravindra, Neal G; Payrovnaziri, Seyedeh Neelufar; Mataraso, Samson; Kim, Yeasul; Xue, Lei; Rosenstein, Melissa G; Oskotsky, Tomiko; Maric, Ivana; Gaudilliere, Brice; Carvalho, Brendan; Bateman, Brian T; Angst, Martin S; Prince, Lawrence S; Blumenfeld, Yair J; Benitz, William E; Fuerch, Janene H; Shaw, Gary M; Sylvester, Karl G; Stevenson, David K; Sirota, Marina; Aghaeepour, Nima.

Sci Transl Med ; 15(683): eadc9854, 2023 02 15.

Article in English | MEDLINE | ID: mdl-36791208

ABSTRACT

Although prematurity is the single largest cause of death in children under 5 years of age, the current definition of prematurity, based on gestational age, lacks the precision needed for guiding care decisions. Here, we propose a longitudinal risk assessment for adverse neonatal outcomes in newborns based on a deep learning model that uses electronic health records (EHRs) to predict a wide range of outcomes over a period starting shortly before conception and ending months after birth. By linking the EHRs of the Lucile Packard Children's Hospital and the Stanford Healthcare Adult Hospital, we developed a cohort of 22,104 mother-newborn dyads delivered between 2014 and 2018. Maternal and newborn EHRs were extracted and used to train a multi-input multitask deep learning model, featuring a long short-term memory neural network, to predict 24 different neonatal outcomes. An additional cohort of 10,250 mother-newborn dyads delivered at the same Stanford Hospitals from 2019 to September 2020 was used to validate the model. Areas under the receiver operating characteristic curve at delivery exceeded 0.9 for 10 of the 24 neonatal outcomes considered and were between 0.8 and 0.9 for 7 additional outcomes. Moreover, comprehensive association analysis identified multiple known associations between various maternal and neonatal features and specific neonatal outcomes. This study used linked EHRs from more than 30,000 mother-newborn dyads and would serve as a resource for the investigation and prediction of neonatal outcomes. An interactive website is available for independent investigators to leverage this unique dataset: https://maternal-child-health-associations.shinyapps.io/shiny_app/.

Subject(s)

Infant Health , Infant, Premature , Adult , Child , Infant, Newborn , Humans , Child, Preschool , Gestational Age , Morbidity , Risk Assessment

Translational Bioinformatics to Enable Precision Medicine for All: Elevating Equity across Molecular, Clinical, and Digital Realms.

Tang, Alice; Woldemariam, Sarah; Roger, Jacquelyn; Sirota, Marina.

Yearb Med Inform ; 31(1): 106-115, 2022 Aug.

Article in English | MEDLINE | ID: mdl-36463867

ABSTRACT

OBJECTIVES: Over the past few years, challenges from the pandemic have led to an explosion of data sharing and algorithmic development efforts in the areas of molecular measurements, clinical data, and digital health. We aim to characterize and describe recent advanced computational approaches in translational bioinformatics across these domains in the context of issues or progress related to equity and inclusion. METHODS: We conducted a literature assessment of the trends and approaches in translational bioinformatics in the past few years. RESULTS: We present a review of recent computational approaches across molecular, clinical, and digital realms. We discuss applications of phenotyping, disease subtype characterization, predictive modeling, biomarker discovery, and treatment selection. We consider these methods and applications through the lens of equity and inclusion in biomedicine. CONCLUSION: Equity and inclusion should be incorporated at every step of translational bioinformatics projects, including project design, data collection, model creation, and clinical implementation. These considerations, coupled with the exciting breakthroughs in big data and machine learning, are pivotal to reach the goals of precision medicine for all.

Subject(s)

Biomedical Research , Precision Medicine , Computational Biology , Big Data , Machine Learning

The case for using mapped exonic non-duplicate reads when reporting RNA-sequencing depth: examples from pediatric cancer datasets.

Beale, Holly C; Roger, Jacquelyn M; Cattle, Matthew A; McKay, Liam T; Thompson, Drew K A; Learned, Katrina; Lyle, A Geoffrey; Kephart, Ellen T; Currie, Rob; Lam, Du Linh; Sanders, Lauren; Pfeil, Jacob; Vivian, John; Bjork, Isabel; Salama, Sofie R; Haussler, David; Vaske, Olena M.

Gigascience ; 10(3)2021 03 13.

Article in English | MEDLINE | ID: mdl-33712853

ABSTRACT

BACKGROUND: The reproducibility of gene expression measured by RNA sequencing (RNA-Seq) is dependent on the sequencing depth. While unmapped or non-exonic reads do not contribute to gene expression quantification, duplicate reads contribute to the quantification but are not informative for reproducibility. We show that mapped, exonic, non-duplicate (MEND) reads are a useful measure of reproducibility of RNA-Seq datasets used for gene expression analysis. FINDINGS: In bulk RNA-Seq datasets from 2,179 tumors in 48 cohorts, the fraction of reads that contribute to the reproducibility of gene expression analysis varies greatly. Unmapped reads constitute 1-77% of all reads (median [IQR], 3% [3-6%]); duplicate reads constitute 3-100% of mapped reads (median [IQR], 27% [13-43%]); and non-exonic reads constitute 4-97% of mapped, non-duplicate reads (median [IQR], 25% [16-37%]). MEND reads constitute 0-79% of total reads (median [IQR], 50% [30-61%]). CONCLUSIONS: Because not all reads in an RNA-Seq dataset are informative for reproducibility of gene expression measurements and the fraction of reads that are informative varies, we propose reporting a dataset's sequencing depth in MEND reads, which definitively inform the reproducibility of gene expression, rather than total, mapped, or exonic reads. We provide a Docker image containing (i) the existing required tools (RSeQC, sambamba, and samblaster) and (ii) a custom script to calculate MEND reads from RNA-Seq data files. We recommend that all RNA-Seq gene expression experiments, sensitivity studies, and depth recommendations use MEND units for sequencing depth.

Subject(s)

Neoplasms , RNA , Child , Gene Expression Profiling , High-Throughput Nucleotide Sequencing , Humans , Neoplasms/genetics , Reproducibility of Results , Sequence Analysis, RNA , Exome Sequencing

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL