ABSTRACT
Bridging the gap between genetic variations, environmental determinants, and phenotypic outcomes is critical for supporting clinical diagnosis and understanding mechanisms of diseases. It requires integrating open data at a global scale. The Monarch Initiative advances these goals by developing open ontologies, semantic data models, and knowledge graphs for translational research. The Monarch App is an integrated platform combining data about genes, phenotypes, and diseases across species. Monarch's APIs enable access to carefully curated datasets and advanced analysis tools that support the understanding and diagnosis of disease for diverse applications such as variant prioritization, deep phenotyping, and patient profile-matching. We have migrated our system into a scalable, cloud-based infrastructure; simplified Monarch's data ingestion and knowledge graph integration systems; enhanced data mapping and integration standards; and developed a new user interface with novel search and graph navigation features. Furthermore, we advanced Monarch's analytic tools by developing a customized plugin for OpenAI's ChatGPT to increase the reliability of its responses about phenotypic data, allowing us to interrogate the knowledge in the Monarch graph using state-of-the-art Large Language Models. The resources of the Monarch Initiative can be found at monarchinitiative.org and its corresponding code repository at github.com/monarch-initiative/monarch-app.
Subject(s)
Databases, Factual , Disease , Genes , Phenotype , Humans , Internet , Databases, Factual/standards , Software , Genes/genetics , Disease/geneticsABSTRACT
High expression of the oncoprotein Myc has been linked to poor outcome in human tumors. Although MYC gene amplification and translocations have been observed, this can explain Myc overexpression in only a subset of human tumors. Myc expression is in part controlled by its protein stability, which can be regulated by phosphorylation at threonine 58 (T58) and serine 62 (S62). We now report that Myc protein stability is increased in a number of breast cancer cell lines and this correlates with increased phosphorylation at S62 and decreased phosphorylation at T58. Moreover, we find this same shift in phosphorylation in primary breast cancers. The signaling cascade that controls phosphorylation at T58 and S62 is coordinated by the scaffold protein Axin1. We therefore examined Axin1 in breast cancer and report decreased AXIN1 expression and a shift in the ratio of expression of two naturally occurring AXIN1 splice variants. We demonstrate that this contributes to increased Myc protein stability, altered phosphorylation at S62 and T58, and increased oncogenic activity of Myc in breast cancer. Thus, our results reveal an important mode of Myc activation in human breast cancer and a mechanism contributing to Myc deregulation involving unique insight into inactivation of the Axin1 tumor suppressor in breast cancer.
Subject(s)
Axin Protein/metabolism , Breast Neoplasms/metabolism , Proto-Oncogene Proteins c-myc/metabolism , Alternative Splicing/genetics , Animals , Axin Protein/genetics , Breast Neoplasms/genetics , Breast Neoplasms/pathology , Cell Line, Tumor , Female , Gene Expression Regulation, Neoplastic , Humans , Mice , Phosphorylation , Phosphoserine/metabolism , Protein StabilityABSTRACT
BACKGROUND: The cause and symptoms of long COVID are poorly understood. It is challenging to predict whether a given COVID-19 patient will develop long COVID in the future. METHODS: We used electronic health record (EHR) data from the National COVID Cohort Collaborative to predict the incidence of long COVID. We trained two machine learning (ML) models - logistic regression (LR) and random forest (RF). Features used to train predictors included symptoms and drugs ordered during acute infection, measures of COVID-19 treatment, pre-COVID comorbidities, and demographic information. We assigned the 'long COVID' label to patients diagnosed with the U09.9 ICD10-CM code. The cohorts included patients with (a) EHRs reported from data partners using U09.9 ICD10-CM code and (b) at least one EHR in each feature category. We analysed three cohorts: all patients (n = 2,190,579; diagnosed with long COVID = 17,036), inpatients (149,319; 3,295), and outpatients (2,041,260; 13,741). FINDINGS: LR and RF models yielded median AUROC of 0.76 and 0.75, respectively. Ablation study revealed that drugs had the highest influence on the prediction task. The SHAP method identified age, gender, cough, fatigue, albuterol, obesity, diabetes, and chronic lung disease as explanatory features. Models trained on data from one N3C partner and tested on data from the other partners had average AUROC of 0.75. INTERPRETATION: ML-based classification using EHR information from the acute infection period is effective in predicting long COVID. SHAP methods identified important features for prediction. Cross-site analysis demonstrated the generalizability of the proposed methodology. FUNDING: NCATS U24 TR002306, NCATS UL1 TR003015, Axle Informatics Subcontract: NCATS-P00438-B, NIH/NIDDK/OD, PSR2015-1720GVALE_01, G43C22001320007, and Director, Office of Science, Office of Basic Energy Sciences of the U.S. Department of Energy Contract No. DE-AC02-05CH11231.