RESUMEN
Rheumatoid arthritis is a prototypical autoimmune disease that causes joint inflammation and destruction1. There is currently no cure for rheumatoid arthritis, and the effectiveness of treatments varies across patients, suggesting an undefined pathogenic diversity1,2. Here, to deconstruct the cell states and pathways that characterize this pathogenic heterogeneity, we profiled the full spectrum of cells in inflamed synovium from patients with rheumatoid arthritis. We used multi-modal single-cell RNA-sequencing and surface protein data coupled with histology of synovial tissue from 79 donors to build single-cell atlas of rheumatoid arthritis synovial tissue that includes more than 314,000 cells. We stratified tissues into six groups, referred to as cell-type abundance phenotypes (CTAPs), each characterized by selectively enriched cell states. These CTAPs demonstrate the diversity of synovial inflammation in rheumatoid arthritis, ranging from samples enriched for T and B cells to those largely lacking lymphocytes. Disease-relevant cell states, cytokines, risk genes, histology and serology metrics are associated with particular CTAPs. CTAPs are dynamic and can predict treatment response, highlighting the clinical utility of classifying rheumatoid arthritis synovial phenotypes. This comprehensive atlas and molecular, tissue-based stratification of rheumatoid arthritis synovial tissue reveal new insights into rheumatoid arthritis pathology and heterogeneity that could inform novel targeted treatments.
Asunto(s)
Artritis Reumatoide , Humanos , Artritis Reumatoide/complicaciones , Artritis Reumatoide/genética , Artritis Reumatoide/inmunología , Artritis Reumatoide/patología , Citocinas/metabolismo , Inflamación/complicaciones , Inflamación/genética , Inflamación/inmunología , Inflamación/patología , Membrana Sinovial/patología , Linfocitos T/inmunología , Linfocitos B/inmunología , Predisposición Genética a la Enfermedad/genética , Fenotipo , Análisis de Expresión Génica de una Sola CélulaRESUMEN
Understanding protein function and developing molecular therapies require deciphering the cell types in which proteins act as well as the interactions between proteins. However, modeling protein interactions across biological contexts remains challenging for existing algorithms. Here we introduce PINNACLE, a geometric deep learning approach that generates context-aware protein representations. Leveraging a multiorgan single-cell atlas, PINNACLE learns on contextualized protein interaction networks to produce 394,760 protein representations from 156 cell type contexts across 24 tissues. PINNACLE's embedding space reflects cellular and tissue organization, enabling zero-shot retrieval of the tissue hierarchy. Pretrained protein representations can be adapted for downstream tasks: enhancing 3D structure-based representations for resolving immuno-oncological protein interactions, and investigating drugs' effects across cell types. PINNACLE outperforms state-of-the-art models in nominating therapeutic targets for rheumatoid arthritis and inflammatory bowel diseases and pinpoints cell type contexts with higher predictive capability than context-free models. PINNACLE's ability to adjust its outputs on the basis of the context in which it operates paves the way for large-scale context-specific predictions in biology.
Asunto(s)
Aprendizaje Profundo , Análisis de la Célula Individual , Humanos , Análisis de la Célula Individual/métodos , Algoritmos , Mapas de Interacción de Proteínas , Proteínas/metabolismo , Proteínas/química , Biología Computacional/métodosRESUMEN
The study aims to determine the shared genetic architecture between COVID-19 severity with existing medical conditions using electronic health record (EHR) data. We conducted a Phenome-Wide Association Study (PheWAS) of genetic variants associated with critical illness (n = 35) or hospitalization (n = 42) due to severe COVID-19 using genome-wide association summary data from the Host Genetics Initiative. PheWAS analysis was performed using genotype-phenotype data from the Veterans Affairs Million Veteran Program (MVP). Phenotypes were defined by International Classification of Diseases (ICD) codes mapped to clinically relevant groups using published PheWAS methods. Among 658,582 Veterans, variants associated with severe COVID-19 were tested for association across 1,559 phenotypes. Variants at the ABO locus (rs495828, rs505922) associated with the largest number of phenotypes (nrs495828 = 53 and nrs505922 = 59); strongest association with venous embolism, odds ratio (ORrs495828 1.33 (p = 1.32 x 10-199), and thrombosis ORrs505922 1.33, p = 2.2 x10-265. Among 67 respiratory conditions tested, 11 had significant associations including MUC5B locus (rs35705950) with increased risk of idiopathic fibrosing alveolitis OR 2.83, p = 4.12 × 10-191; CRHR1 (rs61667602) associated with reduced risk of pulmonary fibrosis, OR 0.84, p = 2.26× 10-12. The TYK2 locus (rs11085727) associated with reduced risk for autoimmune conditions, e.g., psoriasis OR 0.88, p = 6.48 x10-23, lupus OR 0.84, p = 3.97 x 10-06. PheWAS stratified by ancestry demonstrated differences in genotype-phenotype associations. LMNA (rs581342) associated with neutropenia OR 1.29 p = 4.1 x 10-13 among Veterans of African and Hispanic ancestry but not European. Overall, we observed a shared genetic architecture between COVID-19 severity and conditions related to underlying risk factors for severe and poor COVID-19 outcomes. Differing associations between genotype-phenotype across ancestries may inform heterogenous outcomes observed with COVID-19. Divergent associations between risk for severe COVID-19 with autoimmune inflammatory conditions both respiratory and non-respiratory highlights the shared pathways and fine balance of immune host response and autoimmunity and caution required when considering treatment targets.
Asunto(s)
COVID-19 , Veteranos , COVID-19/epidemiología , COVID-19/genética , Estudios de Asociación Genética , Estudio de Asociación del Genoma Completo/métodos , Humanos , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
MOTIVATION: Predicting molecule-disease indications and side effects is important for drug development and pharmacovigilance. Comprehensively mining molecule-molecule, molecule-disease and disease-disease semantic dependencies can potentially improve prediction performance. METHODS: We introduce a Multi-Modal REpresentation Mapping Approach to Predicting molecular-disease relations (M2REMAP) by incorporating clinical semantics learned from electronic health records (EHR) of 12.6 million patients. Specifically, M2REMAP first learns a multimodal molecule representation that synthesizes chemical property and clinical semantic information by mapping molecule chemicals via a deep neural network onto the clinical semantic embedding space shared by drugs, diseases and other common clinical concepts. To infer molecule-disease relations, M2REMAP combines multimodal molecule representation and disease semantic embedding to jointly infer indications and side effects. RESULTS: We extensively evaluate M2REMAP on molecule indications, side effects and interactions. Results show that incorporating EHR embeddings improves performance significantly, for example, attaining an improvement over the baseline models by 23.6% in PRC-AUC on indications and 23.9% on side effects. Further, M2REMAP overcomes the limitation of existing methods and effectively predicts drugs for novel diseases and emerging pathogens. AVAILABILITY AND IMPLEMENTATION: The code is available at https://github.com/celehs/M2REMAP, and prediction results are provided at https://shiny.parse-health.org/drugs-diseases-dev/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Humanos , Desarrollo de Medicamentos , Registros Electrónicos de Salud , Redes Neurales de la Computación , FarmacovigilanciaRESUMEN
OBJECTIVES: Rheumatoid arthritis (RA) and atherosclerosis share many common inflammatory pathways. We studied whether a multi-biomarker panel for RA disease activity (MBDA) would associate with changes in arterial inflammation in an interventional trial. METHODS: In the TARGET Trial, RA patients with active disease despite methotrexate were randomly assigned to the addition of either a TNF inhibitor or sulfasalazine+hydroxychloroquine (triple therapy). Baseline and 24-week follow-up 18F-fluorodeoxyglucose (FDG) positron emission tomography/computed tomography scans were assessed for change in arterial inflammation measured as the maximal arterial target-to-blood background ratio of FDG uptake in the most diseased segment of the carotid arteries or aorta (MDS-TBRmax). The MBDA test, measured at baseline and weeks 6, 18, and 24, was assessed for its association with the change in MDS-TBRmax. RESULTS: Interpretable scans were available at baseline and week 24 for n = 112 patients. The MBDA score at week 24 was significantly correlated with the change in MDR-TBRmax (Spearman's rho = 0.239; p= 0.011) and remained significantly associated after adjustment for relevant confounders. Those with low MBDA at week 24 had a statistically significant adjusted reduction in arterial inflammation of 0.35 units vs no significant reduction in those who did not achieve low MBDA. Neither DAS28-CRP nor CRP predicted change in arterial inflammation. The MBDA component with the strongest association with change in arterial inflammation was serum amyloid A (SAA). CONCLUSIONS: Among treated RA patients, achieved MBDA predicts of changes in arterial inflammation. Achieving low MBDA at 24 weeks was associated with clinically meaningful reductions in arterial inflammation, regardless of treatment.
RESUMEN
In many modern machine learning applications, changes in covariate distributions and difficulty in acquiring outcome information have posed challenges to robust model training and evaluation. Numerous transfer learning methods have been developed to robustly adapt the model itself to some unlabeled target populations using existing labeled data in a source population. However, there is a paucity of literature on transferring performance metrics, especially receiver operating characteristic (ROC) parameters, of a trained model. In this paper, we aim to evaluate the performance of a trained binary classifier on unlabeled target population based on ROC analysis. We proposed Semisupervised Transfer lEarning of Accuracy Measures (STEAM), an efficient three-step estimation procedure that employs (1) double-index modeling to construct calibrated density ratio weights and (2) robust imputation to leverage the large amount of unlabeled data to improve estimation efficiency. We establish the consistency and asymptotic normality of the proposed estimator under the correct specification of either the density ratio model or the outcome model. We also correct for potential overfitting bias in the estimators in finite samples with cross-validation. We compare our proposed estimators to existing methods and show reductions in bias and gains in efficiency through simulations. We illustrate the practical utility of the proposed method on evaluating prediction performance of a phenotyping model for rheumatoid arthritis (RA) on a temporally evolving EHR cohort.
Asunto(s)
Aprendizaje Automático , Aprendizaje Automático Supervisado , Humanos , Curva ROC , Proyectos de Investigación , SesgoRESUMEN
BACKGROUND: We aimed to determine whether integrating concepts from the notes from the electronic health record (EHR) data using natural language processing (NLP) could improve the identification of gout flares. METHODS: Using Medicare claims linked with EHR, we selected gout patients who initiated the urate-lowering therapy (ULT). Patients' 12-month baseline period and on-treatment follow-up were segmented into 1-month units. We retrieved EHR notes for months with gout diagnosis codes and processed notes for NLP concepts. We selected a random sample of 500 patients and reviewed each of their notes for the presence of a physician-documented gout flare. Months containing at least 1 note mentioning gout flares were considered months with events. We used 60% of patients to train predictive models with LASSO. We evaluated the models by the area under the curve (AUC) in the validation data and examined positive/negative predictive values (P/NPV). RESULTS: We extracted and labeled 839 months of follow-up (280 with gout flares). The claims-only model selected 20 variables (AUC = 0.69). The NLP concept-only model selected 15 (AUC = 0.69). The combined model selected 32 claims variables and 13 NLP concepts (AUC = 0.73). The claims-only model had a PPV of 0.64 [0.50, 0.77] and an NPV of 0.71 [0.65, 0.76], whereas the combined model had a PPV of 0.76 [0.61, 0.88] and an NPV of 0.71 [0.65, 0.76]. CONCLUSION: Adding NLP concept variables to claims variables resulted in a small improvement in the identification of gout flares. Our data-driven claims-only model and our combined claims/NLP-concept model outperformed existing rule-based claims algorithms reliant on medication use, diagnosis, and procedure codes.
Asunto(s)
Gota , Anciano , Humanos , Estados Unidos/epidemiología , Gota/diagnóstico , Gota/epidemiología , Procesamiento de Lenguaje Natural , Registros Electrónicos de Salud , Medicare , Brote de los Síntomas , AlgoritmosRESUMEN
Rheumatoid arthritis (RA) affects 24.5 million people worldwide and has been associated with increased cancer risks. However, the extent to which the observed risks are related to the pathophysiology of rheumatoid arthritis or its treatments is unknown. Leveraging nationwide health insurance claims data with 85.97 million enrollees across 8 years, we identified 92 864 patients without cancers at the time of rheumatoid arthritis diagnoses. We matched 68 415 of these patients with participants without rheumatoid arthritis by sex, race, age and inferred health and economic status and compared their risks of developing all cancer types. By 12 months after the diagnosis of rheumatoid arthritis, rheumatoid arthritis patients were 1.21 (95% confidence interval [CI] [1.14, 1.29]) times more likely to develop any cancer compared with matched enrollees without rheumatoid arthritis. In particular, the risk of developing lymphoma is 2.08 (95% CI [1.67, 2.58]) times higher in the rheumatoid arthritis group, and the risk of developing lung cancer is 1.69 (95% CI [1.32, 2.13]) times higher. We further identified the five most commonly used drugs in treating rheumatoid arthritis, and the log-rank test showed none of them is implicated with a significantly increased cancer risk compared with rheumatoid arthritis patients without that specific drug. Our study suggested that the pathophysiology of rheumatoid arthritis, rather than its treatments, is implicated in the development of subsequent cancers. Our method is extensible to investigating the connections among drugs, diseases and comorbidities at scale.
Asunto(s)
Artritis Reumatoide , Neoplasias Pulmonares , Linfoma , Humanos , Artritis Reumatoide/complicaciones , Artritis Reumatoide/epidemiología , Artritis Reumatoide/tratamiento farmacológico , Comorbilidad , Neoplasias Pulmonares/etiología , Neoplasias Pulmonares/complicaciones , Análisis de DatosRESUMEN
With the worldwide digitalisation of medical records, electronic health records (EHRs) have become an increasingly important source of real-world data (RWD). RWD can complement traditional study designs because it captures almost the complete variety of patients, leading to more generalisable results. For rheumatology, these data are particularly interesting as our diseases are uncommon and often take years to develop. In this review, we discuss the following concepts related to the use of EHR for research and considerations for translation into clinical care: EHR data contain a broad collection of healthcare data covering the multitude of real-life patients and the healthcare processes related to their care. Machine learning (ML) is a powerful method that allows us to leverage a large amount of heterogeneous clinical data for clinical algorithms, but requires extensive training, testing, and validation. Patterns discovered in EHR data using ML are applicable to real life settings, however, are also prone to capturing the local EHR structure and limiting generalisability outside the EHR(s) from which they were developed. Population studies on EHR necessitates knowledge on the factors influencing the data available in the EHR to circumvent biases, for example, access to medical care, insurance status. In summary, EHR data represent a rapidly growing and key resource for real-world studies. However, transforming RWD EHR data for research and for real-world evidence using ML requires knowledge of the EHR system and their differences from existing observational data to ensure that studies incorporate rigorous methods that acknowledge or address factors such as access to care, noise in the data, missingness and indication bias.
Asunto(s)
Inteligencia Artificial , Registros Electrónicos de Salud , Humanos , Algoritmos , Aprendizaje Automático , Proyectos de InvestigaciónRESUMEN
OBJECTIVE: Recent large-scale randomised trials demonstrate that immunomodulators reduce cardiovascular (CV) events among the general population. However, it is uncertain whether these effects apply to rheumatoid arthritis (RA) and if certain treatment strategies in RA reduce CV risk to a greater extent. METHODS: Patients with active RA despite use of methotrexate were randomly assigned to addition of a tumour necrosis factor (TNF) inhibitor (TNFi) or addition of sulfasalazine and hydroxychloroquine (triple therapy) for 24 weeks. Baseline and follow-up 18F-fluorodeoxyglucose-positron emission tomography/CT scans were assessed for change in arterial inflammation, an index of CV risk, measured as an arterial target-to-background ratio (TBR) in the carotid arteries and aorta. RESULTS: 115 patients completed the protocol. The two treatment groups were well balanced with a median age of 58 years, 71% women, 57% seropositive and a baseline disease activity score in 28 joints of 4.8 (IQR 4.0, 5.6). Baseline TBR was similar across the two groups. Significant TBR reductions were observed in both groups-ΔTNFi: -0.24 (SD=0.51), Δtriple therapy: -0.19 (SD=0.51)-without difference between groups (difference in Δs: -0.02, 95% CI -0.19 to 0.15, p=0.79). While disease activity was significantly reduced across both treatment groups, there was no association with change in TBR (ß=0.04, 95% CI -0.03 to 0.10). CONCLUSION: We found that addition of either a TNFi or triple therapy resulted in clinically important improvements in vascular inflammation. However, the addition of a TNFi did not reduce arterial inflammation more than triple therapy. TRIAL REGISTRATION NUMBER: NCT02374021.
Asunto(s)
Antirreumáticos , Arteritis , Artritis Reumatoide , Enfermedades Cardiovasculares , Humanos , Femenino , Persona de Mediana Edad , Masculino , Antirreumáticos/efectos adversos , Enfermedades Cardiovasculares/prevención & control , Enfermedades Cardiovasculares/inducido químicamente , Factor de Necrosis Tumoral alfa , Factores de Riesgo , Artritis Reumatoide/diagnóstico por imagen , Artritis Reumatoide/tratamiento farmacológico , Artritis Reumatoide/inducido químicamente , Metotrexato/uso terapéutico , Factores Inmunológicos/uso terapéutico , Factores de Riesgo de Enfermedad Cardiaca , Arteritis/inducido químicamente , Arteritis/tratamiento farmacológico , Resultado del TratamientoRESUMEN
OBJECTIVE: Electronic health records (EHR), containing detailed longitudinal clinical information on a large number of patients and covering broad patient populations, open opportunities for comprehensive predictive modeling of disease progression and treatment response. However, since EHRs were originally constructed for administrative purposes not for research, in the EHR-linked studies, it is often not feasible to capture reliable information for analytical variables, especially in the survival setting, when both accurate event status and event times are needed for model building. For example, progression-free survival (PFS), a commonly used survival outcome for cancer patients, often involves complex information embedded in free-text clinical notes and cannot be extracted reliably. Proxies of PFS time such as time to the first mention of progression in the notes are at best good approximations to the true event time. This leads to difficulty in efficiently estimating event rates for an EHR patient cohort. Estimating survival rates based on error-prone outcome definitions can lead to biased results and hamper the power in the downstream analysis. On the other hand, extracting accurate event time information via manual annotation is time and resource intensive. The objective of this study is to develop a calibrated survival rate estimator using noisy outcomes from EHR data. MATERIALS AND METHODS: In this paper, we propose a two-stage semi-supervised calibration of noisy event rate (SCANER) estimator that can effectively overcome censoring induced dependency and attains more robust performance (i.e., not sensitive to misspecification of the imputation model) by fully utilizing both a small-labeled set of gold-standard survival outcomes annotated via manual chart review and a set of proxy features automatically captured via EHR in the unlabeled set. We validate the SCANER estimator by estimating the PFS rates for a virtual cohort of lung cancer patients from one large tertiary care center and the ICU-free survival rates for COVID patients from two large tertiary care centers. RESULTS: In terms of survival rate estimates, the SCANER had very similar point estimates compared to the complete-case Kaplan Meier estimator. On the other hand, other benchmark methods for comparison, which fail to account for the induced dependency between event time and the censoring time conditioning on surrogate outcomes, produced biased results across all three case studies. In terms of standard errors, the SCANER estimator was more efficient than the KM estimator, with up to 50% efficiency gain. CONCLUSION: The SCANER estimator achieves more efficient, robust, and accurate survival rate estimates compared to existing approaches. This promising new approach can also improve the resolution (i.e., granularity of event time) by using labels conditioning on multiple surrogates, particularly among less common or poorly coded conditions.
Asunto(s)
COVID-19 , Neoplasias Pulmonares , Humanos , Registros Electrónicos de Salud , Calibración , Análisis de SupervivenciaRESUMEN
Rationale: A common MUC5B gene polymorphism, rs35705950-T, is associated with idiopathic pulmonary fibrosis (IPF), but its role in severe acute respiratory syndrome coronavirus 2 infection and disease severity is unclear. Objectives: To assess whether rs35705950-T confers differential risk for clinical outcomes associated with coronavirus disease (COVID-19) infection among participants in the Million Veteran Program (MVP). Methods: The MUC5B rs35705950-T allele was directly genotyped among MVP participants; clinical events and comorbidities were extracted from the electronic health records. Associations between the incidence or severity of COVID-19 and rs35705950-T were analyzed within each ancestry group in the MVP followed by transancestry meta-analysis. Replication and joint meta-analysis were conducted using summary statistics from the COVID-19 Host Genetics Initiative (HGI). Sensitivity analyses with adjustment for additional covariates (body mass index, Charlson comorbidity index, smoking, asbestosis, rheumatoid arthritis with interstitial lung disease, and IPF) and associations with post-COVID-19 pneumonia were performed in MVP subjects. Measurements and Main Results: The rs35705950-T allele was associated with fewer COVID-19 hospitalizations in transancestry meta-analyses within the MVP (Ncases = 4,325; Ncontrols = 507,640; OR = 0.89 [0.82-0.97]; P = 6.86 × 10-3) and joint meta-analyses with the HGI (Ncases = 13,320; Ncontrols = 1,508,841; OR, 0.90 [0.86-0.95]; P = 8.99 × 10-5). The rs35705950-T allele was not associated with reduced COVID-19 positivity in transancestry meta-analysis within the MVP (Ncases = 19,168/Ncontrols = 492,854; OR, 0.98 [0.95-1.01]; P = 0.06) but was nominally significant (P < 0.05) in the joint meta-analysis with the HGI (Ncases = 44,820; Ncontrols = 1,775,827; OR, 0.97 [0.95-1.00]; P = 0.03). Associations were not observed with severe outcomes or mortality. Among individuals of European ancestry in the MVP, rs35705950-T was associated with fewer post-COVID-19 pneumonia events (OR, 0.82 [0.72-0.93]; P = 0.001). Conclusions: The MUC5B variant rs35705950-T may confer protection in COVID-19 hospitalizations.
Asunto(s)
COVID-19 , Fibrosis Pulmonar Idiopática , Humanos , COVID-19/epidemiología , COVID-19/genética , Mucina 5B/genética , Polimorfismo Genético , Fibrosis Pulmonar Idiopática/genética , Genotipo , Hospitalización , Predisposición Genética a la Enfermedad/genéticaRESUMEN
Although randomized controlled trials (RCTs) are the gold standard for establishing the efficacy and safety of a medical treatment, real-world evidence (RWE) generated from real-world data has been vital in postapproval monitoring and is being promoted for the regulatory process of experimental therapies. An emerging source of real-world data is electronic health records (EHRs), which contain detailed information on patient care in both structured (eg, diagnosis codes) and unstructured (eg, clinical notes and images) forms. Despite the granularity of the data available in EHRs, the critical variables required to reliably assess the relationship between a treatment and clinical outcome are challenging to extract. To address this fundamental challenge and accelerate the reliable use of EHRs for RWE, we introduce an integrated data curation and modeling pipeline consisting of 4 modules that leverage recent advances in natural language processing, computational phenotyping, and causal modeling techniques with noisy data. Module 1 consists of techniques for data harmonization. We use natural language processing to recognize clinical variables from RCT design documents and map the extracted variables to EHR features with description matching and knowledge networks. Module 2 then develops techniques for cohort construction using advanced phenotyping algorithms to both identify patients with diseases of interest and define the treatment arms. Module 3 introduces methods for variable curation, including a list of existing tools to extract baseline variables from different sources (eg, codified, free text, and medical imaging) and end points of various types (eg, death, binary, temporal, and numerical). Finally, module 4 presents validation and robust modeling methods, and we propose a strategy to create gold-standard labels for EHR variables of interest to validate data curation quality and perform subsequent causal modeling for RWE. In addition to the workflow proposed in our pipeline, we also develop a reporting guideline for RWE that covers the necessary information to facilitate transparent reporting and reproducibility of results. Moreover, our pipeline is highly data driven, enhancing study data with a rich variety of publicly available information and knowledge sources. We also showcase our pipeline and provide guidance on the deployment of relevant tools by revisiting the emulation of the Clinical Outcomes of Surgical Therapy Study Group Trial on laparoscopy-assisted colectomy versus open colectomy in patients with early-stage colon cancer. We also draw on existing literature on EHR emulation of RCTs together with our own studies with the Mass General Brigham EHR.
Asunto(s)
Neoplasias del Colon , Registros Electrónicos de Salud , Humanos , Algoritmos , Informática , Proyectos de InvestigaciónRESUMEN
BACKGROUND: Psoriasis is a common chronic inflammatory skin disorder that is associated with excess cardiovascular risk. Inflammation is a key mediator in the onset and progression of these cardiometabolic abnormalities; however, the excess cardiovascular risk conferred by psoriatic disease remains understudied. We investigated the prevalence and severity of CMD in patients with psoriasis and determined whether CMD is a result of CV risk factors and atherosclerotic burden. METHODS: This was a consecutive retrospective cohort study of patients with psoriasis, normal myocardial perfusion, and LV ejection fraction (EF) > 50% (N = 62) and matched controls without psoriasis (N = 112). Myocardial perfusion and myocardial flow reserve (MFR) were quantified using PET imaging. Atherosclerotic burden was determined by semi-quantitative computed tomography (CT) coronary calcium assessment. RESULTS: The prevalence of CMD (defined as MFR < 2) was 61.3% in patients with psoriatic disease, compared to 38.4% in a matched control population (P = .004). Furthermore, patients with psoriasis had a more severe reduction in adjusted MFR (2.3 ± .81 vs 1.92 ± .65, respectively, P = .001). The degree of atherosclerotic burden, as assessed by qualitative calcium score, was similar between psoriasis and controls. CONCLUSIONS: Patients with psoriasis without overt CAD demonstrated a high prevalence of coronary vasomotor abnormalities that are not entirely accounted for by the commonly associated coronary risk factors or the burden of atherosclerosis.
Asunto(s)
Enfermedad de la Arteria Coronaria , Isquemia Miocárdica , Imagen de Perfusión Miocárdica , Psoriasis , Calcio , Enfermedad de la Arteria Coronaria/diagnóstico por imagen , Enfermedad de la Arteria Coronaria/epidemiología , Circulación Coronaria , Humanos , Imagen de Perfusión Miocárdica/métodos , Tomografía de Emisión de Positrones/métodos , Psoriasis/complicaciones , Psoriasis/diagnóstico por imagen , Psoriasis/epidemiología , Estudios RetrospectivosRESUMEN
OBJECTIVE: Accurately assigning phenotype information to individual patients via computational phenotyping using Electronic Health Records (EHRs) has been seen as the first step towards enabling EHRs for precision medicine research. Chart review labels annotated by clinical experts, also known as "gold standard" labels, are essential for the development and validation of computational phenotyping algorithms. However, given the complexity of EHR systems, the process of chart review is both labor intensive and time consuming. We propose a fully automated algorithm, referred to as pGUESS, to rank EHR notes according to their relevance to a given phenotype. By identifying the most relevant notes, pGUESS can greatly improve the efficiency and accuracy of chart reviews. METHOD: pGUESS uses prior guided semantic similarity to measure the informativeness of a clinical note to a given phenotype. We first select candidate clinical concepts from a pool of comprehensive medical concepts using public knowledge sources and then derive the semantic embedding vector (SEV) for a reference article (SEVref) and each note (SEVnote). The algorithm scores the relevance of a note as the cosine similarity between SEVnote and SEVref. RESULTS: The algorithm was validated against four sets of 200 notes that were manually annotated by clinical experts to assess their informativeness to one of three disease phenotypes. pGUESS algorithm substantially outperforms existing unsupervised approaches for classifying the relevance status with respect to both accuracy and scalability across phenotypes. Averaging over the three phenotypes, the rank correlation between the algorithm ranking and gold standard label was 0.64 for pGUESS, but only 0.47 and 0.35 for the next two best performing algorithms. pGUESS is also much more computationally scalable compared to existing algorithms. CONCLUSION: pGUESS algorithm can substantially reduce the burden of chart review and holds potential in improving the efficiency and accuracy of human annotation.
Asunto(s)
Algoritmos , Semántica , Registros Electrónicos de Salud , Humanos , Procesamiento de Lenguaje Natural , Fenotipo , Medicina de PrecisiónRESUMEN
OBJECTIVE: The growing availability of electronic health records (EHR) data opens opportunities for integrative analysis of multi-institutional EHR to produce generalizable knowledge. A key barrier to such integrative analyses is the lack of semantic interoperability across different institutions due to coding differences. We propose a Multiview Incomplete Knowledge Graph Integration (MIKGI) algorithm to integrate information from multiple sources with partially overlapping EHR concept codes to enable translations between healthcare systems. METHODS: The MIKGI algorithm combines knowledge graph information from (i) embeddings trained from the co-occurrence patterns of medical codes within each EHR system and (ii) semantic embeddings of the textual strings of all medical codes obtained from the Self-Aligning Pretrained BERT (SAPBERT) algorithm. Due to the heterogeneity in the coding across healthcare systems, each EHR source provides partial coverage of the available codes. MIKGI synthesizes the incomplete knowledge graphs derived from these multi-source embeddings by minimizing a spherical loss function that combines the pairwise directional similarities of embeddings computed from all available sources. MIKGI outputs harmonized semantic embedding vectors for all EHR codes, which improves the quality of the embeddings and enables direct assessment of both similarity and relatedness between any pair of codes from multiple healthcare systems. RESULTS: With EHR co-occurrence data from Veteran Affairs (VA) healthcare and Mass General Brigham (MGB), MIKGI algorithm produces high quality embeddings for a variety of downstream tasks including detecting known similar or related entity pairs and mapping VA local codes to the relevant EHR codes used at MGB. Based on the cosine similarity of the MIKGI trained embeddings, the AUC was 0.918 for detecting similar entity pairs and 0.809 for detecting related pairs. For cross-institutional medical code mapping, the top 1 and top 5 accuracy were 91.0% and 97.5% when mapping medication codes at VA to RxNorm medication codes at MGB; 59.1% and 75.8% when mapping VA local laboratory codes to LOINC hierarchy. When trained with 500 labels, the lab code mapping attained top 1 and 5 accuracy at 77.7% and 87.9%. MIKGI also attained best performance in selecting VA local lab codes for desired laboratory tests and COVID-19 related features for COVID EHR studies. Compared to existing methods, MIKGI attained the most robust performance with accuracy the highest or near the highest across all tasks. CONCLUSIONS: The proposed MIKGI algorithm can effectively integrate incomplete summary data from biomedical text and EHR data to generate harmonized embeddings for EHR codes for knowledge graph modeling and cross-institutional translation of EHR codes.
Asunto(s)
COVID-19 , Registros Electrónicos de Salud , Algoritmos , Humanos , Logical Observation Identifiers Names and Codes , Reconocimiento de Normas Patrones AutomatizadasRESUMEN
OBJECTIVE: Examine the association of methotrexate (MTX) use with cardiovascular disease (CVD) in rheumatoid arthritis (RA) using marginal structural models (MSM) and determine if CVD risk is mediated through modification of disease activity. METHODS: We identified incident CVD events (coronary artery disease (CAD), stroke, heart failure (HF) hospitalisation, CVD death) within a multicentre, prospective cohort of US Veterans with RA. A 28-joint Disease Activity Score with C-reactive protein (DAS28-CRP) was collected at regular visits and medication exposures were determined by linking to pharmacy dispensing data. MSMs were used to estimate the treatment effect of MTX on risk of incident CVD, accounting for time-varying confounders between receiving MTX and CVD events. A mediation analysis was performed to estimate the indirect effects of methotrexate on CVD risk through modification of RA disease activity. RESULTS: Among 2044 RA patients (90% male, mean age 63.9 years, baseline DAS28-CRP 3.6), there were 378 incident CVD events. Using MSM, MTX use was associated with a 24% reduced risk of composite CVD events (HR 0.76, 95% CI 0.58 to 0.99) including a 57% reduction in HF hospitalisations (HR 0.43, 95% CI 0.24 to 0.77). Individual associations with CAD, stroke and CVD death were not statistically significant. In mediation analyses, there was no evidence of indirect effects of MTX on CVD risk through disease activity modification (HR 1.03, 95% CI 0.80 to 1.32). CONCLUSIONS: MTX use in RA was associated with a reduced risk of CVD events, particularly HF-related hospitalisations. These associations were not mediated through reductions in RA disease activity, suggesting alternative MTX-related mechanisms may modify CVD risk in this population.
Asunto(s)
Antirreumáticos/uso terapéutico , Artritis Reumatoide/tratamiento farmacológico , Enfermedad de la Arteria Coronaria/epidemiología , Factores de Riesgo de Enfermedad Cardiaca , Insuficiencia Cardíaca/epidemiología , Hospitalización/estadística & datos numéricos , Metotrexato/uso terapéutico , Accidente Cerebrovascular/epidemiología , Anciano , Artritis Reumatoide/epidemiología , Artritis Reumatoide/fisiopatología , Enfermedades Cardiovasculares/mortalidad , Femenino , Humanos , Incidencia , Masculino , Persona de Mediana Edad , Modelos de Riesgos ProporcionalesRESUMEN
PURPOSE OF REVIEW: Patients with chronic inflammatory disease have an increased risk of cardiovascular disease. This article reviews the current evidence of cardiovascular prevention in three common systemic inflammatory disorders (SIDs): psoriasis, rheumatoid arthritis, and systemic lupus erythematosus. RECENT FINDINGS: General population cardiovascular risk assessment tools currently underestimate cardiovascular risk and disease-specific risk assessment tools are an area of active investigation. A disease-specific cardiovascular risk estimator has not been shown to more accurately predict risk compared with the current guidelines. Rheumatoid arthritis-specific risk estimators have been shown to better predict cardiovascular risk in some cohorts and not others. Systemic lupus erythematosus-specific scores have also been proposed and require further validation, whereas psoriasis is an open area of active investigation. The current role of universal prevention treatment with statin therapy in patients with SID remains unclear. Aggressive risk factor modification and control of disease activity are important interventions to reduce cardiovascular risk. SUMMARY: A comprehensive approach that includes cardiovascular risk factor modification, control of systemic inflammation, and increased patient and physician awareness is needed in cardiovascular prevention of chronic inflammation. Clinical trials are currently underway to test whether disease-specific anti-inflammatory therapies will reduce cardiovascular risk.
Asunto(s)
Artritis Reumatoide , Enfermedades Cardiovasculares , Psoriasis , Artritis Reumatoide/complicaciones , Artritis Reumatoide/tratamiento farmacológico , Enfermedades Cardiovasculares/etiología , Enfermedades Cardiovasculares/prevención & control , Enfermedad Crónica , Humanos , Inflamación/complicaciones , Factores de RiesgoRESUMEN
Crohn's disease (CD) and ulcerative colitis (UC) are heterogeneous. With availability of therapeutic classes with distinct immunologic mechanisms of action, it has become imperative to identify markers that predict likelihood of response to each drug class. However, robust development of such tools has been challenging because of need for large prospective cohorts with systematic and careful assessment of treatment response using validated indices. Most hospitals in the United States use electronic health records (EHRs) that warehouse a large amount of narrative (free-text) and codified (administrative) data generated during routine clinical care. These data have been used to construct virtual disease cohorts for epidemiologic research as well as for defining genetic basis of disease states or discrete laboratory values.1-3 Whether EHR-based data can be used to validate genetic associations for more nuanced outcomes such as treatment response has not been examined previously.
Asunto(s)
Colitis Ulcerosa , Enfermedad de Crohn , Enfermedades Inflamatorias del Intestino , Registros Electrónicos de Salud , Humanos , Enfermedades Inflamatorias del Intestino/tratamiento farmacológico , Estudios Prospectivos , Estados UnidosRESUMEN
OBJECTIVES: To develop classification algorithms that accurately identify axial SpA (axSpA) patients in electronic health records, and compare the performance of algorithms incorporating free-text data against approaches using only International Classification of Diseases (ICD) codes. METHODS: An enriched cohort of 7853 eligible patients was created from electronic health records of two large hospitals using automated searches (⩾1 ICD codes combined with simple text searches). Key disease concepts from free-text data were extracted using NLP and combined with ICD codes to develop algorithms. We created both supervised regression-based algorithms-on a training set of 127 axSpA cases and 423 non-cases-and unsupervised algorithms to identify patients with high probability of having axSpA from the enriched cohort. Their performance was compared against classifications using ICD codes only. RESULTS: NLP extracted four disease concepts of high predictive value: ankylosing spondylitis, sacroiliitis, HLA-B27 and spondylitis. The unsupervised algorithm, incorporating both the NLP concept and ICD code for AS, identified the greatest number of patients. By setting the probability threshold to attain 80% positive predictive value, it identified 1509 axSpA patients (mean age 53 years, 71% male). Sensitivity was 0.78, specificity 0.94 and area under the curve 0.93. The two supervised algorithms performed similarly but identified fewer patients. All three outperformed traditional approaches using ICD codes alone (area under the curve 0.80-0.87). CONCLUSION: Algorithms incorporating free-text data can accurately identify axSpA patients in electronic health records. Large cohorts identified using these novel methods offer exciting opportunities for future clinical research.