Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 15 de 15
1.
Sci Rep ; 14(1): 8021, 2024 04 05.
Article En | MEDLINE | ID: mdl-38580710

The Phenome-Wide Association Study (PheWAS) is increasingly used to broadly screen for potential treatment effects, e.g., IL6R variant as a proxy for IL6R antagonists. This approach offers an opportunity to address the limited power in clinical trials to study differential treatment effects across patient subgroups. However, limited methods exist to efficiently test for differences across subgroups in the thousands of multiple comparisons generated as part of a PheWAS. In this study, we developed an approach that maximizes the power to test for heterogeneous genotype-phenotype associations and applied this approach to an IL6R PheWAS among individuals of African (AFR) and European (EUR) ancestries. We identified 29 traits with differences in IL6R variant-phenotype associations, including a lower risk of type 2 diabetes in AFR (OR 0.96) vs EUR (OR 1.0, p-value for heterogeneity = 8.5 × 10-3), and higher white blood cell count (p-value for heterogeneity = 8.5 × 10-131). These data suggest a more salutary effect of IL6R blockade for T2D among individuals of AFR vs EUR ancestry and provide data to inform ongoing clinical trials targeting IL6 for an expanding number of conditions. Moreover, the method to test for heterogeneity of associations can be applied broadly to other large-scale genotype-phenotype screens in diverse populations.


Diabetes Mellitus, Type 2 , Humans , Diabetes Mellitus, Type 2/drug therapy , Diabetes Mellitus, Type 2/genetics , Genetic Association Studies , Phenotype , Polymorphism, Single Nucleotide , Receptors, Interleukin-6/genetics
2.
J Am Heart Assoc ; 13(9): e030387, 2024 May 07.
Article En | MEDLINE | ID: mdl-38686879

BACKGROUND: Coronary microvascular dysfunction as measured by myocardial flow reserve (MFR) is associated with increased cardiovascular risk in rheumatoid arthritis (RA). The objective of this study was to determine the association between reducing inflammation with MFR and other measures of cardiovascular risk. METHODS AND RESULTS: Patients with RA with active disease about to initiate a tumor necrosis factor inhibitor were enrolled (NCT02714881). All subjects underwent a cardiac perfusion positron emission tomography scan to quantify MFR at baseline before tumor necrosis factor inhibitor initiation, and after tumor necrosis factor inhibitor initiation at 24 weeks. MFR <2.5 in the absence of obstructive coronary artery disease was defined as coronary microvascular dysfunction. Blood samples at baseline and 24 weeks were measured for inflammatory markers (eg, high-sensitivity C-reactive protein [hsCRP], interleukin-1b, and high-sensitivity cardiac troponin T [hs-cTnT]). The primary outcome was mean MFR before and after tumor necrosis factor inhibitor initiation, with Δhs-cTnT as the secondary outcome. Secondary and exploratory analyses included the correlation between ΔhsCRP and other inflammatory markers with MFR and hs-cTnT. We studied 66 subjects, 82% of which were women, mean RA duration 7.4 years. The median atherosclerotic cardiovascular disease risk was 2.5%; 47% had coronary microvascular dysfunction and 23% had detectable hs-cTnT. We observed no change in mean MFR before (2.65) and after treatment (2.64, P=0.6) or hs-cTnT. A correlation was observed between a reduction in hsCRP and interleukin-1b with a reduction in hs-cTnT. CONCLUSIONS: In this RA cohort with low prevalence of cardiovascular risk factors, nearly 50% of subjects had coronary microvascular dysfunction at baseline. A reduction in inflammation was not associated with improved MFR. However, a modest reduction in interleukin-1b and no other inflammatory pathways was correlated with a reduction in subclinical myocardial injury. REGISTRATION: URL: https://www.clinicaltrials.gov; Unique identifier: NCT02714881.


Arthritis, Rheumatoid , Biomarkers , Coronary Circulation , Inflammation , Microcirculation , Aged , Female , Humans , Male , Middle Aged , Antirheumatic Agents/therapeutic use , Arthritis, Rheumatoid/physiopathology , Arthritis, Rheumatoid/complications , Arthritis, Rheumatoid/blood , Biomarkers/blood , C-Reactive Protein/metabolism , Coronary Artery Disease/physiopathology , Coronary Artery Disease/blood , Coronary Artery Disease/diagnosis , Coronary Circulation/physiology , Coronary Vessels/physiopathology , Coronary Vessels/diagnostic imaging , Fractional Flow Reserve, Myocardial/physiology , Heart Disease Risk Factors , Inflammation/blood , Inflammation/physiopathology , Inflammation Mediators/blood , Interleukin-1beta/blood , Myocardial Perfusion Imaging/methods , Positron-Emission Tomography , Treatment Outcome , Troponin T/blood , Tumor Necrosis Factor Inhibitors/therapeutic use
3.
Semin Arthritis Rheum ; 66: 152421, 2024 Jun.
Article En | MEDLINE | ID: mdl-38457949

OBJECTIVE: Switching biologic and targeted synthetic DMARD (b/tsDMARD) medications occurs commonly in RA patients, however data are limited on the reasons for these changes. The objective of the study was to identify and categorize reasons for b/tsDMARD switching and investigate characteristics associated with treatment refractory RA. METHODS: In a multi-hospital RA electronic health record (EHR) cohort, we identified RA patients prescribed ≥1 b/tsDMARD between 2001 and 2017. Consistent with the EULAR "difficult to treat" (D2T) RA definition, we further identified patients who discontinued ≥2 b/tsDMARDs with different mechanisms of action. We performed manual chart review to determine reasons for medication discontinuation. We defined "treatment refractory" RA as not achieving low disease activity (<3 tender or swollen joints on <7.5 mg of daily prednisone equivalent) despite treatment with two different b/tsDMARD mechanisms of action. We compared demographic, lifestyle, and clinical factors between treatment refractory RA and b/tsDMARD initiators not meeting D2T criteria. RESULTS: We identified 6040 RA patients prescribed ≥1 b/tsDMARD including 404 meeting D2T criteria. The most common reasons for medication discontinuation were inadequate response (43.3 %), loss of efficacy (25.8 %), and non-allergic adverse events (13.7 %). Of patients with D2T RA, 15 % had treatment refractory RA. Treatment refractory RA patients were younger at b/tsDMARD initiation (mean 47.2 vs. 55.2 years, p < 0.001), more commonly female (91.8% vs. 76.1 %, p = 0.006), and ever smokers (68.9% vs. 49.9 %, p = 0.005). No RA clinical factors differentiated treatment refractory RA patients from b/tsDMARD initiators. CONCLUSIONS: In a large EHR-based RA cohort, the most common reasons for b/tsDMARD switching were inadequate response, loss of efficacy, and nonallergic adverse events (e.g. infections, leukopenia, psoriasis). Clinical RA factors were insufficient for differentiating b/tsDMARD responders from nonresponders.


Antirheumatic Agents , Arthritis, Rheumatoid , Biological Products , Drug Substitution , Humans , Arthritis, Rheumatoid/drug therapy , Female , Male , Middle Aged , Antirheumatic Agents/therapeutic use , Biological Products/therapeutic use , Aged , Adult
4.
Arthritis Res Ther ; 25(1): 93, 2023 06 02.
Article En | MEDLINE | ID: mdl-37269020

BACKGROUND: Many patients with rheumatoid arthritis (RA) require a trial of multiple biologic disease-modifying anti-rheumatic drugs (bDMARDs) to control their disease. With the availability of several bDMARD options, the history of bDMARDs may provide an alternative approach to understanding subphenotypes of RA. The objective of this study was to determine whether there exist distinct clusters of RA patients based on bDMARD prescription history to subphenotype RA. METHODS: We studied patients from a validated electronic health record-based RA cohort with data from January 1, 2008, through July 31, 2019; all subjects prescribed ≥ 1 bDMARD or targeted synthetic (ts) DMARD were included. To determine whether subjects had similar b/tsDMARD sequences, the sequences were considered as a Markov chain over the state-space of 5 classes of b/tsDMARDs. The maximum likelihood estimator (MLE)-based approach was used to estimate the Markov chain parameters to determine the clusters. The EHR data of study subjects were further linked with a registry containing prospectively collected data for RA disease activity, i.e., clinical disease activity index (CDAI). As a proof of concept, we tested whether the clusters derived from b/tsDMARD sequences correlated with clinical measures, specifically differing trajectories of CDAI. RESULTS: We studied 2172 RA subjects, mean age 52 years, RA duration 3.4 years, and 62% seropositive. We observed 550 unique b/tsDMARD sequences and identified 4 main clusters: (1) TNFi persisters (65.7%), (2) TNFi and abatacept therapy (8.0%), (3) on rituximab or multiple b/tsDMARDs (12.7%), (4) prescribed multiple therapies with tocilizumab predominant (13.6%). Compared to the other groups, TNFi persisters had the most favorable trajectory of CDAI over time. CONCLUSION: We observed that RA subjects can be clustered based on the sequence of b/tsDMARD prescriptions over time and that the clusters were correlated with differing trajectories of disease activity over time. This study highlights an alternative approach to consider subphenotyping of patients with RA for studies aimed at understanding treatment response.


Antirheumatic Agents , Arthritis, Rheumatoid , Biological Products , Humans , Middle Aged , Arthritis, Rheumatoid/drug therapy , Antirheumatic Agents/therapeutic use , Rituximab/therapeutic use , Abatacept/therapeutic use , Biological Products/therapeutic use
5.
EBioMedicine ; 92: 104581, 2023 Jun.
Article En | MEDLINE | ID: mdl-37121095

BACKGROUND: Rheumatoid arthritis (RA) shares genetic variants with other autoimmune conditions, but existing studies test the association between RA variants with a pre-defined set of phenotypes. The objective of this study was to perform a large-scale, systemic screen to determine phenotypes that share genetic architecture with RA to inform our understanding of shared pathways. METHODS: In the UK Biobank (UKB), we constructed RA genetic risk scores (GRS) incorporating human leukocyte antigen (HLA) and non-HLA risk alleles. Phenotypes were defined using groupings of International Classification of Diseases (ICD) codes. Patients with an RA code were excluded to mitigate the possibility of associations being driven by the diagnosis or management of RA. We performed a phenome-wide association study, testing the association between the RA GRS with phenotypes using multivariate generalized estimating equations that adjusted for age, sex, and first five principal components. Statistical significance was defined using Bonferroni correction. Results were replicated in an independent cohort and replicated phenotypes were validated using medical record review of patients. FINDINGS: We studied n = 316,166 subjects from UKB without evidence of RA and screened for association between the RA GRS and n = 1317 phenotypes. In the UKB, 20 phenotypes were significantly associated with the RA GRS, of which 13 (65%) were immune mediated conditions including polymyalgia rheumatica, granulomatosis with polyangiitis (GPA), type 1 diabetes, and multiple sclerosis. We further identified a novel association in Celiac disease where the HLA and non-HLA alleles had strong associations in opposite directions. Strikingly, we observed that the non-HLA GRS was exclusively associated with greater risk of the validated conditions, suggesting shared underlying pathways outside the HLA region. INTERPRETATION: This study replicated and identified novel autoimmune phenotypes verified by medical record review that share immune pathways with RA and may inform opportunities for shared treatment targets, as well as risk assessment for conditions with a paucity of genomic data, such as GPA. FUNDING: This research was funded by the US National Institutes of Health (P30AR072577, R21AR078339, R35GM142879, T32AR007530) and the Harold and DuVal Bowen Fund.


Arthritis, Rheumatoid , Genetic Predisposition to Disease , Humans , Genotype , Arthritis, Rheumatoid/diagnosis , Arthritis, Rheumatoid/genetics , Risk Factors , Phenotype , HLA Antigens/genetics , Histocompatibility Antigens Class II/genetics , HLA-DRB1 Chains/genetics , Alleles
6.
Arthritis Care Res (Hoboken) ; 75(5): 1036-1045, 2023 05.
Article En | MEDLINE | ID: mdl-34623035

OBJECTIVE: In rheumatoid arthritis (RA), there are limited data on risk factors for the clinical heart failure (HF) subtypes of HF with reduced ejection fraction (HFrEF) and HF with preserved ejection fraction (HFpEF). This study examined the association between inflammation and incident HF subtypes in RA. Because inflammation changes over time with disease activity, we hypothesized that the effect of inflammation may be stronger at the 5-year follow-up than at the standard 10-year follow-up from general population studies of cardiovascular risk. METHODS: We studied an electronic health record (EHR)-based RA cohort with data pre- and post-RA incidence. We applied a validated approach to identify HF and extract ejection fraction to classify HFrEF and HFpEF. Follow-up started from the RA incidence date (index date) to the earliest occurrence of incident HF, death, last EHR encounter, or 10 years. Baseline inflammation was assessed using erythrocyte sedimentation rate or C-reactive protein values. Covariates included demographic characteristics, established HF risk factors, and RA-related factors. We tested the association between baseline inflammation with incident HF and its subtypes using Cox proportional hazards models. RESULTS: We studied 9,087 patients with RA; 8.2% developed HF during 10 years of follow-up. Elevated inflammation was associated with increased risk for HF at both 5- and 10-year follow-ups (hazard ratio [HR] 1.66, 95% confidence interval [95% CI] 1.12-2.46 and HR 1.46, 95% CI 1.13-1.90, respectively), which is also seen for HFpEF at 5 years (HR 1.72, 95% CI 1.09-2.70) and 10 years (HR 1.45, 95% CI 1.07-1.94). HFrEF was not associated with inflammation for either follow-up time. CONCLUSION: Elevated inflammation early in RA diagnosis was associated with HF; this association was driven by HFpEF and not HFrEF, suggesting a window of opportunity for prevention of HFpEF in RA.


Arthritis, Rheumatoid , Heart Failure , Humans , Stroke Volume , Heart Failure/diagnosis , Heart Failure/epidemiology , Risk Factors , Inflammation , Prognosis
7.
J Am Heart Assoc ; 11(15): e026014, 2022 08 02.
Article En | MEDLINE | ID: mdl-35904194

Background Models predicting atrial fibrillation (AF) risk, such as Cohorts for Heart and Aging Research in Genomic Epidemiology AF (CHARGE-AF), have not performed as well in electronic health records. Natural language processing (NLP) may improve models by using narrative electronic health record text. Methods and Results From a primary care network, we included patients aged ≥65 years with visits between 2003 and 2013 in development (n=32 960) and internal validation cohorts (n=13 992). An external validation cohort from a separate network from 2015 to 2020 included 39 051 patients. Model features were defined using electronic health record codified data and narrative data with NLP. We developed 2 models to predict 5-year AF incidence using (1) codified+NLP data and (2) codified data only and evaluated model performance. The analysis included 2839 incident AF cases in the development cohort and 1057 and 2226 cases in internal and external validation cohorts, respectively. The C-statistic was greater (P<0.001) in codified+NLP model (0.744 [95% CI, 0.735-0.753]) compared with codified-only (0.730 [95% CI, 0.720-0.739]) in the development cohort. In internal validation, the C-statistic of codified+NLP was modestly higher (0.735 [95% CI, 0.720-0.749]) compared with codified-only (0.729 [95% CI, 0.715-0.744]; P=0.06) and CHARGE-AF (0.717 [95% CI, 0.703-0.731]; P=0.002). Codified+NLP and codified-only were well calibrated, whereas CHARGE-AF underestimated AF risk. In external validation, the C-statistic of codified+NLP (0.750 [95% CI, 0.740-0.760]) remained higher (P<0.001) than codified-only (0.738 [95% CI, 0.727-0.748]) and CHARGE-AF (0.735 [95% CI, 0.725-0.746]). Conclusions Estimation of 5-year risk of AF can be modestly improved using NLP to incorporate narrative electronic health record data.


Atrial Fibrillation , Natural Language Processing , Atrial Fibrillation/diagnosis , Atrial Fibrillation/epidemiology , Cohort Studies , Electronic Health Records , Humans , Incidence , Risk Assessment/methods
8.
JAMA Netw Open ; 5(6): e2218371, 2022 06 01.
Article En | MEDLINE | ID: mdl-35737384

Importance: Temporal shifts in clinical knowledge and practice need to be adjusted for in treatment outcome assessment in clinical evidence. Objective: To use electronic health record (EHR) data to (1) assess the temporal trends in treatment decisions and patient outcomes and (2) emulate a randomized clinical trial (RCT) using EHR data with proper adjustment for temporal trends. Design, Setting, and Participants: The Clinical Outcomes of Surgical Therapy (COST) Study Group Trial assessing overall survival of patients with stages I to III early-stage colon cancer was chosen as the target trial. The RCT was emulated using EHR data of patients from a single health care system cohort who underwent colectomy for early-stage colon cancer from January 1, 2006, to December 31, 2017, and were followed up to January 1, 2020, from Mass General Brigham. Analyses were conducted from December 2, 2019, to January 24, 2022. Exposures: Laparoscopy-assisted colectomy (LAC) vs open colectomy (OC). Main Outcomes and Measures: The primary outcome was 5-year overall survival. To address confounding in the emulation, pretreatment variables were selected and adjusted. The temporal trends were adjusted by stratification of the calendar year when the colectomies were performed with cotraining across strata. Results: A total of 943 patients met key RCT eligibility criteria in the EHR emulation cohort, including 518 undergoing LAC (median age, 63 [range, 20-95] years; 268 [52%] women; 121 [23%] with stage I, 165 [32%] with stage II, and 232 [45%] with stage III cancer; 32 [6%] with colon adhesion; 278 [54%] with right-sided colon cancer; 18 [3%] with left-sided colon cancer; and 222 [43%] with sigmoid colon cancer) and 425 undergoing OC (median age, 65 [range, 28-99] years; 223 [52%] women; 61 [14%] with stage I, 153 [36%] with stage II, and 211 [50%] with stage III cancer; 39 [9%] with colon adhesion; 202 [47%] with right-sided colon cancer; 39 [9%] with left-sided colon cancer; and 201 [47%] with sigmoid colon cancer). Tests for temporal trends in treatment assignment (χ2 = 60.3; P < .001) and overall survival (χ2 = 137.2; P < .001) were significant. The adjusted EHR emulation reached the same conclusion as the RCT: LAC is not inferior to OC in overall survival rate with risk difference at 5 years of -0.007 (95% CI, -0.070 to 0.057). The results were consistent for stratified analysis within each temporal period. Conclusions and Relevance: These findings suggest that confounding bias from temporal trends should be considered when conducting clinical evidence studies with long time spans. Stratification of calendar time and cotraining of models is one solution. With proper adjustment, clinical evidence may supplement RCTs in the assessment of treatment outcome over time.


Laparoscopy , Sigmoid Neoplasms , Aged , Colectomy/methods , Electronic Health Records , Female , Humans , Laparoscopy/methods , Male , Middle Aged
9.
Int J Med Inform ; 162: 104753, 2022 Apr 01.
Article En | MEDLINE | ID: mdl-35405530

OBJECTIVE: The use of electronic health records (EHR) systems has grown over the past decade, and with it, the need to extract information from unstructured clinical narratives. Clinical notes, however, frequently contain acronyms with several potential senses (meanings) and traditional natural language processing (NLP) techniques cannot differentiate between these senses. In this study we introduce a semi-supervised method for binary acronym disambiguation, the task of classifying a target sense for acronyms in the clinical EHR notes. METHODS: We developed a semi-supervised ensemble machine learning (CASEml) algorithm to automatically identify when an acronym means a target sense by leveraging semantic embeddings, visit-level text and billing information. The algorithm was validated using note data from the Veterans Affairs hospital system to classify the meaning of three acronyms: RA, MS, and MI. We compared the performance of CASEml against another standard semi-supervised method and a baseline metric selecting the most frequent acronym sense. Along with evaluating the performance of these methods for specific instances of acronyms, we evaluated the impact of acronym disambiguation on NLP-driven phenotyping of rheumatoid arthritis. RESULTS: CASEml achieved accuracies of 0.947, 0.911, and 0.706 for RA, MS, and MI, respectively, higher than a standard baseline metric and (on average) higher than a state-of-the-art semi-supervised method. As well, we demonstrated that applying CASEml to medical notes improves the AUC of a phenotype algorithm for rheumatoid arthritis. CONCLUSION: CASEml is a novel method that accurately disambiguates acronyms in clinical notes and has advantages over commonly used supervised and semi-supervised machine learning approaches. In addition, CASEml improves the performance of NLP tasks that rely on ambiguous acronyms, such as phenotyping.

10.
Mult Scler Relat Disord ; 57: 103333, 2022 Jan.
Article En | MEDLINE | ID: mdl-35158446

BACKGROUND: Long-term data on multiple sclerosis (MS) inflammatory disease activity are limited. We examined electronic health records (EHR) indicators of disease activity in people with MS. METHODS: We analyzed prospectively collected research registry data and linked EHR data in a clinic-based cohort from 2000 to 2016. We used the trend of the yearly incident relapse rate from the registry data as benchmark. We then calculated the temporal trends of potentially relevant EHR measures, including mean count of the MS diagnostic code, mentions of MS-related concepts, MS-related health utilizations and selected prescriptions. RESULTS: 1,555 MS patients had both registry and EHR data. Between 2000 and 2016, the registry data showed a declining trend in the yearly incident relapse rate, parallel to an increasing trend of DMT usage. Among the EHR measures, covariate-adjusted frequency of diagnostic code of MS, procedure codes of MS-related imaging studies and emergency room visits, and electronic prescription for steroids declined over time, mirroring the temporal trend of the benchmark yearly incident relapse rate. CONCLUSION: This study highlights EHR indicators of MS relapse that could enable large-scale examination of long-term disease activities or inform individual patient monitoring in clinical settings where EHR data are available.


Multiple Sclerosis , Cohort Studies , Electronic Health Records , Humans , Multiple Sclerosis/epidemiology , Recurrence , Registries
11.
JAMA Netw Open ; 4(11): e2134627, 2021 11 01.
Article En | MEDLINE | ID: mdl-34783826

Importance: As disease-modifying treatment options for multiple sclerosis increase, comparisons of the options based on real-world evidence may guide clinical decision-making. Objective: To compare the relapse outcomes between 2 pairs of disease-modifying treatments: dimethyl fumarate vs fingolimod and natalizumab vs rituximab. Design, Setting, and Participants: This comparative effectiveness study integrated data from a clinic-based multiple sclerosis research registry and its linked electronic health records (EHR) system between January 1, 2006, and December 31, 2016, and built treatment groups for each pairwise disease-modifying treatment comparison according to both registry records and electronic prescriptions. Parallel analyses were conducted from October 11, 2019, to July 7, 2021. Main Outcomes and Measures: The main outcomes were the 1-year and 2-year relapse rates as well as the time to relapse. To compare relapse outcomes, the study adjusted for covariates from 2 sources (registry and EHR) and corrected for confounding biases among the covariates by the doubly robust estimation. Results: The study included 4 treatment groups: dimethyl fumarate (n = 260; 198 women [76.2%]; 227 non-Hispanic White individuals [87.3%]; mean [SD] age at diagnosis, 41.7 [10.4] years), fingolimod (n = 267; 190 women [71.2%]; 222 non-Hispanic White individuals [83.1%]; mean [SD] age at diagnosis, 37.9 [9.9] years), natalizumab (n = 204; 160 women [78.4%]; 172 non-Hispanic White individuals [84.3%]; mean [SD] age at diagnosis, 37.2 [10.6] years), and rituximab (n = 115; 83 women [72.2%]; 99 non-Hispanic White individuals [86.1%]; mean [SD] age at diagnosis, 44.1 [11.1] years). No significant differences were found in the relapse outcomes between dimethyl fumarate and fingolimod after correcting for confounding biases and multiple testing (difference in 1-year relapse rate, 0.028 [95% CI, -0.031 to 0.084]; difference in 2-year relapse rate, 0.071 [95% CI, 0.008-0.128]; relative risk of 2-year non-relapse, 0.957 [95% CI, 0.884-1.035] with dimethyl fumarate as reference). When compared with rituximab, natalizumab was associated with a higher relapse rate for all 3 outcomes after bias correction and multiple testing (difference in 1-year relapse rate, 0.080 [95% CI, 0.013-0.137]; difference in 2-year relapse rate, 0.132 [95% CI, 0.043-0.189]; relative risk of 2-year non-relapse, 0.903 [95% CI, 0.822-0.944]). Confounders were identified from EHR data not recorded in the registry data through data-driven feature selection. Conclusions and Relevance: This study reports real-world evidence of equivalent relapse outcomes between dimethyl fumarate and fingolimod and relapse reduction in favor of rituximab relative to natalizumab. This approach illustrates the value of incorporating EHR data as high-dimensional covariates in real-world treatment comparison.


Dimethyl Fumarate/therapeutic use , Fingolimod Hydrochloride/therapeutic use , Multiple Sclerosis, Relapsing-Remitting/prevention & control , Multiple Sclerosis/drug therapy , Natalizumab/therapeutic use , Rituximab/therapeutic use , Adult , Female , Humans , Immunosuppressive Agents/therapeutic use , Male , Middle Aged
12.
ACR Open Rheumatol ; 3(9): 593-600, 2021 Sep.
Article En | MEDLINE | ID: mdl-34296815

OBJECTIVE: Efficiently identifying eligible patients is a crucial first step for a successful clinical trial. The objective of this study was to test whether an approach using electronic health record (EHR) data and an ensemble machine learning algorithm incorporating billing codes and data from clinical notes processed by natural language processing (NLP) can improve the efficiency of eligibility screening. METHODS: We studied patients screened for a clinical trial of rheumatoid arthritis (RA) with one or more International Classification of Diseases (ICD) code for RA and age greater than 35 years, from a tertiary care center and a community hospital. The following three groups of EHR features were considered for the algorithm: 1) structured features, 2) the counts of NLP concepts from notes, 3) health care utilization. All features were linked to dates. We applied random forest and logistic regression with least absolute shrinkage and selection operator penalty against the following two standard approaches: 1) one or more RA ICD code and no ICD codes related to exclusion criteria (ScreenRAICD1 +EX ) and 2) two or more RA ICD codes (ScreenRAICD2 ). To test the portability, we trained the algorithm at one institution and tested it at the other. RESULTS: In total, 3359 patients at Brigham and Women's Hospital (BWH) and 642 patients at Faulkner Hospital (FH) were studied, with 461 (13.7%) eligible patients at BWH and 84 (13.4%) at FH. The application of the algorithm reduced ineligible patients from chart review by 40.5% at the tertiary care center and by 57.0% at the community hospital. In contrast, ScreenRAICD2 reduced patients for chart review by 2.7% to 11.3%; ScreenRAICD1+EX reduced patients for chart review by 63% to 65% but excluded 22% to 27% of eligible patients. CONCLUSION: The ensemble machine learning algorithm incorporating billing codes and NLP data increased the efficiency of eligibility screening by reducing the number of patients requiring chart review while not excluding eligible patients. Moreover, this approach can be trained at one institution and applied at another for multicenter clinical trials.

13.
Ann Clin Transl Neurol ; 8(4): 800-810, 2021 04.
Article En | MEDLINE | ID: mdl-33626237

OBJECTIVE: No relapse risk prediction tool is currently available to guide treatment selection for multiple sclerosis (MS). Leveraging electronic health record (EHR) data readily available at the point of care, we developed a clinical tool for predicting MS relapse risk. METHODS: Using data from a clinic-based research registry and linked EHR system between 2006 and 2016, we developed models predicting relapse events from the registry in a training set (n = 1435) and tested the model performance in an independent validation set of MS patients (n = 186). This iterative process identified prior 1-year relapse history as a key predictor of future relapse but ascertaining relapse history through the labor-intensive chart review is impractical. We pursued two-stage algorithm development: (1) L1 -regularized logistic regression (LASSO) to phenotype past 1-year relapse status from contemporaneous EHR data, (2) LASSO to predict future 1-year relapse risk using imputed prior 1-year relapse status and other algorithm-selected features. RESULTS: The final model, comprising age, disease duration, and imputed prior 1-year relapse history, achieved a predictive AUC and F score of 0.707 and 0.307, respectively. The performance was significantly better than the baseline model (age, sex, race/ethnicity, and disease duration) and noninferior to a model containing actual prior 1-year relapse history. The predicted risk probability declined with disease duration and age. CONCLUSION: Our novel machine-learning algorithm predicts 1-year MS relapse with accuracy comparable to other clinical prediction tools and has applicability at the point of care. This EHR-based two-stage approach of outcome prediction may have application to neurological disease beyond MS.


Electronic Health Records , Machine Learning , Multiple Sclerosis/diagnosis , Registries , Adult , Disease Progression , Female , Humans , Longitudinal Studies , Male , Middle Aged , Prognosis , Recurrence
14.
Arthritis Care Res (Hoboken) ; 73(3): 442-448, 2021 03.
Article En | MEDLINE | ID: mdl-31910317

OBJECTIVE: Identifying pseudogout in large data sets is difficult due to its episodic nature and a lack of billing codes specific to this acute subtype of calcium pyrophosphate (CPP) deposition disease. The objective of this study was to evaluate a novel machine learning approach for classifying pseudogout using electronic health record (EHR) data. METHODS: We created an EHR data mart of patients with ≥1 relevant billing code or ≥2 natural language processing (NLP) mentions of pseudogout or chondrocalcinosis, 1991-2017. We selected 900 subjects for gold standard chart review for definite pseudogout (synovitis + synovial fluid CPP crystals), probable pseudogout (synovitis + chondrocalcinosis), or not pseudogout. We applied a topic modeling approach to identify definite/probable pseudogout. A combined algorithm included topic modeling plus manually reviewed CPP crystal results. We compared algorithm performance and cohorts identified by billing codes, the presence of CPP crystals, topic modeling, and a combined algorithm. RESULTS: Among 900 subjects, 123 (13.7%) had pseudogout by chart review (68 definite, 55 probable). Billing codes had a sensitivity of 65% and a positive predictive value (PPV) of 22% for pseudogout. The presence of CPP crystals had a sensitivity of 29% and a PPV of 92%. Without using CPP crystal results, topic modeling had a sensitivity of 29% and a PPV of 79%. The combined algorithm yielded a sensitivity of 42% and a PPV of 81%. The combined algorithm identified 50% more patients than the presence of CPP crystals; the latter captured a portion of definite pseudogout and missed probable pseudogout. CONCLUSION: For pseudogout, an episodic disease with no specific billing code, combining NLP, machine learning methods, and synovial fluid laboratory results yielded an algorithm that significantly boosted the PPV compared to billing codes.


Chondrocalcinosis/diagnosis , Data Mining , Electronic Health Records , Machine Learning , Natural Language Processing , Aged , Aged, 80 and over , Chondrocalcinosis/classification , Chondrocalcinosis/drug therapy , Female , Humans , Male , Middle Aged
15.
Rheumatology (Oxford) ; 59(12): 3759-3766, 2020 12 01.
Article En | MEDLINE | ID: mdl-32413107

OBJECTIVE: The objective of this study was to compare the performance of an RA algorithm developed and trained in 2010 utilizing natural language processing and machine learning, using updated data containing ICD10, new RA treatments, and a new electronic medical records (EMR) system. METHODS: We extracted data from subjects with ≥1 RA International Classification of Diseases (ICD) codes from the EMR of two large academic centres to create a data mart. Gold standard RA cases were identified from reviewing a random 200 subjects from the data mart, and a random 100 subjects who only have RA ICD10 codes. We compared the performance of the following algorithms using the original 2010 data with updated data: (i) a published 2010 RA algorithm; (ii) updated algorithm, incorporating ICD10 RA codes and new DMARDs; and (iii) published algorithm using ICD codes only, ICD RA code ≥3. RESULTS: The gold standard RA cases had mean age 65.5 years, 78.7% female, 74.1% RF or antibodies to cyclic citrullinated peptide (anti-CCP) positive. The positive predictive value (PPV) for ≥3 RA ICD was 54%, compared with 56% in 2010. At a specificity of 95%, the PPV of the 2010 algorithm and the updated version were both 91%, compared with 94% (95% CI: 91, 96%) in 2010. In subjects with ICD10 data only, the PPV for the updated 2010 RA algorithm was 93%. CONCLUSION: The 2010 RA algorithm validated with the updated data with similar performance characteristics as the 2010 data. While the 2010 algorithm continued to perform better than the rule-based approach, the PPV of the latter also remained stable over time.


Arthritis, Rheumatoid , International Classification of Diseases , Algorithms , Electronic Health Records , Humans
...