Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
Arthritis Care Res (Hoboken) ; 75(5): 1036-1045, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-34623035

RESUMO

OBJECTIVE: In rheumatoid arthritis (RA), there are limited data on risk factors for the clinical heart failure (HF) subtypes of HF with reduced ejection fraction (HFrEF) and HF with preserved ejection fraction (HFpEF). This study examined the association between inflammation and incident HF subtypes in RA. Because inflammation changes over time with disease activity, we hypothesized that the effect of inflammation may be stronger at the 5-year follow-up than at the standard 10-year follow-up from general population studies of cardiovascular risk. METHODS: We studied an electronic health record (EHR)-based RA cohort with data pre- and post-RA incidence. We applied a validated approach to identify HF and extract ejection fraction to classify HFrEF and HFpEF. Follow-up started from the RA incidence date (index date) to the earliest occurrence of incident HF, death, last EHR encounter, or 10 years. Baseline inflammation was assessed using erythrocyte sedimentation rate or C-reactive protein values. Covariates included demographic characteristics, established HF risk factors, and RA-related factors. We tested the association between baseline inflammation with incident HF and its subtypes using Cox proportional hazards models. RESULTS: We studied 9,087 patients with RA; 8.2% developed HF during 10 years of follow-up. Elevated inflammation was associated with increased risk for HF at both 5- and 10-year follow-ups (hazard ratio [HR] 1.66, 95% confidence interval [95% CI] 1.12-2.46 and HR 1.46, 95% CI 1.13-1.90, respectively), which is also seen for HFpEF at 5 years (HR 1.72, 95% CI 1.09-2.70) and 10 years (HR 1.45, 95% CI 1.07-1.94). HFrEF was not associated with inflammation for either follow-up time. CONCLUSION: Elevated inflammation early in RA diagnosis was associated with HF; this association was driven by HFpEF and not HFrEF, suggesting a window of opportunity for prevention of HFpEF in RA.


Assuntos
Artrite Reumatoide , Insuficiência Cardíaca , Humanos , Volume Sistólico , Insuficiência Cardíaca/diagnóstico , Insuficiência Cardíaca/epidemiologia , Fatores de Risco , Inflamação , Prognóstico
2.
J Biomed Inform ; 132: 104109, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-35660521

RESUMO

OBJECTIVE: Accurately assigning phenotype information to individual patients via computational phenotyping using Electronic Health Records (EHRs) has been seen as the first step towards enabling EHRs for precision medicine research. Chart review labels annotated by clinical experts, also known as "gold standard" labels, are essential for the development and validation of computational phenotyping algorithms. However, given the complexity of EHR systems, the process of chart review is both labor intensive and time consuming. We propose a fully automated algorithm, referred to as pGUESS, to rank EHR notes according to their relevance to a given phenotype. By identifying the most relevant notes, pGUESS can greatly improve the efficiency and accuracy of chart reviews. METHOD: pGUESS uses prior guided semantic similarity to measure the informativeness of a clinical note to a given phenotype. We first select candidate clinical concepts from a pool of comprehensive medical concepts using public knowledge sources and then derive the semantic embedding vector (SEV) for a reference article (SEVref) and each note (SEVnote). The algorithm scores the relevance of a note as the cosine similarity between SEVnote and SEVref. RESULTS: The algorithm was validated against four sets of 200 notes that were manually annotated by clinical experts to assess their informativeness to one of three disease phenotypes. pGUESS algorithm substantially outperforms existing unsupervised approaches for classifying the relevance status with respect to both accuracy and scalability across phenotypes. Averaging over the three phenotypes, the rank correlation between the algorithm ranking and gold standard label was 0.64 for pGUESS, but only 0.47 and 0.35 for the next two best performing algorithms. pGUESS is also much more computationally scalable compared to existing algorithms. CONCLUSION: pGUESS algorithm can substantially reduce the burden of chart review and holds potential in improving the efficiency and accuracy of human annotation.


Assuntos
Algoritmos , Semântica , Registros Eletrônicos de Saúde , Humanos , Processamento de Linguagem Natural , Fenótipo , Medicina de Precisão
3.
NPJ Digit Med ; 4(1): 151, 2021 Oct 27.
Artigo em Inglês | MEDLINE | ID: mdl-34707226

RESUMO

The increasing availability of electronic health record (EHR) systems has created enormous potential for translational research. However, it is difficult to know all the relevant codes related to a phenotype due to the large number of codes available. Traditional data mining approaches often require the use of patient-level data, which hinders the ability to share data across institutions. In this project, we demonstrate that multi-center large-scale code embeddings can be used to efficiently identify relevant features related to a disease of interest. We constructed large-scale code embeddings for a wide range of codified concepts from EHRs from two large medical centers. We developed knowledge extraction via sparse embedding regression (KESER) for feature selection and integrative network analysis. We evaluated the quality of the code embeddings and assessed the performance of KESER in feature selection for eight diseases. Besides, we developed an integrated clinical knowledge map combining embedding data from both institutions. The features selected by KESER were comprehensive compared to lists of codified data generated by domain experts. Features identified via KESER resulted in comparable performance to those built upon features selected manually or with patient-level data. The knowledge map created using an integrative analysis identified disease-disease and disease-drug pairs more accurately compared to those identified using single institution data. Analysis of code embeddings via KESER can effectively reveal clinical knowledge and infer relatedness among codified concepts. KESER bypasses the need for patient-level data in individual analyses providing a significant advance in enabling multi-center studies using EHR data.

4.
Stat Med ; 40(18): 4035-4052, 2021 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-33915597

RESUMO

The nested case-control (NCC) design has been widely adopted as a cost-effective sampling design for biomarker research. Under the NCC design, markers are only measured for the NCC subcohort consisting of all cases and a fraction of the controls selected randomly from the matched risk sets of the cases. Robust methods for evaluating prediction performance of risk models have been derived under the inverse probability weighting framework. The probabilities of samples being included in the NCC cohort can be calculated based on the study design ``a previous study'' or estimated non-parametrically ``a previous study''. Neither strategy works well due to model mis-specification and the curse of dimensionality in practical settings where the sampling does not entirely follow the study design or depends on many factors. In this paper, we propose an alternative strategy to estimate the sampling probabilities based on a varying coefficient model, which attains a balance between robustness and the curse of dimensionality. The complex correlation structure induced by repeated finite risk set sampling makes the standard resampling procedure for variance estimation fail. We propose a perturbation resampling procedure that provides valid interval estimation for the proposed estimators. Simulation studies show that the proposed method performs well in finite samples. We apply the proposed method to the Nurses' Health Study II to develop and evaluate prediction models using clinical biomarkers for cardiovascular risk.


Assuntos
Estudos de Casos e Controles , Biomarcadores , Estudos de Coortes , Estudos Epidemiológicos , Humanos , Probabilidade
5.
Arthritis Rheumatol ; 73(6): 970-979, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-33615723

RESUMO

OBJECTIVE: Patients with rheumatoid arthritis (RA) are 1.5 times more likely to develop cardiovascular disease (CVD) attributed to chronic inflammation. A decrease in inflammation in patients with RA is associated with increased low-density lipoprotein (LDL) cholesterol. This study was undertaken to prospectively evaluate the changes in lipid levels among RA patients experiencing changes in inflammation and determine the association with concomitant temporal patterns in markers of myocardial injury. METHODS: A total of 196 patients were evaluated in a longitudinal RA cohort, with blood samples and high-sensitivity C-reactive protein (hsCRP) levels measured annually. Patients were stratified based on whether they experienced either a significant increase in inflammation (an increase in hsCRP of ≥10 mg/liter between any 2 time points 1 year apart; designated the increased inflammation cohort [n = 103]) or decrease in inflammation (a decrease in hsCRP of ≥10 mg/liter between any 2 time points 1 year apart; designated the decreased inflammation cohort [n = 93]). Routine and advanced lipids, markers of inflammation (interleukin-6, hsCRP, soluble tumor necrosis factor receptor II), and markers of subclinical myocardial injury (high-sensitivity cardiac troponin T [hs-cTnT], N-terminal pro-brain natriuretic peptide) were measured. RESULTS: Among the patients in the increased inflammation cohort, the mean age was 59 years, 81% were women, and the mean RA disease duration was 17.9 years. The average increase in hsCRP levels was 36 mg/liter, and this increase was associated with significant reductions in LDL cholesterol, triglycerides, total cholesterol, apolipoprotein (Apo B), and Apo A-I levels. In the increased inflammation cohort at baseline, 45.6% of patients (47 of 103) had detectable circulating hs-cTnT, which further increased during inflammation (P = 0.02). In the decreased inflammation cohort, hs-cTnT levels remained stable despite a reduction in inflammation over follow-up. In both cohorts, hs-cTnT levels were associated with the overall estimated risk of CVD. CONCLUSION: Among RA patients who experienced an increase in inflammation, a significant decrease in routinely measured lipids, including LDL cholesterol, and an increase in markers of subclinical myocardial injury were observed. These findings highlight the divergence in biomarkers of CVD risk and suggest a role in future studies examining the benefit of including hs-cTnT for CVD risk stratification in RA.


Assuntos
Artrite Reumatoide/metabolismo , LDL-Colesterol/metabolismo , Cardiopatias/metabolismo , Inflamação/metabolismo , Miocárdio/metabolismo , Peptídeo Natriurético Encefálico/metabolismo , Fragmentos de Peptídeos/metabolismo , Troponina T/metabolismo , Idoso , Apolipoproteína A-I/metabolismo , Apolipoproteínas B/metabolismo , Doenças Assintomáticas , Proteína C-Reativa/metabolismo , Doenças Cardiovasculares/metabolismo , Colesterol/metabolismo , Feminino , Fatores de Risco de Doenças Cardíacas , Humanos , Interleucina-6/metabolismo , Masculino , Pessoa de Meia-Idade , Estudos Prospectivos , Receptores Tipo II do Fator de Necrose Tumoral/metabolismo , Medição de Risco , Triglicerídeos/metabolismo
6.
Arthritis Care Res (Hoboken) ; 73(3): 442-448, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-31910317

RESUMO

OBJECTIVE: Identifying pseudogout in large data sets is difficult due to its episodic nature and a lack of billing codes specific to this acute subtype of calcium pyrophosphate (CPP) deposition disease. The objective of this study was to evaluate a novel machine learning approach for classifying pseudogout using electronic health record (EHR) data. METHODS: We created an EHR data mart of patients with ≥1 relevant billing code or ≥2 natural language processing (NLP) mentions of pseudogout or chondrocalcinosis, 1991-2017. We selected 900 subjects for gold standard chart review for definite pseudogout (synovitis + synovial fluid CPP crystals), probable pseudogout (synovitis + chondrocalcinosis), or not pseudogout. We applied a topic modeling approach to identify definite/probable pseudogout. A combined algorithm included topic modeling plus manually reviewed CPP crystal results. We compared algorithm performance and cohorts identified by billing codes, the presence of CPP crystals, topic modeling, and a combined algorithm. RESULTS: Among 900 subjects, 123 (13.7%) had pseudogout by chart review (68 definite, 55 probable). Billing codes had a sensitivity of 65% and a positive predictive value (PPV) of 22% for pseudogout. The presence of CPP crystals had a sensitivity of 29% and a PPV of 92%. Without using CPP crystal results, topic modeling had a sensitivity of 29% and a PPV of 79%. The combined algorithm yielded a sensitivity of 42% and a PPV of 81%. The combined algorithm identified 50% more patients than the presence of CPP crystals; the latter captured a portion of definite pseudogout and missed probable pseudogout. CONCLUSION: For pseudogout, an episodic disease with no specific billing code, combining NLP, machine learning methods, and synovial fluid laboratory results yielded an algorithm that significantly boosted the PPV compared to billing codes.


Assuntos
Condrocalcinose/diagnóstico , Mineração de Dados , Registros Eletrônicos de Saúde , Aprendizado de Máquina , Processamento de Linguagem Natural , Idoso , Idoso de 80 Anos ou mais , Condrocalcinose/classificação , Condrocalcinose/tratamento farmacológico , Feminino , Humanos , Masculino , Pessoa de Meia-Idade
7.
Dig Dis Sci ; 66(3): 760-767, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-32436120

RESUMO

INTRODUCTION: Crohn's disease (CD) and ulcerative colitis (UC) are associated with considerable direct healthcare costs. There have been few comprehensive analyses of all IBD- and non-IBD comorbidities that determine direct costs in this population. METHODS: We used data from a validated cohort of patients with inflammatory bowel disease (IBD). Total healthcare costs were estimated as a sum of costs associated with IBD-related hospitalizations and surgery, imaging (CT or MR scans), outpatient visits, endoscopic evaluation, and emergency room (ER) care. All ICD-9 codes were extracted for each patient and clustered into 1804 distinct phecode clusters representing individual phenotypes. A phenome-wide association analysis (PheWAS) was performed using logistic regression to identify predictors of being in the top decile of costs. RESULTS: Our cohort is comprised of 10,721 patients with IBD among whom 50% had CD. The median age was 46 years. The median total cost per patient is $11,203 (IQR $2396-30,563). The strongest association with total healthcare costs was intestinal obstruction without mention of hernia (p = 5.93 × 10-156) and other intestinal obstruction (p = 9.24 × 10-131). In addition, strong associations were observed for symptoms consistent with severity of IBD including the presence of fluid-electrolyte imbalance (p = 1.90 × 10-130), hypovolemia (p = 1.65 × 10-114), abdominal pain (p = 7.29 × 10-60), and anemia (p = 1.90-10-83). Cardiopulmonary diseases and psychological comorbidity also demonstrated significant associations with total costs with the latter being more strongly associated with ER visit-related costs. CONCLUSIONS: Surrogate markers suggesting possible irreversible bowel damage and active disease demonstrate the greatest influence on IBD-related healthcare costs.


Assuntos
Colite Ulcerativa/economia , Doença de Crohn/economia , Custos de Cuidados de Saúde/estatística & dados numéricos , Hospitalização/estatística & dados numéricos , Doenças Inflamatórias Intestinais/economia , Adulto , Estudos de Coortes , Efeitos Psicossociais da Doença , Feminino , Humanos , Masculino , Pessoa de Meia-Idade
8.
Arthritis Care Res (Hoboken) ; 73(2): 159-165, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-31705724

RESUMO

OBJECTIVE: Coronary microvascular dysfunction (CMD) is a predictor of cardiac death in diabetes mellitus (DM) independent of traditional cardiovascular (CV) risk factors. Rheumatoid arthritis (RA) is a chronic inflammatory condition, with excess CV risk compared to the general population, in which CMD is hypothesized to play a role; however, there are limited data on CMD in RA and any association with clinical outcomes. The objective of this study was to compare the prevalence of CMD in RA to that in DM and to test the association with all-cause mortality. METHODS: We performed a retrospective cohort study using data from a registry of all patients undergoing stress myocardial perfusion positron emission tomography as part of routine clinical care from 2006 to 2017. The inclusion criterion was a normal perfusion scan. Patients with RA or DM were classified using previously published approaches. Coronary flow reserve (CFR) was calculated for all patients in the registry and linked with mortality data. CMD was defined as CFR <2.0. RESULTS: We studied 73 patients with RA and 441 patients with DM. Among patients with a normal perfusion scan, the prevalence of CMD in RA was similar to that in DM (P = 0.2). CMD was associated with increased risk for all-cause mortality in RA (hazard ratio 2.4 [95% confidence interval 1.4-4.2]) as well as increased risk for cardiac-related death at rates similar to those in DM. CONCLUSION: These findings suggest an important role for CMD as a potential contributor to excess CV risk and mortality in RA, as previously observed in DM, as well as evidence for a mechanistic link between inflammation and cardiovascular disease.


Assuntos
Artrite Reumatoide/mortalidade , Doença da Artéria Coronariana/mortalidade , Circulação Coronária , Vasos Coronários/fisiopatologia , Diabetes Mellitus/mortalidade , Microcirculação , Idoso , Artrite Reumatoide/diagnóstico , Causas de Morte , Doença da Artéria Coronariana/diagnóstico por imagem , Doença da Artéria Coronariana/fisiopatologia , Vasos Coronários/diagnóstico por imagem , Diabetes Mellitus/diagnóstico , Feminino , Fatores de Risco de Doenças Cardíacas , Humanos , Masculino , Pessoa de Meia-Idade , Prevalência , Prognóstico , Sistema de Registros , Estudos Retrospectivos , Medição de Risco
9.
Arterioscler Thromb Vasc Biol ; 40(11): 2714-2727, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-32907368

RESUMO

OBJECTIVE: HDL (high-density lipoprotein) contains functional proteins that define single subspecies, each comprising 1% to 12% of the total HDL. We studied the differential association with coronary heart disease (CHD) of 15 such subspecies. Approach and Results: We measured plasma apoA1 (apolipoprotein A1) concentrations of 15 protein-defined HDL subspecies in 4 US-based prospective studies. Among participants without CVD at baseline, 932 developed CHD during 10 to 25 years. They were matched 1:1 to controls who did not experience CHD. In each cohort, hazard ratios for each subspecies were computed by conditional logistic regression and combined by meta-analysis. Higher levels of HDL subspecies containing alpha-2 macroglobulin, CoC3 (complement C3), HP (haptoglobin), or PLMG (plasminogen) were associated with higher relative risk compared with the HDL counterpart lacking the defining protein (hazard ratio range, 0.96-1.11 per 1 SD increase versus 0.73-0.81, respectively; P for heterogeneity <0.05). In contrast, HDL containing apoC1 or apoE were associated with lower relative risk compared with the counterpart (hazard ratio, 0.74; P=0.002 and 0.77, P=0.001, respectively). CONCLUSIONS: Several subspecies of HDL defined by single proteins that are involved in thrombosis, inflammation, immunity, and lipid metabolism are found in small fractions of total HDL and are associated with higher relative risk of CHD compared with HDL that lacks the defining protein. In contrast, HDL containing apoC1 or apoE are robustly associated with lower risk. The balance between beneficial and harmful subspecies in a person's HDL sample may determine the risk of CHD pertaining to HDL and paths to treatment.


Assuntos
Doença das Coronárias/sangue , Lipoproteínas HDL/sangue , Adulto , Negro ou Afro-Americano , Idoso , Idoso de 80 Anos ou mais , Apolipoproteína A-I/sangue , Apolipoproteínas E/sangue , Biomarcadores/sangue , Doença das Coronárias/diagnóstico , Doença das Coronárias/etnologia , Feminino , Fatores de Risco de Doenças Cardíacas , Humanos , Masculino , Pessoa de Meia-Idade , Estudos Multicêntricos como Assunto , Estudos Prospectivos , Fatores Raciais , Medição de Risco , Triglicerídeos/sangue , Estados Unidos/epidemiologia , População Branca
10.
J Am Med Inform Assoc ; 27(8): 1235-1243, 2020 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-32548637

RESUMO

OBJECTIVE: A major bottleneck hindering utilization of electronic health record data for translational research is the lack of precise phenotype labels. Chart review as well as rule-based and supervised phenotyping approaches require laborious expert input, hampering applicability to studies that require many phenotypes to be defined and labeled de novo. Though International Classification of Diseases codes are often used as surrogates for true labels in this setting, these sometimes suffer from poor specificity. We propose a fully automated topic modeling algorithm to simultaneously annotate multiple phenotypes. MATERIALS AND METHODS: Surrogate-guided ensemble latent Dirichlet allocation (sureLDA) is a label-free multidimensional phenotyping method. It first uses the PheNorm algorithm to initialize probabilities based on 2 surrogate features for each target phenotype, and then leverages these probabilities to constrain the LDA topic model to generate phenotype-specific topics. Finally, it combines phenotype-feature counts with surrogates via clustering ensemble to yield final phenotype probabilities. RESULTS: sureLDA achieves reliably high accuracy and precision across a range of simulated and real-world phenotypes. Its performance is robust to phenotype prevalence and relative informativeness of surogate vs nonsurrogate features. It also exhibits powerful feature selection properties. DISCUSSION: sureLDA combines attractive properties of PheNorm and LDA to achieve high accuracy and precision robust to diverse phenotype characteristics. It offers particular improvement for phenotypes insufficiently captured by a few surrogate features. Moreover, sureLDA's feature selection ability enables it to handle high feature dimensions and produce interpretable computational phenotypes. CONCLUSIONS: sureLDA is well suited toward large-scale electronic health record phenotyping for highly multiphenotype applications such as phenome-wide association studies .


Assuntos
Algoritmos , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Registros Eletrônicos de Saúde/classificação , Humanos , Medicina de Precisão , Curva ROC , Pesquisa Translacional Biomédica
11.
Rheumatology (Oxford) ; 59(12): 3759-3766, 2020 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-32413107

RESUMO

OBJECTIVE: The objective of this study was to compare the performance of an RA algorithm developed and trained in 2010 utilizing natural language processing and machine learning, using updated data containing ICD10, new RA treatments, and a new electronic medical records (EMR) system. METHODS: We extracted data from subjects with ≥1 RA International Classification of Diseases (ICD) codes from the EMR of two large academic centres to create a data mart. Gold standard RA cases were identified from reviewing a random 200 subjects from the data mart, and a random 100 subjects who only have RA ICD10 codes. We compared the performance of the following algorithms using the original 2010 data with updated data: (i) a published 2010 RA algorithm; (ii) updated algorithm, incorporating ICD10 RA codes and new DMARDs; and (iii) published algorithm using ICD codes only, ICD RA code ≥3. RESULTS: The gold standard RA cases had mean age 65.5 years, 78.7% female, 74.1% RF or antibodies to cyclic citrullinated peptide (anti-CCP) positive. The positive predictive value (PPV) for ≥3 RA ICD was 54%, compared with 56% in 2010. At a specificity of 95%, the PPV of the 2010 algorithm and the updated version were both 91%, compared with 94% (95% CI: 91, 96%) in 2010. In subjects with ICD10 data only, the PPV for the updated 2010 RA algorithm was 93%. CONCLUSION: The 2010 RA algorithm validated with the updated data with similar performance characteristics as the 2010 data. While the 2010 algorithm continued to perform better than the rule-based approach, the PPV of the latter also remained stable over time.


Assuntos
Artrite Reumatoide , Classificação Internacional de Doenças , Algoritmos , Registros Eletrônicos de Saúde , Humanos
12.
ACR Open Rheumatol ; 2(2): 79-83, 2020 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-32043831

RESUMO

OBJECTIVE: It is unclear if biosimilars of biologics for inflammatory arthritis are realizing their promise to increase competition and improve accessibility. This study evaluates biosimilar tumor necrosis factor inhibitor (TNFi) utilization across rheumatology practices in the United States and compares whether patients initiating biosimilars remain on these treatments at least as long as new initiators of bio-originators. METHODS: We identified a cohort of patients initiating a TNFi biosimilar between January 2017 and September 2018 from an electronic health record registry containing data from 218 rheumatology practices and over 1 million rheumatology patients in the United States. We also identified a cohort of patients who initiated the bio-originator TNFi during the same period. We calculated the proportion of biosimilar prescriptions compared with other TNFi's and compared persistence on these therapies, adjusting for age, sex, diagnoses codes, and insurance type. RESULTS: We identified 909 patients prescribed the biosimilar infliximab-dyyb, the only biosimilar prescribed, and 4413 patients with a new prescription for the bio-originator infliximab. Biosimilar patients tended to be older, have a diagnosis code for rheumatoid arthritis, and covered by Medicare insurance. Over the study period, biosimilar prescriptions reached a maximum of 3.5% of all TNFi prescriptions. Patients persisted on the biosimilar at least as long as the bio-originator infliximab (hazard ratio [HR] 0.83, P = 0.07). CONCLUSION: The uptake of biosimilars in the United States remains low despite persistence on infliximab-dyyb being similar to the infliximab bio-originator. These results add to clinical studies that should provide greater confidence to patients and physicians regarding biosimilar use.

13.
J Biomed Inform ; 100: 103322, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31672532

RESUMO

OBJECTIVE: With its increasingly widespread adoption, electronic health records (EHR) have enabled phenotypic information extraction at an unprecedented granularity and scale. However, often a medical concept (e.g. diagnosis, prescription, symptom) is described in various synonyms across different EHR systems, hindering data integration for signal enhancement and complicating dimensionality reduction for knowledge discovery. Despite existing ontologies and hierarchies, tremendous human effort is needed for curation and maintenance - a process that is both unscalable and susceptible to subjective biases. This paper aims to develop a data-driven approach to automate grouping medical terms into clinically relevant concepts by combining multiple up-to-date data sources in an unbiased manner. METHODS: We present a novel data-driven grouping approach - multi-view banded spectral clustering (mvBSC) combining summary data from multiple healthcare systems. The proposed method consists of a banding step that leverages the prior knowledge from the existing coding hierarchy, and a combining step that performs spectral clustering on an optimally weighted matrix. RESULTS: We apply the proposed method to group ICD-9 and ICD-10-CM codes together by integrating data from two healthcare systems. We show grouping results and hierarchies for 13 representative disease categories. Individual grouping qualities were evaluated using normalized mutual information, adjusted Rand index, and F1-measure, and were found to consistently exhibit great similarity to the existing manual grouping counterpart. The resulting ICD groupings also enjoy comparable interpretability and are well aligned with the current ICD hierarchy. CONCLUSION: The proposed approach, by systematically leveraging multiple data sources, is able to overcome bias while maximizing consensus to achieve generalizability. It has the advantage of being efficient, scalable, and adaptive to the evolving human knowledge reflected in the data, showing a significant step toward automating medical knowledge integration.


Assuntos
Registros Eletrônicos de Saúde , Classificação Internacional de Doenças , Algoritmos , Automação , Análise por Conglomerados , Humanos
14.
Arterioscler Thromb Vasc Biol ; 38(12): 2827-2842, 2018 12.
Artigo em Inglês | MEDLINE | ID: mdl-30571168

RESUMO

Objective- HDL (high-density lipoprotein) in plasma is a heterogeneous group of lipoproteins typically containing apo AI as the principal protein. Most HDLs contain additional proteins from a palate of nearly 100 HDL-associated polypeptides. We hypothesized that some of these proteins define distinct and stable apo AI HDL subspecies with unique proteomes that drive function and associations with disease. Approach and Results- We produced 17 plasma pools from 80 normolipidemic human participants (32 men, 48 women; aged 21-66 years). Using immunoaffinity isolation techniques, we isolated apo AI containing species from plasma and then used antibodies to 16 additional HDL protein components to isolate compositional subspecies. We characterized previously described HDL subspecies containing apo AII, apo CIII, and apo E; and 13 novel HDL subspecies defined by presence of apo AIV, apo CI, apo CII, apo J, α-1-antitrypsin, α-2-macroglobulin, plasminogen, fibrinogen, ceruloplasmin, haptoglobin, paraoxonase-1, apo LI, or complement C3. The novel species ranged in abundance from 1% to 18% of total plasma apo AI. Their concentrations were stable over time as demonstrated by intraclass correlations in repeated sampling from the same participants over 3 to 24 months (0.33-0.86; mean 0.62). Some proteomes of the subspecies relative to total HDL were strongly correlated, often among subspecies defined by similar functions: lipid metabolism, hemostasis, antioxidant, or anti-inflammatory. Permutation analysis showed that the proteomes of 12 of the 16 subspecies differed significantly from that of total HDL. Conclusions- Taken together, correlation and permutation analyses support speciation of HDL. Functional studies of these novel subspecies and determination of their relation to diseases may provide new avenues to understand the HDL system of lipoproteins.


Assuntos
Apolipoproteína A-I/sangue , Lipoproteínas HDL/sangue , Proteômica/métodos , Adulto , Idoso , Antioxidantes/metabolismo , Ensaio de Imunoadsorção Enzimática , Feminino , Hemostasia , Humanos , Inflamação/sangue , Inflamação/prevenção & controle , Metabolismo dos Lipídeos , Masculino , Pessoa de Meia-Idade , Ligação Proteica , Estabilidade Proteica , Fatores de Tempo , Adulto Jovem
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...