Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
1.
J Biomed Inform ; 156: 104671, 2024 Jun 12.
Artigo em Inglês | MEDLINE | ID: mdl-38876452

RESUMO

Electronic phenotyping is a fundamental task that identifies the special group of patients, which plays an important role in precision medicine in the era of digital health. Phenotyping provides real-world evidence for other related biomedical research and clinical tasks, e.g., disease diagnosis, drug development, and clinical trials, etc. With the development of electronic health records, the performance of electronic phenotyping has been significantly boosted by advanced machine learning techniques. In the healthcare domain, precision and fairness are both essential aspects that should be taken into consideration. However, most related efforts are put into designing phenotyping models with higher accuracy. Few attention is put on the fairness perspective of phenotyping. The neglection of bias in phenotyping leads to subgroups of patients being underrepresented which will further affect the following healthcare activities such as patient recruitment in clinical trials. In this work, we are motivated to bridge this gap through a comprehensive experimental study to identify the bias existing in electronic phenotyping models and evaluate the widely-used debiasing methods' performance on these models. We choose pneumonia and sepsis as our phenotyping target diseases. We benchmark 9 kinds of electronic phenotyping methods spanning from rule-based to data-driven methods. Meanwhile, we evaluate the performance of the 5 bias mitigation strategies covering pre-processing, in-processing, and post-processing. Through the extensive experiments, we summarize several insightful findings from the bias identified in the phenotyping and key points of the bias mitigation strategies in phenotyping.

2.
medRxiv ; 2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38826460

RESUMO

Objective: Long COVID, marked by persistent, recurring, or new symptoms post-COVID-19 infection, impacts children's well-being yet lacks a unified clinical definition. This study evaluates the performance of an empirically derived Long COVID case identification algorithm, or computable phenotype, with manual chart review in a pediatric sample. This approach aims to facilitate large-scale research efforts to understand this condition better. Methods: The algorithm, composed of diagnostic codes empirically associated with Long COVID, was applied to a cohort of pediatric patients with SARS-CoV-2 infection in the RECOVER PCORnet EHR database. The algorithm classified 31,781 patients with conclusive, probable, or possible Long COVID and 307,686 patients without evidence of Long COVID. A chart review was performed on a subset of patients (n=651) to determine the overlap between the two methods. Instances of discordance were reviewed to understand the reasons for differences. Results: The sample comprised 651 pediatric patients (339 females, M age = 10.10 years) across 16 hospital systems. Results showed moderate overlap between phenotype and chart review Long COVID identification (accuracy = 0.62, PPV = 0.49, NPV = 0.75); however, there were also numerous cases of disagreement. No notable differences were found when the analyses were stratified by age at infection or era of infection. Further examination of the discordant cases revealed that the most common cause of disagreement was the clinician reviewers' tendency to attribute Long COVID-like symptoms to prior medical conditions. The performance of the phenotype improved when prior medical conditions were considered (accuracy = 0.71, PPV = 0.65, NPV = 0.74). Conclusions: Although there was moderate overlap between the two methods, the discrepancies between the two sources are likely attributed to the lack of consensus on a Long COVID clinical definition. It is essential to consider the strengths and limitations of each method when developing Long COVID classification algorithms.

3.
J Viral Hepat ; 30(9): 765-774, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-37309273

RESUMO

The World Health Organization (WHO) aims to reduce HCV mortality, but estimates are difficult to obtain. We aimed to identify electronic health records of individuals with HCV infection, and assess mortality and morbidity. We applied electronic phenotyping strategies on routinely collected data from patients hospitalized at a tertiary referral hospital in Switzerland between 2009 and 2017. Individuals with HCV infection were identified using International Classification of Disease (ICD)-10 codes, prescribed medications and laboratory results (antibody, PCR, antigen or genotype test). Controls were selected using propensity score methods (matching by age, sex, intravenous drug use, alcohol abuse and HIV co-infection). Main outcomes were in-hospital mortality and attributable mortality (in HCV cases and study population). The non-matched dataset included records from 165,972 individuals (287,255 hospital stays). Electronic phenotyping identified 2285 stays with evidence of HCV infection (1677 individuals). Propensity score matching yielded 6855 stays (2285 with HCV, 4570 controls). In-hospital mortality was higher in HCV cases (RR 2.10, 95%CI 1.64 to 2.70). Among those infected, 52.5% of the deaths were attributable to HCV (95%CI 38.9 to 63.1). When cases were matched, the fraction of deaths attributable to HCV was 26.9% (HCV prevalence: 33%), whilst in the non-matched dataset, it was 0.92% (HCV prevalence: 0.8%). In this study, HCV infection was strongly associated with increased mortality. Our methodology may be used to monitor the efforts towards meeting the WHO elimination targets and underline the importance of electronic cohorts as a basis for national longitudinal surveillance.


Assuntos
Infecções por HIV , Hepatite C , Humanos , Adulto , Hepacivirus , Pontuação de Propensão , Infecções por HIV/complicações , Morbidade , Prevalência
4.
J Am Med Inform Assoc ; 30(2): 213-221, 2023 01 18.
Artigo em Inglês | MEDLINE | ID: mdl-36069977

RESUMO

BACKGROUND: Electronic (e)-phenotype specification by noninformaticist investigators remains a challenge. Although validation of each patient returned by e-phenotype could ensure accuracy of cohort representation, this approach is not practical. Understanding the factors leading to successful e-phenotype specification may reveal generalizable strategies leading to better results. MATERIALS AND METHODS: Noninformaticist experts (n = 21) were recruited to produce expert-mediated e-phenotypes using i2b2 assisted by a honest data-broker and a project coordinator. Patient- and visit-sets were reidentified and a random sample of 20 charts matching each e-phenotype was returned to experts for chart-validation. Attributes of the queries and expert characteristics were captured and related to chart-validation rates using generalized linear regression models. RESULTS: E-phenotype validation rates varied according to experts' domains and query characteristics (mean = 61%, range 20-100%). Clinical domains that performed better included infectious, rheumatic, neonatal, and cancers, whereas other domains performed worse (psychiatric, GI, skin, and pulmonary). Match-rate was negatively impacted when specification of temporal constraints was required. In general, the increase in e-phenotype specificity contributed positively to match-rate. DISCUSSIONS AND CONCLUSIONS: Clinical experts and informaticists experience a variety of challenges when building e-phenotypes, including the inability to differentiate clinical events from patient characteristics or appropriately configure temporal constraints; a lack of access to available and quality data; and difficulty in specifying routes of medication administration. Biomedical query mediation by informaticists and honest data-brokers in designing e-phenotypes cannot be overstated. Although tools such as i2b2 may be widely available to noninformaticists, successful utilization depends not on users' confidence, but rather on creating highly specific e-phenotypes.


Assuntos
Processos Mentais , Projetos de Pesquisa , Fenótipo , Registros Eletrônicos de Saúde
5.
BMC Med Ethics ; 23(1): 112, 2022 11 16.
Artigo em Inglês | MEDLINE | ID: mdl-36384545

RESUMO

BACKGROUND: As the use of AI becomes more pervasive, and computerised systems are used in clinical decision-making, the role of trust in, and the trustworthiness of, AI tools will need to be addressed. Using the case of computational phenotyping to support the diagnosis of rare disease in dysmorphology, this paper explores under what conditions we could place trust in medical AI tools, which employ machine learning. METHODS: Semi-structured qualitative interviews (n = 20) with stakeholders (clinical geneticists, data scientists, bioinformaticians, industry and patient support group spokespersons) who design and/or work with computational phenotyping (CP) systems. The method of constant comparison was used to analyse the interview data. RESULTS: Interviewees emphasized the importance of establishing trust in the use of CP technology in identifying rare diseases. Trust was formulated in two interrelated ways in these data. First, interviewees talked about the importance of using CP tools within the context of a trust relationship; arguing that patients will need to trust clinicians who use AI tools and that clinicians will need to trust AI developers, if they are to adopt this technology. Second, they described a need to establish trust in the technology itself, or in the knowledge it provides-epistemic trust. Interviewees suggested CP tools used for the diagnosis of rare diseases might be perceived as more trustworthy if the user is able to vouchsafe for the technology's reliability and accuracy and the person using/developing them is trusted. CONCLUSION: This study suggests we need to take deliberate and meticulous steps to design reliable or confidence-worthy AI systems for use in healthcare. In addition, we need to devise reliable or confidence-worthy processes that would give rise to reliable systems; these could take the form of RCTs and/or systems of accountability transparency and responsibility that would signify the epistemic trustworthiness of these tools. words 294.


Assuntos
Doenças Raras , Confiança , Humanos , Doenças Raras/diagnóstico , Reprodutibilidade dos Testes , Aprendizado de Máquina , Algoritmos
6.
Front Public Health ; 10: 815674, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35677768

RESUMO

The impact of the COVID-19 pandemic involved the disruption of the processes of care and the need for immediately effective re-organizational procedures. In the context of digital health, it is of paramount importance to determine how a specific patients' population reflects into the healthcare dynamics of the hospital, to investigate how patients' sub-group/strata respond to the different care processes, in order to generate novel hypotheses regarding the most effective healthcare strategies. We present an analysis pipeline based on the heterogeneous collected data aimed at identifying the most frequent healthcare processes patterns, jointly analyzing them with demographic and physiological disease trajectories, and stratify the observed cohort on the basis of the mined patterns. This is a process-oriented pipeline which integrates process mining algorithms, and trajectory mining by topological data analyses and pseudo time approaches. Data was collected for 1,179 COVID-19 positive patients, hospitalized at the Italian Hospital "Istituti Clinici Salvatore Maugeri" in Lombardy, integrating different sources including text admission letters, EHR and hospital infrastructure data. We identified five temporal phenotypes, from laboratory values trajectories, which are characterized by statistically significant different death risk estimates. The process mining algorithms allowed splitting the data in sub-cohorts as function of the pandemic waves and of the temporal trajectories showing statistically significant differences in terms of events characteristics.


Assuntos
COVID-19 , Registros Eletrônicos de Saúde , Algoritmos , COVID-19/epidemiologia , Humanos , Pandemias , Fenótipo
7.
Stud Health Technol Inform ; 294: 271-272, 2022 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-35612071

RESUMO

Electronic phenotyping is an important method to identify a disease group by collecting clinical data from hospital information systems. This study aimed to extract accurate cases of supraventricular arrythmia, ventricular arrythmia, and bradycardia from clinical data of a hospital information system. The electronic phenotyping algorithm was improved using the machine learning method. Subsequently, it showed a higher area under the curve for prediction and higher specificity. However, the algorithm needs further improvement to classify each arrythmia disease accurately. In conclusion, phenotyping using clinical data from hospital information systems has some affinities and issues depending on the disease.


Assuntos
Registros Eletrônicos de Saúde , Sistemas de Informação Hospitalar , Algoritmos , Arritmias Cardíacas , Eletrônica , Humanos , Aprendizado de Máquina
8.
J Biomed Inform ; 121: 103879, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34329789

RESUMO

PURPOSE: Standardized approaches for rigorous validation of phenotyping from large-scale electronic health record (EHR) data have not been widely reported. We proposed a methodologically rigorous and efficient approach to guide such validation, including strategies for sampling cases and controls, determining sample sizes, estimating algorithm performance, and terminating the validation process, hereafter referred to as the San Diego Approach to Variable Validation (SDAVV). METHODS: We propose sample size formulae which should be used prior to chart review, based on pre-specified critical lower bounds for positive predictive value (PPV) and negative predictive value (NPV). We also propose a stepwise strategy for iterative algorithm development/validation cycles, updating sample sizes for data abstraction until both PPV and NPV achieve target performance. RESULTS: We applied the SDAVV to a Department of Veterans Affairs study in which we created two phenotyping algorithms, one for distinguishing normal colonoscopy cases from abnormal colonoscopy controls and one for identifying aspirin exposure. Estimated PPV and NPV both reached 0.970 with a 95% confidence lower bound of 0.915, estimated sensitivity was 0.963 and specificity was 0.975 for identifying normal colonoscopy cases. The phenotyping algorithm for identifying aspirin exposure reached a PPV of 0.990 (a 95% lower bound of 0.950), an NPV of 0.980 (a 95% lower bound of 0.930), and sensitivity and specificity were 0.960 and 1.000. CONCLUSIONS: A structured approach for prospectively developing and validating phenotyping algorithms from large-scale EHR data can be successfully implemented, and should be considered to improve the quality of "big data" research.


Assuntos
Algoritmos , Registros Eletrônicos de Saúde , Big Data , Valor Preditivo dos Testes , Sensibilidade e Especificidade
9.
Stud Health Technol Inform ; 281: 148-152, 2021 May 27.
Artigo em Inglês | MEDLINE | ID: mdl-34042723

RESUMO

2,719 distinctive phenotyping variables from 176 electronic phenotypes were compared with 57,150 distinctive clinical trial eligibility criteria concepts to assess the phenotype knowledge overlap between them. We observed a high percentage (69.5%) of eMERGE phenotype features and a lower percentage (47.6%) of OHDSI phenotype features matched to clinical trial eligibility criteria, possibly due to the relative emphasis on specificity for eMERGE phenotypes and the relative emphasis on sensitivity for OHDSI phenotypes. The study results show the potential of reusing clinical trial eligibility criteria for phenotyping feature selection and moderate benefits of using them for local cohort query implementation.


Assuntos
Algoritmos , Registros Eletrônicos de Saúde , Eletrônica , Fenótipo
10.
Stud Health Technol Inform ; 281: 243-247, 2021 May 27.
Artigo em Inglês | MEDLINE | ID: mdl-34042742

RESUMO

Heart failure (HF) is a grave problem in the clinical and public health sectors. The aim of this study is to develop a phenotyping algorithm to identify patients with HF by using the medical information database network (MID-NET) in Japan. METHODS: From April 1 to December 31, 2013, clinical data of patients with HF were obtained from MID-NET. A phenotyping algorithm was developed with machine learning by using disease names, examinations, and medications. Two doctors validated the cases by manually reviewing the medical records according to the Japanese HF guidelines. The algorithm was also validated with different cohorts from an inpatient database of the Department of Cardiovascular Medicine at Tohoku University Hospital. RESULTS: The algorithm, which initially had low precision, was improved by incorporating the value of B-type natriuretic peptide and the combination of medications related to HF. Finally, the algorithm on a different cohort was verified with higher precision (35.0% → 87.8%). CONCLUSIONS: Proper algorithms can be used to identify patients with HF.


Assuntos
Registros Eletrônicos de Saúde , Insuficiência Cardíaca , Algoritmos , Eletrônica , Insuficiência Cardíaca/diagnóstico , Insuficiência Cardíaca/epidemiologia , Humanos , Japão/epidemiologia , Peptídeo Natriurético Encefálico
11.
J Am Med Inform Assoc ; 28(7): 1507-1517, 2021 07 14.
Artigo em Inglês | MEDLINE | ID: mdl-33712852

RESUMO

OBJECTIVE: Claims-based algorithms are used in the Food and Drug Administration Sentinel Active Risk Identification and Analysis System to identify occurrences of health outcomes of interest (HOIs) for medical product safety assessment. This project aimed to apply machine learning classification techniques to demonstrate the feasibility of developing a claims-based algorithm to predict an HOI in structured electronic health record (EHR) data. MATERIALS AND METHODS: We used the 2015-2019 IBM MarketScan Explorys Claims-EMR Data Set, linking administrative claims and EHR data at the patient level. We focused on a single HOI, rhabdomyolysis, defined by EHR laboratory test results. Using claims-based predictors, we applied machine learning techniques to predict the HOI: logistic regression, LASSO (least absolute shrinkage and selection operator), random forests, support vector machines, artificial neural nets, and an ensemble method (Super Learner). RESULTS: The study cohort included 32 956 patients and 39 499 encounters. Model performance (positive predictive value [PPV], sensitivity, specificity, area under the receiver-operating characteristic curve) varied considerably across techniques. The area under the receiver-operating characteristic curve exceeded 0.80 in most model variations. DISCUSSION: For the main Food and Drug Administration use case of assessing risk of rhabdomyolysis after drug use, a model with a high PPV is typically preferred. The Super Learner ensemble model without adjustment for class imbalance achieved a PPV of 75.6%, substantially better than a previously used human expert-developed model (PPV = 44.0%). CONCLUSIONS: It is feasible to use machine learning methods to predict an EHR-derived HOI with claims-based predictors. Modeling strategies can be adapted for intended uses, including surveillance, identification of cases for chart review, and outcomes research.


Assuntos
Registros Eletrônicos de Saúde , Aprendizado de Máquina , Eletrônica , Humanos , Avaliação de Resultados em Cuidados de Saúde , Projetos Piloto
12.
Artif Intell Med ; 108: 101930, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32972659

RESUMO

Temporal phenotyping enables clinicians to better understand observable characteristics of a disease as it progresses. Modelling disease progression that captures interactions between phenotypes is inherently challenging. Temporal models that capture change in disease over time can identify the key features that characterize disease subtypes that underpin these trajectories. These models will enable clinicians to identify early warning signs of progression in specific sub-types and therefore to make informed decisions tailored to individual patients. In this paper, we explore two approaches to building temporal phenotypes based on the topology of data: topological data analysis and pseudo time-series. Using type 2 diabetes data, we show that the topological data analysis approach is able to identify disease trajectories and that pseudo time-series can infer a state space model characterized by transitions between hidden states that represent distinct temporal phenotypes. Both approaches highlight lipid profiles as key factors in distinguishing the phenotypes.


Assuntos
Diabetes Mellitus Tipo 2 , Registros Eletrônicos de Saúde , Análise de Dados , Humanos , Fenótipo
13.
Stud Health Technol Inform ; 270: 838-842, 2020 Jun 16.
Artigo em Inglês | MEDLINE | ID: mdl-32570500

RESUMO

Despite recommendations for the routine HIV testing of all sexually active individuals, a significant percentage of HIV-positive adults are unaware of their HIV status. Therefore, a number of strategies have been implemented to expand HIV testing, which in turn makes it necessary to develop tools for identifying patients with unknown HIV status. This study presents the results of an external validation of an electronic phenotyping algorithm for identifying HIV status and its application on a retrospective cohort in order to explore temporal trends of HIV knowledge status and associated factors.


Assuntos
Registros Eletrônicos de Saúde , Infecções por HIV , Algoritmos , Humanos , Programas de Rastreamento , Estudos Retrospectivos
14.
Artif Intell Med ; 105: 101855, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32505422

RESUMO

In this work we describe the application of a careflow mining algorithm to detect the most frequent patterns of care in a cohort of 3000 breast cancer patients. The applied method relies on longitudinal data extracted from electronic health records, recorded from the first surgical procedure after a breast cancer diagnosis. Careflows are mined from events data recorded for administrative purposes, including procedures from ICD9 - CM billing codes and chemotherapy treatments. Events data have been pre-processed with Topic Modelling to create composite events based on concurrent procedures. The results of the careflow mining algorithm allow the discovery of electronic temporal phenotypes across the studied population. These phenotypes are further characterized on the basis of clinical traits and tumour histopathology, as well as in terms of relapses, metastasis occurrence and 5-year survival rates. Results are highly significant from a clinical perspective, since phenotypes describe well characterized pathology classes, and the careflows are well matched with existing clinical guidelines. The analysis thus facilitates deriving real-world evidence that can inform clinicians as well as hospital decision makers.


Assuntos
Neoplasias da Mama , Algoritmos , Neoplasias da Mama/cirurgia , Mineração de Dados , Registros Eletrônicos de Saúde , Feminino , Humanos , Recidiva Local de Neoplasia/epidemiologia
15.
J Am Med Inform Assoc ; 27(6): 877-883, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32374408

RESUMO

OBJECTIVE: Accurate electronic phenotyping is essential to support collaborative observational research. Supervised machine learning methods can be used to train phenotype classifiers in a high-throughput manner using imperfectly labeled data. We developed 10 phenotype classifiers using this approach and evaluated performance across multiple sites within the Observational Health Data Sciences and Informatics (OHDSI) network. MATERIALS AND METHODS: We constructed classifiers using the Automated PHenotype Routine for Observational Definition, Identification, Training and Evaluation (APHRODITE) R-package, an open-source framework for learning phenotype classifiers using datasets in the Observational Medical Outcomes Partnership Common Data Model. We labeled training data based on the presence of multiple mentions of disease-specific codes. Performance was evaluated on cohorts derived using rule-based definitions and real-world disease prevalence. Classifiers were developed and evaluated across 3 medical centers, including 1 international site. RESULTS: Compared to the multiple mentions labeling heuristic, classifiers showed a mean recall boost of 0.43 with a mean precision loss of 0.17. Performance decreased slightly when classifiers were shared across medical centers, with mean recall and precision decreasing by 0.08 and 0.01, respectively, at a site within the USA, and by 0.18 and 0.10, respectively, at an international site. DISCUSSION AND CONCLUSION: We demonstrate a high-throughput pipeline for constructing and sharing phenotype classifiers across sites within the OHDSI network using APHRODITE. Classifiers exhibit good portability between sites within the USA, however limited portability internationally, indicating that classifier generalizability may have geographic limitations, and, consequently, sharing the classifier-building recipe, rather than the pretrained classifiers, may be more useful for facilitating collaborative observational research.


Assuntos
Registros Eletrônicos de Saúde/classificação , Informática Médica , Aprendizado de Máquina Supervisionado , Classificação/métodos , Ciência de Dados , Humanos , Estudos Observacionais como Assunto
16.
Methodist Debakey Cardiovasc J ; 16(4): 296-303, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33500758

RESUMO

The wide gap between the development of new healthcare technologies and their integration into clinical practice argues for a deeper understanding of how effective quality improvement can be designed to meet the needs of patients and their clinical teams. The COVID-19 pandemic has forced us to address this gap and create long-term strategies to bridge it. On the one hand, it has enabled the rapid implementation of telehealth. On the other hand, it has raised important questions about our preparedness to adopt and employ new digital tools as part of a new process of care. While healthcare organizations are seeking to improve the quality of care by integrating innovations in digital health, they must also address key issues such as patient experience, develop clinical decision support systems that analyze digital health data trends, and create efficient clinical workflows. Given the breadth of such requirements, embracing new technologies as a core competency of a modern healthcare system introduces a host of questions, such as "How best do patients participate in digital health programs that promote behavioral changes and mitigate risk?" and "What type of data analytics are required that enable a deeper understanding of disease phenotypes and corresponding treatment decisions?" This review presents the challenges in implementing digital health technology and discusses how patient-centered digital health programs are designed within real-world models of remote monitoring. It also provides a framework for developing new devices and wearables for the next generation of data-driven, technology-enabled cardiovascular care.


Assuntos
COVID-19/epidemiologia , Doenças Cardiovasculares/terapia , Pandemias , Telemedicina/tendências , Doenças Cardiovasculares/epidemiologia , Comorbidade , Humanos , SARS-CoV-2
17.
Curr Protoc Hum Genet ; 100(1): e80, 2019 01.
Artigo em Inglês | MEDLINE | ID: mdl-30516347

RESUMO

Electronic health records contain patient-level data collected during and for clinical care. Data within the electronic health record include diagnostic billing codes, procedure codes, vital signs, laboratory test results, clinical imaging, and physician notes. With repeated clinic visits, these data are longitudinal, providing important information on disease development, progression, and response to treatment or intervention strategies. The near universal adoption of electronic health records nationally has the potential to provide population-scale real-world clinical data accessible for biomedical research, including genetic association studies. For this research potential to be realized, high-quality research-grade variables must be extracted from these clinical data warehouses. We describe here common and emerging electronic phenotyping approaches applied to electronic health records, as well as current limitations of both the approaches and the biases associated with these clinically collected data that impact their use in research. © 2018 by John Wiley & Sons, Inc.


Assuntos
Mineração de Dados , Registros Eletrônicos de Saúde , Algoritmos , Humanos , Fenótipo
18.
Annu Rev Biomed Data Sci ; 1: 53-68, 2018 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-31218278

RESUMO

With the widespread adoption of electronic health records (EHRs), large repositories of structured and unstructured patient data are becoming available to conduct observational studies. Finding patients with specific conditions or outcomes, known as phenotyping, is one of the most fundamental research problems encountered when using these new EHR data. Phenotyping forms the basis of translational research, comparative effectiveness studies, clinical decision support, and population health analyses using routinely collected EHR data. We review the evolution of electronic phenotyping, from the early rule-based methods to the cutting edge of supervised and unsupervised machine learning models. We aim to cover the most influential papers in commensurate detail, with a focus on both methodology and implementation. Finally, future research directions are explored.

19.
Comput Methods Programs Biomed ; 152: 53-70, 2017 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-29054261

RESUMO

BACKGROUND AND OBJECTIVE: Recent progression towards precision medicine has encouraged the use of electronic health records (EHRs) as a source for large amounts of data, which is required for studying the effect of treatments or risk factors in more specific subpopulations. Phenotyping algorithms allow to automatically classify patients according to their particular electronic phenotype thus facilitating the setup of retrospective cohorts. Our objective is to compare the performance of different classification strategies (only using standardized problems, rule-based algorithms, statistical learning algorithms (six learners) and stacked generalization (five versions)), for the categorization of patients according to their diabetic status (diabetics, not diabetics and inconclusive; Diabetes of any type) using information extracted from EHRs. METHODS: Patient information was extracted from the EHR at Hospital Italiano de Buenos Aires, Buenos Aires, Argentina. For the derivation and validation datasets, two probabilistic samples of patients from different years (2005: n = 1663; 2015: n = 800) were extracted. The only inclusion criterion was age (≥40 & <80 years). Four researchers manually reviewed all records and classified patients according to their diabetic status (diabetic: diabetes registered as a health problem or fulfilling the ADA criteria; non-diabetic: not fulfilling the ADA criteria and having at least one fasting glycemia below 126 mg/dL; inconclusive: no data regarding their diabetic status or only one abnormal value). The best performing algorithms within each strategy were tested on the validation set. RESULTS: The standardized codes algorithm achieved a Kappa coefficient value of 0.59 (95% CI 0.49, 0.59) in the validation set. The Boolean logic algorithm reached 0.82 (95% CI 0.76, 0.88). A slightly higher value was achieved by the Feedforward Neural Network (0.9, 95% CI 0.85, 0.94). The best performing learner was the stacked generalization meta-learner that reached a Kappa coefficient value of 0.95 (95% CI 0.91, 0.98). CONCLUSIONS: The stacked generalization strategy and the feedforward neural network showed the best classification metrics in the validation set. The implementation of these algorithms enables the exploitation of the data of thousands of patients accurately.


Assuntos
Algoritmos , Diabetes Mellitus/classificação , Registros Eletrônicos de Saúde , Fenótipo , Adulto , Idoso , Argentina , Humanos , Pessoa de Meia-Idade
20.
BMC Res Notes ; 10(1): 281, 2017 Jul 14.
Artigo em Inglês | MEDLINE | ID: mdl-28705240

RESUMO

BACKGROUND: The implementation of electronic medical records (EMR) is becoming increasingly common. Error and data loss reduction, patient-care efficiency increase, decision-making assistance and facilitation of event surveillance, are some of the many processes that EMRs help improve. In addition, they show a lot of promise in terms of data collection to facilitate observational epidemiological studies and their use for this purpose has increased significantly over the recent years. Even though the quantity and availability of the data are clearly improved thanks to EMRs, still, the problem of the quality of the data remains. This is especially important when attempting to determine if an event has actually occurred or not. We sought to assess the sensitivity, specificity, and agreement level of a codes-based algorithm for the detection of clinically relevant cardiovascular (CaVD) and cerebrovascular (CeVD) disease cases, using data from EMRs. METHODS: Three family physicians from the research group selected clinically relevant CaVD and CeVD terms from the international classification of primary care, Second Edition (ICPC-2), the ICD 10 version 2015 and SNOMED-CT 2015 Edition. These terms included both signs, symptoms, diagnoses and procedures associated with CaVD and CeVD. Terms not related to symptoms, signs, diagnoses or procedures of CaVD or CeVD and also those describing incidental findings without clinical relevance were excluded. The algorithm yielded a positive result if the patient had at least one of the selected terms in their medical records, as long as it was not recorded as an error. Else, if no terms were found, the patient was classified as negative. This algorithm was applied to a randomly selected sample of the active patients within the hospital's HMO by 1/1/2005 that were 40-79 years old, had at least one year of seniority in the HMO and at least one clinical encounter. Thus, patients were classified into four groups: (1) Negative patients (2) Patients with CaVD but without CeVD; (3) Patients with CeVD but without disease CaVD; (4) Patients with both diseases. To facilitate the validation process, a stratified sample was taken so that each of the groups represented approximately 25% of the sample. Manual chart review was used as the gold standard for assessing the algorithm's performance. One-third of the patients were assigned randomly to each reviewer (Cohen's kappa 0.91). Both coded and un-coded (free text) sections of the EMR were reviewed. This was done from the first present clinical note in the patients chart to the last one registered prior to 1/1/2005. RESULTS: The performance of the algorithm was compared against manual chart review. It yielded high sensitivity (0.99, 95% CI 0.938-0.9971) and acceptable specificity (0.86, 95% CI 0.818-0.895) for detecting cases of CaVD and CeVD combined. A qualitative analysis of the false positives and false negatives was performed. CONCLUSIONS: We developed a simple algorithm, using only standardized and non-standardized coded terms within an EMR that can properly detect clinically relevant events and symptoms of CaVD and CeVD. We believe that combining it with an analysis of the free text using an NLP approach would yield even better results.


Assuntos
Algoritmos , Doenças Cardiovasculares/diagnóstico , Registros Eletrônicos de Saúde , Adulto , Idoso , Humanos , Pessoa de Meia-Idade , Fenótipo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA