Pesquisa | Biblioteca Virtual em Saúde

1.

Large language models for biomedicine: foundations, opportunities, challenges, and best practices.

Sahoo, Satya S; Plasek, Joseph M; Xu, Hua; Uzuner, Özlem; Cohen, Trevor; Yetisgen, Meliha; Liu, Hongfang; Meystre, Stéphane; Wang, Yanshan.

J Am Med Inform Assoc ; 2024 Apr 24.

Artigo em Inglês | MEDLINE | ID: mdl-38657567

RESUMO

OBJECTIVES: Generative large language models (LLMs) are a subset of transformers-based neural network architecture models. LLMs have successfully leveraged a combination of an increased number of parameters, improvements in computational efficiency, and large pre-training datasets to perform a wide spectrum of natural language processing (NLP) tasks. Using a few examples (few-shot) or no examples (zero-shot) for prompt-tuning has enabled LLMs to achieve state-of-the-art performance in a broad range of NLP applications. This article by the American Medical Informatics Association (AMIA) NLP Working Group characterizes the opportunities, challenges, and best practices for our community to leverage and advance the integration of LLMs in downstream NLP applications effectively. This can be accomplished through a variety of approaches, including augmented prompting, instruction prompt tuning, and reinforcement learning from human feedback (RLHF). TARGET AUDIENCE: Our focus is on making LLMs accessible to the broader biomedical informatics community, including clinicians and researchers who may be unfamiliar with NLP. Additionally, NLP practitioners may gain insight from the described best practices. SCOPE: We focus on 3 broad categories of NLP tasks, namely natural language understanding, natural language inferencing, and natural language generation. We review the emerging trends in prompt tuning, instruction fine-tuning, and evaluation metrics used for LLMs while drawing attention to several issues that impact biomedical NLP applications, including falsehoods in generated text (confabulation/hallucinations), toxicity, and dataset contamination leading to overfitting. We also review potential approaches to address some of these current challenges in LLMs, such as chain of thought prompting, and the phenomena of emergent capabilities observed in LLMs that can be leveraged to address complex NLP challenge in biomedical applications.

2.

High Accuracy Open-Source Clinical Data De-Identification: The CliniDeID Solution.

Meystre, Stéphane; Heider, Paul.

Stud Health Technol Inform ; 310: 1370-1371, 2024 Jan 25.

Artigo em Inglês | MEDLINE | ID: mdl-38270048

RESUMO

Clinical data de-identification offers patient data privacy protection and eases reuse of clinical data. As an open-source solution to de-identify unstructured clinical text with high accuracy, CliniDeID applies an ensemble method combining deep and shallow machine learning with rule-based algorithms. It reached high recall and precision when recently evaluated with a selection of clinical text corpora.

Assuntos

Algoritmos , Aprendizado de Máquina , Humanos

3.

AI-Based Gut-Brain Axis Digital Twins.

Meystre, Stéphane; van Stiphout, Ruud; Goris, Annelies; Gaitan, Santiago.

Stud Health Technol Inform ; 302: 1007-1008, 2023 May 18.

Artigo em Inglês | MEDLINE | ID: mdl-37203554

RESUMO

More than 40% of the adult population suffers from functional gastrointestinal disorders, now considered disorders of the "gut-brain axis" (GBA) interactions, a very complex bidirectional neural, endocrine, immune, and humoral communication system modulated by the microbiota. To help discover, understand, and manage GBA disorders, the OnePlanet research center is developing digital twins focused on the GBA, combining novel sensors with artificial intelligence algorithms, providing descriptive, diagnostic, predictive or prescriptive feed-back.

Assuntos

Microbioma Gastrointestinal , Microbiota , Encéfalo , Inteligência Artificial

4.

Piloting an automated clinical trial eligibility surveillance and provider alert system based on artificial intelligence and standard data models.

Meystre, Stéphane M; Heider, Paul M; Cates, Andrew; Bastian, Grace; Pittman, Tara; Gentilin, Stephanie; Kelechi, Teresa J.

BMC Med Res Methodol ; 23(1): 88, 2023 04 11.

Artigo em Inglês | MEDLINE | ID: mdl-37041475

RESUMO

BACKGROUND: To advance new therapies into clinical care, clinical trials must recruit enough participants. Yet, many trials fail to do so, leading to delays, early trial termination, and wasted resources. Under-enrolling trials make it impossible to draw conclusions about the efficacy of new therapies. An oft-cited reason for insufficient enrollment is lack of study team and provider awareness about patient eligibility. Automating clinical trial eligibility surveillance and study team and provider notification could offer a solution. METHODS: To address this need for an automated solution, we conducted an observational pilot study of our TAES (TriAl Eligibility Surveillance) system. We tested the hypothesis that an automated system based on natural language processing and machine learning algorithms could detect patients eligible for specific clinical trials by linking the information extracted from trial descriptions to the corresponding clinical information in the electronic health record (EHR). To evaluate the TAES information extraction and matching prototype (i.e., TAES prototype), we selected five open cardiovascular and cancer trials at the Medical University of South Carolina and created a new reference standard of 21,974 clinical text notes from a random selection of 400 patients (including at least 100 enrolled in the selected trials), with a small subset of 20 notes annotated in detail. We also developed a simple web interface for a new database that stores all trial eligibility criteria, corresponding clinical information, and trial-patient match characteristics using the Observational Medical Outcomes Partnership (OMOP) common data model. Finally, we investigated options for integrating an automated clinical trial eligibility system into the EHR and for notifying health care providers promptly of potential patient eligibility without interrupting their clinical workflow. RESULTS: Although the rapidly implemented TAES prototype achieved only moderate accuracy (recall up to 0.778; precision up to 1.000), it enabled us to assess options for integrating an automated system successfully into the clinical workflow at a healthcare system. CONCLUSIONS: Once optimized, the TAES system could exponentially enhance identification of patients potentially eligible for clinical trials, while simultaneously decreasing the burden on research teams of manual EHR review. Through timely notifications, it could also raise physician awareness of patient eligibility for clinical trials.

Assuntos

Inteligência Artificial , Processamento de Linguagem Natural , Humanos , Projetos Piloto , Seleção de Pacientes , Aprendizado de Máquina

5.

A Natural Language Processing Tool Offering Data Extraction for COVID-19 Related Information (DECOVRI).

Heider, Paul M; Pipaliya, Ronak M; Meystre, Stéphane M.

Stud Health Technol Inform ; 290: 1062-1063, 2022 Jun 06.

Artigo em Inglês | MEDLINE | ID: mdl-35673206

RESUMO

A new natural language processing (NLP) application for COVID-19 related information extraction from clinical text notes is being developed as part of our pandemic response efforts. This NLP application called DECOVRI (Data Extraction for COVID-19 Related Information) will be released as a free and open source tool to convert unstructured notes into structured data within an OMOP CDM-based ecosystem. The DECOVRI prototype is being continuously improved and will be released early (beta) and in a full version.

Assuntos

COVID-19 , Processamento de Linguagem Natural , Ecossistema , Registros Eletrônicos de Saúde , Humanos , Armazenamento e Recuperação da Informação , Pandemias

6.

Comparing Multiple Models for Section Header Classification with Feature Evaluation.

Pipaliya, Ronak; Heider, Paul M; Meystre, Stéphane M.

Stud Health Technol Inform ; 290: 1064-1065, 2022 Jun 06.

Artigo em Inglês | MEDLINE | ID: mdl-35673207

RESUMO

We present on the performance evaluation of machine learning (ML) and Natural Language Processing (NLP) based Section Header classification. The section headers classification task was performed as a two-pass system. The first pass detects a section header while the second pass classifies it. Recall, precision, and F1-measure metrics were reported to explore the best approach for ML based section header classification for use in downstream NLP tasks.

Assuntos

Aprendizado de Máquina , Processamento de Linguagem Natural

7.

Natural language processing enabling COVID-19 predictive analytics to support data-driven patient advising and pooled testing.

Meystre, Stéphane M; Heider, Paul M; Kim, Youngjun; Davis, Matthew; Obeid, Jihad; Madory, James; Alekseyenko, Alexander V.

J Am Med Inform Assoc ; 29(1): 12-21, 2021 12 28.

Artigo em Inglês | MEDLINE | ID: mdl-34415311

RESUMO

OBJECTIVE: The COVID-19 (coronavirus disease 2019) pandemic response at the Medical University of South Carolina included virtual care visits for patients with suspected severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. The telehealth system used for these visits only exports a text note to integrate with the electronic health record, but structured and coded information about COVID-19 (eg, exposure, risk factors, symptoms) was needed to support clinical care and early research as well as predictive analytics for data-driven patient advising and pooled testing. MATERIALS AND METHODS: To capture COVID-19 information from multiple sources, a new data mart and a new natural language processing (NLP) application prototype were developed. The NLP application combined reused components with dictionaries and rules crafted by domain experts. It was deployed as a Web service for hourly processing of new data from patients assessed or treated for COVID-19. The extracted information was then used to develop algorithms predicting SARS-CoV-2 diagnostic test results based on symptoms and exposure information. RESULTS: The dedicated data mart and NLP application were developed and deployed in a mere 10-day sprint in March 2020. The NLP application was evaluated with good accuracy (85.8% recall and 81.5% precision). The SARS-CoV-2 testing predictive analytics algorithms were configured to provide patients with data-driven COVID-19 testing advices with a sensitivity of 81% to 92% and to enable pooled testing with a negative predictive value of 90% to 91%, reducing the required tests to about 63%. CONCLUSIONS: SARS-CoV-2 testing predictive analytics and NLP successfully enabled data-driven patient advising and pooled testing.

Assuntos

COVID-19 , Teste para COVID-19 , Humanos , Processamento de Linguagem Natural , Pandemias , SARS-CoV-2

8.

A Hybrid Model for Family History Information Identification and Relation Extraction: Development and Evaluation of an End-to-End Information Extraction System.

Kim, Youngjun; Heider, Paul M; Lally, Isabel Rh; Meystre, Stéphane M.

JMIR Med Inform ; 9(4): e22797, 2021 Apr 22.

Artigo em Inglês | MEDLINE | ID: mdl-33885370

RESUMO

BACKGROUND: Family history information is important to assess the risk of inherited medical conditions. Natural language processing has the potential to extract this information from unstructured free-text notes to improve patient care and decision making. We describe the end-to-end information extraction system the Medical University of South Carolina team developed when participating in the 2019 National Natural Language Processing Clinical Challenge (n2c2)/Open Health Natural Language Processing (OHNLP) shared task. OBJECTIVE: This task involves identifying mentions of family members and observations in electronic health record text notes and recognizing the 2 types of relations (family member-living status relations and family member-observation relations). Our system aims to achieve a high level of performance by integrating heuristics and advanced information extraction methods. Our efforts also include improving the performance of 2 subtasks by exploiting additional labeled data and clinical text-based embedding models. METHODS: We present a hybrid method that combines machine learning and rule-based approaches. We implemented an end-to-end system with multiple information extraction and attribute classification components. For entity identification, we trained bidirectional long short-term memory deep learning models. These models incorporated static word embeddings and context-dependent embeddings. We created a voting ensemble that combined the predictions of all individual models. For relation extraction, we trained 2 relation extraction models. The first model determined the living status of each family member. The second model identified observations associated with each family member. We implemented online gradient descent models to extract related entity pairs. As part of postchallenge efforts, we used the BioCreative/OHNLP 2018 corpus and trained new models with the union of these 2 datasets. We also pretrained language models using clinical notes from the Medical Information Mart for Intensive Care (MIMIC-III) clinical database. RESULTS: The voting ensemble achieved better performance than individual classifiers. In the entity identification task, our top-performing system reached a precision of 78.90% and a recall of 83.84%. Our natural language processing system for entity identification took 3rd place out of 17 teams in the challenge. We ranked 4th out of 9 teams in the relation extraction task. Our system substantially benefited from the combination of the 2 datasets. Compared to our official submission with F1 scores of 81.30% and 64.94% for entity identification and relation extraction, respectively, the revised system yielded significantly better performance (P<.05) with F1 scores of 86.02% and 72.48%, respectively. CONCLUSIONS: We demonstrated that a hybrid model could be used to successfully extract family history information recorded in unstructured free-text notes. In this study, our approach to entity identification as a sequence labeling problem produced satisfactory results. Our postchallenge efforts significantly improved performance by leveraging additional labeled data and using word vector representations learned from large collections of clinical notes.

9.

Leveraging health system telehealth and informatics infrastructure to create a continuum of services for COVID-19 screening, testing, and treatment.

Ford, Dee; Harvey, Jillian B; McElligott, James; King, Kathryn; Simpson, Kit N; Valenta, Shawn; Warr, Emily H; Walsh, Tasia; Debenham, Ellen; Teasdale, Carla; Meystre, Stephane; Obeid, Jihad S; Metts, Christopher; Lenert, Leslie A.

J Am Med Inform Assoc ; 27(12): 1871-1877, 2020 12 09.

Artigo em Inglês | MEDLINE | ID: mdl-32602884

RESUMO

OBJECTIVES: We describe our approach in using health information technology to provide a continuum of services during the coronavirus disease 2019 (COVID-19) pandemic. COVID-19 challenges and needs required health systems to rapidly redesign the delivery of care. MATERIALS AND METHODS: Our health system deployed 4 COVID-19 telehealth programs and 4 biomedical informatics innovations to screen and care for COVID-19 patients. Using programmatic and electronic health record data, we describe the implementation and initial utilization. RESULTS: Through collaboration across multidisciplinary teams and strategic planning, 4 telehealth program initiatives have been deployed in response to COVID-19: virtual urgent care screening, remote patient monitoring for COVID-19-positive patients, continuous virtual monitoring to reduce workforce risk and utilization of personal protective equipment, and the transition of outpatient care to telehealth. Biomedical informatics was integral to our institutional response in supporting clinical care through new and reconfigured technologies. Through linking the telehealth systems and the electronic health record, we have the ability to monitor and track patients through a continuum of COVID-19 services. DISCUSSION: COVID-19 has facilitated the rapid expansion and utilization of telehealth and health informatics services. We anticipate that patients and providers will view enhanced telehealth services as an essential aspect of the healthcare system. Continuation of telehealth payment models at the federal and private levels will be a key factor in whether this new uptake is sustained. CONCLUSIONS: There are substantial benefits in utilizing telehealth during the COVID-19, including the ability to rapidly scale the number of patients being screened and providing continuity of care.

Assuntos

Teste para COVID-19/métodos , COVID-19/diagnóstico , COVID-19/terapia , Informática Médica , Telemedicina , Continuidade da Assistência ao Paciente , Humanos , Programas de Rastreamento , Pandemias , SARS-CoV-2 , Telemedicina/estatística & dados numéricos

10.

A Comparative Analysis of Speed and Accuracy for Three Off-the-Shelf De-Identification Tools.

Heider, Paul M; Obeid, Jihad S; Meystre, Stéphane M.

AMIA Jt Summits Transl Sci Proc ; 2020: 241-250, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-32477643

RESUMO

A growing quantity of health data is being stored in Electronic Health Records (EHR). The free-text section of these clinical notes contains important patient and treatment information for research but also contains Personally Identifiable Information (PII), which cannot be freely shared within the research community without compromising patient confidentiality and privacy rights. Significant work has been invested in investigating automated approaches to text de-identification, the process of removing or redacting PII. Few studies have examined the performance of existing de-identification pipelines in a controlled comparative analysis. In this study, we use publicly available corpora to analyze speed and accuracy differences between three de-identification systems that can be run off-the-shelf: Amazon Comprehend Medical PHId, Clinacuity's CliniDeID, and the National Library of Medicine's Scrubber. No single system dominated all the compared metrics. NLM Scrubber was the fastest while CliniDeID generally had the highest accuracy.

11.

An artificial intelligence approach to COVID-19 infection risk assessment in virtual visits: A case report.

Obeid, Jihad S; Davis, Matthew; Turner, Matthew; Meystre, Stephane M; Heider, Paul M; O'Bryan, Edward C; Lenert, Leslie A.

J Am Med Inform Assoc ; 27(8): 1321-1325, 2020 08 01.

Artigo em Inglês | MEDLINE | ID: mdl-32449766

RESUMO

OBJECTIVE: In an effort to improve the efficiency of computer algorithms applied to screening for coronavirus disease 2019 (COVID-19) testing, we used natural language processing and artificial intelligence-based methods with unstructured patient data collected through telehealth visits. MATERIALS AND METHODS: After segmenting and parsing documents, we conducted analysis of overrepresented words in patient symptoms. We then developed a word embedding-based convolutional neural network for predicting COVID-19 test results based on patients' self-reported symptoms. RESULTS: Text analytics revealed that concepts such as smell and taste were more prevalent than expected in patients testing positive. As a result, screening algorithms were adapted to include these symptoms. The deep learning model yielded an area under the receiver-operating characteristic curve of 0.729 for predicting positive results and was subsequently applied to prioritize testing appointment scheduling. CONCLUSIONS: Informatics tools such as natural language processing and artificial intelligence methods can have significant clinical impacts when applied to data streams early in the development of clinical systems for outbreak response.

Assuntos

Inteligência Artificial , Infecções por Coronavirus/diagnóstico , Processamento de Linguagem Natural , Pneumonia Viral/diagnóstico , Telemedicina , Algoritmos , Betacoronavirus , COVID-19 , Teste para COVID-19 , Técnicas de Laboratório Clínico , Aprendizado Profundo , Registros Eletrônicos de Saúde , Humanos , Redes Neurais de Computação , Estudos de Casos Organizacionais , Pandemias , Curva ROC , Medição de Risco , SARS-CoV-2 , South Carolina

12.

Comparative Study of Various Approaches for Ensemble-based De-identification of Electronic Health Record Narratives.

Kim, Youngjun; Heider, Paul M; Meystre, Stéphane M.

AMIA Annu Symp Proc ; 2020: 648-657, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-33936439

RESUMO

De-identification of electric health record narratives is a fundamental task applying natural language processing to better protect patient information privacy. We explore different types of ensemble learning methods to improve clinical text de-identification. We present two ensemble-based approaches for combining multiple predictive models. The first method selects an optimal subset of de-identification models by greedy exclusion. This ensemble pruning allows one to save computational time or physical resources while achieving similar or better performance than the ensemble of all members. The second method uses a sequence of words to train a sequential model. For this sequence labelling-based stacked ensemble, we employ search-based structured prediction and bidirectional long short-term memory algorithms. We create ensembles consisting of de-identification models trained on two clinical text corpora. Experimental results show that our ensemble systems can effectively integrate predictions from individual models and offer better generalization across two different corpora.

Assuntos

Registros Eletrônicos de Saúde , Algoritmos , Confidencialidade , Anonimização de Dados , Humanos , Narração , Processamento de Linguagem Natural , Privacidade

13.

Ensemble method-based extraction of medication and related information from clinical texts.

Kim, Youngjun; Meystre, Stéphane M.

J Am Med Inform Assoc ; 27(1): 31-38, 2020 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-31282932

RESUMO

OBJECTIVE: Accurate and complete information about medications and related information is crucial for effective clinical decision support and precise health care. Recognition and reduction of adverse drug events is also central to effective patient care. The goal of this research is the development of a natural language processing (NLP) system to automatically extract medication and adverse drug event information from electronic health records. This effort was part of the 2018 n2c2 shared task on adverse drug events and medication extraction. MATERIALS AND METHODS: The new NLP system implements a stacked generalization based on a search-based structured prediction algorithm for concept extraction. We trained 4 sequential classifiers using a variety of structured learning algorithms. To enhance accuracy, we created a stacked ensemble consisting of these concept extraction models trained on the shared task training data. We implemented a support vector machine model to identify related concepts. RESULTS: Experiments with the official test set showed that our stacked ensemble achieved an F1 score of 92.66%. The relation extraction model with given concepts reached a 93.59% F1 score. Our end-to-end system yielded overall micro-averaged recall, precision, and F1 score of 92.52%, 81.88% and 86.88%, respectively. Our NLP system for adverse drug events and medication extraction ranked within the top 5 of teams participating in the challenge. CONCLUSION: This study demonstrated that a stacked ensemble with a search-based structured prediction algorithm achieved good performance by effectively integrating the output of individual classifiers and could provide a valid solution for other clinical concept extraction tasks.

Assuntos

Algoritmos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Humanos , Narração

14.

A Study of Medical Problem Extraction for Better Disease Management.

Kim, Youngjun; Meystre, Stéphane M.

Stud Health Technol Inform ; 264: 193-197, 2019 Aug 21.

Artigo em Inglês | MEDLINE | ID: mdl-31437912

RESUMO

This study focuses on the extraction of medical problems mentioned in electric health records to support disease management. We experimented with a variety of information extraction methods based on rules, on knowledge bases, and on machine learning, and combined them in an ensemble method approach. A new dataset drawn from cancer patient medical records at the University of Utah Healthcare was manually annotated for all mentions of a selection of the most frequent medical problems in this institution. Our experimental results show that a medical knowledge base can improve shallow and deep learning-based sequence labeling methods. The voting ensemble method combining information extraction models outperformed individual models and yielded more precise extraction of medical problems. As an example of applications benefiting from acurate medical problems extraction, we compared document-level cancer type classifiers and demonstrated that using only medical concepts yielded more accurate classification than using all the words in a clinical note.

Assuntos

Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação , Gerenciamento Clínico , Humanos , Bases de Conhecimento , Aprendizado de Máquina

15.

Impact of De-Identification on Clinical Text Classification Using Traditional and Deep Learning Classifiers.

Obeid, Jihad S; Heider, Paul M; Weeda, Erin R; Matuskowitz, Andrew J; Carr, Christine M; Gagnon, Kevin; Crawford, Tami; Meystre, Stephane M.

Stud Health Technol Inform ; 264: 283-287, 2019 Aug 21.

Artigo em Inglês | MEDLINE | ID: mdl-31437930

RESUMO

Clinical text de-identification enables collaborative research while protecting patient privacy and confidentiality; however, concerns persist about the reduction in the utility of the de-identified text for information extraction and machine learning tasks. In the context of a deep learning experiment to detect altered mental status in emergency department provider notes, we tested several classifiers on clinical notes in their original form and on their automatically de-identified counterpart. We tested both traditional bag-of-words based machine learning models as well as word-embedding based deep learning models. We evaluated the models on 1,113 history of present illness notes. A total of 1,795 protected health information tokens were replaced in the de-identification process across all notes. The deep learning models had the best performance with accuracies of 95% on both original and de-identified notes. However, there was no significant difference in the performance of any of the models on the original vs. the de-identified notes.

Assuntos

Anonimização de Dados , Aprendizado Profundo , Confidencialidade , Registros Eletrônicos de Saúde , Humanos , Aprendizado de Máquina

16.

Patient-Pivoted Automated Trial Eligibility Pipeline: The First of Three Phases in a Modular Architecture.

Heider, Paul M; Meystre, Stéphane M.

Stud Health Technol Inform ; 264: 1476-1477, 2019 Aug 21.

Artigo em Inglês | MEDLINE | ID: mdl-31438189

RESUMO

Automated extraction of patient trial eligibility for clinical research studies can increase enrollment at a decreased time and money cost. We have developed a modular trial eligibility pipeline including patient-batched processing and an internal webservice backed by a uimaFIT pipeline as part of a multi-phase approach to include note-batched processing, the ability to query trials matching patients or patients matching trials, and an external alignment engine to connect patients to trials.

Assuntos

Definição da Elegibilidade , Custos e Análise de Custo , Humanos , Seleção de Pacientes

17.

Automatic trial eligibility surveillance based on unstructured clinical data.

Meystre, Stéphane M; Heider, Paul M; Kim, Youngjun; Aruch, Daniel B; Britten, Carolyn D.

Int J Med Inform ; 129: 13-19, 2019 09.

Artigo em Inglês | MEDLINE | ID: mdl-31445247

RESUMO

INTRODUCTION: Insufficient patient enrollment in clinical trials remains a serious and costly problem and is often considered the most critical issue to solve for the clinical trials community. In this project, we assessed the feasibility of automatically detecting a patient's eligibility for a sample of breast cancer clinical trials by mapping coded clinical trial eligibility criteria to the corresponding clinical information automatically extracted from text in the EHR. METHODS: Three open breast cancer clinical trials were selected by oncologists. Their eligibility criteria were manually abstracted from trial descriptions using the OHDSI ATLAS web application. Patients enrolled or screened for these trials were selected as 'positive' or 'possible' cases. Other patients diagnosed with breast cancer were selected as 'negative' cases. A selection of the clinical data and all clinical notes of these 229 selected patients was extracted from the MUSC clinical data warehouse and stored in a database implementing the OMOP common data model. Eligibility criteria were extracted from clinical notes using either manually crafted pattern matching (regular expressions) or a new natural language processing (NLP) application. These extracted criteria were then compared with reference criteria from trial descriptions. This comparison was realized with three different versions of a new application: rule-based, cosine similarity-based, and machine learning-based. RESULTS: For eligibility criteria extraction from clinical notes, the machine learning-based NLP application allowed for the highest accuracy with a micro-averaged recall of 90.9% and precision of 89.7%. For trial eligibility determination, the highest accuracy was reached by the machine learning-based approach with a per-trial AUC between 75.5% and 89.8%. CONCLUSION: NLP can be used to extract eligibility criteria from EHR clinical notes and automatically discover patients possibly eligible for a clinical trial with good accuracy, which could be leveraged to reduce the workload of humans screening patients for trials.

Assuntos

Definição da Elegibilidade , Automação , Neoplasias da Mama , Data Warehousing , Bases de Dados Factuais , Feminino , Humanos , Aprendizado de Máquina , Masculino , Pessoa de Meia-Idade , Processamento de Linguagem Natural , Seleção de Pacientes , Carga de Trabalho

18.

Automated Extraction and Classification of Cancer Stage Mentions fromUnstructured Text Fields in a Central Cancer Registry.

AAlAbdulsalam, Abdulrahman K; Garvin, Jennifer H; Redd, Andrew; Carter, Marjorie E; Sweeny, Carol; Meystre, Stephane M.

AMIA Jt Summits Transl Sci Proc ; 2017: 16-25, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-29888032

RESUMO

Cancer stage is one of the most important prognostic parameters in most cancer subtypes. The American Joint Com-mittee on Cancer (AJCC) specifies criteria for staging each cancer type based on tumor characteristics (T), lymph node involvement (N), and tumor metastasis (M) known as TNM staging system. Information related to cancer stage is typically recorded in clinical narrative text notes and other informal means of communication in the Electronic Health Record (EHR). As a result, human chart-abstractors (known as certified tumor registrars) have to search through volu-minous amounts of text to extract accurate stage information and resolve discordance between different data sources. This study proposes novel applications of natural language processing and machine learning to automatically extract and classify TNM stage mentions from records at the Utah Cancer Registry. Our results indicate that TNM stages can be extracted and classified automatically with high accuracy (extraction sensitivity: 95.5%-98.4% and classification sensitivity: 83.5%-87%).

19.

Automating Quality Measures for Heart Failure Using Natural Language Processing: A Descriptive Study in the Department of Veterans Affairs.

Garvin, Jennifer Hornung; Kim, Youngjun; Gobbel, Glenn Temple; Matheny, Michael E; Redd, Andrew; Bray, Bruce E; Heidenreich, Paul; Bolton, Dan; Heavirland, Julia; Kelly, Natalie; Reeves, Ruth; Kalsy, Megha; Goldstein, Mary Kane; Meystre, Stephane M.

JMIR Med Inform ; 6(1): e5, 2018 Jan 15.

Artigo em Inglês | MEDLINE | ID: mdl-29335238

RESUMO

BACKGROUND: We developed an accurate, stakeholder-informed, automated, natural language processing (NLP) system to measure the quality of heart failure (HF) inpatient care, and explored the potential for adoption of this system within an integrated health care system. OBJECTIVE: To accurately automate a United States Department of Veterans Affairs (VA) quality measure for inpatients with HF. METHODS: We automated the HF quality measure Congestive Heart Failure Inpatient Measure 19 (CHI19) that identifies whether a given patient has left ventricular ejection fraction (LVEF) <40%, and if so, whether an angiotensin-converting enzyme inhibitor or angiotensin-receptor blocker was prescribed at discharge if there were no contraindications. We used documents from 1083 unique inpatients from eight VA medical centers to develop a reference standard (RS) to train (n=314) and test (n=769) the Congestive Heart Failure Information Extraction Framework (CHIEF). We also conducted semi-structured interviews (n=15) for stakeholder feedback on implementation of the CHIEF. RESULTS: The CHIEF classified each hospitalization in the test set with a sensitivity (SN) of 98.9% and positive predictive value of 98.7%, compared with an RS and SN of 98.5% for available External Peer Review Program assessments. Of the 1083 patients available for the NLP system, the CHIEF evaluated and classified 100% of cases. Stakeholders identified potential implementation facilitators and clinical uses of the CHIEF. CONCLUSIONS: The CHIEF provided complete data for all patients in the cohort and could potentially improve the efficiency, timeliness, and utility of HF quality measurements.

20.

Ensemble-based Methods to Improve De-identification of Electronic Health Record Narratives.

Kim, Youngjun; Heider, Paul; Meystre, Stéphane.

AMIA Annu Symp Proc ; 2018: 663-672, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-30815108

RESUMO

Text de-identification is an application of clinical natural language processing that offers significant efficiency and scalability advantages. Hence, various learning algorithms have been applied to this task to yield better performance. Instead of choosing the best individual learning algorithm, we aim to improve de-identification by constructing ensembles that lead to more accurate classification. We present three different ensemble methods that combine multiple de-identification models trained from deep learning, shallow learning, and rule-based approaches. Each model is capable of automated de-identification without manual medical expertise. Our experimental results show that the stacked learning ensemble is more effective than other ensemble methods, producing the highest recall, the most important metric for de-identification. The stacked ensemble achieved state-of-the-art performance on the 2014 i2b2 dataset with 97.04% precision, 94.45% recall, and 95.73% F1 score.

Assuntos

Algoritmos , Anonimização de Dados , Registros Eletrônicos de Saúde , Aprendizado de Máquina , Processamento de Linguagem Natural , Humanos , Métodos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA