Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
1.
J Med Internet Res ; 26: e55676, 2024 May 28.
Artigo em Inglês | MEDLINE | ID: mdl-38805692

RESUMO

BACKGROUND: Clinical natural language processing (NLP) researchers need access to directly comparable evaluation results for applications such as text deidentification across a range of corpus types and the means to easily test new systems or corpora within the same framework. Current systems, reported metrics, and the personally identifiable information (PII) categories evaluated are not easily comparable. OBJECTIVE: This study presents an open-source and extensible end-to-end framework for comparing clinical NLP system performance across corpora even when the annotation categories do not align. METHODS: As a use case for this framework, we use 6 off-the-shelf text deidentification systems (ie, CliniDeID, deid from PhysioNet, MITRE Identity Scrubber Toolkit [MIST], NeuroNER, National Library of Medicine [NLM] Scrubber, and Philter) across 3 standard clinical text corpora for the task (2 of which are publicly available) and 1 private corpus (all in English), with annotation categories that are not directly analogous. The framework is built on shell scripts that can be extended to include new systems, corpora, and performance metrics. We present this open tool, multiple means for aligning PII categories during evaluation, and our initial timing and performance metric findings. Code for running this framework with all settings needed to run all pairs are available via Codeberg and GitHub. RESULTS: From this case study, we found large differences in processing speed between systems. The fastest system (ie, MIST) processed an average of 24.57 (SD 26.23) notes per second, while the slowest (ie, CliniDeID) processed an average of 1.00 notes per second. No system uniformly outperformed the others at identifying PII across corpora and categories. Instead, a rich tapestry of performance trade-offs emerged for PII categories. CliniDeID and Philter prioritize recall over precision (with an average recall 6.9 and 11.2 points higher, respectively, for partially matching spans of text matching any PII category), while the other 4 systems consistently have higher precision (with MIST's precision scoring 20.2 points higher, NLM Scrubber scoring 4.4 points higher, NeuroNER scoring 7.2 points higher, and deid scoring 17.1 points higher). The macroaverage recall across corpora for identifying names, one of the more sensitive PII categories, included deid (48.8%) and MIST (66.9%) at the low end and NeuroNER (84.1%), NLM Scrubber (88.1%), and CliniDeID (95.9%) at the high end. A variety of metrics across categories and corpora are reported with a wider variety (eg, F2-score) available via the tool. CONCLUSIONS: NLP systems in general and deidentification systems and corpora in our use case tend to be evaluated in stand-alone research articles that only include a limited set of comparators. We hold that a single evaluation pipeline across multiple systems and corpora allows for more nuanced comparisons. Our open pipeline should reduce barriers to evaluation and system advancement.


Assuntos
Processamento de Linguagem Natural
2.
BMC Med Res Methodol ; 23(1): 88, 2023 04 11.
Artigo em Inglês | MEDLINE | ID: mdl-37041475

RESUMO

BACKGROUND: To advance new therapies into clinical care, clinical trials must recruit enough participants. Yet, many trials fail to do so, leading to delays, early trial termination, and wasted resources. Under-enrolling trials make it impossible to draw conclusions about the efficacy of new therapies. An oft-cited reason for insufficient enrollment is lack of study team and provider awareness about patient eligibility. Automating clinical trial eligibility surveillance and study team and provider notification could offer a solution. METHODS: To address this need for an automated solution, we conducted an observational pilot study of our TAES (TriAl Eligibility Surveillance) system. We tested the hypothesis that an automated system based on natural language processing and machine learning algorithms could detect patients eligible for specific clinical trials by linking the information extracted from trial descriptions to the corresponding clinical information in the electronic health record (EHR). To evaluate the TAES information extraction and matching prototype (i.e., TAES prototype), we selected five open cardiovascular and cancer trials at the Medical University of South Carolina and created a new reference standard of 21,974 clinical text notes from a random selection of 400 patients (including at least 100 enrolled in the selected trials), with a small subset of 20 notes annotated in detail. We also developed a simple web interface for a new database that stores all trial eligibility criteria, corresponding clinical information, and trial-patient match characteristics using the Observational Medical Outcomes Partnership (OMOP) common data model. Finally, we investigated options for integrating an automated clinical trial eligibility system into the EHR and for notifying health care providers promptly of potential patient eligibility without interrupting their clinical workflow. RESULTS: Although the rapidly implemented TAES prototype achieved only moderate accuracy (recall up to 0.778; precision up to 1.000), it enabled us to assess options for integrating an automated system successfully into the clinical workflow at a healthcare system. CONCLUSIONS: Once optimized, the TAES system could exponentially enhance identification of patients potentially eligible for clinical trials, while simultaneously decreasing the burden on research teams of manual EHR review. Through timely notifications, it could also raise physician awareness of patient eligibility for clinical trials.


Assuntos
Inteligência Artificial , Processamento de Linguagem Natural , Humanos , Projetos Piloto , Seleção de Pacientes , Aprendizado de Máquina
3.
J Biomed Inform ; 67: 42-48, 2017 03.
Artigo em Inglês | MEDLINE | ID: mdl-28163196

RESUMO

Efforts to improve the treatment of congestive heart failure, a common and serious medical condition, include the use of quality measures to assess guideline-concordant care. The goal of this study is to identify left ventricular ejection fraction (LVEF) information from various types of clinical notes, and to then use this information for heart failure quality measurement. We analyzed the annotation differences between a new corpus of clinical notes from the Echocardiography, Radiology, and Text Integrated Utility package and other corpora annotated for natural language processing (NLP) research in the Department of Veterans Affairs. These reports contain varying degrees of structure. To examine whether existing LVEF extraction modules we developed in prior research improve the accuracy of LVEF information extraction from the new corpus, we created two sequence-tagging NLP modules trained with a new data set, with or without predictions from the existing LVEF extraction modules. We also conducted a set of experiments to examine the impact of training data size on information extraction accuracy. We found that less training data is needed when reports are highly structured, and that combining predictions from existing LVEF extraction modules improves information extraction when reports have less structured formats and a rich set of vocabulary.


Assuntos
Insuficiência Cardíaca/diagnóstico , Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural , Insuficiência Cardíaca/terapia , Humanos , Volume Sistólico
4.
J Biomed Inform ; 50: 162-72, 2014 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-24859155

RESUMO

The Health Insurance Portability and Accountability Act (HIPAA) Safe Harbor method requires removal of 18 types of protected health information (PHI) from clinical documents to be considered "de-identified" prior to use for research purposes. Human review of PHI elements from a large corpus of clinical documents can be tedious and error-prone. Indeed, multiple annotators may be required to consistently redact information that represents each PHI class. Automated de-identification has the potential to improve annotation quality and reduce annotation time. For instance, using machine-assisted annotation by combining de-identification system outputs used as pre-annotations and an interactive annotation interface to provide annotators with PHI annotations for "curation" rather than manual annotation from "scratch" on raw clinical documents. In order to assess whether machine-assisted annotation improves the reliability and accuracy of the reference standard quality and reduces annotation effort, we conducted an annotation experiment. In this annotation study, we assessed the generalizability of the VA Consortium for Healthcare Informatics Research (CHIR) annotation schema and guidelines applied to a corpus of publicly available clinical documents called MTSamples. Specifically, our goals were to (1) characterize a heterogeneous corpus of clinical documents manually annotated for risk-ranked PHI and other annotation types (clinical eponyms and person relations), (2) evaluate how well annotators apply the CHIR schema to the heterogeneous corpus, (3) compare whether machine-assisted annotation (experiment) improves annotation quality and reduces annotation time compared to manual annotation (control), and (4) assess the change in quality of reference standard coverage with each added annotator's annotations.


Assuntos
Registros Eletrônicos de Saúde , Interface Usuário-Computador , Health Insurance Portability and Accountability Act , Estados Unidos
5.
J Biomed Inform ; 50: 142-50, 2014 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-24502938

RESUMO

As more and more electronic clinical information is becoming easier to access for secondary uses such as clinical research, approaches that enable faster and more collaborative research while protecting patient privacy and confidentiality are becoming more important. Clinical text de-identification offers such advantages but is typically a tedious manual process. Automated Natural Language Processing (NLP) methods can alleviate this process, but their impact on subsequent uses of the automatically de-identified clinical narratives has only barely been investigated. In the context of a larger project to develop and investigate automated text de-identification for Veterans Health Administration (VHA) clinical notes, we studied the impact of automated text de-identification on clinical information in a stepwise manner. Our approach started with a high-level assessment of clinical notes informativeness and formatting, and ended with a detailed study of the overlap of select clinical information types and Protected Health Information (PHI). To investigate the informativeness (i.e., document type information, select clinical data types, and interpretation or conclusion) of VHA clinical notes, we used five different existing text de-identification systems. The informativeness was only minimally altered by these systems while formatting was only modified by one system. To examine the impact of de-identification on clinical information extraction, we compared counts of SNOMED-CT concepts found by an open source information extraction application in the original (i.e., not de-identified) version of a corpus of VHA clinical notes, and in the same corpus after de-identification. Only about 1.2-3% less SNOMED-CT concepts were found in de-identified versions of our corpus, and many of these concepts were PHI that was erroneously identified as clinical information. To study this impact in more details and assess how generalizable our findings were, we examined the overlap between select clinical information annotated in the 2010 i2b2 NLP challenge corpus and automatic PHI annotations from our best-of-breed VHA clinical text de-identification system (nicknamed 'BoB'). Overall, only 0.81% of the clinical information exactly overlapped with PHI, and 1.78% partly overlapped. We conclude that automated text de-identification's impact on clinical information is small, but not negligible, and that improved clinical acronyms and eponyms disambiguation could significantly reduce this impact.


Assuntos
Registros Eletrônicos de Saúde , Privacidade , Automação , Systematized Nomenclature of Medicine , Estados Unidos , United States Department of Veterans Affairs
6.
BMC Med Res Methodol ; 12: 109, 2012 Jul 27.
Artigo em Inglês | MEDLINE | ID: mdl-22839356

RESUMO

BACKGROUND: The increased use and adoption of Electronic Health Records (EHR) causes a tremendous growth in digital information useful for clinicians, researchers and many other operational purposes. However, this information is rich in Protected Health Information (PHI), which severely restricts its access and possible uses. A number of investigators have developed methods for automatically de-identifying EHR documents by removing PHI, as specified in the Health Insurance Portability and Accountability Act "Safe Harbor" method.This study focuses on the evaluation of existing automated text de-identification methods and tools, as applied to Veterans Health Administration (VHA) clinical documents, to assess which methods perform better with each category of PHI found in our clinical notes; and when new methods are needed to improve performance. METHODS: We installed and evaluated five text de-identification systems "out-of-the-box" using a corpus of VHA clinical documents. The systems based on machine learning methods were trained with the 2006 i2b2 de-identification corpora and evaluated with our VHA corpus, and also evaluated with a ten-fold cross-validation experiment using our VHA corpus. We counted exact, partial, and fully contained matches with reference annotations, considering each PHI type separately, or only one unique 'PHI' category. Performance of the systems was assessed using recall (equivalent to sensitivity) and precision (equivalent to positive predictive value) metrics, as well as the F(2)-measure. RESULTS: Overall, systems based on rules and pattern matching achieved better recall, and precision was always better with systems based on machine learning approaches. The highest "out-of-the-box" F(2)-measure was 67% for partial matches; the best precision and recall were 95% and 78%, respectively. Finally, the ten-fold cross validation experiment allowed for an increase of the F(2)-measure to 79% with partial matches. CONCLUSIONS: The "out-of-the-box" evaluation of text de-identification systems provided us with compelling insight about the best methods for de-identification of VHA clinical documents. The errors analysis demonstrated an important need for customization to PHI formats specific to VHA documents. This study informed the planning and development of a "best-of-breed" automatic de-identification application for VHA clinical text.


Assuntos
Registros Eletrônicos de Saúde , United States Department of Veterans Affairs , Inteligência Artificial , Segurança Computacional/normas , Confidencialidade , Humanos , Padrões de Referência , Estados Unidos , Saúde dos Veteranos
7.
J Biomed Inform ; 45(4): 703-10, 2012 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-22197801

RESUMO

An increasing need for collaboration and resources sharing in the Natural Language Processing (NLP) research and development community motivates efforts to create and share a common data model and a common terminology for all information annotated and extracted from clinical text. We have combined two existing standards: the HL7 Clinical Document Architecture (CDA), and the ISO Graph Annotation Format (GrAF; in development), to develop such a data model entitled "CDA+GrAF". We experimented with several methods to combine these existing standards, and eventually selected a method wrapping separate CDA and GrAF parts in a common standoff annotation (i.e., separate from the annotated text) XML document. Two use cases, clinical document sections, and the 2010 i2b2/VA NLP Challenge (i.e., problems, tests, and treatments, with their assertions and relations), were used to create examples of such standoff annotation documents, and were successfully validated with the XML schemata provided with both standards. We developed a tool to automatically translate annotation documents from the 2010 i2b2/VA NLP Challenge format to GrAF, and automatically generated 50 annotation documents using this tool, all successfully validated. Finally, we adapted the XSL stylesheet provided with HL7 CDA to allow viewing annotation XML documents in a web browser, and plan to adapt existing tools for translating annotation documents between CDA+GrAF and the UIMA and GATE frameworks. This common data model may ease directly comparing NLP tools and applications, combining their output, transforming and "translating" annotations between different NLP applications, and eventually "plug-and-play" of different modules in NLP applications.


Assuntos
Informática Médica/métodos , Modelos Teóricos , Processamento de Linguagem Natural , Documentação/métodos , Registros Eletrônicos de Saúde , Reprodutibilidade dos Testes
8.
Stud Health Technol Inform ; 290: 1062-1063, 2022 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-35673206

RESUMO

A new natural language processing (NLP) application for COVID-19 related information extraction from clinical text notes is being developed as part of our pandemic response efforts. This NLP application called DECOVRI (Data Extraction for COVID-19 Related Information) will be released as a free and open source tool to convert unstructured notes into structured data within an OMOP CDM-based ecosystem. The DECOVRI prototype is being continuously improved and will be released early (beta) and in a full version.


Assuntos
COVID-19 , Processamento de Linguagem Natural , Ecossistema , Registros Eletrônicos de Saúde , Humanos , Armazenamento e Recuperação da Informação , Pandemias
9.
Stud Health Technol Inform ; 290: 1064-1065, 2022 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-35673207

RESUMO

We present on the performance evaluation of machine learning (ML) and Natural Language Processing (NLP) based Section Header classification. The section headers classification task was performed as a two-pass system. The first pass detects a section header while the second pass classifies it. Recall, precision, and F1-measure metrics were reported to explore the best approach for ML based section header classification for use in downstream NLP tasks.


Assuntos
Aprendizado de Máquina , Processamento de Linguagem Natural
10.
JMIR Med Inform ; 9(4): e22797, 2021 Apr 22.
Artigo em Inglês | MEDLINE | ID: mdl-33885370

RESUMO

BACKGROUND: Family history information is important to assess the risk of inherited medical conditions. Natural language processing has the potential to extract this information from unstructured free-text notes to improve patient care and decision making. We describe the end-to-end information extraction system the Medical University of South Carolina team developed when participating in the 2019 National Natural Language Processing Clinical Challenge (n2c2)/Open Health Natural Language Processing (OHNLP) shared task. OBJECTIVE: This task involves identifying mentions of family members and observations in electronic health record text notes and recognizing the 2 types of relations (family member-living status relations and family member-observation relations). Our system aims to achieve a high level of performance by integrating heuristics and advanced information extraction methods. Our efforts also include improving the performance of 2 subtasks by exploiting additional labeled data and clinical text-based embedding models. METHODS: We present a hybrid method that combines machine learning and rule-based approaches. We implemented an end-to-end system with multiple information extraction and attribute classification components. For entity identification, we trained bidirectional long short-term memory deep learning models. These models incorporated static word embeddings and context-dependent embeddings. We created a voting ensemble that combined the predictions of all individual models. For relation extraction, we trained 2 relation extraction models. The first model determined the living status of each family member. The second model identified observations associated with each family member. We implemented online gradient descent models to extract related entity pairs. As part of postchallenge efforts, we used the BioCreative/OHNLP 2018 corpus and trained new models with the union of these 2 datasets. We also pretrained language models using clinical notes from the Medical Information Mart for Intensive Care (MIMIC-III) clinical database. RESULTS: The voting ensemble achieved better performance than individual classifiers. In the entity identification task, our top-performing system reached a precision of 78.90% and a recall of 83.84%. Our natural language processing system for entity identification took 3rd place out of 17 teams in the challenge. We ranked 4th out of 9 teams in the relation extraction task. Our system substantially benefited from the combination of the 2 datasets. Compared to our official submission with F1 scores of 81.30% and 64.94% for entity identification and relation extraction, respectively, the revised system yielded significantly better performance (P<.05) with F1 scores of 86.02% and 72.48%, respectively. CONCLUSIONS: We demonstrated that a hybrid model could be used to successfully extract family history information recorded in unstructured free-text notes. In this study, our approach to entity identification as a sequence labeling problem produced satisfactory results. Our postchallenge efforts significantly improved performance by leveraging additional labeled data and using word vector representations learned from large collections of clinical notes.

11.
J Am Med Inform Assoc ; 29(1): 12-21, 2021 12 28.
Artigo em Inglês | MEDLINE | ID: mdl-34415311

RESUMO

OBJECTIVE: The COVID-19 (coronavirus disease 2019) pandemic response at the Medical University of South Carolina included virtual care visits for patients with suspected severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. The telehealth system used for these visits only exports a text note to integrate with the electronic health record, but structured and coded information about COVID-19 (eg, exposure, risk factors, symptoms) was needed to support clinical care and early research as well as predictive analytics for data-driven patient advising and pooled testing. MATERIALS AND METHODS: To capture COVID-19 information from multiple sources, a new data mart and a new natural language processing (NLP) application prototype were developed. The NLP application combined reused components with dictionaries and rules crafted by domain experts. It was deployed as a Web service for hourly processing of new data from patients assessed or treated for COVID-19. The extracted information was then used to develop algorithms predicting SARS-CoV-2 diagnostic test results based on symptoms and exposure information. RESULTS: The dedicated data mart and NLP application were developed and deployed in a mere 10-day sprint in March 2020. The NLP application was evaluated with good accuracy (85.8% recall and 81.5% precision). The SARS-CoV-2 testing predictive analytics algorithms were configured to provide patients with data-driven COVID-19 testing advices with a sensitivity of 81% to 92% and to enable pooled testing with a negative predictive value of 90% to 91%, reducing the required tests to about 63%. CONCLUSIONS: SARS-CoV-2 testing predictive analytics and NLP successfully enabled data-driven patient advising and pooled testing.


Assuntos
COVID-19 , Teste para COVID-19 , Humanos , Processamento de Linguagem Natural , Pandemias , SARS-CoV-2
12.
BMC Med Res Methodol ; 10: 70, 2010 Aug 02.
Artigo em Inglês | MEDLINE | ID: mdl-20678228

RESUMO

BACKGROUND: In the United States, the Health Insurance Portability and Accountability Act (HIPAA) protects the confidentiality of patient data and requires the informed consent of the patient and approval of the Internal Review Board to use data for research purposes, but these requirements can be waived if data is de-identified. For clinical data to be considered de-identified, the HIPAA "Safe Harbor" technique requires 18 data elements (called PHI: Protected Health Information) to be removed. The de-identification of narrative text documents is often realized manually, and requires significant resources. Well aware of these issues, several authors have investigated automated de-identification of narrative text documents from the electronic health record, and a review of recent research in this domain is presented here. METHODS: This review focuses on recently published research (after 1995), and includes relevant publications from bibliographic queries in PubMed, conference proceedings, the ACM Digital Library, and interesting publications referenced in already included papers. RESULTS: The literature search returned more than 200 publications. The majority focused only on structured data de-identification instead of narrative text, on image de-identification, or described manual de-identification, and were therefore excluded. Finally, 18 publications describing automated text de-identification were selected for detailed analysis of the architecture and methods used, the types of PHI detected and removed, the external resources used, and the types of clinical documents targeted. All text de-identification systems aimed to identify and remove person names, and many included other types of PHI. Most systems used only one or two specific clinical document types, and were mostly based on two different groups of methodologies: pattern matching and machine learning. Many systems combined both approaches for different types of PHI, but the majority relied only on pattern matching, rules, and dictionaries. CONCLUSIONS: In general, methods based on dictionaries performed better with PHI that is rarely mentioned in clinical text, but are more difficult to generalize. Methods based on machine learning tend to perform better, especially with PHI that is not mentioned in the dictionaries used. Finally, the issues of anonymization, sufficient performance, and "over-scrubbing" are discussed in this publication.


Assuntos
Confidencialidade , Registros Eletrônicos de Saúde , Software , Algoritmos , Confidencialidade/legislação & jurisprudência , Health Insurance Portability and Accountability Act , Humanos , Estados Unidos
13.
Stud Health Technol Inform ; 160(Pt 2): 944-8, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-20841823

RESUMO

An important proportion of the information about the medications a patient is taking is mentioned only in narrative text in the electronic health record. Automated information extraction can make this information accessible for decision support, research, or any other automated processing. In the context of the "i2b2 medication extraction challenge," we have developed a new NLP application called Textractor to automatically extract medications and details about them (e.g., dosage, frequency, reason for their prescription). This application and its evaluation with part of the reference standard for this "challenge" are presented here, along with an analysis of the development of this reference standard. During this evaluation, Textractor reached a system-level overall F1-measure, the reference metric for this challenge, of about 77% for exact matches. The best performance was measured with medication routes (F1-measure 86.4%), and the worst with prescription reasons (F1-measure 29%). These results are consistent with the agreement observed between human annotators when developing the reference standard, and with other published research.


Assuntos
Prescrições de Medicamentos , Armazenamento e Recuperação da Informação/métodos , Registros Eletrônicos de Saúde/normas , Humanos , Processamento de Linguagem Natural , Vocabulário Controlado
14.
J Am Med Inform Assoc ; 27(1): 31-38, 2020 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-31282932

RESUMO

OBJECTIVE: Accurate and complete information about medications and related information is crucial for effective clinical decision support and precise health care. Recognition and reduction of adverse drug events is also central to effective patient care. The goal of this research is the development of a natural language processing (NLP) system to automatically extract medication and adverse drug event information from electronic health records. This effort was part of the 2018 n2c2 shared task on adverse drug events and medication extraction. MATERIALS AND METHODS: The new NLP system implements a stacked generalization based on a search-based structured prediction algorithm for concept extraction. We trained 4 sequential classifiers using a variety of structured learning algorithms. To enhance accuracy, we created a stacked ensemble consisting of these concept extraction models trained on the shared task training data. We implemented a support vector machine model to identify related concepts. RESULTS: Experiments with the official test set showed that our stacked ensemble achieved an F1 score of 92.66%. The relation extraction model with given concepts reached a 93.59% F1 score. Our end-to-end system yielded overall micro-averaged recall, precision, and F1 score of 92.52%, 81.88% and 86.88%, respectively. Our NLP system for adverse drug events and medication extraction ranked within the top 5 of teams participating in the challenge. CONCLUSION: This study demonstrated that a stacked ensemble with a search-based structured prediction algorithm achieved good performance by effectively integrating the output of individual classifiers and could provide a valid solution for other clinical concept extraction tasks.


Assuntos
Algoritmos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Humanos , Narração
15.
AMIA Annu Symp Proc ; 2020: 648-657, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33936439

RESUMO

De-identification of electric health record narratives is a fundamental task applying natural language processing to better protect patient information privacy. We explore different types of ensemble learning methods to improve clinical text de-identification. We present two ensemble-based approaches for combining multiple predictive models. The first method selects an optimal subset of de-identification models by greedy exclusion. This ensemble pruning allows one to save computational time or physical resources while achieving similar or better performance than the ensemble of all members. The second method uses a sequence of words to train a sequential model. For this sequence labelling-based stacked ensemble, we employ search-based structured prediction and bidirectional long short-term memory algorithms. We create ensembles consisting of de-identification models trained on two clinical text corpora. Experimental results show that our ensemble systems can effectively integrate predictions from individual models and offer better generalization across two different corpora.


Assuntos
Registros Eletrônicos de Saúde , Algoritmos , Confidencialidade , Anonimização de Dados , Humanos , Narração , Processamento de Linguagem Natural , Privacidade
16.
AMIA Jt Summits Transl Sci Proc ; 2020: 241-250, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32477643

RESUMO

A growing quantity of health data is being stored in Electronic Health Records (EHR). The free-text section of these clinical notes contains important patient and treatment information for research but also contains Personally Identifiable Information (PII), which cannot be freely shared within the research community without compromising patient confidentiality and privacy rights. Significant work has been invested in investigating automated approaches to text de-identification, the process of removing or redacting PII. Few studies have examined the performance of existing de-identification pipelines in a controlled comparative analysis. In this study, we use publicly available corpora to analyze speed and accuracy differences between three de-identification systems that can be run off-the-shelf: Amazon Comprehend Medical PHId, Clinacuity's CliniDeID, and the National Library of Medicine's Scrubber. No single system dominated all the compared metrics. NLM Scrubber was the fastest while CliniDeID generally had the highest accuracy.

17.
J Am Med Inform Assoc ; 27(8): 1321-1325, 2020 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-32449766

RESUMO

OBJECTIVE: In an effort to improve the efficiency of computer algorithms applied to screening for coronavirus disease 2019 (COVID-19) testing, we used natural language processing and artificial intelligence-based methods with unstructured patient data collected through telehealth visits. MATERIALS AND METHODS: After segmenting and parsing documents, we conducted analysis of overrepresented words in patient symptoms. We then developed a word embedding-based convolutional neural network for predicting COVID-19 test results based on patients' self-reported symptoms. RESULTS: Text analytics revealed that concepts such as smell and taste were more prevalent than expected in patients testing positive. As a result, screening algorithms were adapted to include these symptoms. The deep learning model yielded an area under the receiver-operating characteristic curve of 0.729 for predicting positive results and was subsequently applied to prioritize testing appointment scheduling. CONCLUSIONS: Informatics tools such as natural language processing and artificial intelligence methods can have significant clinical impacts when applied to data streams early in the development of clinical systems for outbreak response.


Assuntos
Inteligência Artificial , Infecções por Coronavirus/diagnóstico , Processamento de Linguagem Natural , Pneumonia Viral/diagnóstico , Telemedicina , Algoritmos , Betacoronavirus , COVID-19 , Teste para COVID-19 , Técnicas de Laboratório Clínico , Aprendizado Profundo , Registros Eletrônicos de Saúde , Humanos , Redes Neurais de Computação , Estudos de Casos Organizacionais , Pandemias , Curva ROC , Medição de Risco , SARS-CoV-2 , South Carolina
18.
BMC Med Res Methodol ; 9: 70, 2009 Oct 28.
Artigo em Inglês | MEDLINE | ID: mdl-19863809

RESUMO

BACKGROUND: Selecting patient cohorts is a critical, iterative, and often time-consuming aspect of studies involving human subjects; informatics tools for helping streamline the process have been identified as important infrastructure components for enabling clinical and translational research. We describe the evaluation of a free and open source cohort selection tool from the Informatics for Integrating Biology and the Bedside (i2b2) group: the i2b2 hive. METHODS: Our evaluation included the usability and functionality of the i2b2 hive using several real world examples of research data requests received electronically at the University of Utah Health Sciences Center between 2006 - 2008. The hive server component and the visual query tool application were evaluated for their suitability as a cohort selection tool on the basis of the types of data elements requested, as well as the effort required to fulfill each research data request using the i2b2 hive alone. RESULTS: We found the i2b2 hive to be suitable for obtaining estimates of cohort sizes and generating research cohorts based on simple inclusion/exclusion criteria, which consisted of about 44% of the clinical research data requests sampled at our institution. Data requests that relied on post-coordinated clinical concepts, aggregate values of clinical findings, or temporal conditions in their inclusion/exclusion criteria could not be fulfilled using the i2b2 hive alone, and required one or more intermediate data steps in the form of pre- or post-processing, modifications to the hive metadata, etc. CONCLUSION: The i2b2 hive was found to be a useful cohort-selection tool for fulfilling common types of requests for research data, and especially in the estimation of initial cohort sizes. For another institution that might want to use the i2b2 hive for clinical research, we recommend that the institution would need to have structured, coded clinical data and metadata available that can be transformed to fit the logical data models of the i2b2 hive, strategies for extracting relevant clinical data from source systems, and the ability to perform substantial pre- and post-processing of these data.


Assuntos
Pesquisa Biomédica/organização & administração , Aplicações da Informática Médica , Seleção de Pacientes , Transferência de Tecnologia , Registros Eletrônicos de Saúde , Humanos , Armazenamento e Recuperação da Informação , Sistemas Automatizados de Assistência Junto ao Leito , Terminologia como Assunto
19.
Stud Health Technol Inform ; 264: 193-197, 2019 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-31437912

RESUMO

This study focuses on the extraction of medical problems mentioned in electric health records to support disease management. We experimented with a variety of information extraction methods based on rules, on knowledge bases, and on machine learning, and combined them in an ensemble method approach. A new dataset drawn from cancer patient medical records at the University of Utah Healthcare was manually annotated for all mentions of a selection of the most frequent medical problems in this institution. Our experimental results show that a medical knowledge base can improve shallow and deep learning-based sequence labeling methods. The voting ensemble method combining information extraction models outperformed individual models and yielded more precise extraction of medical problems. As an example of applications benefiting from acurate medical problems extraction, we compared document-level cancer type classifiers and demonstrated that using only medical concepts yielded more accurate classification than using all the words in a clinical note.


Assuntos
Registros Eletrônicos de Saúde , Armazenamento e Recuperação da Informação , Gerenciamento Clínico , Humanos , Bases de Conhecimento , Aprendizado de Máquina
20.
Stud Health Technol Inform ; 264: 1476-1477, 2019 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-31438189

RESUMO

Automated extraction of patient trial eligibility for clinical research studies can increase enrollment at a decreased time and money cost. We have developed a modular trial eligibility pipeline including patient-batched processing and an internal webservice backed by a uimaFIT pipeline as part of a multi-phase approach to include note-batched processing, the ability to query trials matching patients or patients matching trials, and an external alignment engine to connect patients to trials.


Assuntos
Definição da Elegibilidade , Custos e Análise de Custo , Humanos , Seleção de Pacientes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA