Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
País de afiliación
Intervalo de año de publicación
1.
J Biomed Inform ; 50: 162-72, 2014 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-24859155

RESUMEN

The Health Insurance Portability and Accountability Act (HIPAA) Safe Harbor method requires removal of 18 types of protected health information (PHI) from clinical documents to be considered "de-identified" prior to use for research purposes. Human review of PHI elements from a large corpus of clinical documents can be tedious and error-prone. Indeed, multiple annotators may be required to consistently redact information that represents each PHI class. Automated de-identification has the potential to improve annotation quality and reduce annotation time. For instance, using machine-assisted annotation by combining de-identification system outputs used as pre-annotations and an interactive annotation interface to provide annotators with PHI annotations for "curation" rather than manual annotation from "scratch" on raw clinical documents. In order to assess whether machine-assisted annotation improves the reliability and accuracy of the reference standard quality and reduces annotation effort, we conducted an annotation experiment. In this annotation study, we assessed the generalizability of the VA Consortium for Healthcare Informatics Research (CHIR) annotation schema and guidelines applied to a corpus of publicly available clinical documents called MTSamples. Specifically, our goals were to (1) characterize a heterogeneous corpus of clinical documents manually annotated for risk-ranked PHI and other annotation types (clinical eponyms and person relations), (2) evaluate how well annotators apply the CHIR schema to the heterogeneous corpus, (3) compare whether machine-assisted annotation (experiment) improves annotation quality and reduces annotation time compared to manual annotation (control), and (4) assess the change in quality of reference standard coverage with each added annotator's annotations.


Asunto(s)
Registros Electrónicos de Salud , Interfaz Usuario-Computador , Health Insurance Portability and Accountability Act , Estados Unidos
2.
J Biomed Inform ; 50: 142-50, 2014 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-24502938

RESUMEN

As more and more electronic clinical information is becoming easier to access for secondary uses such as clinical research, approaches that enable faster and more collaborative research while protecting patient privacy and confidentiality are becoming more important. Clinical text de-identification offers such advantages but is typically a tedious manual process. Automated Natural Language Processing (NLP) methods can alleviate this process, but their impact on subsequent uses of the automatically de-identified clinical narratives has only barely been investigated. In the context of a larger project to develop and investigate automated text de-identification for Veterans Health Administration (VHA) clinical notes, we studied the impact of automated text de-identification on clinical information in a stepwise manner. Our approach started with a high-level assessment of clinical notes informativeness and formatting, and ended with a detailed study of the overlap of select clinical information types and Protected Health Information (PHI). To investigate the informativeness (i.e., document type information, select clinical data types, and interpretation or conclusion) of VHA clinical notes, we used five different existing text de-identification systems. The informativeness was only minimally altered by these systems while formatting was only modified by one system. To examine the impact of de-identification on clinical information extraction, we compared counts of SNOMED-CT concepts found by an open source information extraction application in the original (i.e., not de-identified) version of a corpus of VHA clinical notes, and in the same corpus after de-identification. Only about 1.2-3% less SNOMED-CT concepts were found in de-identified versions of our corpus, and many of these concepts were PHI that was erroneously identified as clinical information. To study this impact in more details and assess how generalizable our findings were, we examined the overlap between select clinical information annotated in the 2010 i2b2 NLP challenge corpus and automatic PHI annotations from our best-of-breed VHA clinical text de-identification system (nicknamed 'BoB'). Overall, only 0.81% of the clinical information exactly overlapped with PHI, and 1.78% partly overlapped. We conclude that automated text de-identification's impact on clinical information is small, but not negligible, and that improved clinical acronyms and eponyms disambiguation could significantly reduce this impact.


Asunto(s)
Registros Electrónicos de Salud , Privacidad , Automatización , Systematized Nomenclature of Medicine , Estados Unidos , United States Department of Veterans Affairs
3.
BMC Med Res Methodol ; 12: 109, 2012 Jul 27.
Artículo en Inglés | MEDLINE | ID: mdl-22839356

RESUMEN

BACKGROUND: The increased use and adoption of Electronic Health Records (EHR) causes a tremendous growth in digital information useful for clinicians, researchers and many other operational purposes. However, this information is rich in Protected Health Information (PHI), which severely restricts its access and possible uses. A number of investigators have developed methods for automatically de-identifying EHR documents by removing PHI, as specified in the Health Insurance Portability and Accountability Act "Safe Harbor" method.This study focuses on the evaluation of existing automated text de-identification methods and tools, as applied to Veterans Health Administration (VHA) clinical documents, to assess which methods perform better with each category of PHI found in our clinical notes; and when new methods are needed to improve performance. METHODS: We installed and evaluated five text de-identification systems "out-of-the-box" using a corpus of VHA clinical documents. The systems based on machine learning methods were trained with the 2006 i2b2 de-identification corpora and evaluated with our VHA corpus, and also evaluated with a ten-fold cross-validation experiment using our VHA corpus. We counted exact, partial, and fully contained matches with reference annotations, considering each PHI type separately, or only one unique 'PHI' category. Performance of the systems was assessed using recall (equivalent to sensitivity) and precision (equivalent to positive predictive value) metrics, as well as the F(2)-measure. RESULTS: Overall, systems based on rules and pattern matching achieved better recall, and precision was always better with systems based on machine learning approaches. The highest "out-of-the-box" F(2)-measure was 67% for partial matches; the best precision and recall were 95% and 78%, respectively. Finally, the ten-fold cross validation experiment allowed for an increase of the F(2)-measure to 79% with partial matches. CONCLUSIONS: The "out-of-the-box" evaluation of text de-identification systems provided us with compelling insight about the best methods for de-identification of VHA clinical documents. The errors analysis demonstrated an important need for customization to PHI formats specific to VHA documents. This study informed the planning and development of a "best-of-breed" automatic de-identification application for VHA clinical text.


Asunto(s)
Registros Electrónicos de Salud , United States Department of Veterans Affairs , Inteligencia Artificial , Seguridad Computacional/normas , Confidencialidad , Humanos , Estándares de Referencia , Estados Unidos , Salud de los Veteranos
4.
JMIR Hum Factors ; 9(4): e39646, 2022 Dec 16.
Artículo en Inglés | MEDLINE | ID: mdl-36525294

RESUMEN

BACKGROUND: Extended foster care programs help prepare transitional-aged youth (TAY) to step into adulthood and live independent lives. Aspiranet, one of California's largest social service organizations, used a social care management solution (SCMS) to meet TAY's needs. OBJECTIVE: We aimed to investigate the impact of an SCMS, IBM Watson Care Manager (WCM), in transforming foster program service delivery and improving TAY outcomes. METHODS: We used a mixed methods study design by collecting primary data from stakeholders through semistructured interviews in 2021 and by pulling secondary data from annual reports, system use logs, and data repositories from 2014 to 2021. Thematic analysis based on grounded theory was used to analyze qualitative data using NVivo software. Descriptive analysis of aggregated outcome metrics in the quantitative data was performed and compared across 2 periods: pre-SCMS implementation (before October 31, 2016) and post-SCMS implementation (November 1, 2016, and March 31, 2021). RESULTS: In total, 6 Aspiranet employees (4 leaders and 2 life coaches) were interviewed, with a median time of 56 (IQR 53-67) minutes. The majority (5/6, 83%) were female, over 30 years of age (median 37, IQR 32-39) with a median of 6 (IQR 5-10) years of experience at Aspiranet and overall field experience of 10 (IQR 7-14) years. Most (4/6, 67%) participants rated their technological skills as expert. Thematic analysis of participants' interview transcripts yielded 24 subthemes that were grouped into 6 superordinate themes: study context, the impact of the new tool, key strengths, commonly used features, expectations with WCM, and limitations and recommendations. The tool met users' initial expectations of streamlining tasks and adopting essential functionalities. Median satisfaction scores around pre- and post-WCM workflow processes remained constant between 2 life coaches (3.25, IQR 2.5-4); however, among leaders, post-WCM scores (median 4, IQR 4-5) were higher than pre-WCM scores (median 3, IQR 3-3). Across the 2 study phases, Aspiranet served 1641 TAY having consistent population demographics (median age of 18, IQR 18-19 years; female: 903/1641, 55.03%; race and ethnicity: Hispanic or Latino: 621/1641, 37.84%; Black: 470/1641, 28.64%; White: 397/1641, 24.19%; Other: 153/1641, 9.32%). Between the pre- and post-WCM period, there was an increase in full-time school enrollment (359/531, 67.6% to 833/1110, 75.04%) and a reduction in part-time school enrollment (61/531, 11.5% to 91/1110, 8.2%). The median number of days spent in the foster care program remained the same (247, IQR 125-468 years); however, the number of incidents reported monthly per hundred youth showed a steady decline, even with an exponentially increasing number of enrolled youth and incidents. CONCLUSIONS: The SCMS for coordinating care and delivering tailored services to TAY streamlined Aspiranet's workflows and processes and positively impacted youth outcomes. Further enhancements are needed to better align with user and youth needs.

5.
J Am Med Inform Assoc ; 28(4): 850-855, 2021 03 18.
Artículo en Inglés | MEDLINE | ID: mdl-33517402

RESUMEN

The rapidly evolving science about the Coronavirus Disease 2019 (COVID-19) pandemic created unprecedented health information needs and dramatic changes in policies globally. We describe a platform, Watson Assistant (WA), which has been used to develop conversational agents to deliver COVID-19 related information. We characterized the diverse use cases and implementations during the early pandemic and measured adoption through a number of users, messages sent, and conversational turns (ie, pairs of interactions between users and agents). Thirty-seven institutions in 9 countries deployed COVID-19 conversational agents with WA between March 30 and August 10, 2020, including 24 governmental agencies, 7 employers, 5 provider organizations, and 1 health plan. Over 6.8 million messages were delivered through the platform. The mean number of conversational turns per session ranged between 1.9 and 3.5. Our experience demonstrates that conversational technologies can be rapidly deployed for pandemic response and are adopted globally by a wide range of users.


Asunto(s)
Inteligencia Artificial , COVID-19 , Comunicación , Educación en Salud/métodos , Informática Aplicada a la Salud de los Consumidores , Humanos , Procesamiento de Lenguaje Natural , Telemedicina
6.
BMC Med Res Methodol ; 10: 70, 2010 Aug 02.
Artículo en Inglés | MEDLINE | ID: mdl-20678228

RESUMEN

BACKGROUND: In the United States, the Health Insurance Portability and Accountability Act (HIPAA) protects the confidentiality of patient data and requires the informed consent of the patient and approval of the Internal Review Board to use data for research purposes, but these requirements can be waived if data is de-identified. For clinical data to be considered de-identified, the HIPAA "Safe Harbor" technique requires 18 data elements (called PHI: Protected Health Information) to be removed. The de-identification of narrative text documents is often realized manually, and requires significant resources. Well aware of these issues, several authors have investigated automated de-identification of narrative text documents from the electronic health record, and a review of recent research in this domain is presented here. METHODS: This review focuses on recently published research (after 1995), and includes relevant publications from bibliographic queries in PubMed, conference proceedings, the ACM Digital Library, and interesting publications referenced in already included papers. RESULTS: The literature search returned more than 200 publications. The majority focused only on structured data de-identification instead of narrative text, on image de-identification, or described manual de-identification, and were therefore excluded. Finally, 18 publications describing automated text de-identification were selected for detailed analysis of the architecture and methods used, the types of PHI detected and removed, the external resources used, and the types of clinical documents targeted. All text de-identification systems aimed to identify and remove person names, and many included other types of PHI. Most systems used only one or two specific clinical document types, and were mostly based on two different groups of methodologies: pattern matching and machine learning. Many systems combined both approaches for different types of PHI, but the majority relied only on pattern matching, rules, and dictionaries. CONCLUSIONS: In general, methods based on dictionaries performed better with PHI that is rarely mentioned in clinical text, but are more difficult to generalize. Methods based on machine learning tend to perform better, especially with PHI that is not mentioned in the dictionaries used. Finally, the issues of anonymization, sufficient performance, and "over-scrubbing" are discussed in this publication.


Asunto(s)
Confidencialidad , Registros Electrónicos de Salud , Programas Informáticos , Algoritmos , Confidencialidad/legislación & jurisprudencia , Health Insurance Portability and Accountability Act , Humanos , Estados Unidos
7.
Stud Health Technol Inform ; 160(Pt 2): 944-8, 2010.
Artículo en Inglés | MEDLINE | ID: mdl-20841823

RESUMEN

An important proportion of the information about the medications a patient is taking is mentioned only in narrative text in the electronic health record. Automated information extraction can make this information accessible for decision support, research, or any other automated processing. In the context of the "i2b2 medication extraction challenge," we have developed a new NLP application called Textractor to automatically extract medications and details about them (e.g., dosage, frequency, reason for their prescription). This application and its evaluation with part of the reference standard for this "challenge" are presented here, along with an analysis of the development of this reference standard. During this evaluation, Textractor reached a system-level overall F1-measure, the reference metric for this challenge, of about 77% for exact matches. The best performance was measured with medication routes (F1-measure 86.4%), and the worst with prescription reasons (F1-measure 29%). These results are consistent with the agreement observed between human annotators when developing the reference standard, and with other published research.


Asunto(s)
Prescripciones de Medicamentos , Almacenamiento y Recuperación de la Información/métodos , Registros Electrónicos de Salud/normas , Humanos , Procesamiento de Lenguaje Natural , Vocabulario Controlado
8.
JAMIA Open ; 3(3): 332-337, 2020 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-33215067

RESUMEN

OBJECTIVES: Describe an augmented intelligence approach to facilitate the update of evidence for associations in knowledge graphs. METHODS: New publications are filtered through multiple machine learning study classifiers, and filtered publications are combined with articles already included as evidence in the knowledge graph. The corpus is then subjected to named entity recognition, semantic dictionary mapping, term vector space modeling, pairwise similarity, and focal entity match to identify highly related publications. Subject matter experts review recommended articles to assess inclusion in the knowledge graph; discrepancies are resolved by consensus. RESULTS: Study classifiers achieved F-scores from 0.88 to 0.94, and similarity thresholds for each study type were determined by experimentation. Our approach reduces human literature review load by 99%, and over the past 12 months, 41% of recommendations were accepted to update the knowledge graph. CONCLUSION: Integrated search and recommendation exploiting current evidence in a knowledge graph is useful for reducing human cognition load.

9.
BMC Bioinformatics ; 10 Suppl 9: S12, 2009 Sep 17.
Artículo en Inglés | MEDLINE | ID: mdl-19761566

RESUMEN

BACKGROUND: Natural Language Processing (NLP) systems can be used for specific Information Extraction (IE) tasks such as extracting phenotypic data from the electronic medical record (EMR). These data are useful for translational research and are often found only in free text clinical notes. A key required step for IE is the manual annotation of clinical corpora and the creation of a reference standard for (1) training and validation tasks and (2) to focus and clarify NLP system requirements. These tasks are time consuming, expensive, and require considerable effort on the part of human reviewers. METHODS: Using a set of clinical documents from the VA EMR for a particular use case of interest we identify specific challenges and present several opportunities for annotation tasks. We demonstrate specific methods using an open source annotation tool, a customized annotation schema, and a corpus of clinical documents for patients known to have a diagnosis of Inflammatory Bowel Disease (IBD). We report clinician annotator agreement at the document, concept, and concept attribute level. We estimate concept yield in terms of annotated concepts within specific note sections and document types. RESULTS: Annotator agreement at the document level for documents that contained concepts of interest for IBD using estimated Kappa statistic (95% CI) was very high at 0.87 (0.82, 0.93). At the concept level, F-measure ranged from 0.61 to 0.83. However, agreement varied greatly at the specific concept attribute level. For this particular use case (IBD), clinical documents producing the highest concept yield per document included GI clinic notes and primary care notes. Within the various types of notes, the highest concept yield was in sections representing patient assessment and history of presenting illness. Ancillary service documents and family history and plan note sections produced the lowest concept yield. CONCLUSION: Challenges include defining and building appropriate annotation schemas, adequately training clinician annotators, and determining the appropriate level of information to be annotated. Opportunities include narrowing the focus of information extraction to use case specific note types and sections, especially in cases where NLP systems will be used to extract information from large repositories of electronic clinical note documents.


Asunto(s)
Biología Computacional/métodos , Enfermedades Inflamatorias del Intestino/diagnóstico , Fenotipo , Humanos , Vocabulario Controlado
10.
J Biomed Semantics ; 10(1): 6, 2019 04 11.
Artículo en Inglés | MEDLINE | ID: mdl-30975223

RESUMEN

BACKGROUND: Social risk factors are important dimensions of health and are linked to access to care, quality of life, health outcomes and life expectancy. However, in the Electronic Health Record, data related to many social risk factors are primarily recorded in free-text clinical notes, rather than as more readily computable structured data, and hence cannot currently be easily incorporated into automated assessments of health. In this paper, we present Moonstone, a new, highly configurable rule-based clinical natural language processing system designed to automatically extract information that requires inferencing from clinical notes. Our initial use case for the tool is focused on the automatic extraction of social risk factor information - in this case, housing situation, living alone, and social support - from clinical notes. Nursing notes, social work notes, emergency room physician notes, primary care notes, hospital admission notes, and discharge summaries, all derived from the Veterans Health Administration, were used for algorithm development and evaluation. RESULTS: An evaluation of Moonstone demonstrated that the system is highly accurate in extracting and classifying the three variables of interest (housing situation, living alone, and social support). The system achieved positive predictive value (i.e. precision) scores ranging from 0.66 (homeless/marginally housed) to 0.98 (lives at home/not homeless), accuracy scores ranging from 0.63 (lives in facility) to 0.95 (lives alone), and sensitivity (i.e. recall) scores ranging from 0.75 (lives in facility) to 0.97 (lives alone). CONCLUSIONS: The Moonstone system is - to the best of our knowledge - the first freely available, open source natural language processing system designed to extract social risk factors from clinical text with good (lives in facility) to excellent (lives alone) performance. Although developed with the social risk factor identification task in mind, Moonstone provides a powerful tool to address a range of clinical natural language processing tasks, especially those tasks that require nuanced linguistic processing in conjunction with inference capabilities.


Asunto(s)
Procesamiento de Lenguaje Natural , Medio Social , Salud , Humanos , Factores de Riesgo
11.
Stud Health Technol Inform ; 264: 1660-1661, 2019 Aug 21.
Artículo en Inglés | MEDLINE | ID: mdl-31438280

RESUMEN

The Department of Defense (DoD) and Department of Veterans Affairs (VA) Infrastructure for Clinical Intelligence (DaVINCI) creates an electronic network between the two United States federal agencies that provides a consolidated view of electronic medical record data for both service members and Veterans. This inter-agency collaboration has created new opportunities for supporting transitions in clinical care, reporting to Congress, and longitudinal research.


Asunto(s)
United States Department of Veterans Affairs , Veteranos , Bases de Datos Factuales , Registros Electrónicos de Salud , Agencias Gubernamentales , Humanos , Inteligencia , Estados Unidos
12.
J Biomed Semantics ; 7: 26, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27175226

RESUMEN

BACKGROUND: In the United States, 795,000 people suffer strokes each year; 10-15 % of these strokes can be attributed to stenosis caused by plaque in the carotid artery, a major stroke phenotype risk factor. Studies comparing treatments for the management of asymptomatic carotid stenosis are challenging for at least two reasons: 1) administrative billing codes (i.e., Current Procedural Terminology (CPT) codes) that identify carotid images do not denote which neurovascular arteries are affected and 2) the majority of the image reports are negative for carotid stenosis. Studies that rely on manual chart abstraction can be labor-intensive, expensive, and time-consuming. Natural Language Processing (NLP) can expedite the process of manual chart abstraction by automatically filtering reports with no/insignificant carotid stenosis findings and flagging reports with significant carotid stenosis findings; thus, potentially reducing effort, costs, and time. METHODS: In this pilot study, we conducted an information content analysis of carotid stenosis mentions in terms of their report location (Sections), report formats (structures) and linguistic descriptions (expressions) from Veteran Health Administration free-text reports. We assessed an NLP algorithm, pyConText's, ability to discern reports with significant carotid stenosis findings from reports with no/insignificant carotid stenosis findings given these three document composition factors for two report types: radiology (RAD) and text integration utility (TIU) notes. RESULTS: We observed that most carotid mentions are recorded in prose using categorical expressions, within the Findings and Impression sections for RAD reports and within neither of these designated sections for TIU notes. For RAD reports, pyConText performed with high sensitivity (88 %), specificity (84 %), and negative predictive value (95 %) and reasonable positive predictive value (70 %). For TIU notes, pyConText performed with high specificity (87 %) and negative predictive value (92 %), reasonable sensitivity (73 %), and moderate positive predictive value (58 %). pyConText performed with the highest sensitivity processing the full report rather than the Findings or Impressions independently. CONCLUSION: We conclude that pyConText can reduce chart review efforts by filtering reports with no/insignificant carotid stenosis findings and flagging reports with significant carotid stenosis findings from the Veteran Health Administration electronic health record, and hence has utility for expediting a comparative effectiveness study of treatment strategies for stroke prevention.


Asunto(s)
Minería de Datos , Agencias Gubernamentales , Procesamiento de Lenguaje Natural , Fenotipo , Accidente Cerebrovascular , Veteranos , Algoritmos , Estenosis Carotídea/complicaciones , Registros Electrónicos de Salud , Humanos , Factores de Riesgo , Accidente Cerebrovascular/complicaciones
13.
J Biomed Semantics ; 7: 43, 2016 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-27370271

RESUMEN

BACKGROUND: The ShARe/CLEF eHealth challenge lab aims to stimulate development of natural language processing and information retrieval technologies to aid patients in understanding their clinical reports. In clinical text, acronyms and abbreviations, also referenced as short forms, can be difficult for patients to understand. For one of three shared tasks in 2013 (Task 2), we generated a reference standard of clinical short forms normalized to the Unified Medical Language System. This reference standard can be used to improve patient understanding by linking to web sources with lay descriptions of annotated short forms or by substituting short forms with a more simplified, lay term. METHODS: In this study, we evaluate 1) accuracy of participating systems' normalizing short forms compared to a majority sense baseline approach, 2) performance of participants' systems for short forms with variable majority sense distributions, and 3) report the accuracy of participating systems' normalizing shared normalized concepts between the test set and the Consumer Health Vocabulary, a vocabulary of lay medical terms. RESULTS: The best systems submitted by the five participating teams performed with accuracies ranging from 43 to 72 %. A majority sense baseline approach achieved the second best performance. The performance of participating systems for normalizing short forms with two or more senses with low ambiguity (majority sense greater than 80 %) ranged from 52 to 78 % accuracy, with two or more senses with moderate ambiguity (majority sense between 50 and 80 %) ranged from 23 to 57 % accuracy, and with two or more senses with high ambiguity (majority sense less than 50 %) ranged from 2 to 45 % accuracy. With respect to the ShARe test set, 69 % of short form annotations contained common concept unique identifiers with the Consumer Health Vocabulary. For these 2594 possible annotations, the performance of participating systems ranged from 50 to 75 % accuracy. CONCLUSION: Short form normalization continues to be a challenging problem. Short form normalization systems perform with moderate to reasonable accuracies. The Consumer Health Vocabulary could enrich its knowledge base with missed concept unique identifiers from the ShARe test set to further support patient understanding of unfamiliar medical terms.


Asunto(s)
Ontologías Biológicas , Procesamiento de Lenguaje Natural , Telemedicina , Humanos
14.
J Am Med Inform Assoc ; 22(1): 143-54, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25147248

RESUMEN

OBJECTIVE: The ShARe/CLEF eHealth 2013 Evaluation Lab Task 1 was organized to evaluate the state of the art on the clinical text in (i) disorder mention identification/recognition based on Unified Medical Language System (UMLS) definition (Task 1a) and (ii) disorder mention normalization to an ontology (Task 1b). Such a community evaluation has not been previously executed. Task 1a included a total of 22 system submissions, and Task 1b included 17. Most of the systems employed a combination of rules and machine learners. MATERIALS AND METHODS: We used a subset of the Shared Annotated Resources (ShARe) corpus of annotated clinical text--199 clinical notes for training and 99 for testing (roughly 180 K words in total). We provided the community with the annotated gold standard training documents to build systems to identify and normalize disorder mentions. The systems were tested on a held-out gold standard test set to measure their performance. RESULTS: For Task 1a, the best-performing system achieved an F1 score of 0.75 (0.80 precision; 0.71 recall). For Task 1b, another system performed best with an accuracy of 0.59. DISCUSSION: Most of the participating systems used a hybrid approach by supplementing machine-learning algorithms with features generated by rules and gazetteers created from the training data and from external resources. CONCLUSIONS: The task of disorder normalization is more challenging than that of identification. The ShARe corpus is available to the community as a reference standard for future studies.


Asunto(s)
Enfermedad , Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Vocabulario Controlado , Ontologías Biológicas , Conjuntos de Datos como Asunto , Humanos , Almacenamiento y Recuperación de la Información/métodos , Systematized Nomenclature of Medicine , Unified Medical Language System
16.
Stud Health Technol Inform ; 192: 1213, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23920987

RESUMEN

Human annotation and chart review is an important process in biomedical informatics research, but which humans are best suited for the job? Domain expertise, such as medical or linguistic knowledge is desirable, but other factors may be equally important. The University of Utah has a group of 20+ reviewers with backgrounds in medicine and linguistics, and 10 key traits have surfaced in those best able to annotate quickly and with high quality. To identify reviewers with these key traits, we created a hiring process that includes interviewing candidates, testing their medical and linguistic knowledge, and having them complete an annotation exercise on realistic medical text. Each step is designed to assess the key traits and allow the investigator to choose the skill set required for each project.


Asunto(s)
Curaduría de Datos/métodos , Registros Electrónicos de Salud , Perfil Laboral , Uso Significativo/organización & administración , Informática Médica , Selección de Personal/métodos , Utah , Recursos Humanos
17.
Artículo en Inglés | MEDLINE | ID: mdl-24303260

RESUMEN

Clinical text de-identification can potentially overlap with clinical information such as medical problems or treatments, therefore causing this information to be lost. In this study, we focused on the analysis of the overlap between the 2010 i2b2 NLP challenge concept annotations, with the PHI annotations of our best-of-breed clinical text de-identification application. Overall, 0.81% of the annotations overlapped exactly, and 1.78% partly overlapped.

18.
J Am Med Inform Assoc ; 20(1): 77-83, 2013 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-22947391

RESUMEN

OBJECTIVE: De-identification allows faster and more collaborative clinical research while protecting patient confidentiality. Clinical narrative de-identification is a tedious process that can be alleviated by automated natural language processing methods. The goal of this research is the development of an automated text de-identification system for Veterans Health Administration (VHA) clinical documents. MATERIALS AND METHODS: We devised a novel stepwise hybrid approach designed to improve the current strategies used for text de-identification. The proposed system is based on a previous study on the best de-identification methods for VHA documents. This best-of-breed automated clinical text de-identification system (aka BoB) tackles the problem as two separate tasks: (1) maximize patient confidentiality by redacting as much protected health information (PHI) as possible; and (2) leave de-identified documents in a usable state preserving as much clinical information as possible. RESULTS: We evaluated BoB with a manually annotated corpus of a variety of VHA clinical notes, as well as with the 2006 i2b2 de-identification challenge corpus. We present evaluations at the instance- and token-level, with detailed results for BoB's main components. Moreover, an existing text de-identification system was also included in our evaluation. DISCUSSION: BoB's design efficiently takes advantage of the methods implemented in its pipeline, resulting in high sensitivity values (especially for sensitive PHI categories) and a limited number of false positives. CONCLUSIONS: Our system successfully addressed VHA clinical document de-identification, and its hybrid stepwise design demonstrates robustness and efficiency, prioritizing patient confidentiality while leaving most clinical information intact.


Asunto(s)
Confidencialidad , Registros Electrónicos de Salud , Difusión de la Información , Procesamiento de Lenguaje Natural , Inteligencia Artificial , Minería de Datos , Humanos , Evaluación de la Tecnología Biomédica , Estados Unidos , United States Department of Veterans Affairs
19.
Artículo en Inglés | MEDLINE | ID: mdl-24303238

RESUMEN

Patients report their symptoms and subjective experiences in their own words. These expressions may be clinically meaningful yet are difficult to capture using automated methods. We annotated subjective symptom expressions in 750 clinical notes from the Veterans Affairs EHR. Within each document, subjective symptom expressions were compared to mentions of symptoms in clinical terms and to the assigned ICD-9-CM codes for the encounter. A total of 543 subjective symptom expressions were identified, of which 66.5% were categorized as mental/behavioral experiences and 33.5% somatic experiences. Only two subjective expressions were coded using ICD-9-CM. Subjective expressions were restated in semantically related clinical terms in 246 (45.3%) instances. Nearly one third (31%) of subjective expressions were not coded or restated in standard terminology. The results highlight the diversity of symptom descriptions and the opportunities to further develop natural language processing to extract symptom expressions that are unobtainable by other automated methods.

20.
J Am Med Inform Assoc ; 19(5): 786-91, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22366294

RESUMEN

BACKGROUND: The fifth i2b2/VA Workshop on Natural Language Processing Challenges for Clinical Records conducted a systematic review on resolution of noun phrase coreference in medical records. Informatics for Integrating Biology and the Bedside (i2b2) and the Veterans Affair (VA) Consortium for Healthcare Informatics Research (CHIR) partnered to organize the coreference challenge. They provided the research community with two corpora of medical records for the development and evaluation of the coreference resolution systems. These corpora contained various record types (ie, discharge summaries, pathology reports) from multiple institutions. METHODS: The coreference challenge provided the community with two annotated ground truth corpora and evaluated systems on coreference resolution in two ways: first, it evaluated systems for their ability to identify mentions of concepts and to link together those mentions. Second, it evaluated the ability of the systems to link together ground truth mentions that refer to the same entity. Twenty teams representing 29 organizations and nine countries participated in the coreference challenge. RESULTS: The teams' system submissions showed that machine-learning and rule-based approaches worked best when augmented with external knowledge sources and coreference clues extracted from document structure. The systems performed better in coreference resolution when provided with ground truth mentions. Overall, the systems struggled in solving coreference resolution for cases that required domain knowledge.


Asunto(s)
Registros Electrónicos de Salud , Almacenamiento y Recuperación de la Información/métodos , Procesamiento de Lenguaje Natural , Humanos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA