Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 45
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
País de afiliación
Intervalo de año de publicación
1.
Med Care ; 57 Suppl 6 Suppl 2: S149-S156, 2019 06.
Artículo en Inglés | MEDLINE | ID: mdl-31095054

RESUMEN

BACKGROUND: Despite national screening efforts, military sexual trauma (MST) is underreported. Little is known of racial/ethnic differences in MST reporting in the Veterans Health Administration (VHA). OBJECTIVE: This study aimed to compare patterns of MST disclosure in VHA by race/ethnicity. RESEARCH DESIGN: Retrospective cohort study of MST disclosures in a national, random sample of Veterans who served in Afghanistan and Iraq and completed MST screens from October 2009 to 2014. We used natural language processing (NLP) to extract MST concepts from electronic medical notes in the year following Veterans' first MST screen. MEASURE(S): Any evidence of MST (positive MST screen or NLP concepts) and late MST disclosure (NLP concepts following a negative MST screen). Multivariable logistic regressions, stratified by sex, tested racial/ethnic differences in any MST evidence, and late disclosure. RESULTS: Of 6618 male and 6716 female Veterans with MST screen results, 1473 had a positive screen (68 male, 1%; 1405 female, 21%). Of those with a negative screen, 257 evidenced late MST disclosure by NLP (44 male, 39%; 213 female, 13%). Late MST disclosure was usually documented during mental health visits. There were no significant racial/ethnic differences in MST disclosure among men. Among women, blacks were less likely than whites to have any MST evidence (adjusted odds ratio=0.75). In the subsample with any MST evidence, black and Hispanic women were more likely than whites to disclose MST late (adjusted odds ratio=1.89 and 1.59, respectively). CONCLUSIONS: Combining NLP results with MST screen data facilitated the identification of under-reported sexual trauma experiences among men and racial/ethnic minority women.


Asunto(s)
Revelación/estadística & datos numéricos , Documentación , Procesamiento de Lenguaje Natural , Delitos Sexuales , Veteranos/estadística & datos numéricos , Adulto , Femenino , Humanos , Masculino , Persona de Mediana Edad , Estudios Retrospectivos , Delitos Sexuales/etnología , Delitos Sexuales/estadística & datos numéricos , Estados Unidos , United States Department of Veterans Affairs
2.
J Biomed Inform ; 71S: S68-S76, 2017 07.
Artículo en Inglés | MEDLINE | ID: mdl-27497780

RESUMEN

RATIONALE: Templates in text notes pose challenges for automated information extraction algorithms. We propose a method that identifies novel templates in plain text medical notes. The identification can then be used to either include or exclude templates when processing notes for information extraction. METHODS: The two-module method is based on the framework of information foraging and addresses the hypothesis that documents containing templates and the templates within those documents can be identified by common features. The first module takes documents from the corpus and groups those with common templates. This is accomplished through a binned word count hierarchical clustering algorithm. The second module extracts the templates. It uses the groupings and performs a longest common subsequence (LCS) algorithm to obtain the constituent parts of the templates. The method was developed and tested on a random document corpus of 750 notes derived from a large database of US Department of Veterans Affairs (VA) electronic medical notes. RESULTS: The grouping module, using hierarchical clustering, identified 23 groups with 3 documents or more, consisting of 120 documents from the 750 documents in our test corpus. Of these, 18 groups had at least one common template that was present in all documents in the group for a positive predictive value of 78%. The LCS extraction module performed with 100% positive predictive value, 94% sensitivity, and 83% negative predictive value. The human review determined that in 4 groups the template covered the entire document, with the remaining 14 groups containing a common section template. Among documents with templates, the number of templates per document ranged from 1 to 14. The mean and median number of templates per group was 5.9 and 5, respectively. DISCUSSION: The grouping method was successful in finding like documents containing templates. Of the groups of documents containing templates, the LCS module was successful in deciphering text belonging to the template and text that was extraneous. Major obstacles to improved performance included documents composed of multiple templates, templates that included other templates embedded within them, and variants of templates. We demonstrate proof of concept of the grouping and extraction method of identifying templates in electronic medical records in this pilot study and propose methods to improve performance and scaling up.


Asunto(s)
Algoritmos , Registros Electrónicos de Salud , Heurística , Procesamiento de Lenguaje Natural , Humanos , Proyectos Piloto
3.
J Biomed Inform ; 71S: S39-S45, 2017 07.
Artículo en Inglés | MEDLINE | ID: mdl-27404849

RESUMEN

OBJECTIVE: To develop a natural language processing pipeline to extract positively asserted concepts related to the presence of an indwelling urinary catheter in hospitalized patients from the free text of the electronic medical note. The goal is to assist infection preventionists and other healthcare professionals in determining whether a patient has an indwelling urinary catheter when a catheter-associated urinary tract infection is suspected. Currently, data on indwelling urinary catheters is not consistently captured in the electronic medical record in structured format and thus cannot be reliably extracted for clinical and research purposes. MATERIALS AND METHODS: We developed a lexicon of terms related to indwelling urinary catheters and urinary symptoms based on domain knowledge, prior experience in the field, and review of medical notes. A reference standard of 1595 randomly selected documents from inpatient admissions was annotated by human reviewers to identify all positively and negatively asserted concepts related to indwelling urinary catheters. We trained a natural language processing pipeline based on the V3NLP framework using 1050 documents and tested on 545 documents to determine agreement with the human reference standard. Metrics reported are positive predictive value and recall. RESULTS: The lexicon contained 590 terms related to the presence of an indwelling urinary catheter in various categories including insertion, care, change, and removal of urinary catheters and 67 terms for urinary symptoms. Nursing notes were the most frequent inpatient note titles in the reference standard document corpus; these also yielded the highest number of positively asserted concepts with respect to urinary catheters. Comparing the performance of the natural language processing pipeline against the human reference standard, the overall recall was 75% and positive predictive value was 99% on the training set; on the testing set, the recall was 72% and positive predictive value was 98%. The performance on extracting urinary symptoms (including fever) was high with recall and precision greater than 90%. CONCLUSIONS: We have shown that it is possible to identify the presence of an indwelling urinary catheter and urinary symptoms from the free text of electronic medical notes from inpatients using natural language processing. These are two key steps in developing automated protocols to assist humans in large-scale review of patient charts for catheter-associated urinary tract infection. The challenges associated with extracting indwelling urinary catheter-related concepts also inform the design of electronic medical record templates to reliably and consistently capture data on indwelling urinary catheters.


Asunto(s)
Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Catéteres Urinarios , Infecciones Urinarias , Minería de Datos , Humanos
4.
J Med Syst ; 41(2): 32, 2017 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-28050745

RESUMEN

In an ideal clinical Natural Language Processing (NLP) ecosystem, researchers and developers would be able to collaborate with others, undertake validation of NLP systems, components, and related resources, and disseminate them. We captured requirements and formative evaluation data from the Veterans Affairs (VA) Clinical NLP Ecosystem stakeholders using semi-structured interviews and meeting discussions. We developed a coding rubric to code interviews. We assessed inter-coder reliability using percent agreement and the kappa statistic. We undertook 15 interviews and held two workshop discussions. The main areas of requirements related to; design and functionality, resources, and information. Stakeholders also confirmed the vision of the second generation of the Ecosystem and recommendations included; adding mechanisms to better understand terms, measuring collaboration to demonstrate value, and datasets/tools to navigate spelling errors with consumer language, among others. Stakeholders also recommended capability to: communicate with developers working on the next version of the VA electronic health record (VistA Evolution), provide a mechanism to automatically monitor download of tools and to automatically provide a summary of the downloads to Ecosystem contributors and funders. After three rounds of coding and discussion, we determined the percent agreement of two coders to be 97.2% and the kappa to be 0.7851. The vision of the VA Clinical NLP Ecosystem met stakeholder needs. Interviews and discussion provided key requirements that inform the design of the VA Clinical NLP Ecosystem.


Asunto(s)
Registros Electrónicos de Salud/organización & administración , Procesamiento de Lenguaje Natural , United States Department of Veterans Affairs/organización & administración , Comunicación , Conducta Cooperativa , Registros Electrónicos de Salud/normas , Humanos , Entrevistas como Asunto , Reproducibilidad de los Resultados , Terminología como Asunto , Estados Unidos
5.
J Biomed Inform ; 58: 19-27, 2015 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-26362345

RESUMEN

OBJECTIVE: To develop a method to exploit the UMLS Metathesaurus for extracting and categorizing concepts found in clinical text representing signs and symptoms to anatomically related organ systems. The overarching goal is to classify patient reported symptoms to organ systems for population health and epidemiological analyses. MATERIALS AND METHODS: Using the concepts' semantic types and the inter-concept relationships as guidance, a selective portion of the concepts within the UMLS Metathesaurus was traversed starting from the concepts representing the highest level organ systems. The traversed concepts were chosen, filtered, and reviewed to obtain the concepts representing clinical signs and symptoms by blocking deviations, pruning superfluous concepts, and manual review. The mapping process was applied to signs and symptoms annotated in a corpus of 750 clinical notes. RESULTS: The mapping process yielded a total of 91,000 UMLS concepts (with approximately 300,000 descriptions) possibly representing physical and mental signs and symptoms that were extracted and categorized to the anatomically related organ systems. Of 1864 distinct descriptions of signs and symptoms found in the 750 document corpus, 1635 of these (88%) were successfully mapped to the set of concepts extracted from the UMLS. Of 668 unique concepts mapped, 603 (90%) were correctly categorized to their organ systems. CONCLUSION: We present a process that facilitates mapping of signs and symptoms to their organ systems. By providing a smaller set of UMLS concepts to use for comparing and matching patient records, this method has the potential to increase efficiency of information extraction pipelines.


Asunto(s)
Anatomía , Formación de Concepto , Unified Medical Language System , Humanos
6.
Disabil Rehabil ; : 1-10, 2023 Sep 13.
Artículo en Inglés | MEDLINE | ID: mdl-37702040

RESUMEN

PURPOSE OF THE ARTICLE: This article describes a conceptual and methodological approach to integrating functional information into an ontology to categorize mental functioning, which to date is an under-developed area of classification, and supports our work with the United States (U.S.) Social Security Administration (SSA). DESIGN AND METHODOLOGICAL PROCEDURES: Conceptualizing and defining mental functioning was paramount to develop natural language processing (NLP) tools to support our use case. The International Classification of Functioning, Disability, and Health (ICF) was the framework used to conceptualize mental functioning at the activities and participation level in clinical records. To address challenges that arose when applying the ICF as to what should or should not be classified as mental functioning, a mental functioning domain ontology was developed that rearranged, reclassified and incorporated all ICF key components, concepts, classifications, and their definitions. CONCLUSIONS: Challenges emerged in the extent to which we could directly align components in the ICF into an applied ontology of mental functioning. These conceptual challenges required rearrangement of ICF components to adequately support our use case within the social security disability determination process. Findings also have implications to support future NLP efforts for behavioral health outcomes and policy research.


Mental functioning in everyday life is an important area of inquiry from the perspectives of public health, health policy, healthcare, and overall individual level health and well-being.A domain ontology of mental functioning that defines concepts and their relationships, and provides a common terminology with definitions, would enable interdisciplinary communication, research, and collaboration.A clearer conceptual model of mental functioning can improve the development of software that can identify, codify, and organize mental functioning information within clinical records into data that can be analyzed.The International Classification of Functioning, Disability and Health was utilized to conceptualize mental functioning and to guide the development of a proposed domain ontology of mental functioning.

7.
Psychiatr Serv ; 74(1): 56-62, 2023 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-35652194

RESUMEN

The disability determination process of the Social Security Administration's (SSA's) disability program requires assessing work-related functioning for individual claimants alleging disability due to mental impairment. This task is particularly challenging because the determination process involves the review of a large file of information, including objective medical evidence and self-reports from claimants, families, and former employers. To improve this decision-making process, SSA entered an interagency agreement with the Rehabilitation Medicine Department, Epidemiology and Biostatistics Section, in the Clinical Center of the National Institutes of Health, intending to use data science and informatics to develop decision support tools. This collaborative effort over the past decade has led to the development of the Work Disability-Functional Assessment Battery and has initiated an approach to applying natural language processing to the review of claimants' files for information on mental health functioning. This informatics research collaboration holds promise for improving the process of disability determination for individuals with mental impairments who make claims at the SSA.


Asunto(s)
Personas con Discapacidad , Salud Mental , Estados Unidos , Humanos , United States Social Security Administration , Seguridad Social , Evaluación de la Discapacidad , Informática
8.
Front Digit Health ; 4: 914171, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36148210

RESUMEN

This paper describes the identification of body function (BF) mentions within the clinical text within a large, national, heterogeneous corpus to highlight structural challenges presented by the clinical text. BF in clinical documents provides information on dysfunction or impairments in the function or structure of organ systems or organs. BF mentions are embedded in highly formatted structures where the formats include implied scoping boundaries that confound existing natural language processing segmentation and document decomposition techniques. This paper describes follow-up work to adapt a rule-based system created using National Institutes of Health records to a larger, more challenging corpus of Social Security Administration data. Results of these systems provide a baseline for future work to improve document decomposition techniques.

9.
J Am Med Inform Assoc ; 28(3): 516-532, 2021 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-33319905

RESUMEN

OBJECTIVES: Normalizing mentions of medical concepts to standardized vocabularies is a fundamental component of clinical text analysis. Ambiguity-words or phrases that may refer to different concepts-has been extensively researched as part of information extraction from biomedical literature, but less is known about the types and frequency of ambiguity in clinical text. This study characterizes the distribution and distinct types of ambiguity exhibited by benchmark clinical concept normalization datasets, in order to identify directions for advancing medical concept normalization research. MATERIALS AND METHODS: We identified ambiguous strings in datasets derived from the 2 available clinical corpora for concept normalization and categorized the distinct types of ambiguity they exhibited. We then compared observed string ambiguity in the datasets with potential ambiguity in the Unified Medical Language System (UMLS) to assess how representative available datasets are of ambiguity in clinical language. RESULTS: We found that <15% of strings were ambiguous within the datasets, while over 50% were ambiguous in the UMLS, indicating only partial coverage of clinical ambiguity. The percentage of strings in common between any pair of datasets ranged from 2% to only 36%; of these, 40% were annotated with different sets of concepts, severely limiting generalization. Finally, we observed 12 distinct types of ambiguity, distributed unequally across the available datasets, reflecting diverse linguistic and medical phenomena. DISCUSSION: Existing datasets are not sufficient to cover the diversity of clinical concept ambiguity, limiting both training and evaluation of normalization methods for clinical text. Additionally, the UMLS offers important semantic information for building and evaluating normalization methods. CONCLUSIONS: Our findings identify 3 opportunities for concept normalization research, including a need for ambiguity-specific clinical datasets and leveraging the rich semantics of the UMLS in new methods and evaluation measures for normalization.


Asunto(s)
Conjuntos de Datos como Asunto , Registros Electrónicos de Salud , Terminología como Asunto , Unified Medical Language System , Aprendizaje Profundo , Procesamiento de Lenguaje Natural , Semántica , Vocabulario Controlado
10.
Stud Health Technol Inform ; 264: 452-456, 2019 Aug 21.
Artículo en Inglés | MEDLINE | ID: mdl-31437964

RESUMEN

Misspellings in clinical free text present potential challenges to pharmacovigilance tasks, such as monitoring for potential ineffective treatment of drug-resistant infections. We developed a novel method using Word2Vec, Levenshtein edit distance constraints, and a customized lexicon to identify correct and misspelled pharmaceutical word forms. We processed a large corpus of clinical notes in a real-world pharmacovigilance task, achieving positive predictive values of 0.929 and 0.909 in identifying valid misspellings and correct spellings, respectively, and negative predictive values of 0.994 and 0.333 as assessments where the program did not produce output. In a specific Methicillin-Resistant Staphylococcus Aureus use case, the method identified 9,815 additional instances in the corpus for potential inaffective drug administration inspection. The findings suggest that this method could potentially achieve satisfactory results for other pharmacovigilance tasks.


Asunto(s)
Preparaciones Farmacéuticas , Farmacovigilancia , Algoritmos , Lenguaje , Staphylococcus aureus Resistente a Meticilina
11.
BMC Res Notes ; 12(1): 42, 2019 Jan 18.
Artículo en Inglés | MEDLINE | ID: mdl-30658682

RESUMEN

OBJECTIVE: Misspellings in clinical free text present challenges to natural language processing. With an objective to identify misspellings and their corrections, we developed a prototype spelling analysis method that implements Word2Vec, Levenshtein edit distance constraints, a lexical resource, and corpus term frequencies. We used the prototype method to process two different corpora, surgical pathology reports, and emergency department progress and visit notes, extracted from Veterans Health Administration resources. We evaluated performance by measuring positive predictive value and performing an error analysis of false positive output, using four classifications. We also performed an analysis of spelling errors in each corpus, using common error classifications. RESULTS: In this small-scale study utilizing a total of 76,786 clinical notes, the prototype method achieved positive predictive values of 0.9057 and 0.8979, respectively, for the surgical pathology reports, and emergency department progress and visit notes, in identifying and correcting misspelled words. False positives varied by corpus. Spelling error types were similar among the two corpora, however, the authors of emergency department progress and visit notes made over four times as many errors. Overall, the results of this study suggest that this method could also perform sufficiently in identifying misspellings in other clinical document types.


Asunto(s)
Diccionarios como Asunto , Informática Médica/métodos , Procesamiento de Lenguaje Natural , Vocabulario Controlado , Algoritmos , Humanos , Lenguaje , Informática Médica/normas , Informática Médica/estadística & datos numéricos , Sistemas de Registros Médicos Computarizados/normas , Sistemas de Registros Médicos Computarizados/estadística & datos numéricos , Patología Quirúrgica/métodos , Reproducibilidad de los Resultados , Informe de Investigación/normas , Unified Medical Language System/normas , Unified Medical Language System/estadística & datos numéricos
12.
AMIA Annu Symp Proc ; 2019: 514-522, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-32308845

RESUMEN

Background: Experiences of sexual trauma are associated with adverse patient and health system outcomes, but are not systematically documented in electronic health records (EHR). Objective: To describe variations in how sexual trauma is documented in the Veterans Health Adminstration's EHR. Methods: Sexual trauma concepts were extracted from from 362,559 clinical notes using a natural language processing pipeline. Results: We observed variations in the presence of sexual trauma in notes across five United States regions: Pacific, Continental, Midwest, North Atlantic, Southeast. We also observed variations in the types of notes used to document sexual trauma (e.g., mental health, primary care) and sources of sexual trauma (e.g., adult, childhood, military) mentioned in the EHR. Our findings illustrate potential differences in cultural norms related to patient disclosure of sensitive information, and provider documentation. Standardized protocol for eliciting and documenting sexual trauma histories are needed to ensure Veteran access to high quality, trauma-informed care.


Asunto(s)
Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Delitos Sexuales , Veteranos , Adulto , Niño , Revelación , Documentación , Femenino , Humanos , Masculino , Servicios de Salud Mental , Personal Militar , Atención Primaria de Salud , Estados Unidos , United States Department of Veterans Affairs
13.
J Am Med Inform Assoc ; 15(4): 496-505, 2008.
Artículo en Inglés | MEDLINE | ID: mdl-18436906

RESUMEN

OBJECTIVE: This study has two objectives: first, to identify and characterize consumer health terms not found in the Unified Medical Language System (UMLS) Metathesaurus (2007 AB); second, to describe the procedure for creating new concepts in the process of building a consumer health vocabulary. How do the unmapped consumer health concepts relate to the existing UMLS concepts? What is the place of these new concepts in professional medical discourse? DESIGN: The consumer health terms were extracted from two large corpora derived in the process of Open Access Collaboratory Consumer Health Vocabulary (OAC CHV) building. Terms that could not be mapped to existing UMLS concepts via machine and manual methods prompted creation of new concepts, which were then ascribed semantic types, related to existing UMLS concepts, and coded according to specified criteria. RESULTS: This approach identified 64 unmapped concepts, 17 of which were labeled as uniquely "lay" and not feasible for inclusion in professional health terminologies. The remaining terms constituted potential candidates for inclusion in professional vocabularies, or could be constructed by post-coordinating existing UMLS terms. The relationship between new and existing concepts differed depending on the corpora from which they were extracted. CONCLUSION: Non-mapping concepts constitute a small proportion of consumer health terms, but a proportion that is likely to affect the process of consumer health vocabulary building. We have identified a novel approach for identifying such concepts.


Asunto(s)
Información de Salud al Consumidor/clasificación , Unified Medical Language System , Vocabulario , Humanos , Terminología como Asunto
14.
J Med Internet Res ; 9(1): e4, 2007 Feb 28.
Artículo en Inglés | MEDLINE | ID: mdl-17478413

RESUMEN

BACKGROUND: The development of consumer health information applications such as health education websites has motivated the research on consumer health vocabulary (CHV). Term identification is a critical task in vocabulary development. Because of the heterogeneity and ambiguity of consumer expressions, term identification for CHV is more challenging than for professional health vocabularies. OBJECTIVE: For the development of a CHV, we explored several term identification methods, including collaborative human review and automated term recognition methods. METHODS: A set of criteria was established to ensure consistency in the collaborative review, which analyzed 1893 strings. Using the results from the human review, we tested two automated methods-C-value formula and a logistic regression model. RESULTS: The study identified 753 consumer terms and found the logistic regression model to be highly effective for CHV term identification (area under the receiver operating characteristic curve = 95.5%). CONCLUSIONS: The collaborative human review and logistic regression methods were effective for identifying terms for CHV development.


Asunto(s)
Educación en Salud/métodos , Vocabulario Controlado , Automatización/métodos , Conducta Cooperativa , Humanos , Modelos Logísticos , Modelos Teóricos , Curva ROC
15.
Stud Health Technol Inform ; 129(Pt 1): 545-9, 2007.
Artículo en Inglés | MEDLINE | ID: mdl-17911776

RESUMEN

The National Library of Medicine has developed a tool to identify medical concepts from the Unified Medical Language System in free text. This tool - MetaMap (and its java version MMTx) has been used extensively for biomedical text mining applications. We have developed a module for MetaMap which has a high performance in terms of processing speed. We evaluated our module independently against MetaMap for the task of identifying UMLS concepts in free text clinical radiology reports. A set of 1000 sentences from neuro-radiology reports were collected and processed using our technique and the MMTx Program. An evaluation showed that our technique was able to identify 91% of the concepts found by MMTx in 14% of the time taken by MMTx. An error analysis showed that the missing concepts were largely those which were not direct lexical matches but inferential matches of multiple concepts. Our method also identified multi-phrase concepts which MMTx failed to identify. We suggest that this module be implemented as an option in MMTx for real-time text mining applications where single concepts found in the UMLS need to be identified.


Asunto(s)
Almacenamiento y Recuperación de la Información/métodos , Procesamiento de Lenguaje Natural , Unified Medical Language System , Humanos , Sistemas de Registros Médicos Computarizados , Neurología , Servicio de Radiología en Hospital , Sistemas de Información Radiológica
16.
Stud Health Technol Inform ; 238: 136-139, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28679906

RESUMEN

We investigate options for grouping templates for the purpose of template identification and extraction from electronic medical records. We sampled a corpus of 1000 documents originating from Veterans Health Administration (VA) electronic medical record. We grouped documents through hashing and binning tokens (Hashed) as well as by the top 5% of tokens identified as important through the term frequency inverse document frequency metric (TF-IDF). We then compared the approaches on the number of groups with 3 or more and the resulting longest common subsequences (LCSs) common to all documents in the group. We found that the Hashed method had a higher success rate for finding LCSs, and longer LCSs than the TF-IDF method, however the TF-IDF approach found more groups than the Hashed and subsequently more long sequences, however the average length of LCSs were lower. In conclusion, each algorithm appears to have areas where it appears to be superior.


Asunto(s)
Algoritmos , Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Estados Unidos , United States Department of Veterans Affairs , Veteranos
17.
Stud Health Technol Inform ; 245: 356-360, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-29295115

RESUMEN

There is need for cataloging signs and symptoms, but not all are documented in structured data. The text from clinical records are an additional source of signs and symptoms. We describe a Natural Language Processing (NLP) technique to identify symptoms from text. Using a human-annotated reference corpus from VA electronic medical notes we trained and tested an NLP pipeline to identify and categorize symptoms. The technique includes a model created from an automatic machine learning model selection tool. Tested on a hold-out set, its precision at the mention level was 0.80, recall 0.74 and an overall f-score of 0.80. The tool was scaled-up to process a large corpus of 964,105 patient records.


Asunto(s)
Minería de Datos , Aprendizaje Automático , Procesamiento de Lenguaje Natural , Registros Electrónicos de Salud , Humanos
18.
Stud Health Technol Inform ; 245: 351-355, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-29295114

RESUMEN

Patient history of sexual trauma is of clinical relevance to healthcare providers as survivors face adverse health-related outcomes. This paper describes a method for identifying mentions of sexual trauma within the free text of electronic medical notes. A natural language processing pipeline for information extraction was developed and scaled to handle a large corpus of electronic medical notes used for this study from US Veterans Health Administration medical facilities. The tool was used to identify sexual trauma mentions and create snippets around every asserted mention based on a domain-specific lexicon developed for this purpose. All snippets were evaluated by trained human reviewers. An overall positive predictive value (PPV) of 0.90 for identifying sexual trauma mentions from the free text and a PPV of 0.71 at the patient level are reported. The metrics are superior for records from female patients.


Asunto(s)
Registros Electrónicos de Salud , Procesamiento de Lenguaje Natural , Femenino , Humanos , Almacenamiento y Recuperación de la Información
19.
Stud Health Technol Inform ; 238: 128-131, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28679904

RESUMEN

Sexual trauma survivors are reluctant to disclose such a history due to stigma. This is likely the case when estimating the prevalence of sexual trauma experienced in the military. The Veterans Health Administration has a program by which all former US military service members (Veterans) are screened for military sexual trauma (MST) using a questionnaire. Administrative data on MST screens and a change of status from an initial negative answer to positive and natural language processing (NLP) on electronic medical notes to extract concepts related to MST were used to refine initial estimates of MST among a random sample of 20,000 Veterans. The initial MST positive screen of 15.4% among women was revised upward to 21.8% using administrative data and further to 24.5% by adding NLP results. The overall estimate of MST status in women and men in this sample was revised from 8.1% to 13.1% using both data elements.


Asunto(s)
Registros Electrónicos de Salud , Personal Militar , Delitos Sexuales , United States Department of Veterans Affairs , Veteranos , Adulto , Recolección de Datos , Femenino , Humanos , Masculino , Conducta Sexual , Trastornos por Estrés Postraumático/epidemiología , Estados Unidos
20.
Stud Health Technol Inform ; 226: 33-6, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27350459

RESUMEN

Medical text contains boilerplated content, an artifact of pull-down forms from EMRs. Boilerplated content is the source of challenges for concept extraction on clinical text. This paper introduces PlateRunner, a search engine on boilerplates from the US Department of Veterans Affairs (VA) EMR. Boilerplates containing concepts should be identified and reviewed to recognize challenging formats, identify high yield document titles, and fine tune section zoning. This search engine has the capability to filter negated and asserted concepts, save and search query results. This tool can save queries, search results, and documents found for later analysis.


Asunto(s)
Registros Electrónicos de Salud/organización & administración , Motor de Búsqueda/métodos , Humanos , Estados Unidos , United States Department of Veterans Affairs
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA