Pesquisa | Portal Regional da BVS

Natural Language Processing for Enterprise-scale De-identification of Protected Health Information in Clinical Notes.

Abu-El-Rub, Noor; Urbain, Jay; Kowalski, George; Osinski, Kristen; Spaniol, Robert; Liu, Mei; Taylor, Bradley; Waitman, Lemuel R.

AMIA Jt Summits Transl Sci Proc ; 2022: 92-101, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35854742

RESUMO

Patient privacy is a major concern when allowing data sharing and the flow of health information. Hence, de-identification and anonymization techniques are used to ensure the protection of patient health information while supporting the secondary uses of data to advance the healthcare system and improve patient outcomes. Several de-identification tools have been developed for free-text, however, this research focuses on developing notes de-identification and adjudication framework that has been tested for i2b2 searches. The aim is to facilitate clinical notes research without an additional HIPAA approval process or consent by a clinician or patient especially for narrative free-text notes such as physician and nursing notes. In this paper, we build a scalable, accurate, and maintainable pipeline for notes de-identification utilizing the natural language processing and REDCap database as a method of adjudication verification. The system is deployed at an enterprise-scale where researchers can search and visualize over 45 million de-identified notes hosted in an i2b2 instance.

Mining heart disease risk factors in clinical text with named entity recognition and distributional semantic models.

Urbain, Jay.

J Biomed Inform ; 58 Suppl: S143-S149, 2015 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-26305514

RESUMO

We present the design, and analyze the performance of a multi-stage natural language processing system employing named entity recognition, Bayesian statistics, and rule logic to identify and characterize heart disease risk factor events in diabetic patients over time. The system was originally developed for the 2014 i2b2 Challenges in Natural Language in Clinical Data. The system's strengths included a high level of accuracy for identifying named entities associated with heart disease risk factor events. The system's primary weakness was due to inaccuracies when characterizing the attributes of some events. For example, determining the relative time of an event with respect to the record date, whether an event is attributable to the patient's history or the patient's family history, and differentiating between current and prior smoking status. We believe these inaccuracies were due in large part to the lack of an effective approach for integrating context into our event detection model. To address these inaccuracies, we explore the addition of a distributional semantic model for characterizing contextual evidence of heart disease risk factor events. Using this semantic model, we raise our initial 2014 i2b2 Challenges in Natural Language of Clinical data F1 score of 0.838 to 0.890 and increased precision by 10.3% without use of any lexicons that might bias our results.

Assuntos

Doenças Cardiovasculares/epidemiologia , Mineração de Dados/métodos , Complicações do Diabetes/epidemiologia , Registros Eletrônicos de Saúde/organização & administração , Narração , Processamento de Linguagem Natural , Idoso , Doenças Cardiovasculares/diagnóstico , Estudos de Coortes , Comorbidade , Segurança Computacional , Confidencialidade , Complicações do Diabetes/diagnóstico , Feminino , Humanos , Incidência , Estudos Longitudinais , Masculino , Pessoa de Meia-Idade , Modelos Estatísticos , Reconhecimento Automatizado de Padrão/métodos , Medição de Risco/métodos , Semântica , Vocabulário Controlado , Wisconsin/epidemiologia

Passage relevance models for genomics search.

Urbain, Jay; Frieder, Ophir; Goharian, Nazli.

BMC Bioinformatics ; 10 Suppl 3: S3, 2009 Mar 19.

Artigo em Inglês | MEDLINE | ID: mdl-19344479

RESUMO

We present a passage relevance model for integrating syntactic and semantic evidence of biomedical concepts and topics using a probabilistic graphical model. Component models of topics, concepts, terms, and document are represented as potential functions within a Markov Random Field. The probability of a passage being relevant to a biologist's information need is represented as the joint distribution across all potential functions. Relevance model feedback of top ranked passages is used to improve distributional estimates of query concepts and topics in context, and a dimensional indexing strategy is used for efficient aggregation of concept and term statistics. By integrating multiple sources of evidence including dependencies between topics, concepts, and terms, we seek to improve genomics literature passage retrieval precision. Using this model, we are able to demonstrate statistically significant improvements in retrieval precision using a large genomics literature corpus.

Assuntos

Genômica/métodos , Armazenamento e Recuperação da Informação/métodos , Modelos Estatísticos , Algoritmos , Cadeias de Markov , Processamento de Linguagem Natural

A dimensional retrieval model for integrating semantics and statistical evidence in context for genomics literature search.

Urbain, Jay; Goharian, Nazli; Frieder, Ophir.

Comput Biol Med ; 39(1): 61-8, 2009 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-19147128

RESUMO

We present a dimensional information retrieval model for combining concept-based semantics and term statistics within multiple levels of document context to identify concise, variable length passages of text that answer a user query. Our results demonstrate improved search results in the presence of varying levels of semantic evidence, and higher performance using retrieval functions that combine document, as well as sentence and passage level information. Experimental results are promising. When ranking documents based on the most relevant extracted passages, the results exceed the state-of-the-art by 15.28% as assessed by the TREC 2005 Genomics track collection of 4.5 million MEDLINE citations.

Assuntos

Bases de Dados Genéticas , Genômica , Armazenamento e Recuperação da Informação , Modelos Estatísticos , Indexação e Redação de Resumos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA