DeIDNER Corpus: Annotation of Clinical Discharge Summary Notes for Named Entity Recognition Using BRAT Tool.

Syed, Mahanazuddin; Al-Shukri, Shaymaa; Syed, Shorabuddin; Sexton, Kevin; Greer, Melody L; Zozus, Meredith; Bhattacharyya, Sudeepa; Prior, Fred

Syed, Mahanazuddin; Al-Shukri, Shaymaa; Syed, Shorabuddin; Sexton, Kevin; Greer, Melody L; Zozus, Meredith; Bhattacharyya, Sudeepa; Prior, Fred.

Afiliação

Syed M; Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA.
Al-Shukri S; Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA.
Syed S; Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA.
Sexton K; Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA.
Greer ML; Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA.
Zozus M; Department of Population Health Sciences, University of Texas Health Science Center at San Antonio, San Antonio, TX, USA.
Bhattacharyya S; Department of Biological Sciences and Arkansas Biosciences Institute, Arkansas State University, Jonesboro, AR, USA.
Prior F; Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA.

Stud Health Technol Inform ; 281: 432-436, 2021 May 27.

Article em En | MEDLINE | ID: mdl-34042780

ABSTRACT

ABSTRACT

Named Entity Recognition (NER) aims to identify and classify entities into predefined categories is a critical pre-processing task in Natural Language Processing (NLP) pipeline. Readily available off-the-shelf NER algorithms or programs are trained on a general corpus and often need to be retrained when applied on a different domain. The end model's performance depends on the quality of named entities generated by these NER models used in the NLP task. To improve NER model accuracy, researchers build domain-specific corpora for both model training and evaluation. However, in the clinical domain, there is a dearth of training data because of privacy reasons, forcing many studies to use NER models that are trained in the non-clinical domain to generate NER feature-set. Thus, influencing the performance of the downstream NLP tasks like information extraction and de-identification. In this paper, our objective is to create a high quality annotated clinical corpus for training NER models that can be easily generalizable and can be used in a downstream de-identification task to generate named entities feature-set.

Assuntos

Nomes; Alta do Paciente; Algoritmos; Humanos; Armazenamento e Recuperação da Informação; Processamento de Linguagem Natural

Palavras-chave

Annotation; Clinical Corpus; De-identification; Named Entity Recognition; Natural Language Processing

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Alta do Paciente / Nomes Limite: Humans Idioma: En Ano de publicação: 2021 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google