Pesquisa | Secretaria de Estado da Saúde

TAX-Corpus: Taxonomy based Annotations for Colonoscopy Evaluation.

Syed, Shorabuddin; Angel, Adam Jackson; Syeda, Hafsa Bareen; Jennings, Carole Franc; VanScoy, Joseph; Syed, Mahanazuddin; Greer, Melody; Bhattacharyya, Sudeepa; Al-Shukri, Shaymaa; Zozus, Meredith; Prior, Fred; Tharian, Benjamin.

Biomed Eng Syst Technol Int Jt Conf BIOSTEC Revis Sel Pap ; 2022: 162-169, 2022 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-35300321

RESUMO

Colonoscopy plays a critical role in screening of colorectal carcinomas (CC). Unfortunately, the data related to this procedure are stored in disparate documents, colonoscopy, pathology, and radiology reports respectively. The lack of integrated standardized documentation is impeding accurate reporting of quality metrics and clinical and translational research. Natural language processing (NLP) has been used as an alternative to manual data abstraction. Performance of Machine Learning (ML) based NLP solutions is heavily dependent on the accuracy of annotated corpora. Availability of large volume annotated corpora is limited due to data privacy laws and the cost and effort required. In addition, the manual annotation process is error-prone, making the lack of quality annotated corpora the largest bottleneck in deploying ML solutions. The objective of this study is to identify clinical entities critical to colonoscopy quality, and build a high-quality annotated corpus using domain specific taxonomies following standardized annotation guidelines. The annotated corpus can be used to train ML models for a variety of downstream tasks.

DeIDNER Corpus: Annotation of Clinical Discharge Summary Notes for Named Entity Recognition Using BRAT Tool.

Syed, Mahanazuddin; Al-Shukri, Shaymaa; Syed, Shorabuddin; Sexton, Kevin; Greer, Melody L; Zozus, Meredith; Bhattacharyya, Sudeepa; Prior, Fred.

Stud Health Technol Inform ; 281: 432-436, 2021 May 27.

Artigo em Inglês | MEDLINE | ID: mdl-34042780

RESUMO

Named Entity Recognition (NER) aims to identify and classify entities into predefined categories is a critical pre-processing task in Natural Language Processing (NLP) pipeline. Readily available off-the-shelf NER algorithms or programs are trained on a general corpus and often need to be retrained when applied on a different domain. The end model's performance depends on the quality of named entities generated by these NER models used in the NLP task. To improve NER model accuracy, researchers build domain-specific corpora for both model training and evaluation. However, in the clinical domain, there is a dearth of training data because of privacy reasons, forcing many studies to use NER models that are trained in the non-clinical domain to generate NER feature-set. Thus, influencing the performance of the downstream NLP tasks like information extraction and de-identification. In this paper, our objective is to create a high quality annotated clinical corpus for training NER models that can be easily generalizable and can be used in a downstream de-identification task to generate named entities feature-set.

Assuntos

Nomes , Alta do Paciente , Algoritmos , Humanos , Armazenamento e Recuperação da Informação , Processamento de Linguagem Natural

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa