Pesquisa | Portal Regional da BVS

Chemical identification and indexing in full-text articles: an overview of the NLM-Chem track at BioCreative VII.

Leaman, Robert; Islamaj, Rezarta; Adams, Virginia; Alliheedi, Mohammed A; Almeida, João Rafael; Antunes, Rui; Bevan, Robert; Chang, Yung-Chun; Erdengasileng, Arslan; Hodgskiss, Matthew; Ida, Ryuki; Kim, Hyunjae; Li, Keqiao; Mercer, Robert E; Mertová, Lukrécia; Mobasher, Ghadeer; Shin, Hoo-Chang; Sung, Mujeen; Tsujimura, Tomoki; Yeh, Wen-Chao; Lu, Zhiyong.

Database (Oxford) ; 20232023 03 07.

Artigo em Inglês | MEDLINE | ID: mdl-36882099

RESUMO

The BioCreative National Library of Medicine (NLM)-Chem track calls for a community effort to fine-tune automated recognition of chemical names in the biomedical literature. Chemicals are one of the most searched biomedical entities in PubMed, and-as highlighted during the coronavirus disease 2019 pandemic-their identification may significantly advance research in multiple biomedical subfields. While previous community challenges focused on identifying chemical names mentioned in titles and abstracts, the full text contains valuable additional detail. We, therefore, organized the BioCreative NLM-Chem track as a community effort to address automated chemical entity recognition in full-text articles. The track consisted of two tasks: (i) chemical identification and (ii) chemical indexing. The chemical identification task required predicting all chemicals mentioned in recently published full-text articles, both span [i.e. named entity recognition (NER)] and normalization (i.e. entity linking), using Medical Subject Headings (MeSH). The chemical indexing task required identifying which chemicals reflect topics for each article and should therefore appear in the listing of MeSH terms for the document in the MEDLINE article indexing. This manuscript summarizes the BioCreative NLM-Chem track and post-challenge experiments. We received a total of 85 submissions from 17 teams worldwide. The highest performance achieved for the chemical identification task was 0.8672 F-score (0.8759 precision and 0.8587 recall) for strict NER performance and 0.8136 F-score (0.8621 precision and 0.7702 recall) for strict normalization performance. The highest performance achieved for the chemical indexing task was 0.6073 F-score (0.7417 precision and 0.5141 recall). This community challenge demonstrated that (i) the current substantial achievements in deep learning technologies can be utilized to improve automated prediction accuracy further and (ii) the chemical indexing task is substantially more challenging. We look forward to further developing biomedical text-mining methods to respond to the rapid growth of biomedical literature. The NLM-Chem track dataset and other challenge materials are publicly available at https://ftp.ncbi.nlm.nih.gov/pub/lu/BC7-NLM-Chem-track/. Database URL https://ftp.ncbi.nlm.nih.gov/pub/lu/BC7-NLM-Chem-track/.

Assuntos

COVID-19 , Estados Unidos , Humanos , National Library of Medicine (U.S.) , Mineração de Dados , Bases de Dados Factuais , MEDLINE

BERN2: an advanced neural biomedical named entity recognition and normalization tool.

Sung, Mujeen; Jeong, Minbyul; Choi, Yonghwa; Kim, Donghyeon; Lee, Jinhyuk; Kang, Jaewoo.

Bioinformatics ; 38(20): 4837-4839, 2022 10 14.

Artigo em Inglês | MEDLINE | ID: mdl-36053172

RESUMO

In biomedical natural language processing, named entity recognition (NER) and named entity normalization (NEN) are key tasks that enable the automatic extraction of biomedical entities (e.g. diseases and drugs) from the ever-growing biomedical literature. In this article, we present BERN2 (Advanced Biomedical Entity Recognition and Normalization), a tool that improves the previous neural network-based NER tool by employing a multi-task NER model and neural network-based NEN models to achieve much faster and more accurate inference. We hope that our tool can help annotate large-scale biomedical texts for various tasks such as biomedical knowledge graph construction. AVAILABILITY AND IMPLEMENTATION: Web service of BERN2 is publicly available at http://bern2.korea.ac.kr. We also provide local installation of BERN2 at https://github.com/dmis-lab/BERN2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Redes Neurais de Computação , Software , Processamento de Linguagem Natural

Full-text chemical identification with improved generalizability and tagging consistency.

Kim, Hyunjae; Sung, Mujeen; Yoon, Wonjin; Park, Sungjoon; Kang, Jaewoo.

Database (Oxford) ; 20222022 09 28.

Artigo em Inglês | MEDLINE | ID: mdl-36170114

RESUMO

Chemical identification involves finding chemical entities in text (i.e. named entity recognition) and assigning unique identifiers to the entities (i.e. named entity normalization). While current models are developed and evaluated based on article titles and abstracts, their effectiveness has not been thoroughly verified in full text. In this paper, we identify two limitations of models in tagging full-text articles: (1) low generalizability to unseen mentions and (2) tagging inconsistency. We use simple training and post-processing methods to address the limitations such as transfer learning and mention-wise majority voting. We also present a hybrid model for the normalization task that utilizes the high recall of a neural model while maintaining the high precision of a dictionary model. In the BioCreative VII NLM-Chem track challenge, our best model achieves 86.72 and 78.31 F1 scores in named entity recognition and normalization, significantly outperforming the median (83.73 and 77.49 F1 scores) and taking first place in named entity recognition. In a post-challenge evaluation, we re-implement our model and obtain 84.70 F1 score in the normalization task, outperforming the best score in the challenge by 3.34 F1 score. Database URL: https://github.com/dmis-lab/bc7-chem-id.

Assuntos

Mineração de Dados , Mineração de Dados/métodos , Bases de Dados Factuais

Pandemics are catalysts of scientific novelty: Evidence from COVID-19.

Liu, Meijun; Bu, Yi; Chen, Chongyan; Xu, Jian; Li, Daifeng; Leng, Yan; Freeman, Richard B; Meyer, Eric T; Yoon, Wonjin; Sung, Mujeen; Jeong, Minbyul; Lee, Jinhyuk; Kang, Jaewoo; Min, Chao; Song, Min; Zhai, Yujia; Ding, Ying.

J Assoc Inf Sci Technol ; 73(8): 1065-1078, 2022 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-35441082

RESUMO

Scientific novelty drives the efforts to invent new vaccines and solutions during the pandemic. First-time collaboration and international collaboration are two pivotal channels to expand teams' search activities for a broader scope of resources required to address the global challenge, which might facilitate the generation of novel ideas. Our analysis of 98,981 coronavirus papers suggests that scientific novelty measured by the BioBERT model that is pretrained on 29 million PubMed articles, and first-time collaboration increased after the outbreak of COVID-19, and international collaboration witnessed a sudden decrease. During COVID-19, papers with more first-time collaboration were found to be more novel and international collaboration did not hamper novelty as it had done in the normal periods. The findings suggest the necessity of reaching out for distant resources and the importance of maintaining a collaborative scientific community beyond nationalism during a pandemic.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA