Búsqueda | BVS Bolivia

Pipelined biomedical event extraction rivaling joint learning.

Wu, Pengchao; Li, Xuefeng; Gu, Jinghang; Qian, Longhua; Zhou, Guodong.

Methods ; 226: 9-18, 2024 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-38604412

RESUMEN

Biomedical event extraction is an information extraction task to obtain events from biomedical text, whose targets include the type, the trigger, and the respective arguments involved in an event. Traditional biomedical event extraction usually adopts a pipelined approach, which contains trigger identification, argument role recognition, and finally event construction either using specific rules or by machine learning. In this paper, we propose an n-ary relation extraction method based on the BERT pre-training model to construct Binding events, in order to capture the semantic information about an event's context and its participants. The experimental results show that our method achieves promising results on the GE11 and GE13 corpora of the BioNLP shared task with F1 scores of 63.14% and 59.40%, respectively. It demonstrates that by significantly improving the performance of Binding events, the overall performance of the pipelined event extraction approach or even exceeds those of current joint learning methods.

Asunto(s)

Minería de Datos , Aprendizaje Automático , Minería de Datos/métodos , Humanos , Semántica , Procesamiento de Lenguaje Natural , Algoritmos

Joint learning-based causal relation extraction from biomedical literature.

Li, Dongling; Wu, Pengchao; Dong, Yuehu; Gu, Jinghang; Qian, Longhua; Zhou, Guodong.

J Biomed Inform ; 139: 104318, 2023 03.

Artículo en Inglés | MEDLINE | ID: mdl-36781035

RESUMEN

Causal relation extraction of biomedical entities is one of the most complex tasks in biomedical text mining, which involves two kinds of information: entity relations and entity functions. One feasible approach is to take relation extraction and function detection as two independent sub-tasks. However, this separate learning method ignores the intrinsic correlation between them and leads to unsatisfactory performance. In this paper, we propose a joint learning model, which combines entity relation extraction and entity function detection to exploit their commonality and capture their inter-relationship, so as to improve the performance of biomedical causal relation extraction. Experimental results on the BioCreative-V Track 4 corpus show that our joint learning model outperforms the separate models in BEL statement extraction, achieving the F1 scores of 57.0% and 37.3% on the test set in Stage 2 and Stage 1 evaluations, respectively. This demonstrates that our joint learning system reaches the state-of-the-art performance in Stage 2 compared with other systems.

Asunto(s)

Minería de Datos , Aprendizaje Automático , Minería de Datos/métodos , Descubrimiento del Conocimiento

LitCovid ensemble learning for COVID-19 multi-label classification.

Gu, Jinghang; Chersoni, Emmanuele; Wang, Xing; Huang, Chu-Ren; Qian, Longhua; Zhou, Guodong.

Database (Oxford) ; 20222022 11 25.

Artículo en Inglés | MEDLINE | ID: mdl-36426767

RESUMEN

The Coronavirus Disease 2019 (COVID-19) pandemic has shifted the focus of research worldwide, and more than 10 000 new articles per month have concentrated on COVID-19-related topics. Considering this rapidly growing literature, the efficient and precise extraction of the main topics of COVID-19-relevant articles is of great importance. The manual curation of this information for biomedical literature is labor-intensive and time-consuming, and as such the procedure is insufficient and difficult to maintain. In response to these complications, the BioCreative VII community has proposed a challenging task, LitCovid Track, calling for a global effort to automatically extract semantic topics for COVID-19 literature. This article describes our work on the BioCreative VII LitCovid Track. We proposed the LitCovid Ensemble Learning (LCEL) method for the tasks and integrated multiple biomedical pretrained models to address the COVID-19 multi-label classification problem. Specifically, seven different transformer-based pretrained models were ensembled for the initialization and fine-tuning processes independently. To enhance the representation abilities of the deep neural models, diverse additional biomedical knowledge was utilized to facilitate the fruitfulness of the semantic expressions. Simple yet effective data augmentation was also leveraged to address the learning deficiency during the training phase. In addition, given the imbalanced label distribution of the challenging task, a novel asymmetric loss function was applied to the LCEL model, which explicitly adjusted the negative-positive importance by assigning different exponential decay factors and helped the model focus on the positive samples. After the training phase, an ensemble bagging strategy was adopted to merge the outputs from each model for final predictions. The experimental results show the effectiveness of our proposed approach, as LCEL obtains the state-of-the-art performance on the LitCovid dataset. Database URL: https://github.com/JHnlp/LCEL.

Asunto(s)

COVID-19 , Humanos , COVID-19/epidemiología , Bases de Datos Factuales , Semántica , Aprendizaje Automático

Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations.

Chen, Qingyu; Allot, Alexis; Leaman, Robert; Islamaj, Rezarta; Du, Jingcheng; Fang, Li; Wang, Kai; Xu, Shuo; Zhang, Yuefu; Bagherzadeh, Parsa; Bergler, Sabine; Bhatnagar, Aakash; Bhavsar, Nidhir; Chang, Yung-Chun; Lin, Sheng-Jie; Tang, Wentai; Zhang, Hongtong; Tavchioski, Ilija; Pollak, Senja; Tian, Shubo; Zhang, Jinfeng; Otmakhova, Yulia; Yepes, Antonio Jimeno; Dong, Hang; Wu, Honghan; Dufour, Richard; Labrak, Yanis; Chatterjee, Niladri; Tandon, Kushagri; Laleye, Fréjus A A; Rakotoson, Loïc; Chersoni, Emmanuele; Gu, Jinghang; Friedrich, Annemarie; Pujari, Subhash Chandra; Chizhikova, Mariia; Sivadasan, Naveen; Vg, Saipradeep; Lu, Zhiyong.

Database (Oxford) ; 20222022 08 31.

Artículo en Inglés | MEDLINE | ID: mdl-36043400

RESUMEN

The coronavirus disease 2019 (COVID-19) pandemic has been severely impacting global society since December 2019. The related findings such as vaccine and drug development have been reported in biomedical literature-at a rate of about 10 000 articles on COVID-19 per month. Such rapid growth significantly challenges manual curation and interpretation. For instance, LitCovid is a literature database of COVID-19-related articles in PubMed, which has accumulated more than 200 000 articles with millions of accesses each month by users worldwide. One primary curation task is to assign up to eight topics (e.g. Diagnosis and Treatment) to the articles in LitCovid. The annotated topics have been widely used for navigating the COVID literature, rapidly locating articles of interest and other downstream studies. However, annotating the topics has been the bottleneck of manual curation. Despite the continuing advances in biomedical text-mining methods, few have been dedicated to topic annotations in COVID-19 literature. To close the gap, we organized the BioCreative LitCovid track to call for a community effort to tackle automated topic annotation for COVID-19 literature. The BioCreative LitCovid dataset-consisting of over 30 000 articles with manually reviewed topics-was created for training and testing. It is one of the largest multi-label classification datasets in biomedical scientific literature. Nineteen teams worldwide participated and made 80 submissions in total. Most teams used hybrid systems based on transformers. The highest performing submissions achieved 0.8875, 0.9181 and 0.9394 for macro-F1-score, micro-F1-score and instance-based F1-score, respectively. Notably, these scores are substantially higher (e.g. 12%, higher for macro F1-score) than the corresponding scores of the state-of-art multi-label classification method. The level of participation and results demonstrate a successful track and help close the gap between dataset curation and method development. The dataset is publicly available via https://ftp.ncbi.nlm.nih.gov/pub/lu/LitCovid/biocreative/ for benchmarking and further development. Database URL https://ftp.ncbi.nlm.nih.gov/pub/lu/LitCovid/biocreative/.

Asunto(s)

COVID-19 , COVID-19/epidemiología , Minería de Datos/métodos , Bases de Datos Factuales , Humanos , PubMed , Publicaciones

Multi-probe attention neural network for COVID-19 semantic indexing.

Gu, Jinghang; Xiang, Rong; Wang, Xing; Li, Jing; Li, Wenjie; Qian, Longhua; Zhou, Guodong; Huang, Chu-Ren.

BMC Bioinformatics ; 23(1): 259, 2022 Jun 29.

Artículo en Inglés | MEDLINE | ID: mdl-35768777

RESUMEN

BACKGROUND: The COVID-19 pandemic has increasingly accelerated the publication pace of scientific literature. How to efficiently curate and index this large amount of biomedical literature under the current crisis is of great importance. Previous literature indexing is mainly performed by human experts using Medical Subject Headings (MeSH), which is labor-intensive and time-consuming. Therefore, to alleviate the expensive time consumption and monetary cost, there is an urgent need for automatic semantic indexing technologies for the emerging COVID-19 domain. RESULTS: In this research, to investigate the semantic indexing problem for COVID-19, we first construct the new COVID-19 Semantic Indexing dataset, which consists of more than 80 thousand biomedical articles. We then propose a novel semantic indexing framework based on the multi-probe attention neural network (MPANN) to address the COVID-19 semantic indexing problem. Specifically, we employ a k-nearest neighbour based MeSH masking approach to generate candidate topic terms for each input article. We encode and feed the selected candidate terms as well as other contextual information as probes into the downstream attention-based neural network. Each semantic probe carries specific aspects of biomedical knowledge and provides informatively discriminative features for the input article. After extracting the semantic features at both term-level and document-level through the attention-based neural network, MPANN adopts a linear multi-view classifier to conduct the final topic prediction for COVID-19 semantic indexing. CONCLUSION: The experimental results suggest that MPANN promises to represent the semantic features of biomedical texts and is effective in predicting semantic topics for COVID-19 related biomedical articles.

Asunto(s)

COVID-19 , Semántica , Humanos , Medical Subject Headings , Redes Neurales de la Computación , Pandemias

Extraction of causal relations based on SBEL and BERT model.

Shao, Yifan; Li, Haoru; Gu, Jinghang; Qian, Longhua; Zhou, Guodong.

Database (Oxford) ; 20212021 02 18.

Artículo en Inglés | MEDLINE | ID: mdl-33570092

RESUMEN

Extraction of causal relations between biomedical entities in the form of Biological Expression Language (BEL) poses a new challenge to the community of biomedical text mining due to the complexity of BEL statements. We propose a simplified form of BEL statements [Simplified Biological Expression Language (SBEL)] to facilitate BEL extraction and employ BERT (Bidirectional Encoder Representation from Transformers) to improve the performance of causal relation extraction (RE). On the one hand, BEL statement extraction is transformed into the extraction of an intermediate form-SBEL statement, which is then further decomposed into two subtasks: entity RE and entity function detection. On the other hand, we use a powerful pretrained BERT model to both extract entity relations and detect entity functions, aiming to improve the performance of two subtasks. Entity relations and functions are then combined into SBEL statements and finally merged into BEL statements. Experimental results on the BioCreative-V Track 4 corpus demonstrate that our method achieves the state-of-the-art performance in BEL statement extraction with F1 scores of 54.8% in Stage 2 evaluation and of 30.1% in Stage 1 evaluation, respectively. Database URL: https://github.com/grapeff/SBEL_datasets.

Asunto(s)

Minería de Datos , Lenguaje , Bases de Datos Factuales , Procesamiento de Lenguaje Natural

Chemical-induced disease relation extraction via attention-based distant supervision.

Gu, Jinghang; Sun, Fuqing; Qian, Longhua; Zhou, Guodong.

BMC Bioinformatics ; 20(1): 403, 2019 Jul 22.

Artículo en Inglés | MEDLINE | ID: mdl-31331263

RESUMEN

BACKGROUND: Automatically understanding chemical-disease relations (CDRs) is crucial in various areas of biomedical research and health care. Supervised machine learning provides a feasible solution to automatically extract relations between biomedical entities from scientific literature, its success, however, heavily depends on large-scale biomedical corpora manually annotated with intensive labor and tremendous investment. RESULTS: We present an attention-based distant supervision paradigm for the BioCreative-V CDR extraction task. Training examples at both intra- and inter-sentence levels are generated automatically from the Comparative Toxicogenomics Database (CTD) without any human intervention. An attention-based neural network and a stacked auto-encoder network are applied respectively to induce learning models and extract relations at both levels. After merging the results of both levels, the document-level CDRs can be finally extracted. It achieves the precision/recall/F1-score of 60.3%/73.8%/66.4%, outperforming the state-of-the-art supervised learning systems without using any annotated corpus. CONCLUSION: Our experiments demonstrate that distant supervision is promising for extracting chemical disease relations from biomedical literature, and capturing both local and global attention features simultaneously is effective in attention-based distantly supervised learning.

Asunto(s)

Algoritmos , Enfermedad , Aprendizaje Automático Supervisado , Toxicogenética , Bases de Datos como Asunto , Bases de Datos Factuales , Humanos , Redes Neurales de la Computación , Flujo de Trabajo

Chemical-induced disease relation extraction via convolutional neural network.

Gu, Jinghang; Sun, Fuqing; Qian, Longhua; Zhou, Guodong.

Database (Oxford) ; 2017(1)2017 01 01.

Artículo en Inglés | MEDLINE | ID: mdl-28415073

RESUMEN

This article describes our work on the BioCreative-V chemical-disease relation (CDR) extraction task, which employed a maximum entropy (ME) model and a convolutional neural network model for relation extraction at inter- and intra-sentence level, respectively. In our work, relation extraction between entity concepts in documents was simplified to relation extraction between entity mentions. We first constructed pairs of chemical and disease mentions as relation instances for training and testing stages, then we trained and applied the ME model and the convolutional neural network model for inter- and intra-sentence level, respectively. Finally, we merged the classification results from mention level to document level to acquire the final relations between chemical and disease concepts. The evaluation on the BioCreative-V CDR corpus shows the effectiveness of our proposed approach. Database URL: http://www.biocreative.org/resources/corpora/biocreative-v-cdr-corpus/.

Asunto(s)

Enfermedad , Redes Neurales de la Computación , Toxicología , Entropía , Humanos

Chemical-induced disease relation extraction with various linguistic features.

Gu, Jinghang; Qian, Longhua; Zhou, Guodong.

Database (Oxford) ; 20162016.

Artículo en Inglés | MEDLINE | ID: mdl-27052618

RESUMEN

Understanding the relations between chemicals and diseases is crucial in various biomedical tasks such as new drug discoveries and new therapy developments. While manually mining these relations from the biomedical literature is costly and time-consuming, such a procedure is often difficult to keep up-to-date. To address these issues, the BioCreative-V community proposed a challenging task of automatic extraction of chemical-induced disease (CID) relations in order to benefit biocuration. This article describes our work on the CID relation extraction task on the BioCreative-V tasks. We built a machine learning based system that utilized simple yet effective linguistic features to extract relations with maximum entropy models. In addition to leveraging various features, the hypernym relations between entity concepts derived from the Medical Subject Headings (MeSH)-controlled vocabulary were also employed during both training and testing stages to obtain more accurate classification models and better extraction performance, respectively. We demoted relation extraction between entities in documents to relation extraction between entity mentions. In our system, pairs of chemical and disease mentions at both intra- and inter-sentence levels were first constructed as relation instances for training and testing, then two classification models at both levels were trained from the training examples and applied to the testing examples. Finally, we merged the classification results from mention level to document level to acquire final relations between chemicals and diseases. Our system achieved promisingF-scores of 60.4% on the development dataset and 58.3% on the test dataset using gold-standard entity annotations, respectively. Database URL:https://github.com/JHnlp/BC5CIDTask.

Asunto(s)

Minería de Datos/métodos , Enfermedad , Lingüística , Bases de Datos como Asunto , Humanos

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA