Search | VHL Regional Portal

Overview of DrugProt task at BioCreative VII: data and methods for large-scale text mining and knowledge graph generation of heterogenous chemical-protein relations.

Miranda-Escalada, Antonio; Mehryary, Farrokh; Luoma, Jouni; Estrada-Zavala, Darryl; Gasco, Luis; Pyysalo, Sampo; Valencia, Alfonso; Krallinger, Martin.

Database (Oxford) ; 20232023 11 28.

Article in English | MEDLINE | ID: mdl-38015956

ABSTRACT

It is getting increasingly challenging to efficiently exploit drug-related information described in the growing amount of scientific literature. Indeed, for drug-gene/protein interactions, the challenge is even bigger, considering the scattered information sources and types of interactions. However, their systematic, large-scale exploitation is key for developing tools, impacting knowledge fields as diverse as drug design or metabolic pathway research. Previous efforts in the extraction of drug-gene/protein interactions from the literature did not address these scalability and granularity issues. To tackle them, we have organized the DrugProt track at BioCreative VII. In the context of the track, we have released the DrugProt Gold Standard corpus, a collection of 5000 PubMed abstracts, manually annotated with granular drug-gene/protein interactions. We have proposed a novel large-scale track to evaluate the capacity of natural language processing systems to scale to the range of millions of documents, and generate with their predictions a silver standard knowledge graph of 53 993 602 nodes and 19 367 406 edges. Its use exceeds the shared task and points toward pharmacological and biological applications such as drug discovery or continuous database curation. Finally, we have created a persistent evaluation scenario on CodaLab to continuously evaluate new relation extraction systems that may arise. Thirty teams from four continents, which involved 110 people, sent 107 submission runs for the Main DrugProt track, and nine teams submitted 21 runs for the Large Scale DrugProt track. Most participants implemented deep learning approaches based on pretrained transformer-like language models (LMs) such as BERT or BioBERT, reaching precision and recall values as high as 0.9167 and 0.9542 for some relation types. Finally, some initial explorations of the applicability of the knowledge graph have shown its potential to explore the chemical-protein relations described in the literature, or chemical compound-enzyme interactions. Database URL: https://doi.org/10.5281/zenodo.4955410.

Subject(s)

Data Mining , Pattern Recognition, Automated , Humans , Databases, Factual , Data Mining/methods , Proteins/metabolism

S1000: a better taxonomic name corpus for biomedical information extraction.

Luoma, Jouni; Nastou, Katerina; Ohta, Tomoko; Toivonen, Harttu; Pafilis, Evangelos; Jensen, Lars Juhl; Pyysalo, Sampo.

Bioinformatics ; 39(6)2023 06 01.

Article in English | MEDLINE | ID: mdl-37289518

ABSTRACT

MOTIVATION: The recognition of mentions of species names in text is a critically important task for biomedical text mining. While deep learning-based methods have made great advances in many named entity recognition tasks, results for species name recognition remain poor. We hypothesize that this is primarily due to the lack of appropriate corpora. RESULTS: We introduce the S1000 corpus, a comprehensive manual re-annotation and extension of the S800 corpus. We demonstrate that S1000 makes highly accurate recognition of species names possible (F-score =93.1%), both for deep learning and dictionary-based methods. AVAILABILITY AND IMPLEMENTATION: All resources introduced in this study are available under open licenses from https://jensenlab.org/resources/s1000/. The webpage contains links to a Zenodo project and three GitHub repositories associated with the study.

Subject(s)

Data Mining , Data Mining/methods

Automatic extraction of brain surface and mid-sagittal plane from PET images applying deformable models.

Mykkänen, Jouni; Tohka, Jussi; Luoma, Jouni; Ruotsalainen, Ulla.

Comput Methods Programs Biomed ; 79(1): 1-17, 2005 Jul.

Article in English | MEDLINE | ID: mdl-15885848

ABSTRACT

In this study, we propose and evaluate new methods for automatic extraction of the brain surface and the mid-sagittal plane from functional positron emission tomography (PET) images. Designing methods for these segmentation tasks is challenging because the spatial distribution of intensity values in a PET image depends on the applied radiopharmaceutical and the contrast to noise ratio in a PET image is typically low. We extracted the brain surface with a deformable model which is based on a global optimization algorithm. The global optimization allows reliable automation of the extraction task. Based on the extracted brain surface, the mid-sagittal plane was determined. The method was tested with the image of the Hoffman brain phantom (FDG) and the images from the brain studies with the FDG (17 images) and the C11-Raclopride tracers (4 images). In addition to the brain surfaces, we applied the deformable model for extraction of the coarse cortical structure based on the tracer uptake from FDG-PET brain images. The proposed segmentation methods provide a promising direction for automatic processing and analysis of PET brain images.

Subject(s)

Brain/diagnostic imaging , Phantoms, Imaging , Positron-Emission Tomography/methods , Adult , Algorithms , Computers , Fluorodeoxyglucose F18/administration & dosage , Humans , Image Processing, Computer-Assisted , Raclopride/administration & dosage , Software

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL