pathCLIP: Detection of Genes and Gene Relations from Biological Pathway Figures through Image-Text Contrastive Learning.
bioRxiv
; 2023 Nov 02.
Article
en En
| MEDLINE
| ID: mdl-37961680
In biomedical literature, biological pathways are commonly described through a combination of images and text. These pathways contain valuable information, including genes and their relationships, which provide insight into biological mechanisms and precision medicine. Curating pathway information across the literature enables the integration of this information to build a comprehensive knowledge base. While some studies have extracted pathway information from images and text independently, they often overlook the correspondence between the two modalities. In this paper, we present a pathway figure curation system named pathCLIP for identifying genes and gene relations from pathway figures. Our key innovation is the use of an image-text contrastive learning model to learn coordinated embeddings of image snippets and text descriptions of genes and gene relations, thereby improving curation. Our validation results, using pathway figures from PubMed, showed that our multimodal model outperforms models using only a single modality. Additionally, our system effectively curates genes and gene relations from multiple literature sources. A case study on extracting pathway information from non-small cell lung cancer literature further demonstrates the usefulness of our curated pathway information in enhancing related pathways in the KEGG database.
Texto completo:
1
Colección:
01-internacional
Banco de datos:
MEDLINE
Idioma:
En
Revista:
BioRxiv
Año:
2023
Tipo del documento:
Article