pathCLIP: Detection of Genes and Gene Relations from Biological Pathway Figures through Image-Text Contrastive Learning.

He, Fei; Liu, Kai; Yang, Zhiyuan; Chen, Yibo; Hammer, Richard D; Xu, Dong; Popescu, Mihail

He, Fei; Liu, Kai; Yang, Zhiyuan; Chen, Yibo; Hammer, Richard D; Xu, Dong; Popescu, Mihail.

Afiliación

He F; School of Information Science and Technology, Northeast Normal University, Changchun 130000, China; Department of Electrical Engineer and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia Missouri, MO 65211 USA.
Liu K; School of Information Science and Technology, Northeast Normal University, Changchun 130000, China.
Yang Z; School of Information Science and Technology, Northeast Normal University, Changchun 130000, China.
Chen Y; Department of Electrical Engineer and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia Missouri, MO 65211 USA.
Hammer RD; School of Medicine, University of Missouri, Columbia Missouri, MO 65211 USA.
Xu D; Department of Electrical Engineer and Computer Science, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia Missouri, MO 65211 USA.
Popescu M; School of Medicine, University of Missouri, Columbia Missouri, MO 65211 USA.

bioRxiv ; 2023 Nov 02.

Article en En | MEDLINE | ID: mdl-37961680

RESUMEN

In biomedical literature, biological pathways are commonly described through a combination of images and text. These pathways contain valuable information, including genes and their relationships, which provide insight into biological mechanisms and precision medicine. Curating pathway information across the literature enables the integration of this information to build a comprehensive knowledge base. While some studies have extracted pathway information from images and text independently, they often overlook the correspondence between the two modalities. In this paper, we present a pathway figure curation system named pathCLIP for identifying genes and gene relations from pathway figures. Our key innovation is the use of an image-text contrastive learning model to learn coordinated embeddings of image snippets and text descriptions of genes and gene relations, thereby improving curation. Our validation results, using pathway figures from PubMed, showed that our multimodal model outperforms models using only a single modality. Additionally, our system effectively curates genes and gene relations from multiple literature sources. A case study on extracting pathway information from non-small cell lung cancer literature further demonstrates the usefulness of our curated pathway information in enhancing related pathways in the KEGG database.

Palabras clave

Contrastive learning; Entity detection; Literature mining; Pathway figures; Relation extraction

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Idioma: En Revista: BioRxiv Año: 2023 Tipo del documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Idioma: En Revista: BioRxiv Año: 2023 Tipo del documento: Article