edge2vec: Representation learning using edge semantics for biomedical knowledge discovery.

Gao, Zheng; Fu, Gang; Ouyang, Chunping; Tsutsui, Satoshi; Liu, Xiaozhong; Yang, Jeremy; Gessner, Christopher; Foote, Brian; Wild, David; Ding, Ying; Yu, Qi

Gao, Zheng; Fu, Gang; Ouyang, Chunping; Tsutsui, Satoshi; Liu, Xiaozhong; Yang, Jeremy; Gessner, Christopher; Foote, Brian; Wild, David; Ding, Ying; Yu, Qi.

Afiliação

Gao Z; School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA.
Fu G; Microsoft Corporation, Seattle, Washington, USA.
Ouyang C; University of South China, Hengyang, Hunan, China.
Tsutsui S; School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA.
Liu X; School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA.
Yang J; School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA.
Gessner C; Microsoft Corporation, Seattle, Washington, USA.
Foote B; School of Medicine, University of New Mexico, Albuquerque, NM, USA.
Wild D; School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA.
Ding Y; Data2Discovery, Inc., Bloomington, IN, USA.
Yu Q; School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN, USA.

BMC Bioinformatics ; 20(1): 306, 2019 Jun 10.

Article em En | MEDLINE | ID: mdl-31238875

ABSTRACT

ABSTRACT

BACKGROUND:

Representation learning provides new and powerful graph analytical approaches and tools for the highly valued data science challenge of mining knowledge graphs. Since previous graph analytical methods have mostly focused on homogeneous graphs, an important current challenge is extending this methodology for richly heterogeneous graphs and knowledge domains. The biomedical sciences are such a domain, reflecting the complexity of biology, with entities such as genes, proteins, drugs, diseases, and phenotypes, and relationships such as gene co-expression, biochemical regulation, and biomolecular inhibition or activation. Therefore, the semantics of edges and nodes are critical for representation learning and knowledge discovery in real world biomedical problems.

RESULTS:

In this paper, we propose the edge2vec model, which represents graphs considering edge semantics. An edge-type transition matrix is trained by an Expectation-Maximization approach, and a stochastic gradient descent model is employed to learn node embedding on a heterogeneous graph via the trained transition matrix. edge2vec is validated on three biomedical domain tasks biomedical entity classification, compound-gene bioactivity prediction, and biomedical information retrieval. Results show that by considering edge-types into node embedding learning in heterogeneous graphs, edge2vec significantly outperforms state-of-the-art models on all three tasks.

CONCLUSIONS:

We propose this method for its added value relative to existing graph analytical methodology, and in the real world context of biomedical knowledge discovery applicability.

Assuntos

Informática/métodos; Conhecimento; Aprendizagem; Algoritmos; Pesquisa Biomédica; Humanos; Redes Neurais de Computação; Semântica

Palavras-chave

Applied machine learning; Biomedical knowledge discovery; Data science; Edge semantics; Graph embedding; Heterogeneous network; Knowledge graph; Linked data; Network science; Node embedding; Representation learning; Semantic web; Systems biology

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Conhecimento / Informática / Aprendizagem Tipo de estudo: Prognostic_studies Limite: Humans Idioma: En Revista: BMC Bioinformatics Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2019 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google