Supervised learning is an accurate method for network-based gene classification.

Liu, Renming; Mancuso, Christopher A; Yannakopoulos, Anna; Johnson, Kayla A; Krishnan, Arjun

Liu, Renming; Mancuso, Christopher A; Yannakopoulos, Anna; Johnson, Kayla A; Krishnan, Arjun.

Afiliação

Liu R; Department of Computational Mathematics, Science and Engineering.
Mancuso CA; Department of Computational Mathematics, Science and Engineering.
Yannakopoulos A; Department of Computational Mathematics, Science and Engineering.
Johnson KA; Department of Computational Mathematics, Science and Engineering.
Krishnan A; Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA.

Bioinformatics ; 36(11): 3457-3465, 2020 06 01.

Article em En | MEDLINE | ID: mdl-32129827

ABSTRACT

ABSTRACT

BACKGROUND:

Assigning every human gene to specific functions, diseases and traits is a grand challenge in modern genetics. Key to addressing this challenge are computational methods, such as supervised learning and label propagation, that can leverage molecular interaction networks to predict gene attributes. In spite of being a popular machine-learning technique across fields, supervised learning has been applied only in a few network-based studies for predicting pathway-, phenotype- or disease-associated genes. It is unknown how supervised learning broadly performs across different networks and diverse gene classification tasks, and how it compares to label propagation, the widely benchmarked canonical approach for this problem.

RESULTS:

In this study, we present a comprehensive benchmarking of supervised learning for network-based gene classification, evaluating this approach and a classic label propagation technique on hundreds of diverse prediction tasks and multiple networks using stringent evaluation schemes. We demonstrate that supervised learning on a gene's full network connectivity outperforms label propagaton and achieves high prediction accuracy by efficiently capturing local network properties, rivaling label propagation's appeal for naturally using network topology. We further show that supervised learning on the full network is also superior to learning on node embeddings (derived using node2vec), an increasingly popular approach for concisely representing network connectivity. These results show that supervised learning is an accurate approach for prioritizing genes associated with diverse functions, diseases and traits and should be considered a staple of network-based gene classification workflows. AVAILABILITY AND IMPLEMENTATION The datasets and the code used to reproduce the results and add new gene classification methods have been made freely available. CONTACT arjun@msu.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

Assuntos

Biologia Computacional; Redes Reguladoras de Genes; Humanos; Aprendizado de Máquina Supervisionado

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Biologia Computacional / Redes Reguladoras de Genes Limite: Humans Idioma: En Ano de publicação: 2020 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google