Your browser doesn't support javascript.
loading
Improving deconvolution methods in biology through open innovation competitions: an application to the connectivity map.
Blasco, Andrea; Natoli, Ted; Endres, Michael G; Sergeev, Rinat A; Randazzo, Steven; Paik, Jin H; Macaluso, N J Maximilian; Narayan, Rajiv; Lu, Xiaodong; Peck, David; Lakhani, Karim R; Subramanian, Aravind.
Afiliação
  • Blasco A; Harvard Business School, Harvard University, Boston, MA 02163, USA.
  • Natoli T; Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA.
  • Endres MG; Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA.
  • Sergeev RA; Harvard Business School, Harvard University, Boston, MA 02163, USA.
  • Randazzo S; Harvard Business School, Harvard University, Boston, MA 02163, USA.
  • Paik JH; Harvard Business School, Harvard University, Boston, MA 02163, USA.
  • Macaluso NJM; Harvard Business School, Harvard University, Boston, MA 02163, USA.
  • Narayan R; Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA.
  • Lu X; Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA.
  • Peck D; Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA.
  • Lakhani KR; Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA.
  • Subramanian A; Harvard Business School, Harvard University, Boston, MA 02163, USA.
Bioinformatics ; 37(18): 2889-2895, 2021 09 29.
Article em En | MEDLINE | ID: mdl-33824954
ABSTRACT
MOTIVATION Do machine learning methods improve standard deconvolution techniques for gene expression data? This article uses a unique new dataset combined with an open innovation competition to evaluate a wide range of approaches developed by 294 competitors from 20 countries. The competition's objective was to address a deconvolution problem critical to analyzing genetic perturbations from the Connectivity Map. The issue consists of separating gene expression of individual genes from raw measurements obtained from gene pairs. We evaluated the outcomes using ground-truth data (direct measurements for single genes) obtained from the same samples.

RESULTS:

We find that the top-ranked algorithm, based on random forest regression, beat the other methods in accuracy and reproducibility; more traditional gaussian-mixture methods performed well and tended to be faster, and the best deep learning approach yielded outcomes slightly inferior to the above methods. We anticipate researchers in the field will find the dataset and algorithms developed in this study to be a powerful research tool for benchmarking their deconvolution methods and a resource useful for multiple applications. AVAILABILITY AND IMPLEMENTATION The data is freely available at clue.io/data (section Contests) and the software is on GitHub at https//github.com/cmap/gene_deconvolution_challenge. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Assuntos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Software Idioma: En Ano de publicação: 2021 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Algoritmos / Software Idioma: En Ano de publicação: 2021 Tipo de documento: Article