Comparison of multiple modalities for drug response prediction with learning curves using neural networks and XGBoost.

Branson, Nikhil; Cutillas, Pedro R; Bessant, Conrad

Branson, Nikhil; Cutillas, Pedro R; Bessant, Conrad.

Affiliation

Branson N; School of Biological and Behavioural Sciences, Queen Mary University of London, London E1 4NS, United Kingdom.
Cutillas PR; Digital Environment Research Institute, Queen Mary University of London, London E1 1HH, United Kingdom.
Bessant C; Centre for Genomics and Computational Biology, Barts Cancer Institute, Queen Mary University of London, London EC1M 6BQ, United Kingdom.

Bioinform Adv ; 4(1): vbad190, 2024.

Article in En | MEDLINE | ID: mdl-38282976

ABSTRACT

ABSTRACT

Motivation Anti-cancer drug response prediction is a central problem within stratified medicine. Transcriptomic profiles of cancer cell lines are typically used for drug response prediction, but we hypothesize that proteomics or phosphoproteomics might be more suitable as they give a more direct insight into cellular processes. However, there has not yet been a systematic comparison between all three of these datatypes using consistent evaluation criteria.

Results:

Due to the limited number of cell lines with phosphoproteomics profiles we use learning curves, a plot of predictive performance as a function of dataset size, to compare the current performance and predict the future performance of the three omics datasets with more data. We use neural networks and XGBoost and compare them against a simple rule-based benchmark. We show that phosphoproteomics slightly outperforms RNA-seq and proteomics using the 38 cell lines with profiles of all three omics data types. Furthermore, using the 877 cell lines with proteomics and RNA-seq profiles, we show that RNA-seq slightly outperforms proteomics. With the learning curves we predict that the mean squared error using the phosphoproteomics dataset would decrease by â¼15% if a dataset of the same size as the proteomics/transcriptomics was collected. For the cell lines with proteomics and RNA-seq profiles the learning curves reveal that for smaller dataset sizes neural networks outperform XGBoost and vice versa for larger datasets. Furthermore, the trajectory of the XGBoost curve suggests that it will improve faster than the neural networks as more data are collected. Availability and implementation See https//github.com/Nik-BB/Learning-curves-for-DRP for the code used.

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google

Full text: 1 Collection: 01-internacional Database: MEDLINE Type of study: Prognostic_studies / Risk_factors_studies Language: En Journal: Bioinform Adv Year: 2024 Document type: Article Affiliation country: Reino Unido Country of publication: Reino Unido

Fulltext

Add to My VHL

XML

PubMed Links

Search on Google