Pesquisa | Portal Regional da BVS

Improving five-year survival prediction via multitask learning across HPV-related cancers.

Goncalves, Andre; Soper, Braden; Nygård, Mari; Nygård, Jan F; Ray, Priyadip; Widemann, David; Sales, Ana Paula.

PLoS One ; 15(11): e0241225, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-33196642

RESUMO

Oncology is a highly siloed field of research in which sub-disciplinary specialization has limited the amount of information shared between researchers of distinct cancer types. This can be attributed to legitimate differences in the physiology and carcinogenesis of cancers affecting distinct anatomical sites. However, underlying processes that are shared across seemingly disparate cancers probably affect prognosis. The objective of the current study is to investigate whether multitask learning improves 5-year survival cancer patient survival prediction by leveraging information across anatomically distinct HPV related cancers. Data were obtained from the Surveillance, Epidemiology, and End Results (SEER) program database. The study cohort consisted of 29,768 primary cancer cases diagnosed in the United States between 2004 and 2015. Ten different cancer diagnoses were selected, all with a known association with HPV risk. In the analysis, the cancer diagnoses were categorized into three distinct topography groups of varying specificity. The most specific topography grouping consisted of 10 original cancer diagnoses differentiated by the first two digits of the ICD-O-3 topography code. The second topography grouping consisted of cancer diagnoses categorized into six distinct organ groups. Finally, the third topography grouping consisted of just two groups, head-neck cancers and ano-genital cancers. The tasks were to predict 5-year survival for patients within the different topography groups using 14 predictive features which were selected among descriptive variables available in the SEER database. The information from the predictive features was shared between tasks in three different ways, resulting in three distinct predictive models: 1) Information was not shared between patients assigned to different tasks (single task learning); 2) Information was shared between all patients, regardless of task (pooled model); 3) Only relevant information was shared between patients grouped to different tasks (multitask learning). Prediction performance was evaluated with Brier scores. All three models were evaluated against one another on each of the three distinct topography-defined tasks. The results showed that multitask classifiers achieved relative improvement for the majority of the scenarios studied compared to single task learning and pooled baseline methods. In this study, we have demonstrated that sharing information among anatomically distinct cancer types can lead to improved predictive survival models.

Assuntos

Aprendizagem , Comportamento Multitarefa , Neoplasias/mortalidade , Neoplasias/virologia , Infecções por Papillomavirus/mortalidade , Adulto , Idoso , Idoso de 80 Anos ou mais , Algoritmos , Estudos de Coortes , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Programa de SEER , Tamanho da Amostra , Análise de Sobrevida , Adulto Jovem

Bayesian multitask learning regression for heterogeneous patient cohorts.

Goncalves, Andre; Ray, Priyadip; Soper, Braden; Widemann, David; Nygård, Mari; Nygård, Jan F; Sales, Ana Paula.

J Biomed Inform ; 100S: 100059, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-34384572

RESUMO

Multitask learning (MTL) leverages commonalities across related tasks with the aim of improving individual task performance. A key modeling choice in designing MTL models is the structure of the tasks' relatedness, which may not be known. Here we propose a Bayesian multitask learning model that is able to infer the task relationship structure directly from the data. We present two variations of the model in terms of a priori information of task relatedness. First, a diffuse Wishart prior is placed on a task precision matrix so that all tasks are assumed to be equally related a priori. Second, a Bayesian graphical LASSO prior is used on the task precision matrix to impose sparsity in the task relatedness. Motivated by machine learning applications in the biomedical domain, we emphasize interpretability and uncertainty quantification in our models. To encourage model interpretability, linear mappings from the shared input spaces to task-dependent output spaces are used. To encourage uncertainty quantification, conjugate priors are used so that full posterior inference is possible. Using synthetic data, we show that our model is able to recover the underlying task relationships as well as features jointly relevant for all tasks. We demonstrate the utility of our model on three distinct biomedical applications: Alzheimer's disease progression, Parkinson's disease assessment, and cervical cancer screening compliance. We show that our model outperforms Single Task (STL) models in terms of predictive performance, and performs better than existing MTL methods for the majority of the scenarios.

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA