Machine learning pipeline to analyze clinical and proteomics data: experiences on a prostate cancer case.

Vizza, Patrizia; Aracri, Federica; Guzzi, Pietro Hiram; Gaspari, Marco; Veltri, Pierangelo; Tradigo, Giuseppe

Vizza, Patrizia; Aracri, Federica; Guzzi, Pietro Hiram; Gaspari, Marco; Veltri, Pierangelo; Tradigo, Giuseppe.

Afiliação

Vizza P; Department of Surgical and Medical Sciences, Magna Græcia University, 88100, Catanzaro, Italy.
Aracri F; Department of Surgical and Medical Sciences, Magna Græcia University, 88100, Catanzaro, Italy. federica.aracri@unicz.it.
Guzzi PH; Department of Surgical and Medical Sciences, Magna Græcia University, 88100, Catanzaro, Italy.
Gaspari M; Department of Experimental and Clinical Medicine, Magna Græcia University, 88100, Catanzaro, Italy.
Veltri P; Department of Computers, Modeling, Electronics and Systems Engineering, University of Calabria, 87036, Rende, Italy.
Tradigo G; Department of Theoretical and Applied Sciences, eCampus University, 22060, Novedrate, CO, Italy.

BMC Med Inform Decis Mak ; 24(1): 93, 2024 Apr 08.

Article em En | MEDLINE | ID: mdl-38584282

ABSTRACT

ABSTRACT

Proteomic-based analysis is used to identify biomarkers in blood samples and tissues. Data produced by devices such as mass spectrometry requires platforms to identify and quantify proteins (or peptides). Clinical information can be related to mass spectrometry data to identify diseases at an early stage. Machine learning techniques can be used to support physicians and biologists in studying and classifying pathologies. We present the application of machine learning techniques to define a pipeline aimed at studying and classifying proteomics data enriched using clinical information. The pipeline allows users to relate established blood biomarkers with clinical parameters and proteomics data. The proposed pipeline entails three main phases (i) feature selection, (ii) models training, and (iii) models ensembling. We report the experience of applying such a pipeline to prostate-related diseases. Models have been trained on several biological datasets. We report experimental results about two datasets that result from the integration of clinical and mass spectrometry-based data in the contexts of serum and urine analysis. The pipeline receives input data from blood analytes, tissue samples, proteomic analysis, and urine biomarkers. It then trains different models for feature selection, classification and voting. The presented pipeline has been applied on two datasets obtained in a 2 years research project which aimed to extract hidden information from mass spectrometry, serum, and urine samples from hundreds of patients. We report results on analyzing prostate datasets serum with 143 samples, including 79 PCa and 84 BPH patients, and an urine dataset with 121 samples, including 67 PCa and 54 BPH patients. As results pipeline allowed to identify interesting peptides in the two datasets, 6 for the first one and 2 for the second one. The best model for both serum (AUC=0.87, Accuracy=0.83, F1=0.81, Sensitivity=0.84, Specificity=0.81) and urine (AUC=0.88, Accuracy=0.83, F1=0.83, Sensitivity=0.85, Specificity=0.80) datasets showed good predictive performances. We made the pipeline code available on GitHub and we are confident that it will be successfully adopted in similar clinical setups.

Assuntos

Hiperplasia Prostática; Neoplasias da Próstata; Masculino; Humanos; Proteômica; Próstata; Neoplasias da Próstata/diagnóstico; Aprendizado de Máquina; Biomarcadores; Peptídeos

Palavras-chave

Biological pipeline; Data enhancing; Machine learning; Prostate cancer

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Hiperplasia Prostática / Neoplasias da Próstata Limite: Humans / Male Idioma: En Revista: BMC Med Inform Decis Mak Ano de publicação: 2024 Tipo de documento: Article

Texto completo

Imprimir

XML

PubMed Links

Buscar no Google