Search | VHL Regional Portal

Machine-Learning-Based Late Fusion on Multi-Omics and Multi-Scale Data for Non-Small-Cell Lung Cancer Diagnosis.

Carrillo-Perez, Francisco; Morales, Juan Carlos; Castillo-Secilla, Daniel; Gevaert, Olivier; Rojas, Ignacio; Herrera, Luis Javier.

J Pers Med ; 12(4)2022 Apr 08.

Article in English | MEDLINE | ID: mdl-35455716

ABSTRACT

Differentiation between the various non-small-cell lung cancer subtypes is crucial for providing an effective treatment to the patient. For this purpose, machine learning techniques have been used in recent years over the available biological data from patients. However, in most cases this problem has been treated using a single-modality approach, not exploring the potential of the multi-scale and multi-omic nature of cancer data for the classification. In this work, we study the fusion of five multi-scale and multi-omic modalities (RNA-Seq, miRNA-Seq, whole-slide imaging, copy number variation, and DNA methylation) by using a late fusion strategy and machine learning techniques. We train an independent machine learning model for each modality and we explore the interactions and gains that can be obtained by fusing their outputs in an increasing manner, by using a novel optimization approach to compute the parameters of the late fusion. The final classification model, using all modalities, obtains an F1 score of 96.81±1.07, an AUC of 0.993±0.004, and an AUPRC of 0.980±0.016, improving those results that each independent model obtains and those presented in the literature for this problem. These obtained results show that leveraging the multi-scale and multi-omic nature of cancer data can enhance the performance of single-modality clinical decision support systems in personalized medicine, consequently improving the diagnosis of the patient.

Non-small-cell lung cancer classification via RNA-Seq and histology imaging probability fusion.

Carrillo-Perez, Francisco; Morales, Juan Carlos; Castillo-Secilla, Daniel; Molina-Castro, Yésica; Guillén, Alberto; Rojas, Ignacio; Herrera, Luis Javier.

BMC Bioinformatics ; 22(1): 454, 2021 Sep 22.

Article in English | MEDLINE | ID: mdl-34551733

ABSTRACT

BACKGROUND: Adenocarcinoma and squamous cell carcinoma are the two most prevalent lung cancer types, and their distinction requires different screenings, such as the visual inspection of histology slides by an expert pathologist, the analysis of gene expression or computer tomography scans, among others. In recent years, there has been an increasing gathering of biological data for decision support systems in the diagnosis (e.g. histology imaging, next-generation sequencing technologies data, clinical information, etc.). Using all these sources to design integrative classification approaches may improve the final diagnosis of a patient, in the same way that doctors can use multiple types of screenings to reach a final decision on the diagnosis. In this work, we present a late fusion classification model using histology and RNA-Seq data for adenocarcinoma, squamous-cell carcinoma and healthy lung tissue. RESULTS: The classification model improves results over using each source of information separately, being able to reduce the diagnosis error rate up to a 64% over the isolate histology classifier and a 24% over the isolate gene expression classifier, reaching a mean F1-Score of 95.19% and a mean AUC of 0.991. CONCLUSIONS: These findings suggest that a classification model using a late fusion methodology can considerably help clinicians in the diagnosis between the aforementioned lung cancer cancer subtypes over using each source of information separately. This approach can also be applied to any cancer type or disease with heterogeneous sources of information.

Subject(s)

Adenocarcinoma , Carcinoma, Non-Small-Cell Lung , Lung Neoplasms , Carcinoma, Non-Small-Cell Lung/diagnostic imaging , Carcinoma, Non-Small-Cell Lung/genetics , Humans , Lung Neoplasms/genetics , Probability , RNA-Seq

KnowSeq R-Bioc package: The automatic smart gene expression tool for retrieving relevant biological knowledge.

Castillo-Secilla, Daniel; Gálvez, Juan Manuel; Carrillo-Perez, Francisco; Verona-Almeida, Marta; Redondo-Sánchez, Daniel; Ortuno, Francisco Manuel; Herrera, Luis Javier; Rojas, Ignacio.

Comput Biol Med ; 133: 104387, 2021 06.

Article in English | MEDLINE | ID: mdl-33872966

ABSTRACT

KnowSeq R/Bioc package is designed as a powerful, scalable and modular software focused on automatizing and assembling renowned bioinformatic tools with new features and functionalities. It comprises a unified environment to perform complex gene expression analyses, covering all the needed processing steps to identify a gene signature for a specific disease to gather understandable knowledge. This process may be initiated from raw files either available at well-known platforms or provided by the users themselves, and in either case coming from different information sources and different Transcriptomic technologies. The pipeline makes use of a set of advanced algorithms, including the adaptation of a novel procedure for the selection of the most representative genes in a given multiclass problem. Similarly, an intelligent system able to classify new patients, providing the user the opportunity to choose one among a number of well-known and widespread classification and feature selection methods in Bioinformatics, is embedded. Furthermore, KnowSeq is engineered to automatically develop a complete and detailed HTML report of the whole process which is also modular and scalable. Biclass breast cancer and multiclass lung cancer study cases were addressed to rigorously assess the usability and efficiency of KnowSeq. The models built by using the Differential Expressed Genes achieved from both experiments reach high classification rates. Furthermore, biological knowledge was extracted in terms of Gene Ontologies, Pathways and related diseases with the aim of helping the expert in the decision-making process. KnowSeq is available at Bioconductor (https://bioconductor.org/packages/KnowSeq), GitHub (https://github.com/CasedUgr/KnowSeq) and Docker (https://hub.docker.com/r/casedugr/knowseq).

Subject(s)

Computational Biology , Software , Algorithms , Humans , Transcriptome

Towards Improving Skin Cancer Diagnosis by Integrating Microarray and RNA-Seq Datasets.

Galvez, Juan M; Castillo-Secilla, Daniel; Herrera, Luis J; Valenzuela, Olga; Caba, Octavio; Prados, Jose C; Ortuno, Francisco M; Rojas, Ignacio.

IEEE J Biomed Health Inform ; 24(7): 2119-2130, 2020 07.

Article in English | MEDLINE | ID: mdl-31871000

ABSTRACT

Many clinical studies have revealed the high biological similarities existing among different skin pathological states. These similarities create difficulties in the efficient diagnosis of skin cancer, and encourage to study and design new intelligent clinical decision support systems. In this sense, gene expression analysis can help find differentially expressed genes (DEGs) simultaneously discerning multiple skin pathological states in a single test. The integration of multiple heterogeneous transcriptomic datasets requires different pipeline stages to be properly designed: from suitable batch merging and efficient biomarker selection to automated classification assessment. This article presents a novel approach addressing all these technical issues, with the intention of providing new sights about skin cancer diagnosis. Although new future efforts will have to be made in the search for better biomarkers recognizing specific skin pathological states, our study found a panel of 8 highly relevant multiclass DEGs for discerning up to 10 skin pathological states: 2 healthy skin conditions a priori, 2 cataloged precancerous skin diseases and 6 cancerous skin states. Their power of diagnosis over new samples was widely tested by previously well-trained classification models. Robust performance metrics such as overall and mean multiclass F1-score outperformed recognition rates of 94% and 80%, respectively. Clinicians should give special attention to highlighted multiclass DEGs that have high gene expression changes present among them, and understand their biological relationship to different skin pathological states.

Subject(s)

Diagnosis, Computer-Assisted/methods , Gene Expression Profiling/methods , Machine Learning , RNA-Seq/methods , Skin Neoplasms/diagnosis , Biomarkers, Tumor/analysis , Biomarkers, Tumor/genetics , Biomarkers, Tumor/metabolism , Computational Biology , Humans , Skin Neoplasms/genetics , Skin Neoplasms/metabolism

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL