Your browser doesn't support javascript.
loading
TULIP: An RNA-seq-based Primary Tumor Type Prediction Tool Using Convolutional Neural Networks.
Jones, Sara; Beyers, Matthew; Shukla, Maulik; Xia, Fangfang; Brettin, Thomas; Stevens, Rick; Weil, M Ryan; Ranganathan Ganakammal, Satishkumar.
Afiliação
  • Jones S; Frederick National Laboratory for Cancer Research, Cancer Data Science Initiatives, Cancer Research Technology Program, Rockville, MD, USA.
  • Beyers M; Frederick National Laboratory for Cancer Research, Cancer Data Science Initiatives, Cancer Research Technology Program, Rockville, MD, USA.
  • Shukla M; Argonne National Laboratory, Computing, Environment and Life Sciences, Lemont, IL, USA.
  • Xia F; Argonne National Laboratory, Computing, Environment and Life Sciences, Lemont, IL, USA.
  • Brettin T; Argonne National Laboratory, Computing, Environment and Life Sciences, Lemont, IL, USA.
  • Stevens R; Argonne National Laboratory, Computing, Environment and Life Sciences, Lemont, IL, USA.
  • Weil MR; Frederick National Laboratory for Cancer Research, Cancer Data Science Initiatives, Cancer Research Technology Program, Rockville, MD, USA.
  • Ranganathan Ganakammal S; Frederick National Laboratory for Cancer Research, Cancer Data Science Initiatives, Cancer Research Technology Program, Rockville, MD, USA.
Cancer Inform ; 21: 11769351221139491, 2022.
Article em En | MEDLINE | ID: mdl-36507076
ABSTRACT

Background:

With cancer as one of the leading causes of death worldwide, accurate primary tumor type prediction is critical in identifying genetic factors that can inhibit or slow tumor progression. There have been efforts to categorize primary tumor types with gene expression data using machine learning, and more recently with deep learning, in the last several years.

Methods:

In this paper, we developed four 1-dimensional (1D) Convolutional Neural Network (CNN) models to classify RNA-seq count data as one of 17 highly represented primary tumor types or 32 primary tumor types regardless of imbalanced representation. Additionally, we adapted the models to take as input either all Ensembl genes (60,483) or protein coding genes only (19,758). Unlike previous work, we avoided selection bias by not filtering genes based on expression values. RNA-seq count data expressed as FPKM-UQ of 9,025 and 10,940 samples from The Cancer Genome Atlas (TCGA) were downloaded from the Genomic Data Commons (GDC) corresponding to 17 and 32 primary tumor types respectively for training and validating the models.

Results:

All 4 1D-CNN models had an overall accuracy of 94.7% to 97.6% on the test dataset. Further evaluation indicates that the models with protein coding genes only as features performed with better accuracy compared to the models with all Ensembl genes for both 17 and 32 primary tumor types. For all models, the accuracy by primary tumor type was above 80% for most primary tumor types.

Conclusions:

We packaged all 4 models as a Python-based deep learning classification tool called TULIP (TUmor CLassIfication Predictor) for performing quality control on primary tumor samples and characterizing cancer samples of unknown tumor type. Further optimization of the models is needed to improve the accuracy of certain primary tumor types.
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Revista: Cancer Inform Ano de publicação: 2022 Tipo de documento: Article País de afiliação: Estados Unidos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Revista: Cancer Inform Ano de publicação: 2022 Tipo de documento: Article País de afiliação: Estados Unidos