Your browser doesn't support javascript.
loading
Improved prediction of smoking status via isoform-aware RNA-seq deep learning models.
Wang, Zifeng; Masoomi, Aria; Xu, Zhonghui; Boueiz, Adel; Lee, Sool; Zhao, Tingting; Bowler, Russell; Cho, Michael; Silverman, Edwin K; Hersh, Craig; Dy, Jennifer; Castaldi, Peter J.
Afiliación
  • Wang Z; Department of ECE, Northeastern University, Boston, Massachusetts, United States.
  • Masoomi A; Department of ECE, Northeastern University, Boston, Massachusetts, United States.
  • Xu Z; Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States.
  • Boueiz A; Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States.
  • Lee S; Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States.
  • Zhao T; Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States.
  • Bowler R; Department of ECE, Northeastern University, Boston, Massachusetts, United States.
  • Cho M; Division of Pulmonary and Critical Care Medicine, National Jewish Health, Denver, Colorado, United States.
  • Silverman EK; Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States.
  • Hersh C; Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States.
  • Dy J; Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States.
  • Castaldi PJ; Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, Massachusetts, United States.
PLoS Comput Biol ; 17(10): e1009433, 2021 10.
Article en En | MEDLINE | ID: mdl-34634029
ABSTRACT
Most predictive models based on gene expression data do not leverage information related to gene splicing, despite the fact that splicing is a fundamental feature of eukaryotic gene expression. Cigarette smoking is an important environmental risk factor for many diseases, and it has profound effects on gene expression. Using smoking status as a prediction target, we developed deep neural network predictive models using gene, exon, and isoform level quantifications from RNA sequencing data in 2,557 subjects in the COPDGene Study. We observed that models using exon and isoform quantifications clearly outperformed gene-level models when using data from 5 genes from a previously published prediction model. Whereas the test set performance of the previously published model was 0.82 in the original publication, our exon-based models including an exon-to-isoform mapping layer achieved a test set AUC (area under the receiver operating characteristic) of 0.88, which improved to an AUC of 0.94 using exon quantifications from a larger set of genes. Isoform variability is an important source of latent information in RNA-seq data that can be used to improve clinical prediction models.
Asunto(s)

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Fumar / Modelos Estadísticos / Aprendizaje Profundo / RNA-Seq Tipo de estudio: Prognostic_studies / Risk_factors_studies Límite: Aged / Female / Humans / Male / Middle aged Idioma: En Revista: PLoS Comput Biol Asunto de la revista: BIOLOGIA / INFORMATICA MEDICA Año: 2021 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Fumar / Modelos Estadísticos / Aprendizaje Profundo / RNA-Seq Tipo de estudio: Prognostic_studies / Risk_factors_studies Límite: Aged / Female / Humans / Male / Middle aged Idioma: En Revista: PLoS Comput Biol Asunto de la revista: BIOLOGIA / INFORMATICA MEDICA Año: 2021 Tipo del documento: Article País de afiliación: Estados Unidos