Your browser doesn't support javascript.
loading
Transcriptomics and machine learning to advance schizophrenia genetics: A case-control study using post-mortem brain data.
Qi, Bill; Boscenco, Sonia; Ramamurthy, Janani; Trakadis, Yannis J.
Afiliação
  • Qi B; Department of Human Genetics, McGill University, Montreal, QC, Canada.
  • Boscenco S; Faculty of Science, McGill University, Montreal, QC, Canada.
  • Ramamurthy J; Faculty of Science, McGill University, Montreal, QC, Canada.
  • Trakadis YJ; Department of Human Genetics, McGill University, Montreal, QC, Canada; Department of Medical Genetics, McGill University Health Center, Montreal, QC, Canada. Electronic address: yannis.trakadis@mcgill.ca.
Comput Methods Programs Biomed ; 214: 106590, 2022 Feb.
Article em En | MEDLINE | ID: mdl-34954633
ABSTRACT
BACKGROUND AND

OBJECTIVE:

Alterations of the expression of a variety of genes have been reported in patients with schizophrenia (SCZ). Moreover, machine learning (ML) analysis of gene expression microarray data has shown promising preliminary results in the study of SCZ. Our objective was to evaluate the performance of ML in classifying SCZ cases and controls based on gene expression microarray data from the dorsolateral prefrontal cortex.

METHODS:

We apply a state-of-the-art ML algorithm (XGBoost) to train and evaluate a classification model using 201 SCZ cases and 278 controls. We utilized 10-fold cross-validation for model selection, and a held-out testing set to evaluate the model. The performance metric utilizes to evaluate classification performance was the area under the receiver-operator characteristics curve (AUC).

RESULTS:

We report an average AUC on 10-fold cross-validation of 0.76 and an AUC of 0.76 on testing data, not used during training. Analysis of the rolling balanced classification accuracy from high to low prediction confidence levels showed that the most certain subset of predictions ranged between 80-90%. The ML model utilized 182 gene expression probes. Further improvement to classification performance was observed when applying an automated ML strategy on the 182 features, which achieved an AUC of 0.79 on the same testing data. We found literature evidence linking all of the top ten ML ranked genes to SCZ. Furthermore, we leveraged information from the full set of microarray gene expressions available via univariate differential gene expression analysis. We then prioritized differentially expressed gene sets using the piano gene set analysis package. We augmented the ranking of the prioritized gene sets with genes from the complex multivariate ML model using hypergeometric tests to identify more robust gene sets. We identified two significant Gene Ontology molecular function gene sets "oxidoreductase activity, acting on the CH-NH2 group of donors" and "integrin binding." Lastly, we present candidate treatments for SCZ based on findings from our study

CONCLUSIONS:

Overall, we observed above-chance performance from ML classification of SCZ cases and controls based on brain gene expression microarray data, and found that ML analysis of gene expressions could further our understanding of the pathophysiology of SCZ and help identify novel treatments.
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Esquizofrenia Idioma: En Ano de publicação: 2022 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Esquizofrenia Idioma: En Ano de publicação: 2022 Tipo de documento: Article