Pesquisa | Portal de Pesquisa da BVS Enfermagem

Explainable multi-task learning improves the parallel estimation of polygenic risk scores for many diseases through shared genetic basis.

Badré, Adrien; Pan, Chongle.

PLoS Comput Biol ; 19(7): e1011211, 2023 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-37418352

RESUMO

Many complex diseases share common genetic determinants and are comorbid in a population. We hypothesized that the co-occurrences of diseases and their overlapping genetic etiology can be exploited to simultaneously improve multiple diseases' polygenic risk scores (PRS). This hypothesis was tested using a multi-task learning (MTL) approach based on an explainable neural network architecture. We found that parallel estimations of the PRS for 17 prevalent cancers in a pan-cancer MTL model were generally more accurate than independent estimations for individual cancers in comparable single-task learning (STL) models. Such performance improvement conferred by positive transfer learning was also observed consistently for 60 prevalent non-cancer diseases in a pan-disease MTL model. Interpretation of the MTL models revealed significant genetic correlations between the important sets of single nucleotide polymorphisms used by the neural network for PRS estimation. This suggested a well-connected network of diseases with shared genetic basis.

Assuntos

Aprendizagem , Redes Neurais de Computação , Humanos , Fatores de Risco , Herança Multifatorial/genética , Polimorfismo de Nucleotídeo Único/genética , Predisposição Genética para Doença/genética

Deep neural network improves the estimation of polygenic risk scores for breast cancer.

Badré, Adrien; Zhang, Li; Muchero, Wellington; Reynolds, Justin C; Pan, Chongle.

J Hum Genet ; 66(4): 359-369, 2021 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-33009504

RESUMO

Polygenic risk scores (PRS) estimate the genetic risk of an individual for a complex disease based on many genetic variants across the whole genome. In this study, we compared a series of computational models for estimation of breast cancer PRS. A deep neural network (DNN) was found to outperform alternative machine learning techniques and established statistical algorithms, including BLUP, BayesA, and LDpred. In the test cohort with 50% prevalence, the Area Under the receiver operating characteristic Curve (AUC) were 67.4% for DNN, 64.2% for BLUP, 64.5% for BayesA, and 62.4% for LDpred. BLUP, BayesA, and LPpred all generated PRS that followed a normal distribution in the case population. However, the PRS generated by DNN in the case population followed a bimodal distribution composed of two normal distributions with distinctly different means. This suggests that DNN was able to separate the case population into a high-genetic-risk case subpopulation with an average PRS significantly higher than the control population and a normal-genetic-risk case subpopulation with an average PRS similar to the control population. This allowed DNN to achieve 18.8% recall at 90% precision in the test cohort with 50% prevalence, which can be extrapolated to 65.4% recall at 20% precision in a general population with 12% prevalence. Interpretation of the DNN model identified salient variants that were assigned insignificant p values by association studies, but were important for DNN prediction. These variants may be associated with the phenotype through nonlinear relationships.

Assuntos

Biomarcadores Tumorais/genética , Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Predisposição Genética para Doença , Herança Multifatorial , Redes Neurais de Computação , Polimorfismo de Nucleotídeo Único , Algoritmos , Estudos de Casos e Controles , Feminino , Estudo de Associação Genômica Ampla , Humanos , Fenótipo , Curva ROC , Fatores de Risco

LINA: A Linearizing Neural Network Architecture for Accurate First-Order and Second-Order Interpretations.

Badré, Adrien; Pan, Chongle.

IEEE Access ; 10: 36166-36176, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35462722

RESUMO

While neural networks can provide high predictive performance, it was a challenge to identify the salient features and important feature interactions used for their predictions. This represented a key hurdle for deploying neural networks in many biomedical applications that require interpretability, including predictive genomics. In this paper, linearizing neural network architecture (LINA) was developed here to provide both the first-order and the second-order interpretations on both the instance-wise and the model-wise levels. LINA combines the representational capacity of a deep inner attention neural network with a linearized intermediate representation for model interpretation. In comparison with DeepLIFT, LIME, Grad*Input and L2X, the first-order interpretation of LINA had better Spearman correlation with the ground-truth importance rankings of features in synthetic datasets. In comparison with NID and GEH, the second-order interpretation results from LINA achieved better precision for identification of the ground-truth feature interactions in synthetic datasets. These algorithms were further benchmarked using predictive genomics as a real-world application. LINA identified larger numbers of important single nucleotide polymorphisms (SNPs) and salient SNP interactions than the other algorithms at given false discovery rates. The results showed accurate and versatile model interpretation using LINA.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA