Your browser doesn't support javascript.
loading
An Explainable Deep Learning Classifier of Bovine Mastitis Based on Whole-Genome Sequence Data-Circumventing the p >> n Problem.
Kotlarz, Krzysztof; Mielczarek, Magda; Biecek, Przemyslaw; Wojdak-Maksymiec, Katarzyna; Suchocki, Tomasz; Topolski, Piotr; Jagusiak, Wojciech; Szyda, Joanna.
Afiliação
  • Kotlarz K; Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Kozuchowska 7, 51-631 Wroclaw, Poland.
  • Mielczarek M; University Cancer Diagnostic Center, Poznan University of Medical Science, 61-701 Poznan, Poland.
  • Biecek P; Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Kozuchowska 7, 51-631 Wroclaw, Poland.
  • Wojdak-Maksymiec K; University Cancer Diagnostic Center, Poznan University of Medical Science, 61-701 Poznan, Poland.
  • Suchocki T; Faculty of Mathematics, Informatics and Mechanics, University of Warsaw, Banacha 2, 02-097 Warsaw, Poland.
  • Topolski P; Faculty of Mathematics and Information Science, Warsaw University of Technology, 00-662 Warsaw, Poland.
  • Jagusiak W; Department of Genetics and Animal Breeding, West Pomeranian University of Technology, Aleja Piastow 45, 70-311 Szczecin, Poland.
  • Szyda J; Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Kozuchowska 7, 51-631 Wroclaw, Poland.
Int J Mol Sci ; 25(9)2024 Apr 26.
Article em En | MEDLINE | ID: mdl-38731932
ABSTRACT
The serious drawback underlying the biological annotation of whole-genome sequence data is the p >> n problem, which means that the number of polymorphic variants (p) is much larger than the number of available phenotypic records (n). We propose a way to circumvent the problem by combining a LASSO logistic regression with deep learning to classify cows as susceptible or resistant to mastitis, based on single nucleotide polymorphism (SNP) genotypes. Among several architectures, the one with 204,642 SNPs was selected as the best. This architecture was composed of two layers with, respectively, 7 and 46 units per layer implementing respective drop-out rates of 0.210 and 0.358. The classification of the test data resulted in AUC = 0.750, accuracy = 0.650, sensitivity = 0.600, and specificity = 0.700. Significant SNPs were selected based on the SHapley Additive exPlanation (SHAP). As a final result, one GO term related to the biological process and thirteen GO terms related to molecular function were significantly enriched in the gene set that corresponded to the significant SNPs. Our findings revealed that the optimal approach can correctly predict susceptibility or resistance status for approximately 65% of cows. Genes marked by the most significant SNPs are related to the immune response and protein synthesis.
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Polimorfismo de Nucleotídeo Único / Sequenciamento Completo do Genoma / Aprendizado Profundo / Mastite Bovina Limite: Animals Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Polimorfismo de Nucleotídeo Único / Sequenciamento Completo do Genoma / Aprendizado Profundo / Mastite Bovina Limite: Animals Idioma: En Ano de publicação: 2024 Tipo de documento: Article