Your browser doesn't support javascript.
loading
Detecting anomalous proteins using deep representations.
Michael-Pitschaze, Tomer; Cohen, Niv; Ofer, Dan; Hoshen, Yedid; Linial, Michal.
Affiliation
  • Michael-Pitschaze T; The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel.
  • Cohen N; The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel.
  • Ofer D; Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel.
  • Hoshen Y; The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel.
  • Linial M; Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel.
NAR Genom Bioinform ; 6(1): lqae021, 2024 Mar.
Article in En | MEDLINE | ID: mdl-38486884
ABSTRACT
Many advances in biomedicine can be attributed to identifying unusual proteins and genes. Many of these proteins' unique properties were discovered by manual inspection, which is becoming infeasible at the scale of modern protein datasets. Here, we propose to tackle this challenge using anomaly detection methods that automatically identify unexpected properties. We adopt a state-of-the-art anomaly detection paradigm from computer vision, to highlight unusual proteins. We generate meaningful representations without labeled inputs, using pretrained deep neural network models. We apply these protein language models (pLM) to detect anomalies in function, phylogenetic families, and segmentation tasks. We compute protein anomaly scores to highlight human prion-like proteins, distinguish viral proteins from their host proteome, and mark non-classical ion/metal binding proteins and enzymes. Other tasks concern segmentation of protein sequences into folded and unstructured regions. We provide candidates for rare functionality (e.g. prion proteins). Additionally, we show the anomaly score is useful in 3D folding-related segmentation. Our novel method shows improved performance over strong baselines and has objectively high performance across a variety of tasks. We conclude that the combination of pLM and anomaly detection techniques is a valid method for discovering a range of global and local protein characteristics.

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: NAR Genom Bioinform Year: 2024 Document type: Article

Full text: 1 Collection: 01-internacional Database: MEDLINE Language: En Journal: NAR Genom Bioinform Year: 2024 Document type: Article