Your browser doesn't support javascript.
loading
PLMSearch: Protein language model powers accurate and fast sequence search for remote homology.
Liu, Wei; Wang, Ziye; You, Ronghui; Xie, Chenghan; Wei, Hong; Xiong, Yi; Yang, Jianyi; Zhu, Shanfeng.
Afiliación
  • Liu W; Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, 200433, Shanghai, China.
  • Wang Z; Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, 200433, Shanghai, China.
  • You R; Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, 200433, Shanghai, China.
  • Xie C; School of Mathematical Sciences, Fudan University, 200433, Shanghai, China.
  • Wei H; School of Mathematical Sciences, Nankai University, 300071, Tianjin, China.
  • Xiong Y; Department of Bioinformatics and Biostatistics, Shanghai Jiao Tong University, 200240, Shanghai, China.
  • Yang J; Ministry of Education Frontiers Science Center for Nonlinear Expectations, Research Center for Mathematics and Interdisciplinary Science, Shandong University, 266237, Qingdao, China. yangjy@sdu.edu.cn.
  • Zhu S; Institute of Science and Technology for Brain-Inspired Intelligence and MOE Frontiers Center for Brain Science, Fudan University, 200433, Shanghai, China. zhusf@fudan.edu.cn.
Nat Commun ; 15(1): 2775, 2024 Mar 30.
Article en En | MEDLINE | ID: mdl-38555371
ABSTRACT
Homologous protein search is one of the most commonly used methods for protein annotation and analysis. Compared to structure search, detecting distant evolutionary relationships from sequences alone remains challenging. Here we propose PLMSearch (Protein Language Model), a homologous protein search method with only sequences as input. PLMSearch uses deep representations from a pre-trained protein language model and trains the similarity prediction model with a large number of real structure similarity. This enables PLMSearch to capture the remote homology information concealed behind the sequences. Extensive experimental results show that PLMSearch can search millions of query-target protein pairs in seconds like MMseqs2 while increasing the sensitivity by more than threefold, and is comparable to state-of-the-art structure search methods. In particular, unlike traditional sequence search methods, PLMSearch can recall most remote homology pairs with dissimilar sequences but similar structures. PLMSearch is freely available at https//dmiip.sjtu.edu.cn/PLMSearch .
Asunto(s)

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Proteínas / Evolución Biológica Idioma: En Revista: Nat Commun Asunto de la revista: BIOLOGIA / CIENCIA Año: 2024 Tipo del documento: Article País de afiliación: China

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Asunto principal: Proteínas / Evolución Biológica Idioma: En Revista: Nat Commun Asunto de la revista: BIOLOGIA / CIENCIA Año: 2024 Tipo del documento: Article País de afiliación: China
...