Your browser doesn't support javascript.
loading
A representation and deep learning model for annotating ubiquitylation sentences stating E3 ligase - substrate interaction.
Luo, Mengqi; Li, Zhongyan; Li, Shangfu; Lee, Tzong-Yi.
Afiliación
  • Luo M; Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, China.
  • Li Z; School of Life Sciences, University of Science and Technology of China, Hefei, China.
  • Li S; School of Life and Health Sciences, The Chinese University of Hong Kong, Shenzhen, China.
  • Lee TY; Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, China.
BMC Bioinformatics ; 22(1): 507, 2021 Oct 18.
Article en En | MEDLINE | ID: mdl-34663215
BACKGROUND: Ubiquitylation is an important post-translational modification of proteins that not only plays a central role in cellular coding, but is also closely associated with the development of a variety of diseases. The specific selection of substrate by ligase E3 is the key in ubiquitylation. As various high-throughput analytical techniques continue to be applied to the study of ubiquitylation, a large amount of ubiquitylation site data, and records of E3-substrate interactions continue to be generated. Biomedical literature is an important vehicle for information on E3-substrate interactions in ubiquitylation and related new discoveries, as well as an important channel for researchers to obtain such up to date data. The continuous explosion of ubiquitylation related literature poses a great challenge to researchers in acquiring and analyzing the information. Therefore, automatic annotation of these E3-substrate interaction sentences from the available literature is urgently needed. RESULTS: In this research, we proposed a model based on representation and attention mechanism based deep learning methods, to automatic annotate E3-substrate interaction sentences in biomedical literature. Focusing on the sentences with E3 protein inside, we applied several natural language processing methods and a Long Short-Term Memory (LSTM)-based deep learning classifier to train the model. Experimental results had proved the effectiveness of our proposed model. And also, the proposed attention mechanism deep learning method outperforms other statistical machine learning methods. We also created a manual corpus of E3-substrate interaction sentences, in which the E3 proteins and substrate proteins are also labeled, in order to construct our model. The corpus and model proposed by our research are definitely able to be very useful and valuable resource for advancement of ubiquitylation-related research. CONCLUSION: Having the entire manual corpus of E3-substrate interaction sentences readily available in electronic form will greatly facilitate subsequent text mining and machine learning analyses. Automatic annotating ubiquitylation sentences stating E3 ligase-substrate interaction is significantly benefited from semantic representation and deep learning. The model enables rapid information accessing and can assist in further screening of key ubiquitylation ligase substrates for in-depth studies.
Asunto(s)
Palabras clave

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: Ubiquitina-Proteína Ligasas / Aprendizaje Profundo Idioma: En Revista: BMC Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2021 Tipo del documento: Article País de afiliación: China

Texto completo: 1 Bases de datos: MEDLINE Asunto principal: Ubiquitina-Proteína Ligasas / Aprendizaje Profundo Idioma: En Revista: BMC Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2021 Tipo del documento: Article País de afiliación: China