Your browser doesn't support javascript.
loading
Int&in: A machine learning-based web server for active split site identification in inteins.
Schmitz, Mirko; Ballestin, Jara Ballestin; Liang, Junsheng; Tomas, Franziska; Freist, Leon; Voigt, Karsten; Di Ventura, Barbara; Öztürk, Mehmet Ali.
Afiliação
  • Schmitz M; BIOSS and CIBSS Research Signalling Centers, University of Freiburg, Freiburg, Germany.
  • Ballestin JB; Institute of Biology II, University of Freiburg, Freiburg, Germany.
  • Liang J; 4HF Biotec GmbH, Freiburg, Germany.
  • Tomas F; BIOSS and CIBSS Research Signalling Centers, University of Freiburg, Freiburg, Germany.
  • Freist L; Institute of Biology II, University of Freiburg, Freiburg, Germany.
  • Voigt K; Bioprocess Innovation Unit, ViraTherapeutics GmbH, Rum, Austria.
  • Di Ventura B; BIOSS and CIBSS Research Signalling Centers, University of Freiburg, Freiburg, Germany.
  • Öztürk MA; Institute of Biology II, University of Freiburg, Freiburg, Germany.
Protein Sci ; 33(6): e4985, 2024 Jun.
Article em En | MEDLINE | ID: mdl-38717278
ABSTRACT
Inteins are proteins that excise themselves out of host proteins and ligate the flanking polypeptides in an auto-catalytic process called protein splicing. In nature, inteins are either contiguous or split. In the case of split inteins, the two fragments must first form a complex for the splicing to occur. Contiguous inteins have previously been artificially split in two fragments because split inteins allow for distinct applications than contiguous ones. Even naturally split inteins have been split at unnatural split sites to obtain fragments with reduced affinity for one another, which are useful to create conditional inteins or to study protein-protein interactions. So far, split sites in inteins have been heuristically identified. We developed Int&in, a web server freely available for academic research (https//intein.biologie.uni-freiburg.de) that runs a machine learning model using logistic regression to predict active and inactive split sites in inteins with high accuracy. The model was trained on a dataset of 126 split sites generated using the gp41-1, Npu DnaE and CL inteins and validated using 97 split sites extracted from the literature. Despite the limited data size, the model, which uses various protein structural features, as well as sequence conservation information, achieves an accuracy of 0.79 and 0.78 for the training and testing sets, respectively. We envision Int&in will facilitate the engineering of novel split inteins for applications in synthetic and cell biology.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Software / Processamento de Proteína / Internet / Inteínas / Aprendizado de Máquina Idioma: En Revista: Protein Sci Assunto da revista: BIOQUIMICA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Alemanha

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Software / Processamento de Proteína / Internet / Inteínas / Aprendizado de Máquina Idioma: En Revista: Protein Sci Assunto da revista: BIOQUIMICA Ano de publicação: 2024 Tipo de documento: Article País de afiliação: Alemanha