Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
ACS Omega ; 8(44): 41930-41942, 2023 Nov 07.
Artigo em Inglês | MEDLINE | ID: mdl-37969991

RESUMO

As one of the most important post-translational modifications (PTM), lysine acetylation (Kace) plays an important role in various biological activities. Traditional experimental methods for identifying Kace sites are inefficient and expensive. Instead, several machine learning methods have been developed for Kace site prediction, and hand-crafted features have been used to encode the protein sequences. However, there are still two challenges: the complex biological information may be under-represented by these manmade features and the small sample issue of some species needs to be addressed. We propose a novel model, MSTL-Kace, which was developed based on transfer learning strategy with pretrained bidirectional encoder representations from transformers (BERT) model. In this model, the high-level embeddings were extracted from species-specific BERT models, and a two-stage fine-tuning strategy was used to deal with small sample issue. Specifically, a domain-specific BERT model was pretrained using all of the sequences in our data sets, which was then fine-tuned, or two-stage fine-tuned based on the training data set of each species to obtain the species-specific BERT models. Afterward, the embeddings of residues were extracted from the fine-tuned model and fed to the different downstream learning algorithms. After comparison, the best model for the six prokaryotic species was built by using a random forest. The results for the independent test sets show that our model outperforms the state-of-the-art methods on all six species. The source codes and data for MSTL-Kace are available at https://github.com/leo97king/MSTL-Kace.

2.
Comput Biol Chem ; 107: 107970, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37866116

RESUMO

The identification of hotspot residues at the protein-DNA binding interfaces plays a crucial role in various aspects such as drug discovery and disease treatment. Although experimental methods such as alanine scanning mutagenesis have been developed to determine the hotspot residues on protein-DNA interfaces, they are both inefficient and costly. Therefore, it is highly necessary to develop efficient and accurate computational methods for predicting hotspot residues. Several computational methods have been developed, however, they are mainly based on hand-crafted features which may not be able to represent all the information of proteins. In this regard, we propose a model called PDH-EH, which utilizes fused features of embeddings extracted from a protein language model (PLM) and handcrafted features. After we extracted the total 1141 dimensional features, we used mRMR to select the optimal feature subset. Based on the optimal feature subset, several different learning algorithms such as Random Forest, Support Vector Machine, and XGBoost were used to build the models. The cross-validation results on the training dataset show that the model built by using Random Forest achieves the highest AUROC. Further evaluation on the independent test set shows that our model outperforms the existing state-of-the-art models. Moreover, the effectiveness and interpretability of embeddings extracted from PLM were demonstrated in our analysis. The codes and datasets used in this study are available at: https://github.com/lixiangli01/PDH-EH.


Assuntos
Algoritmos , Proteínas , Bases de Dados de Proteínas , Proteínas/química , Ligação Proteica , DNA/química
3.
Front Bioinform ; 2: 834153, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36304324

RESUMO

As one of the most important posttranslational modifications (PTMs), protein lysine glycation changes the characteristics of the proteins and leads to the dysfunction of the proteins, which may cause diseases. Accurately detecting the glycation sites is of great benefit for understanding the biological function and potential mechanism of glycation in the treatment of diseases. However, experimental methods are expensive and time-consuming for lysine glycation site identification. Instead, computational methods, with their higher efficiency and lower cost, could be an important supplement to the experimental methods. In this study, we proposed a novel predictor, BERT-Kgly, for protein lysine glycation site prediction, which was developed by extracting embedding features of protein segments from pretrained Bidirectional Encoder Representations from Transformers (BERT) models. Three pretrained BERT models were explored to get the embeddings with optimal representability, and three downstream deep networks were employed to build our models. Our results showed that the model based on embeddings extracted from the BERT model pretrained on 556,603 protein sequences of UniProt outperforms other models. In addition, an independent test set was used to evaluate and compare our model with other existing methods, which indicated that our model was superior to other existing models.

4.
Org Lett ; 23(14): 5545-5548, 2021 07 16.
Artigo em Inglês | MEDLINE | ID: mdl-34231355

RESUMO

Deuteriodifluoromethythio (SCF2D) and deuteriodifluoromethyl (CF2D) are important functional groups in pharmaceutical and agrochemical compounds, and the introduction of these functional groups remains a challenging. We herein report a robust reagent for deuteriodifluoromethylthiolation and deuteriodifluoromethylation. Its potentials were successfully showcased by deuteriodifluoromethylation and deuteriodifluoromethylthiolation of indoles with high-level deuterium incorporation. The reagent also has potential for deuteriodifluoromethylation and deuteriodifluoromethylthiolation of wide range of other natural or synthetic bioactive molecules.


Assuntos
Deutério/química , Sódio/química , Indicadores e Reagentes , Indóis/química , Metilação , Estrutura Molecular
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...