Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Más filtros

Bases de datos
Tipo del documento
País de afiliación
Intervalo de año de publicación
1.
Comput Biol Med ; 173: 108339, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38547658

RESUMEN

The application of Artificial Intelligence (AI) to screen drug molecules with potential therapeutic effects has revolutionized the drug discovery process, with significantly lower economic cost and time consumption than the traditional drug discovery pipeline. With the great power of AI, it is possible to rapidly search the vast chemical space for potential drug-target interactions (DTIs) between candidate drug molecules and disease protein targets. However, only a small proportion of molecules have labelled DTIs, consequently limiting the performance of AI-based drug screening. To solve this problem, a machine learning-based approach with great ability to generalize DTI prediction across molecules is desirable. Many existing machine learning approaches for DTI identification failed to exploit the full information with respect to the topological structures of candidate molecules. To develop a better approach for DTI prediction, we propose GraphormerDTI, which employs the powerful Graph Transformer neural network to model molecular structures. GraphormerDTI embeds molecular graphs into vector-format representations through iterative Transformer-based message passing, which encodes molecules' structural characteristics by node centrality encoding, node spatial encoding and edge encoding. With a strong structural inductive bias, the proposed GraphormerDTI approach can effectively infer informative representations for out-of-sample molecules and as such, it is capable of predicting DTIs across molecules with an exceptional performance. GraphormerDTI integrates the Graph Transformer neural network with a 1-dimensional Convolutional Neural Network (1D-CNN) to extract the drugs' and target proteins' representations and leverages an attention mechanism to model the interactions between them. To examine GraphormerDTI's performance for DTI prediction, we conduct experiments on three benchmark datasets, where GraphormerDTI achieves a superior performance than five state-of-the-art baselines for out-of-molecule DTI prediction, including GNN-CPI, GNN-PT, DeepEmbedding-DTI, MolTrans and HyperAttentionDTI, and is on a par with the best baseline for transductive DTI prediction. The source codes and datasets are publicly accessible at https://github.com/mengmeng34/GraphormerDTI.


Asunto(s)
Inteligencia Artificial , Descubrimiento de Drogas , Evaluación Preclínica de Medicamentos , Redes Neurales de la Computación , Benchmarking
2.
Mol Biosyst ; 12(9): 2849-58, 2016 08 16.
Artículo en Inglés | MEDLINE | ID: mdl-27364688

RESUMEN

Protein S-sulfenylation (SOH) is a type of post-translational modification through the oxidation of cysteine thiols to sulfenic acids. It acts as a redox switch to modulate versatile cellular processes and plays important roles in signal transduction, protein folding and enzymatic catalysis. Reversible SOH is also a key component for maintaining redox homeostasis and has been implicated in a variety of human diseases, such as cancer, diabetes, and atherosclerosis, due to redox imbalance. Despite its significance, the in situ trapping of the entire 'sulfenome' remains a major challenge. Yang et al. have recently experimentally identified about 1000 SOH sites, providing an enriched benchmark SOH dataset. In this work, we developed a new ensemble learning tool SOHPRED for identifying protein SOH sites based on the compositions of enriched amino acids and the physicochemical properties of residues surrounding SOH sites. SOHPRED was built based on four complementary predictors, i.e. a naive Bayesian predictor, a random forest predictor and two support vector machine predictors, whose training features are, respectively, amino acid occurrences, physicochemical properties, frequencies of k-spaced amino acid pairs and sequence profiles. Benchmarking experiments on the 5-fold cross validation and independent tests show that SOHPRED achieved AUC values of 0.784 and 0.799, respectively, which outperforms several previously developed tools. As a real application of SOHPRED, we predicted potential SOH sites for 193 S-sulfenylated substrates, which had been experimentally detected through a global sulfenome profiling in living cells, though the actual SOH sites were not determined. The web server of SOHPRED has been made publicly available at for the wider research community. The source codes and the benchmark datasets can be downloaded from the website.


Asunto(s)
Biología Computacional/métodos , Cisteína/metabolismo , Procesamiento Proteico-Postraduccional , Algoritmos , Secuencia de Aminoácidos , Teorema de Bayes , Catálisis , Cisteína/química , Conjuntos de Datos como Asunto , Humanos , Oxidación-Reducción , Péptidos/química , Péptidos/metabolismo , Posición Específica de Matrices de Puntuación , Pliegue de Proteína , Curva ROC , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Ácidos Sulfénicos/química , Compuestos de Sulfhidrilo/química , Máquina de Vectores de Soporte , Navegador Web
3.
Bioinformatics ; 31(9): 1411-9, 2015 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-25568279

RESUMEN

MOTIVATION: Glycosylation is a ubiquitous type of protein post-translational modification (PTM) in eukaryotic cells, which plays vital roles in various biological processes (BPs) such as cellular communication, ligand recognition and subcellular recognition. It is estimated that >50% of the entire human proteome is glycosylated. However, it is still a significant challenge to identify glycosylation sites, which requires expensive/laborious experimental research. Thus, bioinformatics approaches that can predict the glycan occupancy at specific sequons in protein sequences would be useful for understanding and utilizing this important PTM. RESULTS: In this study, we present a novel bioinformatics tool called GlycoMine, which is a comprehensive tool for the systematic in silico identification of C-linked, N-linked, and O-linked glycosylation sites in the human proteome. GlycoMine was developed using the random forest algorithm and evaluated based on a well-prepared up-to-date benchmark dataset that encompasses all three types of glycosylation sites, which was curated from multiple public resources. Heterogeneous sequences and functional features were derived from various sources, and subjected to further two-step feature selection to characterize a condensed subset of optimal features that contributed most to the type-specific prediction of glycosylation sites. Five-fold cross-validation and independent tests show that this approach significantly improved the prediction performance compared with four existing prediction tools: NetNGlyc, NetOGlyc, EnsembleGly and GPP. We demonstrated that this tool could identify candidate glycosylation sites in case study proteins and applied it to identify many high-confidence glycosylation target proteins by screening the entire human proteome. AVAILABILITY AND IMPLEMENTATION: The webserver, Java Applet, user instructions, datasets, and predicted glycosylation sites in the human proteome are freely available at http://www.structbioinfor.org/Lab/GlycoMine/. CONTACT: Jiangning.Song@monash.edu or James.Whisstock@monash.edu or zhangyang@nwsuaf.edu.cn SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Inteligencia Artificial , Procesamiento Proteico-Postraduccional , Proteoma/metabolismo , Programas Informáticos , Algoritmos , Aminoácidos/metabolismo , Glicosilación , Humanos , Proteoma/química , Análisis de Secuencia de Proteína
4.
Biochim Biophys Acta ; 1834(8): 1461-7, 2013 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-23603789

RESUMEN

As one of the most common post-translational modifications, ubiquitination regulates the quantity and function of a variety of proteins. Experimental and clinical investigations have also suggested the crucial roles of ubiquitination in several human diseases. The complicated sequence context of human ubiquitination sites revealed by proteomic studies highlights the need of developing effective computational strategies to predict human ubiquitination sites. Here we report the establishment of a novel human-specific ubiquitination site predictor through the integration of multiple complementary classifiers. Firstly, a Support Vector Machine (SVM) classier was constructed based on the composition of k-spaced amino acid pairs (CKSAAP) encoding, which has been utilized in our previous yeast ubiquitination site predictor. To further exploit the pattern and properties of the ubiquitination sites and their flanking residues, three additional SVM classifiers were constructed using the binary amino acid encoding, the AAindex physicochemical property encoding and the protein aggregation propensity encoding, respectively. Through an integration that relied on logistic regression, the resulting predictor termed hCKSAAP_UbSite achieved an area under ROC curve (AUC) of 0.770 in 5-fold cross-validation test on a class-balanced training dataset. When tested on a class-balanced independent testing dataset that contains 3419 ubiquitination sites, hCKSAAP_UbSite has also achieved a robust performance with an AUC of 0.757. Specifically, it has consistently performed better than the predictor using the CKSAAP encoding alone and two other publicly available predictors which are not human-specific. Given its promising performance in our large-scale datasets, hCKSAAP_UbSite has been made publicly available at our server (http://protein.cau.edu.cn/cksaap_ubsite/).


Asunto(s)
Aminoácidos/metabolismo , Biología Computacional , Proteínas/química , Proteínas/metabolismo , Máquina de Vectores de Soporte , Ubiquitinación/fisiología , Sitios de Unión , Humanos , Procesamiento Proteico-Postraduccional
5.
PLoS One ; 7(2): e30361, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22319565

RESUMEN

Protein backbone torsion angles (Phi) and (Psi) involve two rotation angles rotating around the C(α)-N bond (Phi) and the C(α)-C bond (Psi). Due to the planarity of the linked rigid peptide bonds, these two angles can essentially determine the backbone geometry of proteins. Accordingly, the accurate prediction of protein backbone torsion angle from sequence information can assist the prediction of protein structures. In this study, we develop a new approach called TANGLE (Torsion ANGLE predictor) to predict the protein backbone torsion angles from amino acid sequences. TANGLE uses a two-level support vector regression approach to perform real-value torsion angle prediction using a variety of features derived from amino acid sequences, including the evolutionary profiles in the form of position-specific scoring matrices, predicted secondary structure, solvent accessibility and natively disordered region as well as other global sequence features. When evaluated based on a large benchmark dataset of 1,526 non-homologous proteins, the mean absolute errors (MAEs) of the Phi and Psi angle prediction are 27.8° and 44.6°, respectively, which are 1% and 3% respectively lower than that using one of the state-of-the-art prediction tools ANGLOR. Moreover, the prediction of TANGLE is significantly better than a random predictor that was built on the amino acid-specific basis, with the p-value<1.46e-147 and 7.97e-150, respectively by the Wilcoxon signed rank test. As a complementary approach to the current torsion angle prediction algorithms, TANGLE should prove useful in predicting protein structural properties and assisting protein fold recognition by applying the predicted torsion angles as useful restraints. TANGLE is freely accessible at http://sunflower.kuicr.kyoto-u.ac.jp/~sjn/TANGLE/.


Asunto(s)
Algoritmos , Proteínas/química , Aminoácidos , Internet , Conformación Proteica , Estructura Secundaria de Proteína , Solventes/química
6.
J Theor Biol ; 241(2): 390-401, 2006 Jul 21.
Artículo en Inglés | MEDLINE | ID: mdl-16427089

RESUMEN

High-quality data about protein structures and their gene sequences are essential to the understanding of the relationship between protein folding and protein coding sequences. Firstly we constructed the EcoPDB database, which is a high-quality database of Escherichia coli genes and their corresponding PDB structures. Based on EcoPDB, we presented a novel approach based on information theory to investigate the correlation between cysteine synonymous codon usages and local amino acids flanking cysteines, the correlation between cysteine synonymous codon usages and synonymous codon usages of local amino acids flanking cysteines, as well as the correlation between cysteine synonymous codon usages and the disulfide bonding states of cysteines in the E. coli genome. The results indicate that the nearest neighboring residues and their synonymous codons of the C-terminus have the greatest influence on the usages of the synonymous codons of cysteines and the usage of the synonymous codons has a specific correlation with the disulfide bond formation of cysteines in proteins. The correlations may result from the regulation mechanism of protein structures at gene sequence level and reflect the biological function restriction that cysteines pair to form disulfide bonds. The results may also be helpful in identifying residues that are important for synonymous codon selection of cysteines to introduce disulfide bridges in protein engineering and molecular biology. The approach presented in this paper can also be utilized as a complementary computational method and be applicable to analyse the synonymous codon usages in other model organisms.


Asunto(s)
Codón/genética , Cisteína/genética , Disulfuros/metabolismo , Escherichia coli/genética , Genoma Bacteriano/genética , Aminoácidos/análisis , Biología Computacional , Cisteína/metabolismo , Bases de Datos Genéticas , Proteínas de Escherichia coli/genética , Modelos Genéticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA