RESUMEN
Antiviral peptides (AVPs) have shown potential in inhibiting viral attachment, preventing viral fusion with host cells and disrupting viral replication due to their unique action mechanisms. They have now become a broad-spectrum, promising antiviral therapy. However, identifying effective AVPs is traditionally slow and costly. This study proposed a new two-stage computational framework for AVP identification. The first stage identifies AVPs from a wide range of peptides, and the second stage recognizes AVPs targeting specific families or viruses. This method integrates contrastive learning and multi-feature fusion strategy, focusing on sequence information and peptide characteristics, significantly enhancing predictive ability and interpretability. The evaluation results of the model show excellent performance, with accuracy of 0.9240 and Matthews correlation coefficient (MCC) score of 0.8482 on the non-AVP independent dataset, and accuracy of 0.9934 and MCC score of 0.9869 on the non-AMP independent dataset. Furthermore, our model can predict antiviral activities of AVPs against six key viral families (Coronaviridae, Retroviridae, Herpesviridae, Paramyxoviridae, Orthomyxoviridae, Flaviviridae) and eight viruses (FIV, HCV, HIV, HPIV3, HSV1, INFVA, RSV, SARS-CoV). Finally, to facilitate user accessibility, we built a user-friendly web interface deployed at https://awi.cuhk.edu.cn/â¼dbAMP/AVP/.
Asunto(s)
Antivirales , Biología Computacional , Péptidos , Antivirales/farmacología , Péptidos/química , Biología Computacional/métodos , Humanos , Virus , Aprendizaje Automático , AlgoritmosRESUMEN
Cancer is a severe illness that significantly threatens human life and health. Anticancer peptides (ACPs) represent a promising therapeutic strategy for combating cancer. In silico methods enable rapid and accurate identification of ACPs without extensive human and material resources. This study proposes a two-stage computational framework called ACP-CapsPred, which can accurately identify ACPs and characterize their functional activities across different cancer types. ACP-CapsPred integrates a protein language model with evolutionary information and physicochemical properties of peptides, constructing a comprehensive profile of peptides. ACP-CapsPred employs a next-generation neural network, specifically capsule networks, to construct predictive models. Experimental results demonstrate that ACP-CapsPred exhibits satisfactory predictive capabilities in both stages, reaching state-of-the-art performance. In the first stage, ACP-CapsPred achieves accuracies of 80.25% and 95.71%, as well as F1-scores of 79.86% and 95.90%, on benchmark datasets Set 1 and Set 2, respectively. In the second stage, tasked with characterizing the functional activities of ACPs across five selected cancer types, ACP-CapsPred attains an average accuracy of 90.75% and an F1-score of 91.38%. Furthermore, ACP-CapsPred demonstrates excellent interpretability, revealing regions and residues associated with anticancer activity. Consequently, ACP-CapsPred presents a promising solution to expedite the development of ACPs and offers a novel perspective for other biological sequence analyses.
Asunto(s)
Antineoplásicos , Biología Computacional , Redes Neurales de la Computación , Péptidos , Humanos , Antineoplásicos/química , Antineoplásicos/farmacología , Péptidos/química , Biología Computacional/métodos , Neoplasias/tratamiento farmacológico , Neoplasias/metabolismo , Bases de Datos de ProteínasRESUMEN
MicroRNA (miRNA)-target interaction (MTI) plays a substantial role in various cell activities, molecular regulations and physiological processes. Published biomedical literature is the carrier of high-confidence MTI knowledge. However, digging out this knowledge in an efficient manner from large-scale published articles remains challenging. To address this issue, we were motivated to construct a deep learning-based model. We applied the pre-trained language models to biomedical text to obtain the representation, and subsequently fed them into a deep neural network with gate mechanism layers and a fully connected layer for the extraction of MTI information sentences. Performances of the proposed models were evaluated using two datasets constructed on the basis of text data obtained from miRTarBase. The validation and test results revealed that incorporating both PubMedBERT and SciBERT for sentence level encoding with the long short-term memory (LSTM)-based deep neural network can yield an outstanding performance, with both F1 and accuracy being higher than 80% on validation data and test data. Additionally, the proposed deep learning method outperformed the following machine learning methods: random forest, support vector machine, logistic regression and bidirectional LSTM. This work would greatly facilitate studies on MTI analysis and regulations. It is anticipated that this work can assist in large-scale screening of miRNAs, thereby revealing their functional roles in various diseases, which is important for the development of highly specific drugs with fewer side effects. Source code and corpus are publicly available at https://github.com/qi29.
Asunto(s)
Aprendizaje Profundo , MicroARNs , MicroARNs/genética , Procesamiento de Lenguaje Natural , Redes Neurales de la Computación , LenguajeRESUMEN
Tuberculosis (TB) is a severe disease caused by Mycobacterium tuberculosis that poses a significant threat to human health. The emergence of drug-resistant strains has made the global fight against TB even more challenging. Antituberculosis peptides (ATPs) have shown promising results as a potential treatment for TB. However, conventional wet lab-based approaches to ATP discovery are time-consuming and costly and often fail to discover peptides with desired properties. To address these challenges, we propose a novel machine learning-based framework called ATPfinder that can significantly accelerate the discovery of ATP. Our approach integrates various efficient peptide descriptors and utilizes the deep forest algorithm to construct the model. This neural network-like cascading structure can effectively process and mine features without complex hyperparameter tuning. Our experimental results show that ATPfinder outperforms existing ATP prediction tools, achieving state-of-the-art performance with an accuracy of 89.3% and an MCC of 0.70. Moreover, our framework exhibits better robustness than baseline algorithms commonly used for other sequence analysis tasks. Additionally, the excellent interpretability of our model can assist researchers in understanding the critical features of ATP. Finally, we developed a downloadable desktop application to simplify the use of our framework for researchers. Therefore, ATPfinder can facilitate the discovery of peptide drugs and provide potential solutions for TB treatment. Our framework is freely available at https://github.com/lantianyao/ATPfinder/ (data sets and code) and https://awi.cuhk.edu.cn/dbAMP/ATPfinder.html (software).
Asunto(s)
Mycobacterium tuberculosis , Tuberculosis , Humanos , Péptidos/farmacología , Antituberculosos/farmacología , Algoritmos , Tuberculosis/tratamiento farmacológico , Bosques , Adenosina TrifosfatoRESUMEN
Enhancers are a class of noncoding DNA, serving as crucial regulatory elements in governing gene expression by binding to transcription factors. The identification of enhancers holds paramount importance in the field of biology. However, traditional experimental methods for enhancer identification demand substantial human and material resources. Consequently, there is a growing interest in employing computational methods for enhancer prediction. In this study, we propose a two-stage framework based on deep learning, termed CapsEnhancer, for the identification of enhancers and their strengths. CapsEnhancer utilizes chaos game representation to encode DNA sequences into unique images and employs a capsule network to extract local and global features from sequence "images". Experimental results demonstrate that CapsEnhancer achieves state-of-the-art performance in both stages. In the first and second stages, the accuracy surpasses the previous best methods by 8 and 3.5%, reaching accuracies of 94.5 and 95%, respectively. Notably, this study represents the pioneering application of computer vision methods to enhancer identification tasks. Our work not only contributes novel insights to enhancer identification but also provides a fresh perspective for other biological sequence analysis tasks.
Asunto(s)
Biología Computacional , Elementos de Facilitación Genéticos , Biología Computacional/métodos , Humanos , Dinámicas no Lineales , Aprendizaje ProfundoRESUMEN
Protein post-translational modifications (PTMs) play an important role in different cellular processes. In view of the importance of PTMs in cellular functions and the massive data accumulated by the rapid development of mass spectrometry (MS)-based proteomics, this paper presents an update of dbPTM with over 2 777 000 PTM substrate sites obtained from existing databases and manual curation of literature, of which more than 2 235 000 entries are experimentally verified. This update has manually curated over 42 new modification types that were not included in the previous version. Due to the increasing number of studies on the mechanism of PTMs in the past few years, a great deal of upstream regulatory proteins of PTM substrate sites have been revealed. The updated dbPTM thus collates regulatory information from databases and literature, and merges them into a protein-protein interaction network. To enhance the understanding of the association between PTMs and molecular functions/cellular processes, the functional annotations of PTMs are curated and integrated into the database. In addition, the existing PTM-related resources, including annotation databases and prediction tools are also renewed. Overall, in this update, we would like to provide users with the most abundant data and comprehensive annotations on PTMs of proteins. The updated dbPTM is now freely accessible at https://awi.cuhk.edu.cn/dbPTM/.
Asunto(s)
Bases de Datos de Proteínas , Redes Reguladoras de Genes , Procesamiento Proteico-Postraduccional , Proteínas/metabolismo , Programas Informáticos , Animales , Arabidopsis/genética , Arabidopsis/metabolismo , Bacterias/genética , Bacterias/metabolismo , Humanos , Internet , Ratones , Modelos Moleculares , Anotación de Secuencia Molecular , Unión Proteica , Conformación Proteica , Mapeo de Interacción de Proteínas , Proteínas/química , Proteínas/genética , Ratas , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismoRESUMEN
Circular RNAs (circRNAs), which are single-stranded RNA molecules that have individually formed into a covalently closed continuous loop, act as sponges of microRNAs to regulate transcription and translation. CircRNAs are important molecules in the field of cancer diagnosis, as growing evidence suggests that they are closely related to pathological cancer features. Therefore, they have high potential for clinical use as novel cancer biomarkers. In this article, we present our updates to CircNet (version 2.0), into which circRNAs from circAtlas and MiOncoCirc, and novel circRNAs from The Cancer Genome Atlas database have been integrated. In total, 2732 samples from 37 types of cancers were integrated into CircNet 2.0 and analyzed using several of the most reliable circRNA detection algorithms. Furthermore, target miRNAs were predicted from the full-length circRNA sequence using three reliable tools (PITA, miRanda and TargetScan). Additionally, 384 897 experimentally verified miRNA-target interactions from miRTarBase were integrated into our database to facilitate the construction of high-quality circRNA-miRNA-gene regulatory networks. These improvements, along with the user-friendly interactive web interface for data presentation, search, and visualization, showcase the updated CircNet database as a powerful, experimentally validated resource, for providing strong data support in the biomedical fields. CircNet 2.0 is currently accessible at https://awi.cuhk.edu.cn/â¼CircNet.
Asunto(s)
Biomarcadores de Tumor/genética , Bases de Datos Genéticas , Neoplasias/genética , ARN Circular/genética , Perfilación de la Expresión Génica , Redes Reguladoras de Genes/genética , Humanos , ARN Circular/clasificaciónRESUMEN
The last 18 months, or more, have seen a profound shift in our global experience, with many of us navigating a once-in-100-year pandemic. To date, COVID-19 remains a life-threatening pandemic with little to no targeted therapeutic recourse. The discovery of novel antiviral agents, such as vaccines and drugs, can provide therapeutic solutions to save human beings from severe infections; however, there is no specifically effective antiviral treatment confirmed for now. Thus, great attention has been paid to the use of natural or artificial antimicrobial peptides (AMPs) as these compounds are widely regarded as promising solutions for the treatment of harmful microorganisms. Given the biological significance of AMPs, it was obvious that there was a significant need for a single platform for identifying and engaging with AMP data. This led to the creation of the dbAMP platform that provides comprehensive information about AMPs and facilitates their investigation and analysis. To date, the dbAMP has accumulated 26 447 AMPs and 2262 antimicrobial proteins from 3044 organisms using both database integration and manual curation of >4579 articles. In addition, dbAMP facilitates the evaluation of AMP structures using I-TASSER for automated protein structure prediction and structure-based functional annotation, providing predictive structure information for clinical drug development. Next-generation sequencing (NGS) and third-generation sequencing have been applied to generate large-scale sequencing reads from various environments, enabling greatly improved analysis of genome structure. In this update, we launch an efficient online tool that can effectively identify AMPs from genome/metagenome and proteome data of all species in a short period. In conclusion, these improvements promote the dbAMP as one of the most abundant and comprehensively annotated resources for AMPs. The updated dbAMP is now freely accessible at http://awi.cuhk.edu.cn/dbAMP.
Asunto(s)
Péptidos Antimicrobianos , Bases de Datos Factuales , Programas Informáticos , Péptidos Antimicrobianos/química , Péptidos Antimicrobianos/farmacología , Genómica , Sistemas de Lectura Abierta , Conformación Proteica , ProteómicaRESUMEN
MicroRNAs (miRNAs) are noncoding RNAs with 18-26 nucleotides; they pair with target mRNAs to regulate gene expression and produce significant changes in various physiological and pathological processes. In recent years, the interaction between miRNAs and their target genes has become one of the mainstream directions for drug development. As a large-scale biological database that mainly provides miRNA-target interactions (MTIs) verified by biological experiments, miRTarBase has undergone five revisions and enhancements. The database has accumulated >2 200 449 verified MTIs from 13 389 manually curated articles and CLIP-seq data. An optimized scoring system is adopted to enhance this update's critical recognition of MTI-related articles and corresponding disease information. In addition, single-nucleotide polymorphisms and disease-related variants related to the binding efficiency of miRNA and target were characterized in miRNAs and gene 3' untranslated regions. miRNA expression profiles across extracellular vesicles, blood and different tissues, including exosomal miRNAs and tissue-specific miRNAs, were integrated to explore miRNA functions and biomarkers. For the user interface, we have classified attributes, including RNA expression, specific interaction, protein expression and biological function, for various validation experiments related to the role of miRNA. We also used seed sequence information to evaluate the binding sites of miRNA. In summary, these enhancements render miRTarBase as one of the most research-amicable MTI databases that contain comprehensive and experimentally verified annotations. The newly updated version of miRTarBase is now available at https://miRTarBase.cuhk.edu.cn/.
Asunto(s)
Regiones no Traducidas 3' , Bases de Datos de Ácidos Nucleicos , Redes Reguladoras de Genes , MicroARNs/genética , Neoplasias/genética , ARN no Traducido/genética , Animales , Sitios de Unión , Biomarcadores/metabolismo , Minería de Datos/estadística & datos numéricos , Exosomas/química , Exosomas/metabolismo , Regulación de la Expresión Génica , Humanos , Internet , Ratones , MicroARNs/clasificación , MicroARNs/metabolismo , Anotación de Secuencia Molecular , Neoplasias/metabolismo , Neoplasias/patología , Polimorfismo de Nucleótido Simple , ARN no Traducido/clasificación , ARN no Traducido/metabolismo , Células Tumorales Cultivadas , Interfaz Usuario-ComputadorRESUMEN
Antiviral peptide (AVP) is a kind of antimicrobial peptide (AMP) that has the potential ability to fight against virus infection. Machine learning-based prediction with a computational biology approach can facilitate the development of the novel therapeutic agents. In this study, we proposed a double-stage classification scheme, named AVPIden, for predicting the AVPs and their functional activities against different viruses. The first stage is to distinguish the AVP from a broad-spectrum peptide collection, including not only the regular peptides (non-AMP) but also the AMPs without antiviral functions (non-AVP). The second stage is responsible for characterizing one or more virus families or species that the AVP targets. Imbalanced learning is utilized to improve the performance of prediction. The AVPIden uses multiple descriptors to precisely demonstrate the peptide properties and adopts explainable machine learning strategies based on Shapley value to exploit how the descriptors impact the antiviral activities. Finally, the evaluation performance of the proposed model suggests its ability to predict the antivirus activities and their potential functions against six virus families (Coronaviridae, Retroviridae, Herpesviridae, Paramyxoviridae, Orthomyxoviridae, Flaviviridae) and eight kinds of virus (FIV, HCV, HIV, HPIV3, HSV1, INFVA, RSV, SARS-CoV). The AVPIden gives an option for reinforcing the development of AVPs with the computer-aided method and has been deployed at http://awi.cuhk.edu.cn/AVPIden/.
Asunto(s)
Antivirales/química , Tratamiento Farmacológico de COVID-19 , Péptidos/química , SARS-CoV-2/química , Algoritmos , Secuencia de Aminoácidos/genética , Antivirales/uso terapéutico , COVID-19/genética , COVID-19/virología , Biología Computacional , Humanos , Aprendizaje Automático , Péptidos/uso terapéutico , SARS-CoV-2/efectos de los fármacos , SARS-CoV-2/genética , Programas InformáticosRESUMEN
MOTIVATION: Antimicrobial peptides (AMPs) have the potential to inhibit multiple types of pathogens and to heal infections. Computational strategies can assist in characterizing novel AMPs from proteome or collections of synthetic sequences and discovering their functional abilities toward different microbial targets without intensive labor. RESULTS: Here, we present a deep learning-based method for computer-aided novel AMP discovery that utilizes the transformer neural network architecture with knowledge from natural language processing to extract peptide sequence information. We implemented the method for two AMP-related tasks: the first is to discriminate AMPs from other peptides, and the second task is identifying AMPs functional activities related to seven different targets (gram-negative bacteria, gram-positive bacteria, fungi, viruses, cancer cells, parasites and mammalian cell inhibition), which is a multi-label problem. In addition, asymmetric loss was adopted to resolve the intrinsic imbalance of dataset, particularly for the multi-label scenarios. The evaluation showed that our proposed scheme achieves the best performance for the first task (96.85% balanced accuracy) and has a more unbiased prediction for the second task (79.83% balanced accuracy averaged across all functional activities) when compared with that of strategies without imbalanced learning or deep learning. AVAILABILITY AND IMPLEMENTATION: The source code and data of this study are available at https://github.com/BiOmicsLab/TransImbAMP. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Péptidos Antimicrobianos , Programas Informáticos , Animales , Redes Neurales de la Computación , PéptidosRESUMEN
Inflammation is a biological response to harmful stimuli, aiding in the maintenance of tissue homeostasis. However, excessive or persistent inflammation can precipitate a myriad of pathological conditions. Although current treatments such as NSAIDs, corticosteroids, and immunosuppressants are effective, they can have side effects and resistance issues. In this backdrop, anti-inflammatory peptides (AIPs) have emerged as a promising therapeutic approach against inflammation. Leveraging machine learning methods, we have the opportunity to accelerate the discovery and investigation of these AIPs more effectively. In this study, we proposed an advanced framework by ensemble machine learning and deep learning for AIP prediction. Initially, we constructed three individual models with extremely randomized trees (ET), gated recurrent unit (GRU), and convolutional neural networks (CNNs) with attention mechanism and then used stacking architecture to build the final predictor. By utilizing various sequence encodings and combining the strengths of different algorithms, our predictor demonstrated exemplary performance. On our independent test set, our model achieved an accuracy, MCC, and F1-score of 0.757, 0.500, and 0.707, respectively, clearly outperforming other contemporary AIP prediction methods. Additionally, our model offers profound insights into the feature interpretation of AIPs, establishing a valuable knowledge foundation for the design and development of future anti-inflammatory strategies.
Asunto(s)
Aprendizaje Profundo , Humanos , Antiinflamatorios/farmacología , Antiinflamatorios/uso terapéutico , Péptidos/farmacología , Inflamación/tratamiento farmacológico , Algoritmos , Aprendizaje AutomáticoRESUMEN
One of the major challenges in cancer therapy lies in the limited targeting specificity exhibited by existing anti-cancer drugs. Tumor-homing peptides (THPs) have emerged as a promising solution to this issue, due to their capability to specifically bind to and accumulate in tumor tissues while minimally impacting healthy tissues. THPs are short oligopeptides that offer a superior biological safety profile, with minimal antigenicity, and faster incorporation rates into target cells/tissues. However, identifying THPs experimentally, using methods such as phage display or in vivo screening, is a complex, time-consuming task, hence the need for computational methods. In this study, we proposed StackTHPred, a novel machine learning-based framework that predicts THPs using optimal features and a stacking architecture. With an effective feature selection algorithm and three tree-based machine learning algorithms, StackTHPred has demonstrated advanced performance, surpassing existing THP prediction methods. It achieved an accuracy of 0.915 and a 0.831 Matthews Correlation Coefficient (MCC) score on the main dataset, and an accuracy of 0.883 and a 0.767 MCC score on the small dataset. StackTHPred also offers favorable interpretability, enabling researchers to better understand the intrinsic characteristics of THPs. Overall, StackTHPred is beneficial for both the exploration and identification of THPs and facilitates the development of innovative cancer therapies.
Asunto(s)
Neoplasias , Péptidos , Humanos , Péptidos/metabolismo , Oligopéptidos , Algoritmos , Aprendizaje AutomáticoRESUMEN
Cancer is one of the leading diseases threatening human life and health worldwide. Peptide-based therapies have attracted much attention in recent years. Therefore, the precise prediction of anticancer peptides (ACPs) is crucial for discovering and designing novel cancer treatments. In this study, we proposed a novel machine learning framework (GRDF) that incorporates deep graphical representation and deep forest architecture for identifying ACPs. Specifically, GRDF extracts graphical features based on the physicochemical properties of peptides and integrates their evolutionary information along with binary profiles for constructing models. Moreover, we employ the deep forest algorithm, which adopts a layer-by-layer cascade architecture similar to deep neural networks, enabling excellent performance on small datasets but without complicated tuning of hyperparameters. The experiment shows GRDF exhibits state-of-the-art performance on two elaborate datasets (Set 1 and Set 2), achieving 77.12% accuracy and 77.54% F1-score on Set 1, as well as 94.10% accuracy and 94.15% F1-score on Set 2, exceeding existing ACP prediction methods. Our models exhibit greater robustness than the baseline algorithms commonly used for other sequence analysis tasks. In addition, GRDF is well-interpretable, enabling researchers to better understand the features of peptide sequences. The promising results demonstrate that GRDF is remarkably effective in identifying ACPs. Therefore, the framework presented in this study could assist researchers in facilitating the discovery of anticancer peptides and contribute to developing novel cancer treatments.
Asunto(s)
Neoplasias , Péptidos , Humanos , Péptidos/química , Algoritmos , Secuencia de Aminoácidos , Redes Neurales de la ComputaciónRESUMEN
Drug-target interactions (DTIs) are considered a crucial component of drug design and drug discovery. To date, many computational methods were developed for drug-target interactions, but they are insufficiently informative for accurately predicting DTIs due to the lack of experimentally verified negative datasets, inaccurate molecular feature representation, and ineffective DTI classifiers. Therefore, we address the limitations of randomly selecting negative DTI data from unknown drug-target pairs by establishing two experimentally validated datasets and propose a capsule network-based framework called CapBM-DTI to capture hierarchical relationships of drugs and targets, which adopts pre-trained bidirectional encoder representations from transformers (BERT) for contextual sequence feature extraction from target proteins through transfer learning and the message-passing neural network (MPNN) for the 2-D graph feature extraction of compounds to accurately and robustly identify drug-target interactions. We compared the performance of CapBM-DTI with state-of-the-art methods using four experimentally validated DTI datasets of different sizes, including human (Homo sapiens) and worm (Caenorhabditis elegans) species datasets, as well as three subsets (new compounds, new proteins, and new pairs). Our results demonstrate that the proposed model achieved robust performance and powerful generalization ability in all experiments. The case study on treating COVID-19 demonstrates the applicability of the model in virtual screening.
RESUMEN
The emergence and spread of antibiotic-resistant bacteria pose a significant public health threat, necessitating the exploration of alternative antibacterial strategies. Antibacterial peptide (ABP) is a kind of antimicrobial peptide (AMP) that has the potential ability to fight against bacteria infection, offering a promising avenue for developing novel therapeutic interventions. This study introduces AMPActiPred, a three-stage computational framework designed to identify ABPs, characterize their activity against diverse bacterial species, and predict their activity levels. AMPActiPred employed multiple effective peptide descriptors to effectively capture the compositional features and physicochemical properties of peptides. AMPActiPred utilized deep forest architecture, a cascading architecture similar to deep neural networks, capable of effectively processing and exploring original features to enhance predictive performance. In the first stage, AMPActiPred focuses on ABP identification, achieving an Accuracy of 87.6% and an MCC of 0.742 on an elaborate dataset, demonstrating state-of-the-art performance. In the second stage, AMPActiPred achieved an average GMean at 82.8% in identifying ABPs targeting 10 bacterial species, indicating AMPActiPred can achieve balanced predictions regarding the functional activity of ABP across this set of species. In the third stage, AMPActiPred demonstrates robust predictive capabilities for ABP activity levels with an average PCC of 0.722. Furthermore, AMPActiPred exhibits excellent interpretability, elucidating crucial features associated with antibacterial activity. AMPActiPred is the first computational framework capable of predicting targets and activity levels of ABPs. Finally, to facilitate the utilization of AMPActiPred, we have established a user-friendly web interface deployed at https://awi.cuhk.edu.cn/â¼AMPActiPred/.
Asunto(s)
Antibacterianos , Antibacterianos/farmacología , Antibacterianos/química , Péptidos Antimicrobianos/química , Péptidos Antimicrobianos/farmacología , Bacterias/efectos de los fármacos , Biología Computacional/métodos , Redes Neurales de la Computación , Pruebas de Sensibilidad MicrobianaRESUMEN
Kinases as important enzymes can transfer phosphate groups from high-energy and phosphate-donating molecules to specific substrates and play essential roles in various cellular processes. Existing algorithms for kinase activity from phosphorylated proteomics data are often costly, requiring valuable samples. Moreover, methods to extract kinase activities from bulk RNA sequencing data remain undeveloped. In this study, we propose a computational framework KinPred-RNA to derive kinase activities from bulk RNA-sequencing data in cancer samples. KinPred-RNA framework, using the extreme gradient boosting (XGBoost) regression model, outperforms random forest regression, multiple linear regression, and support vector machine regression models in predicting kinase activities from cancer-related RNA sequencing data. Efficient gene signatures from the LINCS-L1000 dataset were used as inputs for KinPred-RNA. The results highlight its potential to be related to biological function. In conclusion, KinPred RNA constitutes a significant advance in cancer research by potentially facilitating the identification of cancer.
RESUMEN
The purpose of this work is to enhance KinasePhos, a machine learning-based kinase-specific phosphorylation site prediction tool. Experimentally verified kinase-specific phosphorylation data were collected from PhosphoSitePlus, UniProtKB, the GPS 5.0, and Phospho.ELM. In total, 41,421 experimentally verified kinase-specific phosphorylation sites were identified. A total of 1380 unique kinases were identified, including 753 with existing classification information from KinBase and the remaining 627 annotated by building a phylogenetic tree. Based on this kinase classification, a total of 771 predictive models were built at the individual, family, and group levels, using at least 15 experimentally verified substrate sites in positive training datasets. The improved models demonstrated their effectiveness compared with other prediction tools. For example, the prediction of sites phosphorylated by the protein kinase B, casein kinase 2, and protein kinase A families had accuracies of 94.5%, 92.5%, and 90.0%, respectively. The average prediction accuracy for all 771 models was 87.2%. For enhancing interpretability, the SHapley Additive exPlanations (SHAP) method was employed to assess feature importance. The web interface of KinasePhos 3.0 has been redesigned to provide comprehensive annotations of kinase-specific phosphorylation sites on multiple proteins. Additionally, considering the large scale of phosphoproteomic data, a downloadable prediction tool is available at https://awi.cuhk.edu.cn/KinasePhos/download.html or https://github.com/tom-209/KinasePhos-3.0-executable-file.
Asunto(s)
Proteínas Quinasas , Humanos , Fosforilación , Filogenia , Proteínas Quinasas/genética , Proteínas Quinasas/metabolismoRESUMEN
Fungal infections have become a significant global health issue, affecting millions worldwide. Antifungal peptides (AFPs) have emerged as a promising alternative to conventional antifungal drugs due to their low toxicity and low propensity for inducing resistance. In this study, we developed a deep learning-based framework called DeepAFP to efficiently identify AFPs. DeepAFP fully leverages and mines composition information, evolutionary information, and physicochemical properties of peptides by employing combined kernels from multiple branches of convolutional neural network with bi-directional long short-term memory layers. In addition, DeepAFP integrates a transfer learning strategy to obtain efficient representations of peptides for improving model performance. DeepAFP demonstrates strong predictive ability on carefully curated datasets, yielding an accuracy of 93.29% and an F1-score of 93.45% on the DeepAFP-Main dataset. The experimental results show that DeepAFP outperforms existing AFP prediction tools, achieving state-of-the-art performance. Finally, we provide a downloadable AFP prediction tool to meet the demands of large-scale prediction and facilitate the usage of our framework by the public or other researchers. Our framework can accurately identify AFPs in a short time without requiring significant human and material resources, and hence can accelerate the development of AFPs as well as contribute to the treatment of fungal infections. Furthermore, our method can provide new perspectives for other biological sequence analysis tasks.