Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 195
Filtrar
1.
Proteomics ; : e202400157, 2024 Sep 20.
Artículo en Inglés | MEDLINE | ID: mdl-39305039

RESUMEN

Prediction of antifreeze proteins (AFPs) holds significant importance due to their diverse applications in healthcare. An inherent limitation of current AFP prediction methods is their reliance on unreviewed proteins for evaluation. This study evaluates, proposed and existing methods on an independent dataset containing 80 AFPs and 73 non-AFPs obtained from Uniport, which have been already reviewed by experts. Initially, we constructed machine learning models for AFP prediction using selected composition-based protein features and achieved a peak AUROC of 0.90 with an MCC of 0.69 on the independent dataset. Subsequently, we observed a notable enhancement in model performance, with the AUROC increasing from 0.90 to 0.93 upon incorporating evolutionary information instead of relying solely on the primary sequence of proteins. Furthermore, we explored hybrid models integrating our machine learning approaches with BLAST-based similarity and motif-based methods. However, the performance of these hybrid models either matched or was inferior to that of our best machine-learning model. Our best model based on evolutionary information outperforms all existing methods on independent/validation dataset. To facilitate users, a user-friendly web server with a standalone package named "AFPropred" was developed (https://webs.iiitd.edu.in/raghava/afpropred).

2.
Front Bioinform ; 4: 1425419, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39119181

RESUMEN

Transcription factors are essential DNA-binding proteins that regulate the transcription rate of several genes and control the expression of genes inside a cell. The prediction of transcription factors with high precision is important for understanding biological processes such as cell differentiation, intracellular signaling, and cell-cycle control. In this study, we developed a hybrid method that combines alignment-based and alignment-free methods for predicting transcription factors with higher accuracy. All models have been trained, tested, and evaluated on a large dataset that contains 19,406 transcription factors and 523,560 non-transcription factor protein sequences. To avoid biases in evaluation, the datasets were divided into training and validation/independent datasets, where 80% of the data was used for training, and the remaining 20% was used for external validation. In the case of alignment-free methods, models were developed using machine learning techniques and the composition-based features of a protein. Our best alignment-free model obtained an AUC of 0.97 on an independent dataset. In the case of the alignment-based method, we used BLAST at different cut-offs to predict the transcription factors. Although the alignment-based method demonstrated excellent performance, it was unable to cover all transcription factors due to instances of no hits. To combine the strengths of both methods, we developed a hybrid method that combines alignment-free and alignment-based methods. In the hybrid method, we added the scores of the alignment-free and alignment-based methods and achieved a maximum AUC of 0.99 on the independent dataset. The method proposed in this study performs better than existing methods. We incorporated the best models in the webserver/Python Package Index/standalone package of "TransFacPred" (https://webs.iiitd.edu.in/raghava/transfacpred).

3.
Comput Biol Med ; 179: 108926, 2024 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-39038391

RESUMEN

Toxicity emerges as a prominent challenge in the design of therapeutic peptides, causing the failure of numerous peptides during clinical trials. In 2013, our group developed ToxinPred, a computational method that has been extensively adopted by the scientific community for predicting peptide toxicity. In this paper, we propose a refined variant of ToxinPred that showcases improved reliability and accuracy in predicting peptide toxicity. Initially, we utilized a similarity/alignment-based approach employing BLAST to predict toxic peptides, which yielded satisfactory accuracy; however, the method suffered from inadequate coverage. Subsequently, we employed a motif-based approach using MERCI software to uncover specific patterns or motifs that are exclusively observed in toxic peptides. The search for these motifs in peptides allowed us to predict toxic peptides with a high level of specificity with poor sensitivity. To overcome the coverage limitations, we developed alignment-free methods using machine/deep learning techniques to balance sensitivity and specificity of prediction. Deep learning model (ANN - LSTM with fixed sequence length) developed using one-hot encoding achieved a maximum AUROC of 0.93 with MCC of 0.71 on an independent dataset. Machine learning model (extra tree) developed using compositional features of peptides achieved a maximum AUROC of 0.95 with MCC of 0.78. We also developed large language models and achieved maximum AUC of 0.93 using ESM2-t33. Finally, we developed hybrid or ensemble methods combining two or more methods to enhance performance. Our specific hybrid method, which combines a motif-based approach with a machine learning-based model, achieved a maximum AUROC of 0.98 with MCC 0.81 on an independent dataset. In this study, all models were trained and tested on 80 % of data using five-fold cross-validation and evaluated on the remaining 20 % of data called independent dataset. The evaluation of all methods on an independent dataset revealed that the method proposed in this study exhibited better performance than existing methods. To cater to the needs of the scientific community, we have developed a standalone software, pip package and web-based server ToxinPred3 (https://github.com/raghavagps/toxinpred3 and https://webs.iiitd.edu.in/raghava/toxinpred3/).


Asunto(s)
Péptidos , Programas Informáticos , Péptidos/química , Humanos , Biología Computacional/métodos , Aprendizaje Profundo , Bases de Datos de Proteínas
4.
Drug Discov Today ; 29(7): 104047, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38830503

RESUMEN

During the past 20 years, there has been a significant increase in the number of protein-based drugs approved by the US Food and Drug Administration (FDA). This paper presents THPdb2, an updated version of the THPdb database, which holds information about all types of protein-based drugs, including peptides, antibodies, and biosimilar proteins. THPdb2 contains a total of 6,385 entries, providing comprehensive information about 894 FDA-approved therapeutic proteins, including 354 monoclonal antibodies and 85 peptides or polypeptides. Each entry includes the name of therapeutic molecule, the amino acid sequence, physical and chemical properties, and route of drug administration. The therapeutic molecules that are included in the database target a wide range of biological molecules, such as receptors, factors, and proteins, and have been approved for the treatment of various diseases, including cancers, infectious diseases, and immune disorders.


Asunto(s)
Aprobación de Drogas , Péptidos , United States Food and Drug Administration , Estados Unidos , Péptidos/uso terapéutico , Péptidos/farmacología , Péptidos/química , Humanos , Proteínas/química , Proteínas/uso terapéutico , Biosimilares Farmacéuticos/uso terapéutico , Biosimilares Farmacéuticos/farmacología
5.
Front Mol Biosci ; 11: 1395721, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38872916

RESUMEN

Background: Head and Neck Squamous Cell Carcinoma (HNSCC) is the seventh most highly prevalent cancer type worldwide. Early detection of HNSCC is one of the important challenges in managing the treatment of the cancer patients. Existing techniques for detecting HNSCC are costly, expensive, and invasive in nature. Methods: In this study, we aimed to address this issue by developing classification models using machine learning and deep learning techniques, focusing on single-cell transcriptomics to distinguish between HNSCC and normal samples. Furthermore, we built models to classify HNSCC samples into HPV-positive (HPV+) and HPV-negative (HPV-) categories. In this study, we have used GSE181919 dataset, we have extracted 20 primary cancer (HNSCC) samples, and 9 normal tissues samples. The primary cancer samples contained 13 HPV- and 7 HPV+ samples. The models developed in this study have been trained on 80% of the dataset and validated on the remaining 20%. To develop an efficient model, we performed feature selection using mRMR method to shortlist a small number of genes from a plethora of genes. We also performed Gene Ontology (GO) enrichment analysis on the 100 shortlisted genes. Results: Artificial Neural Network based model trained on 100 genes outperformed the other classifiers with an AUROC of 0.91 for HNSCC classification for the validation set. The same algorithm achieved an AUROC of 0.83 for the classification of HPV+ and HPV- patients on the validation set. In GO enrichment analysis, it was found that most genes were involved in binding and catalytic activities. Conclusion: A software package has been developed in Python which allows users to identify HNSCC in patients along with their HPV status. It is available at https://webs.iiitd.edu.in/raghava/hnscpred/.

6.
Proteomics ; : e2400004, 2024 May 27.
Artículo en Inglés | MEDLINE | ID: mdl-38803012

RESUMEN

Peptide hormones serve as genome-encoded signal transduction molecules that play essential roles in multicellular organisms, and their dysregulation can lead to various health problems. In this study, we propose a method for predicting hormonal peptides with high accuracy. The dataset used for training, testing, and evaluating our models consisted of 1174 hormonal and 1174 non-hormonal peptide sequences. Initially, we developed similarity-based methods utilizing BLAST and MERCI software. Although these similarity-based methods provided a high probability of correct prediction, they had limitations, such as no hits or prediction of limited sequences. To overcome these limitations, we further developed machine and deep learning-based models. Our logistic regression-based model achieved a maximum AUROC of 0.93 with an accuracy of 86% on an independent/validation dataset. To harness the power of similarity-based and machine learning-based models, we developed an ensemble method that achieved an AUROC of 0.96 with an accuracy of 89.79% and a Matthews correlation coefficient (MCC) of 0.8 on the validation set. To facilitate researchers in predicting and designing hormone peptides, we developed a web-based server called HOPPred. This server offers a unique feature that allows the identification of hormone-associated motifs within hormone peptides. The server can be accessed at: https://webs.iiitd.edu.in/raghava/hoppred/.

7.
Adv Protein Chem Struct Biol ; 139: 383-403, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38448141

RESUMEN

An uncommon opportunistic fungal infection known as mucormycosis is caused by a class of molds called mucoromycetes. Currently, antifungal therapy and surgical debridement are the primary treatment options for mucormycosis. Despite the importance of comprehensive knowledge on mucormycosis, there is a lack of well-annotated databases that provide all relevant information. In this study, we have gathered and organized all available information related to mucormycosis that include disease's genome, proteins, diagnostic methods. Furthermore, using the AlphaFold2.0 prediction tool, we have predicted the tertiary structures of potential drug targets. We have categorized the information into three major sections: "genomics/proteomics," "immunotherapy," and "drugs." The genomics/proteomics module contains information on different strains responsible for mucormycosis. The immunotherapy module includes putative sequence-based therapeutics predicted using established tools. Drugs module provides information on available drugs for treating the disease. Additionally, the drugs module also offers prerequisite information for designing computationally aided drugs, such as putative targets and predicted structures. In order to provide comprehensive information over internet, we developed a web-based platform MucormyDB (https://webs.iiitd.edu.in/raghava/mucormydb/).


Asunto(s)
Fármacos Anti-VIH , Mucormicosis , Humanos , Mucormicosis/tratamiento farmacológico , Mucormicosis/genética , Genómica , Bases de Datos Factuales , Sistemas de Liberación de Medicamentos
8.
Antibiotics (Basel) ; 13(2)2024 Feb 08.
Artículo en Inglés | MEDLINE | ID: mdl-38391554

RESUMEN

Most of the existing methods developed for predicting antibacterial peptides (ABPs) are mostly designed to target either gram-positive or gram-negative bacteria. In this study, we describe a method that allows us to predict ABPs against gram-positive, gram-negative, and gram-variable bacteria. Firstly, we developed an alignment-based approach using BLAST to identify ABPs and achieved poor sensitivity. Secondly, we employed a motif-based approach to predict ABPs and obtained high precision with low sensitivity. To address the issue of poor sensitivity, we developed alignment-free methods for predicting ABPs using machine/deep learning techniques. In the case of alignment-free methods, we utilized a wide range of peptide features that include different types of composition, binary profiles of terminal residues, and fastText word embedding. In this study, a five-fold cross-validation technique has been used to build machine/deep learning models on training datasets. These models were evaluated on an independent dataset with no common peptide between training and independent datasets. Our machine learning-based model developed using the amino acid binary profile of terminal residues achieved maximum AUC 0.93, 0.98, and 0.94 for gram-positive, gram-negative, and gram-variable bacteria, respectively, on an independent dataset. Our method performs better than existing methods when compared with existing approaches on an independent dataset. A user-friendly web server, standalone package and pip package have been developed to facilitate peptide-based therapeutics.

9.
Front Bioinform ; 4: 1341479, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38379813

RESUMEN

In the past, several methods have been developed for predicting the single-label subcellular localization of messenger RNA (mRNA). However, only limited methods are designed to predict the multi-label subcellular localization of mRNA. Furthermore, the existing methods are slow and cannot be implemented at a transcriptome scale. In this study, a fast and reliable method has been developed for predicting the multi-label subcellular localization of mRNA that can be implemented at a genome scale. Machine learning-based methods have been developed using mRNA sequence composition, where the XGBoost-based classifier achieved an average area under the receiver operator characteristic (AUROC) of 0.709 (0.668-0.732). In addition to alignment-free methods, we developed alignment-based methods using motif search techniques. Finally, a hybrid technique that combines the XGBoost model and the motif-based approach has been developed, achieving an average AUROC of 0.742 (0.708-0.816). Our method-MRSLpred-outperforms the existing state-of-the-art classifier in terms of performance and computation efficiency. A publicly accessible webserver and a standalone tool have been developed to facilitate researchers (webserver: https://webs.iiitd.edu.in/raghava/mrslpred/).

10.
Comput Biol Med ; 170: 108083, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38295479

RESUMEN

B-cell is an essential component of the immune system that plays a vital role in providing the immune response against any pathogenic infection by producing antibodies. Existing methods either predict linear or conformational B-cell epitopes in an antigen. In this study, a single method was developed for predicting both types (linear/conformational) of B-cell epitopes. The dataset used in this study contains 3875 B-cell epitopes and 3996 non-B-cell epitopes, where B-cell epitopes consist of both linear and conformational B-cell epitopes. Our primary analysis indicates that certain residues (like Asp, Glu, Lys, and Asn) are more prominent in B-cell epitopes. We developed machine-learning based methods using different types of sequence composition and achieved the highest AUROC of 0.80 using dipeptide composition. In addition, models were developed on selected features, but no further improvement was observed. Our similarity-based method implemented using BLAST shows a high probability of correct prediction with poor sensitivity. Finally, we developed a hybrid model that combines alignment-free (dipeptide based random forest model) and alignment-based (BLAST-based similarity) models. Our hybrid model attained a maximum AUROC of 0.83 with an MCC of 0.49 on the independent dataset. Our hybrid model performs better than existing methods on an independent dataset used in this study. All models were trained and tested on 80 % of the data using a cross-validation technique, and the final model was evaluated on 20 % of the data, called an independent or validation dataset. A webserver and standalone package named "CLBTope" has been developed for predicting, designing, and scanning B-cell epitopes in an antigen sequence available at (https://webs.iiitd.edu.in/raghava/clbtope/).


Asunto(s)
Antígenos , Epítopos de Linfocito B , Epítopos de Linfocito B/química , Secuencia de Aminoácidos , Antígenos/química , Conformación Molecular , Dipéptidos
11.
Proteomics ; 24(6): e2300231, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-37525341

RESUMEN

Non-invasive diagnostics and therapies are crucial to prevent patients from undergoing painful procedures. Exosomal proteins can serve as important biomarkers for such advancements. In this study, we attempted to build a model to predict exosomal proteins. All models are trained, tested, and evaluated on a non-redundant dataset comprising 2831 exosomal and 2831 non-exosomal proteins, where no two proteins have more than 40% similarity. Initially, the standard similarity-based method Basic Local Alignment Search Tool (BLAST) was used to predict exosomal proteins, which failed due to low-level similarity in the dataset. To overcome this challenge, machine learning (ML) based models were developed using compositional and evolutionary features of proteins achieving an area under the receiver operating characteristics (AUROC) of 0.73. Our analysis also indicated that exosomal proteins have a variety of sequence-based motifs which can be used to predict exosomal proteins. Hence, we developed a hybrid method combining motif-based and ML-based approaches for predicting exosomal proteins, achieving a maximum AUROC of 0.85 and MCC of 0.56 on an independent dataset. This hybrid model performs better than presently available methods when assessed on an independent dataset. A web server and a standalone software ExoProPred (https://webs.iiitd.edu.in/raghava/exopropred/) have been created to help scientists predict and discover exosomal proteins and find functional motifs present in them.


Asunto(s)
Bosques Aleatorios , Análisis de Secuencia de Proteína , Humanos , Secuencia de Aminoácidos , Análisis de Secuencia de Proteína/métodos , Proteínas/metabolismo , Programas Informáticos
12.
Comput Biol Med ; 167: 107594, 2023 12.
Artículo en Inglés | MEDLINE | ID: mdl-37918263

RESUMEN

Advancements in cancer immunotherapy have shown significant outcomes in treating cancers. To design effective immunotherapy, it's important to understand immune response of a patient based on its genomic profile. However, analyses to do that requires proficiency in the bioinformatic methods. Swiftly growing sequencing technologies and statistical methods create a blockage for the scientists who want to find the biomarkers for different cancers but don't have detailed knowledge of coding or tool. Here, we are providing a web-based resource that gives scientists with no bioinformatics expertise, the ability to obtain the prognostic biomarkers for different cancer types at different levels. We computed prognostic biomarkers from 8346 cancer patients for twenty cancer types. These biomarkers were computed based on i) presence of 352 Human leukocyte antigen class-I, ii) 660959 tumor-specific HLA1 neobinders, and iii) expression profile of 153 cytokines. It was observed that survival risk of cancer patients depends on presence of certain type of HLA-I alleles; for example, liver hepatocellular carcinoma patients with HLA-A*03:01 are at lower risk. Our analysis indicates that neobinders of HLA-I alleles have high correlation with overall survival of certain type of cancer patients. For example, HLA-B*07:02 binders have 0.49 correlation with survival of lung squamous cell carcinoma and -0.77 with kidney chromophobe patients. Additionally, we computed prognostic biomarkers based on cytokine expressions. Higher expression of few cytokines is survival favorable like IL-2 for bladder urothelial carcinoma, whereas IL-5R is survival unfavorable for kidney chromophobe patients. Freely accessible to public, CancerHLA-I maintains raw and analysed data (https://webs.iiitd.edu.in/raghava/cancerhla1/).


Asunto(s)
Carcinoma de Células Transicionales , Neoplasias Pulmonares , Neoplasias de la Vejiga Urinaria , Humanos , Citocinas/genética , Alelos , Carcinoma de Células Transicionales/genética , Neoplasias de la Vejiga Urinaria/genética , Biomarcadores , Neoplasias Pulmonares/genética , Medición de Riesgo
13.
Protein Sci ; 32(11): e4785, 2023 11.
Artículo en Inglés | MEDLINE | ID: mdl-37733481

RESUMEN

The identification of B-cell epitopes (BCEs) in antigens is a crucial step in developing recombinant vaccines or immunotherapies for various diseases. Over the past four decades, numerous in silico methods have been developed for predicting BCEs. However, existing reviews have only covered specific aspects, such as the progress in predicting conformational or linear BCEs. Therefore, in this paper, we have undertaken a systematic approach to provide a comprehensive review covering all aspects associated with the identification of BCEs. First, we have covered the experimental techniques developed over the years for identifying linear and conformational epitopes, including the limitations and challenges associated with these techniques. Second, we have briefly described the historical perspectives and resources that maintain experimentally validated information on BCEs. Third, we have extensively reviewed the computational methods developed for predicting conformational BCEs from the structure of the antigen, as well as the methods for predicting conformational epitopes from the sequence. Fourth, we have systematically reviewed the in silico methods developed in the last four decades for predicting linear or continuous BCEs. Finally, we have discussed the overall challenge of identifying continuous or conformational BCEs. In this review, we only listed major computational resources; a complete list with the URL is available from the BCinfo website (https://webs.iiitd.edu.in/raghava/bcinfo/).


Asunto(s)
Antígenos , Epítopos de Linfocito B , Epítopos de Linfocito B/química , Secuencia de Aminoácidos
14.
Comput Biol Med ; 160: 106929, 2023 06.
Artículo en Inglés | MEDLINE | ID: mdl-37126926

RESUMEN

Tumor Necrosis Factor alpha (TNF-α) is a pleiotropic pro-inflammatory cytokine that is crucial in controlling the signaling pathways within the immune cells. Recent studies reported that higher expression levels of TNF-α are associated with the progression of several diseases, including cancers, cytokine release syndrome in COVID-19, and autoimmune disorders. Thus, it is the need of the hour to develop immunotherapies or subunit vaccines to manage TNF-α progression in various disease conditions. In the pilot study, we proposed a host-specific in-silico tool for predicting, designing, and scanning TNF-α inducing epitopes. The prediction models were trained and validated on the experimentally validated TNF-α inducing/non-inducing epitopes from human and mouse hosts. Firstly, we developed alignment-free (machine learning based models using composition-based features of peptides) methods for predicting TNF-α inducing peptides and achieved maximum AUROC of 0.79 and 0.74 for human and mouse hosts, respectively. Secondly, an alignment-based (using BLAST) method has been used for predicting TNF-α inducing epitopes. Finally, a hybrid method (combination of alignment-free and alignment-based method) has been developed for predicting epitopes. Hybrid approach achieved maximum AUROC of 0.83 and 0.77 on an independent dataset for human and mouse hosts, respectively. We have also identified potential TNF-α inducing peptides in different proteins of HIV-1, HIV-2, SARS-CoV-2, and human insulin. The best models developed in this study has been incorporated in the webserver TNFepitope (https://webs.iiitd.edu.in/raghava/tnfepitope/), standalone package and GitLab (https://gitlab.com/raghavalab/tnfepitope).


Asunto(s)
COVID-19 , Factor de Necrosis Tumoral alfa , Humanos , Animales , Ratones , Epítopos , Proyectos Piloto , SARS-CoV-2 , Péptidos
15.
Methods Mol Biol ; 2673: 317-327, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37258924

RESUMEN

Interleukin 6 (IL6) is a major pro-inflammatory cytokine that plays a pivotal role in both innate and adaptive immune responses. In the past, a number of studies reported that high level of IL6 promotes the proliferation of cancer, autoimmune disorders, and cytokine storm in COVID-19 patients. Thus, it is extremely important to identify and remove the antigenic regions from a therapeutic protein or vaccine candidate that may induce IL6-associated immunotoxicity. In order to overcome this challenge, our group has developed a computational tool, IL6pred, for discovering IL6-inducing peptides in a vaccine candidate. The aim of this chapter is to describe the potential applications and methodology of IL6pred. It sheds light on the prediction, designing, and scanning modules of IL6pred webserver and standalone package ( https://webs.iiitd.edu.in/raghava/il6pred/ ).


Asunto(s)
COVID-19 , Vacunas , Humanos , Interleucina-6/genética , COVID-19/prevención & control , Citocinas/metabolismo , Internet
16.
Methods Mol Biol ; 2673: 329-338, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37258925

RESUMEN

Interleukins are a distinctive class of molecules exhibiting various immune signaling functions. Immunoregulatory cytokine, Interleukin 13 (IL13), is primarily synthesized by activated T-helper 2 cells, mast cells, and basophils. IL13, is known to stimulate many allergic and autoimmune diseases, such as asthma, rheumatoid arthritis, systemic sclerosis, ulcerative colitis, airway hyperresponsiveness, glycoprotein hypersecretion, and goblet cell hyperplasia. In addition to such disorders, IL13 also leads to carcinogenesis by inhibiting tumor immunosurveillance. Due to its role in various diseases, predicting IL13-inducing peptides or regions in a protein is vital to designing safe protein vaccines and therapeutics. IL13pred is an in silico tool which aids in identifying, predicting, and designing IL13-inducing peptides. The IL13pred web server and standalone package is easily accessible at ( https://webs.iiitd.edu.in/raghava/il13pred/ ).


Asunto(s)
Asma , Interleucina-13 , Humanos , Citocinas , Interleucinas , Péptidos
17.
Comput Biol Med ; 158: 106864, 2023 05.
Artículo en Inglés | MEDLINE | ID: mdl-37058758

RESUMEN

Interleukin-5 (IL-5) can act as an enticing therapeutic target due to its pivotal role in several eosinophil-mediated diseases. The aim of this study is to develop a model for predicting IL-5 inducing antigenic regions in a protein with high precision. All models in this study have been trained, tested and validated on experimentally validated 1907 IL-5 inducing and 7759 non-IL-5 inducing peptides obtained from IEDB. Our primary analysis indicates that IL-5 inducing peptides are dominated by certain residues like Ile, Asn, and Tyr. It was also observed that binders of a wide range of HLA alleles can induce IL-5. Initially, alignment-based methods have been developed using similarity and motif search. These alignment-based methods provide high precision but poor coverage. In order to overcome this limitation, we explore alignment-free methods which are mainly machine learning-based models. Firstly, models have been developed using binary profiles and eXtreme Gradient Boosting-based model achieved a maximum AUC of 0.59. Secondly, composition-based models have been developed and our dipeptide-based random forest model achieved a maximum AUC of 0.74. Thirdly, random forest model developed using selected 250 dipeptides and achieved AUC 0.75 and MCC 0.29 on validation dataset; best among alignment-free models. In order to improve the performance, we developed an ensemble or hybrid method that combined alignment-based and alignment-free methods. Our hybrid method achieved AUC 0.94 with MCC 0.60 on a validation/independent dataset. The best hybrid model developed in this study has been incorporated into the user-friendly web server and a standalone package named 'IL5pred' (https://webs.iiitd.edu.in/raghava/il5pred/).


Asunto(s)
Interleucina-5 , Péptidos , Simulación por Computador , Péptidos/química , Computadores , Antígenos , Bases de Datos de Proteínas
18.
Front Microbiol ; 14: 1148579, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37032893

RESUMEN

Phage therapy is a viable alternative to antibiotics for treating microbial infections, particularly managing drug-resistant strains of bacteria. One of the major challenges in designing phage-based therapy is to identify the most appropriate potential phage candidate to treat bacterial infections. In this study, an attempt has been made to predict phage-host interactions with high accuracy to identify the potential bacteriophage that can be used for treating a bacterial infection. The developed models have been created using a training dataset containing 826 phage- host interactions, and have been evaluated on a validation dataset comprising 1,201 phage-host interactions. Firstly, alignment-based models have been developed using similarity between phage-phage (BLASTPhage), host-host (BLASTHost) and phage-CRISPR (CRISPRPred), where we achieved accuracy between 42.4-66.2% for BLASTPhage, 55-78.4% for BLASTHost, and 43.7-80.2% for CRISPRPred across five taxonomic levels. Secondly, alignment free models have been developed using machine learning techniques. Thirdly, hybrid models have been developed by integrating the alignment-free models and the similarity-scores where we achieved maximum performance of (60.6-93.5%). Finally, an ensemble model has been developed that combines the hybrid and alignment-based models. Our ensemble model achieved highest accuracy of 67.9, 80.6, 85.5, 90, and 93.5% at Genus, Family, Order, Class, and Phylum levels on validation dataset. In order to serve the scientific community, we have also developed a webserver named PhageTB and provided a standalone software package (https://webs.iiitd.edu.in/raghava/phagetb/) for the same.

19.
Front Immunol ; 14: 1056101, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36742312

RESUMEN

Introduction: Celiac disease (CD) is an autoimmune gastrointestinal disorder causes immune-mediated enteropathy against gluten. Gluten immunogenic peptides have the potential to trigger immune responses which leads to damage the small intestine. HLA-DQ2/DQ8 are major alleles that bind to epitope/antigenic region of gluten and induce celiac disease. There is a need to identify CD associated epitopes in protein-based foods and therapeutics. Methods: In this study, computational tools have been developed to predict CD associated epitopes and motifs. Dataset used for training, testing and evaluation contain experimentally validated CD associated and non-CD associate peptides. We perform positional analysis to identify the most significant position of an amino acid residue in the peptide and checked the frequency of HLA alleles. We also compute amino acid composition to develop machine learning based models. We also developed ensemble method that combines motif-based approach and machine learning based models. Results and Discussion: Our analysis support existing hypothesis that proline (P) and glutamine (Q) are highly abundant in CD associated peptides. A model based on density of P&Q in peptides has been developed for predicting CD associated peptides which achieve maximum AUROC 0.98 on independent data. We discovered motifs (e.g., QPF, QPQ, PYP) which occurs specifically in CD associated peptides. We also developed machine learning based models using peptide composition and achieved maximum AUROC 0.99. Finally, we developed ensemble method that combines motif-based approach and machine learning based models. The ensemble model-predict CD associated motifs with 100% accuracy on an independent dataset, not used for training. Finally, the best models and motifs has been integrated in a web server and standalone software package "CDpred". We hope this server anticipate the scientific community for the prediction, designing and scanning of CD associated peptides as well as CD associated motifs in a protein/peptide sequence (https://webs.iiitd.edu.in/raghava/cdpred/).


Asunto(s)
Enfermedad Celíaca , Humanos , Epítopos , Glútenes , Péptidos , Aminoácidos
20.
Database (Oxford) ; 20232023 02 07.
Artículo en Inglés | MEDLINE | ID: mdl-36747479

RESUMEN

Saliva as a non-invasive diagnostic fluid has immense potential as a tool for early diagnosis and prognosis of patients. The information about salivary biomarkers is broadly scattered across various resources and research papers. It is important to bring together all the information on salivary biomarkers to a single platform. This will accelerate research and development in non-invasive diagnosis and prognosis of complex diseases. We collected widespread information on five types of salivary biomarkers-proteins, metabolites, microbes, micro-ribonucleic acid (miRNA) and genes found in humans. This information was collected from different resources that include PubMed, the Human Metabolome Database and SalivaTecDB. Our database SalivaDB contains a total of 15 821 entries for 201 different diseases and 48 disease categories. These entries can be classified into five categories based on the type of biomolecules; 6067, 3987, 2909, 2272 and 586 entries belong to proteins, metabolites, microbes, miRNAs and genes, respectively. The information maintained in this database includes analysis methods, associated diseases, biomarker type, regulation status, exosomal origin, fold change and sequence. The entries are linked to relevant biological databases to provide users with comprehensive information. We developed a web-based interface that provides a wide range of options like browse, keyword search and advanced search. In addition, a similarity search module has been integrated which allows users to perform a similarity search using Basic Local Alignment Search Tool and Smith-Waterman algorithm against biomarker sequences in SalivaDB. We created a web-based database-SalivaDB, which provides information about salivary biomarkers found in humans. A wide range of web-based facilities have been integrated to provide services to the scientific community. https://webs.iiitd.edu.in/raghava/salivadb/.


Asunto(s)
Bases de Datos Factuales , MicroARNs , Humanos , Algoritmos , Biomarcadores , MicroARNs/genética , Programas Informáticos , Saliva
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA