Pesquisa | BVS IEC

1.

DMPPred: a tool for identification of antigenic regions responsible for inducing type 1 diabetes mellitus.

Kumar, Nishant; Patiyal, Sumeet; Choudhury, Shubham; Tomer, Ritu; Dhall, Anjali; Raghava, Gajendra P S.

Brief Bioinform ; 24(1)2023 01 19.

Artigo em Inglês | MEDLINE | ID: mdl-36524996

RESUMO

There are a number of antigens that induce autoimmune response against ß-cells, leading to type 1 diabetes mellitus (T1DM). Recently, several antigen-specific immunotherapies have been developed to treat T1DM. Thus, identification of T1DM associated peptides with antigenic regions or epitopes is important for peptide based-therapeutics (e.g. immunotherapeutic). In this study, for the first time, an attempt has been made to develop a method for predicting, designing, and scanning of T1DM associated peptides with high precision. We analysed 815 T1DM associated peptides and observed that these peptides are not associated with a specific class of HLA alleles. Thus, HLA binder prediction methods are not suitable for predicting T1DM associated peptides. First, we developed a similarity/alignment based method using Basic Local Alignment Search Tool and achieved a high probability of correct hits with poor coverage. Second, we developed an alignment-free method using machine learning techniques and got a maximum AUROC of 0.89 using dipeptide composition. Finally, we developed a hybrid method that combines the strength of both alignment free and alignment-based methods and achieves maximum area under the receiver operating characteristic of 0.95 with Matthew's correlation coefficient of 0.81 on an independent dataset. We developed a web server 'DMPPred' and stand-alone server for predicting, designing and scanning T1DM associated peptides (https://webs.iiitd.edu.in/raghava/dmppred/).

Assuntos

Diabetes Mellitus Tipo 1 , Humanos , Diabetes Mellitus Tipo 1/genética , Simulação por Computador , Peptídeos/química , Epitopos/química , Software

2.

Prediction of RNA-interacting residues in a protein using CNN and evolutionary profile.

Patiyal, Sumeet; Dhall, Anjali; Bajaj, Khushboo; Sahu, Harshita; Raghava, Gajendra P S.

Brief Bioinform ; 24(1)2023 01 19.

Artigo em Inglês | MEDLINE | ID: mdl-36516298

RESUMO

This paper describes a method Pprint2, which is an improved version of Pprint developed for predicting RNA-interacting residues in a protein. Training and independent/validation datasets used in this study comprises of 545 and 161 non-redundant RNA-binding proteins, respectively. All models were trained on training dataset and evaluated on the validation dataset. The preliminary analysis reveals that positively charged amino acids such as H, R and K, are more prominent in the RNA-interacting residues. Initially, machine learning based models have been developed using binary profile and obtain maximum area under curve (AUC) 0.68 on validation dataset. The performance of this model improved significantly from AUC 0.68 to 0.76, when evolutionary profile is used instead of binary profile. The performance of our evolutionary profile-based model improved further from AUC 0.76 to 0.82, when convolutional neural network has been used for developing model. Our final model based on convolutional neural network using evolutionary information achieved AUC 0.82 with Matthews correlation coefficient of 0.49 on the validation dataset. Our best model outperforms existing methods when evaluated on the independent/validation dataset. A user-friendly standalone software and web-based server named 'Pprint2' has been developed for predicting RNA-interacting residues (https://webs.iiitd.edu.in/raghava/pprint2 and https://github.com/raghavagps/pprint2).

Assuntos

Aminoácidos , RNA , Sítios de Ligação , RNA/metabolismo , Software , Proteínas de Ligação a RNA/metabolismo

3.

A random forest model for predicting exosomal proteins using evolutionary information and motifs.

Arora, Akanksha; Patiyal, Sumeet; Sharma, Neelam; Devi, Naorem Leimarembi; Kaur, Dashleen; Raghava, Gajendra P S.

Proteomics ; 24(6): e2300231, 2024 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-37525341

RESUMO

Non-invasive diagnostics and therapies are crucial to prevent patients from undergoing painful procedures. Exosomal proteins can serve as important biomarkers for such advancements. In this study, we attempted to build a model to predict exosomal proteins. All models are trained, tested, and evaluated on a non-redundant dataset comprising 2831 exosomal and 2831 non-exosomal proteins, where no two proteins have more than 40% similarity. Initially, the standard similarity-based method Basic Local Alignment Search Tool (BLAST) was used to predict exosomal proteins, which failed due to low-level similarity in the dataset. To overcome this challenge, machine learning (ML) based models were developed using compositional and evolutionary features of proteins achieving an area under the receiver operating characteristics (AUROC) of 0.73. Our analysis also indicated that exosomal proteins have a variety of sequence-based motifs which can be used to predict exosomal proteins. Hence, we developed a hybrid method combining motif-based and ML-based approaches for predicting exosomal proteins, achieving a maximum AUROC of 0.85 and MCC of 0.56 on an independent dataset. This hybrid model performs better than presently available methods when assessed on an independent dataset. A web server and a standalone software ExoProPred (https://webs.iiitd.edu.in/raghava/exopropred/) have been created to help scientists predict and discover exosomal proteins and find functional motifs present in them.

Assuntos

Algoritmo Florestas Aleatórias , Análise de Sequência de Proteína , Humanos , Sequência de Aminoácidos , Análise de Sequência de Proteína/métodos , Proteínas/metabolismo , Software

4.

HLAncPred: a method for predicting promiscuous non-classical HLA binding sites.

Dhall, Anjali; Patiyal, Sumeet; Raghava, Gajendra P S.

Brief Bioinform ; 23(5)2022 09 20.

Artigo em Inglês | MEDLINE | ID: mdl-35580839

RESUMO

Human leukocyte antigens (HLA) regulate various innate and adaptive immune responses and play a crucial immunomodulatory role. Recent studies revealed that non-classical HLA-(HLA-E & HLA-G) based immunotherapies have many advantages over traditional HLA-based immunotherapy, particularly against cancer and COVID-19 infection. In the last two decades, several methods have been developed to predict the binders of classical HLA alleles. In contrast, limited attempts have been made to develop methods for predicting non-classical HLA binding peptides, due to the scarcity of sufficient experimental data. Of note, in order to facilitate the scientific community, we have developed an artificial intelligence-based method for predicting binders of class-Ib HLA alleles. All the models were trained and tested on experimentally validated data obtained from the recent release of IEDB. The machine learning models achieved more than 0.98 AUC for HLA-G alleles on validation dataset. Similarly, our models achieved the highest AUC of 0.96 and 0.94 on the validation dataset for HLA-E*01:01 and HLA-E*01:03, respectively. We have summarized the models developed in the past for non-classical HLA and validated the performance with the models developed in this study. Moreover, to facilitate the community, we have utilized our tool for predicting the potential non-classical HLA binding peptides in the spike protein of different variants of virus causing COVID-19, including Omicron (B.1.1.529). One of the major challenges in the field of immunotherapy is to identify the promiscuous binders or antigenic regions that can bind to a large number of HLA alleles. To predict the promiscuous binders for the non-classical HLA alleles, we developed a web server HLAncPred (https://webs.iiitd.edu.in/raghava/hlancpred) and standalone package.

Assuntos

Inteligência Artificial , COVID-19 , Sítios de Ligação , COVID-19/genética , Antígenos HLA-G/metabolismo , Humanos , Peptídeos/química , Ligação Proteica , Glicoproteína da Espícula de Coronavírus/metabolismo

5.

A deep learning-based method for the prediction of DNA interacting residues in a protein.

Patiyal, Sumeet; Dhall, Anjali; Raghava, Gajendra P S.

Brief Bioinform ; 23(5)2022 09 20.

Artigo em Inglês | MEDLINE | ID: mdl-35943134

RESUMO

DNA-protein interaction is one of the most crucial interactions in the biological system, which decides the fate of many processes such as transcription, regulation and splicing of genes. In this study, we trained our models on a training dataset of 646 DNA-binding proteins having 15 636 DNA interacting and 298 503 non-interacting residues. Our trained models were evaluated on an independent dataset of 46 DNA-binding proteins having 965 DNA interacting and 9911 non-interacting residues. All proteins in the independent dataset have less than 30% of sequence similarity with proteins in the training dataset. A wide range of traditional machine learning and deep learning (1D-CNN) techniques-based models have been developed using binary, physicochemical properties and Position-Specific Scoring Matrix (PSSM)/evolutionary profiles. In the case of machine learning technique, eXtreme Gradient Boosting-based model achieved a maximum area under the receiver operating characteristics (AUROC) curve of 0.77 on the independent dataset using PSSM profile. Deep learning-based model achieved the highest AUROC of 0.79 on the independent dataset using a combination of all three profiles. We evaluated the performance of existing methods on the independent dataset and observed that our proposed method outperformed all the existing methods. In order to facilitate scientific community, we developed standalone software and web server, which are accessible from https://webs.iiitd.edu.in/raghava/dbpred.

Assuntos

Aprendizado Profundo , DNA/química , DNA/genética , Proteínas de Ligação a DNA , Bases de Dados de Proteínas , Matrizes de Pontuação de Posição Específica

6.

Computer-aided prediction and design of IL-6 inducing peptides: IL-6 plays a crucial role in COVID-19.

Dhall, Anjali; Patiyal, Sumeet; Sharma, Neelam; Usmani, Salman Sadullah; Raghava, Gajendra P S.

Brief Bioinform ; 22(2): 936-945, 2021 03 22.

Artigo em Inglês | MEDLINE | ID: mdl-33034338

RESUMO

Interleukin 6 (IL-6) is a pro-inflammatory cytokine that stimulates acute phase responses, hematopoiesis and specific immune reactions. Recently, it was found that the IL-6 plays a vital role in the progression of COVID-19, which is responsible for the high mortality rate. In order to facilitate the scientific community to fight against COVID-19, we have developed a method for predicting IL-6 inducing peptides/epitopes. The models were trained and tested on experimentally validated 365 IL-6 inducing and 2991 non-inducing peptides extracted from the immune epitope database. Initially, 9149 features of each peptide were computed using Pfeature, which were reduced to 186 features using the SVC-L1 technique. These features were ranked based on their classification ability, and the top 10 features were used for developing prediction models. A wide range of machine learning techniques has been deployed to develop models. Random Forest-based model achieves a maximum AUROC of 0.84 and 0.83 on training and independent validation dataset, respectively. We have also identified IL-6 inducing peptides in different proteins of SARS-CoV-2, using our best models to design vaccine against COVID-19. A web server named as IL-6Pred and a standalone package has been developed for predicting, designing and screening of IL-6 inducing peptides (https://webs.iiitd.edu.in/raghava/il6pred/).

Assuntos

COVID-19/fisiopatologia , Simulação por Computador , Interleucina-6/biossíntese , Peptídeos/metabolismo , COVID-19/virologia , Bases de Dados de Proteínas , Conjuntos de Dados como Assunto , Humanos , Interleucina-6/fisiologia , Aprendizado de Máquina , SARS-CoV-2/isolamento & purificação

7.

AlgPred 2.0: an improved method for predicting allergenic proteins and mapping of IgE epitopes.

Sharma, Neelam; Patiyal, Sumeet; Dhall, Anjali; Pande, Akshara; Arora, Chakit; Raghava, Gajendra P S.

Brief Bioinform ; 22(4)2021 07 20.

Artigo em Inglês | MEDLINE | ID: mdl-33201237

RESUMO

AlgPred 2.0 is a web server developed for predicting allergenic proteins and allergenic regions in a protein. It is an updated version of AlgPred developed in 2006. The dataset used for training, testing and validation consists of 10 075 allergens and 10 075 non-allergens. In addition, 10 451 experimentally validated immunoglobulin E (IgE) epitopes were used to identify antigenic regions in a protein. All models were trained on 80% of data called training dataset, and the performance of models was evaluated using 5-fold cross-validation technique. The performance of the final model trained on the training dataset was evaluated on 20% of data called validation dataset; no two proteins in any two sets have more than 40% similarity. First, a Basic Local Alignment Search Tool (BLAST) search has been performed against the dataset, and allergens were predicted based on the level of similarity with known allergens. Second, IgE epitopes obtained from the IEDB database were searched in the dataset to predict allergens based on their presence in a protein. Third, motif-based approaches like multiple EM for motif elicitation/motif alignment and search tool have been used to predict allergens. Fourth, allergen prediction models have been developed using a wide range of machine learning techniques. Finally, the ensemble approach has been used for predicting allergenic protein by combining prediction scores of different approaches. Our best model achieved maximum performance in terms of area under receiver operating characteristic curve 0.98 with Matthew's correlation coefficient 0.85 on the validation dataset. A web server AlgPred 2.0 has been developed that allows the prediction of allergens, mapping of IgE epitope, motif search and BLAST search (https://webs.iiitd.edu.in/raghava/algpred2/).

Assuntos

Alérgenos/química , Bases de Dados de Proteínas , Mapeamento de Epitopos , Epitopos/química , Hipersensibilidade , Imunoglobulina E/química , Análise de Sequência de Proteína , Software , Alérgenos/imunologia , Epitopos/imunologia , Humanos , Imunoglobulina E/imunologia , Valor Preditivo dos Testes

8.

In silico method for predicting infectious strains of influenza A virus from its genome and protein sequences.

Roy, Trinita; Sharma, Khushal; Dhall, Anjali; Patiyal, Sumeet; Raghava, Gajendra Pal Singh.

J Gen Virol ; 103(11)2022 11.

Artigo em Inglês | MEDLINE | ID: mdl-36318663

RESUMO

Influenza A is a contagious viral disease responsible for four pandemics in the past and a major public health concern. Being zoonotic in nature, the virus can cross the species barrier and transmit from wild aquatic bird reservoirs to humans via intermediate hosts. In this study, we have developed a computational method for the prediction of human-associated and non-human-associated influenza A virus sequences. The models were trained and validated on proteins and genome sequences of influenza A virus. Firstly, we have developed prediction models for 15 types of influenza A proteins using composition-based and one-hot-encoding features. We have achieved a highest AUC of 0.98 for HA protein on a validation dataset using dipeptide composition-based features. Of note, we obtained a maximum AUC of 0.99 using one-hot-encoding features for protein-based models on a validation dataset. Secondly, we built models using whole genome sequences which achieved an AUC of 0.98 on a validation dataset. In addition, we showed that our method outperforms a similarity-based approach (i.e., blast) on the same validation dataset. Finally, we integrated our best models into a user-friendly web server 'FluSPred' (https://webs.iiitd.edu.in/raghava/fluspred/index.html) and a standalone version (https://github.com/raghavagps/FluSPred) for the prediction of human-associated/non-human-associated influenza A virus strains.

Assuntos

Doenças Transmissíveis , Vírus da Influenza A , Influenza Humana , Humanos , Sequência de Aminoácidos , Leucócitos

9.

CancerEnD: A database of cancer associated enhancers.

Kumar, Rajesh; Lathwal, Anjali; Kumar, Vinod; Patiyal, Sumeet; Raghav, Pawan Kumar; Raghava, Gajendra P S.

Genomics ; 112(5): 3696-3702, 2020 09.

Artigo em Inglês | MEDLINE | ID: mdl-32360910

RESUMO

CancerEnD is an integrated resource developed for annotating 8524 unique expressed enhancers, associated genes, somatic mutations and copy number variations of 8063 cancer samples from 18 cancer types of TCGA. Somatic mutation data was taken from the COSMIC repository. To delineate the relationship of change in copy number of enhancer elements with the prognosis of cancer patients, survival analysis was done using the survival package in R. We identified 1762 overall survival associated enhancers, which can be used for prognostic purposes of cancer patients in a tissue-specific manner. CancerEnD (https://webs.iiitd.edu.in/raghava/cancerend/) is developed on a user-friendly responsive template, that enables searching, browsing and downloading of the annotated enhancer elements in terms of gene expression, copy number variation and survival association. We hope it provides a promising avenue for researchers to facilitate the understanding of enhancer deregulation in tumorigenesis, and to identify new biomarkers for therapy and disease-diagnosis.

Assuntos

Biomarcadores Tumorais/genética , Bases de Dados Genéticas , Elementos Facilitadores Genéticos , Neoplasias/genética , Variações do Número de Cópias de DNA , Humanos , Neoplasias/patologia , Prognóstico , Análise de Sobrevida

10.

In Silico Analysis of Gene Expression Change Associated with Copy Number of Enhancers in Pancreatic Adenocarcinoma.

Kumar, Rajesh; Patiyal, Sumeet; Kumar, Vinod; Nagpal, Gandharva; Raghava, Gajendra P S.

Int J Mol Sci ; 20(14)2019 Jul 22.

Artigo em Inglês | MEDLINE | ID: mdl-31336658

RESUMO

Understanding the gene regulatory network governing cancer initiation and progression is necessary, although it remains largely unexplored. Enhancer elements represent the center of this regulatory circuit. The study aims to identify the gene expression change driven by copy number variation in enhancer elements of pancreatic adenocarcinoma (PAAD). The pancreatic tissue specific enhancer and target gene data were taken from EnhancerAtlas. The gene expression and copy number data were taken from The Cancer Genome Atlas (TCGA). Differentially expressed genes (DEGs) and copy number variations (CNVs) were identified between matched tumor-normal samples of PAAD. Significant CNVs were matched onto enhancer coordinates by using genomic intersection functionality from BEDTools. By combining the gene expression and CNV data, we identified 169 genes whose expression shows a positive correlation with the CNV of enhancers. We further identified 16 genes which are regulated by a super enhancer and 15 genes which have high prognostic potential (Z-score > 1.96). Cox proportional hazard analysis of these genes indicates that these are better predictors of survival. Taken together, our integrative analytical approach identifies enhancer CNV-driven gene expression change in PAAD, which could lead to better understanding of PAAD pathogenesis and to the design of enhancer-based cancer treatment strategies.

Assuntos

Adenocarcinoma/genética , Biologia Computacional , Variações do Número de Cópias de DNA , Elementos Facilitadores Genéticos , Regulação Neoplásica da Expressão Gênica , Neoplasias Pancreáticas/genética , Adenocarcinoma/mortalidade , Adenocarcinoma/patologia , Biomarcadores Tumorais , Biologia Computacional/métodos , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Humanos , Neoplasias Pancreáticas/mortalidade , Neoplasias Pancreáticas/patologia , Prognóstico , Modelos de Riscos Proporcionais , Transcriptoma

11.

THPdb2: compilation of FDA approved therapeutic peptides and proteins.

Jain, Shipra; Gupta, Srijanee; Patiyal, Sumeet; Raghava, Gajendra P S.

Drug Discov Today ; 29(7): 104047, 2024 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-38830503

RESUMO

During the past 20 years, there has been a significant increase in the number of protein-based drugs approved by the US Food and Drug Administration (FDA). This paper presents THPdb2, an updated version of the THPdb database, which holds information about all types of protein-based drugs, including peptides, antibodies, and biosimilar proteins. THPdb2 contains a total of 6,385 entries, providing comprehensive information about 894 FDA-approved therapeutic proteins, including 354 monoclonal antibodies and 85 peptides or polypeptides. Each entry includes the name of therapeutic molecule, the amino acid sequence, physical and chemical properties, and route of drug administration. The therapeutic molecules that are included in the database target a wide range of biological molecules, such as receptors, factors, and proteins, and have been approved for the treatment of various diseases, including cancers, infectious diseases, and immune disorders.

Assuntos

Aprovação de Drogas , Peptídeos , United States Food and Drug Administration , Estados Unidos , Peptídeos/uso terapêutico , Peptídeos/farmacologia , Peptídeos/química , Humanos , Proteínas/química , Proteínas/uso terapêutico , Medicamentos Biossimilares/uso terapêutico , Medicamentos Biossimilares/farmacologia

12.

MRSLpred-a hybrid approach for predicting multi-label subcellular localization of mRNA at the genome scale.

Choudhury, Shubham; Bajiya, Nisha; Patiyal, Sumeet; Raghava, Gajendra P S.

Front Bioinform ; 4: 1341479, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38379813

RESUMO

In the past, several methods have been developed for predicting the single-label subcellular localization of messenger RNA (mRNA). However, only limited methods are designed to predict the multi-label subcellular localization of mRNA. Furthermore, the existing methods are slow and cannot be implemented at a transcriptome scale. In this study, a fast and reliable method has been developed for predicting the multi-label subcellular localization of mRNA that can be implemented at a genome scale. Machine learning-based methods have been developed using mRNA sequence composition, where the XGBoost-based classifier achieved an average area under the receiver operator characteristic (AUROC) of 0.709 (0.668-0.732). In addition to alignment-free methods, we developed alignment-based methods using motif search techniques. Finally, a hybrid technique that combines the XGBoost model and the motif-based approach has been developed, achieving an average AUROC of 0.742 (0.708-0.816). Our method-MRSLpred-outperforms the existing state-of-the-art classifier in terms of performance and computation efficiency. A publicly accessible webserver and a standalone tool have been developed to facilitate researchers (webserver: https://webs.iiitd.edu.in/raghava/mrslpred/).

13.

Genome-based solutions for managing mucormycosis.

Tomer, Ritu; Patiyal, Sumeet; Kaur, Dilraj; Choudhury, Shubham; Raghava, Gajendra P S.

Adv Protein Chem Struct Biol ; 139: 383-403, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38448141

RESUMO

An uncommon opportunistic fungal infection known as mucormycosis is caused by a class of molds called mucoromycetes. Currently, antifungal therapy and surgical debridement are the primary treatment options for mucormycosis. Despite the importance of comprehensive knowledge on mucormycosis, there is a lack of well-annotated databases that provide all relevant information. In this study, we have gathered and organized all available information related to mucormycosis that include disease's genome, proteins, diagnostic methods. Furthermore, using the AlphaFold2.0 prediction tool, we have predicted the tertiary structures of potential drug targets. We have categorized the information into three major sections: "genomics/proteomics," "immunotherapy," and "drugs." The genomics/proteomics module contains information on different strains responsible for mucormycosis. The immunotherapy module includes putative sequence-based therapeutics predicted using established tools. Drugs module provides information on available drugs for treating the disease. Additionally, the drugs module also offers prerequisite information for designing computationally aided drugs, such as putative targets and predicted structures. In order to provide comprehensive information over internet, we developed a web-based platform MucormyDB (https://webs.iiitd.edu.in/raghava/mucormydb/).

Assuntos

Fármacos Anti-HIV , Mucormicose , Humanos , Mucormicose/tratamento farmacológico , Mucormicose/genética , Genômica , Bases de Dados Factuais , Sistemas de Liberação de Medicamentos

14.

A deep learning method for classification of HNSCC and HPV patients using single-cell transcriptomics.

Jarwal, Akanksha; Dhall, Anjali; Arora, Akanksha; Patiyal, Sumeet; Srivastava, Aman; Raghava, Gajendra P S.

Front Mol Biosci ; 11: 1395721, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38872916

RESUMO

Background: Head and Neck Squamous Cell Carcinoma (HNSCC) is the seventh most highly prevalent cancer type worldwide. Early detection of HNSCC is one of the important challenges in managing the treatment of the cancer patients. Existing techniques for detecting HNSCC are costly, expensive, and invasive in nature. Methods: In this study, we aimed to address this issue by developing classification models using machine learning and deep learning techniques, focusing on single-cell transcriptomics to distinguish between HNSCC and normal samples. Furthermore, we built models to classify HNSCC samples into HPV-positive (HPV+) and HPV-negative (HPV-) categories. In this study, we have used GSE181919 dataset, we have extracted 20 primary cancer (HNSCC) samples, and 9 normal tissues samples. The primary cancer samples contained 13 HPV- and 7 HPV+ samples. The models developed in this study have been trained on 80% of the dataset and validated on the remaining 20%. To develop an efficient model, we performed feature selection using mRMR method to shortlist a small number of genes from a plethora of genes. We also performed Gene Ontology (GO) enrichment analysis on the 100 shortlisted genes. Results: Artificial Neural Network based model trained on 100 genes outperformed the other classifiers with an AUROC of 0.91 for HNSCC classification for the validation set. The same algorithm achieved an AUROC of 0.83 for the classification of HPV+ and HPV- patients on the validation set. In GO enrichment analysis, it was found that most genes were involved in binding and catalytic activities. Conclusion: A software package has been developed in Python which allows users to identify HNSCC in patients along with their HPV status. It is available at https://webs.iiitd.edu.in/raghava/hnscpred/.

15.

A hybrid approach for predicting transcription factors.

Patiyal, Sumeet; Tiwari, Palak; Ghai, Mohit; Dhapola, Aman; Dhall, Anjali; Raghava, Gajendra P S.

Front Bioinform ; 4: 1425419, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-39119181

RESUMO

Transcription factors are essential DNA-binding proteins that regulate the transcription rate of several genes and control the expression of genes inside a cell. The prediction of transcription factors with high precision is important for understanding biological processes such as cell differentiation, intracellular signaling, and cell-cycle control. In this study, we developed a hybrid method that combines alignment-based and alignment-free methods for predicting transcription factors with higher accuracy. All models have been trained, tested, and evaluated on a large dataset that contains 19,406 transcription factors and 523,560 non-transcription factor protein sequences. To avoid biases in evaluation, the datasets were divided into training and validation/independent datasets, where 80% of the data was used for training, and the remaining 20% was used for external validation. In the case of alignment-free methods, models were developed using machine learning techniques and the composition-based features of a protein. Our best alignment-free model obtained an AUC of 0.97 on an independent dataset. In the case of the alignment-based method, we used BLAST at different cut-offs to predict the transcription factors. Although the alignment-based method demonstrated excellent performance, it was unable to cover all transcription factors due to instances of no hits. To combine the strengths of both methods, we developed a hybrid method that combines alignment-free and alignment-based methods. In the hybrid method, we added the scores of the alignment-free and alignment-based methods and achieved a maximum AUC of 0.99 on the independent dataset. The method proposed in this study performs better than existing methods. We incorporated the best models in the webserver/Python Package Index/standalone package of "TransFacPred" (https://webs.iiitd.edu.in/raghava/transfacpred).

16.

A method for predicting linear and conformational B-cell epitopes in an antigen from its primary sequence.

Kumar, Nishant; Tripathi, Sadhana; Sharma, Neelam; Patiyal, Sumeet; Devi, Naorem Leimarembi; Raghava, Gajendra P S.

Comput Biol Med ; 170: 108083, 2024 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-38295479

RESUMO

B-cell is an essential component of the immune system that plays a vital role in providing the immune response against any pathogenic infection by producing antibodies. Existing methods either predict linear or conformational B-cell epitopes in an antigen. In this study, a single method was developed for predicting both types (linear/conformational) of B-cell epitopes. The dataset used in this study contains 3875 B-cell epitopes and 3996 non-B-cell epitopes, where B-cell epitopes consist of both linear and conformational B-cell epitopes. Our primary analysis indicates that certain residues (like Asp, Glu, Lys, and Asn) are more prominent in B-cell epitopes. We developed machine-learning based methods using different types of sequence composition and achieved the highest AUROC of 0.80 using dipeptide composition. In addition, models were developed on selected features, but no further improvement was observed. Our similarity-based method implemented using BLAST shows a high probability of correct prediction with poor sensitivity. Finally, we developed a hybrid model that combines alignment-free (dipeptide based random forest model) and alignment-based (BLAST-based similarity) models. Our hybrid model attained a maximum AUROC of 0.83 with an MCC of 0.49 on the independent dataset. Our hybrid model performs better than existing methods on an independent dataset used in this study. All models were trained and tested on 80 % of the data using a cross-validation technique, and the final model was evaluated on 20 % of the data, called an independent or validation dataset. A webserver and standalone package named "CLBTope" has been developed for predicting, designing, and scanning B-cell epitopes in an antigen sequence available at (https://webs.iiitd.edu.in/raghava/clbtope/).

Assuntos

Antígenos , Epitopos de Linfócito B , Epitopos de Linfócito B/química , Sequência de Aminoácidos , Antígenos/química , Conformação Molecular , Dipeptídeos

17.

Prediction of celiac disease associated epitopes and motifs in a protein.

Tomer, Ritu; Patiyal, Sumeet; Dhall, Anjali; Raghava, Gajendra P S.

Front Immunol ; 14: 1056101, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-36742312

RESUMO

Introduction: Celiac disease (CD) is an autoimmune gastrointestinal disorder causes immune-mediated enteropathy against gluten. Gluten immunogenic peptides have the potential to trigger immune responses which leads to damage the small intestine. HLA-DQ2/DQ8 are major alleles that bind to epitope/antigenic region of gluten and induce celiac disease. There is a need to identify CD associated epitopes in protein-based foods and therapeutics. Methods: In this study, computational tools have been developed to predict CD associated epitopes and motifs. Dataset used for training, testing and evaluation contain experimentally validated CD associated and non-CD associate peptides. We perform positional analysis to identify the most significant position of an amino acid residue in the peptide and checked the frequency of HLA alleles. We also compute amino acid composition to develop machine learning based models. We also developed ensemble method that combines motif-based approach and machine learning based models. Results and Discussion: Our analysis support existing hypothesis that proline (P) and glutamine (Q) are highly abundant in CD associated peptides. A model based on density of P&Q in peptides has been developed for predicting CD associated peptides which achieve maximum AUROC 0.98 on independent data. We discovered motifs (e.g., QPF, QPQ, PYP) which occurs specifically in CD associated peptides. We also developed machine learning based models using peptide composition and achieved maximum AUROC 0.99. Finally, we developed ensemble method that combines motif-based approach and machine learning based models. The ensemble model-predict CD associated motifs with 100% accuracy on an independent dataset, not used for training. Finally, the best models and motifs has been integrated in a web server and standalone software package "CDpred". We hope this server anticipate the scientific community for the prediction, designing and scanning of CD associated peptides as well as CD associated motifs in a protein/peptide sequence (https://webs.iiitd.edu.in/raghava/cdpred/).

Assuntos

Doença Celíaca , Humanos , Epitopos , Glutens , Peptídeos , Aminoácidos

18.

Hmrbase2: a comprehensive database of hormones and their receptors.

Kaur, Dashleen; Arora, Akanksha; Patiyal, Sumeet; Raghava, Gajendra Pal Singh.

Hormones (Athens) ; 22(3): 359-366, 2023 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-37291365

RESUMO

PURPOSE: Hormones play a critical role in regulating various physiological processes and any hormonal imbalances can lead to major endocrine disorders. Thus, studying hormones is essential for both the therapeutics and the diagnostics of hormonal diseases. To facilitate this need, we have developed Hmrbase2, a comprehensive platform that provides extensive information on hormones. METHODS: Hmrbase2 is a web-based database which is an update of a previously published database, Hmrbase ( http://crdd.osdd.net/raghava/hmrbase/ ). We collected a large amount of information on peptide and non-peptide hormones and hormone receptors, this information being sourced from Hmrbase, HMDB, UniProt, HORDB, ENDONET, PubChem, and the medical literature. RESULTS: Hmrbase2 contains a total of 12,056 entries, which is more than twice the number of entries contained in the previous version Hmrbase. These include 7406, 753, and 3897 entries for peptide hormones, non-peptide hormones, and hormone receptors, respectively, from 803 organisms compared to the 562 organisms in the previous version. The database also hosts 5662 hormone receptor pairs. The source organism, function, and subcellular location are provided for peptide hormones and receptors and properties such as melting point and water solubility is provided for non-peptide hormones. Besides browsing and keyword search, an advanced search option has also been supplied. Additionally, a similarity search module has been incorporated enabling users to run similarity searches against peptide hormone sequences using BLAST and Smith-Waterman. CONCLUSIONS: To make the database accessible to various users, we designed a user-friendly, responsive website that can be easily used on smartphones, tablets, and desktop computers. The updated database version, Hmrbase2, offers improved data content compared to the previous version. Hmrbase2 is freely available at https://webs.iiitd.edu.in/raghava/hmrbase2 .

Assuntos

Hormônios , Hormônios Peptídicos , Humanos , Bases de Dados de Proteínas

19.

In Silico Tool for Identification, Designing, and Searching of IL13-Inducing Peptides in Antigens.

Jain, Shipra; Dhall, Anjali; Patiyal, Sumeet; Raghava, Gajendra P S.

Methods Mol Biol ; 2673: 329-338, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37258925

RESUMO

Interleukins are a distinctive class of molecules exhibiting various immune signaling functions. Immunoregulatory cytokine, Interleukin 13 (IL13), is primarily synthesized by activated T-helper 2 cells, mast cells, and basophils. IL13, is known to stimulate many allergic and autoimmune diseases, such as asthma, rheumatoid arthritis, systemic sclerosis, ulcerative colitis, airway hyperresponsiveness, glycoprotein hypersecretion, and goblet cell hyperplasia. In addition to such disorders, IL13 also leads to carcinogenesis by inhibiting tumor immunosurveillance. Due to its role in various diseases, predicting IL13-inducing peptides or regions in a protein is vital to designing safe protein vaccines and therapeutics. IL13pred is an in silico tool which aids in identifying, predicting, and designing IL13-inducing peptides. The IL13pred web server and standalone package is easily accessible at ( https://webs.iiitd.edu.in/raghava/il13pred/ ).

Assuntos

Asma , Interleucina-13 , Humanos , Citocinas , Interleucinas , Peptídeos

20.

Risk assessment of cancer patients based on HLA-I alleles, neobinders and expression of cytokines.

Dhall, Anjali; Patiyal, Sumeet; Kaur, Harpreet; Raghava, Gajendra P S.

Comput Biol Med ; 167: 107594, 2023 12.

Artigo em Inglês | MEDLINE | ID: mdl-37918263

RESUMO

Advancements in cancer immunotherapy have shown significant outcomes in treating cancers. To design effective immunotherapy, it's important to understand immune response of a patient based on its genomic profile. However, analyses to do that requires proficiency in the bioinformatic methods. Swiftly growing sequencing technologies and statistical methods create a blockage for the scientists who want to find the biomarkers for different cancers but don't have detailed knowledge of coding or tool. Here, we are providing a web-based resource that gives scientists with no bioinformatics expertise, the ability to obtain the prognostic biomarkers for different cancer types at different levels. We computed prognostic biomarkers from 8346 cancer patients for twenty cancer types. These biomarkers were computed based on i) presence of 352 Human leukocyte antigen class-I, ii) 660959 tumor-specific HLA1 neobinders, and iii) expression profile of 153 cytokines. It was observed that survival risk of cancer patients depends on presence of certain type of HLA-I alleles; for example, liver hepatocellular carcinoma patients with HLA-A*03:01 are at lower risk. Our analysis indicates that neobinders of HLA-I alleles have high correlation with overall survival of certain type of cancer patients. For example, HLA-B*07:02 binders have 0.49 correlation with survival of lung squamous cell carcinoma and -0.77 with kidney chromophobe patients. Additionally, we computed prognostic biomarkers based on cytokine expressions. Higher expression of few cytokines is survival favorable like IL-2 for bladder urothelial carcinoma, whereas IL-5R is survival unfavorable for kidney chromophobe patients. Freely accessible to public, CancerHLA-I maintains raw and analysed data (https://webs.iiitd.edu.in/raghava/cancerhla1/).

Assuntos

Carcinoma de Células de Transição , Neoplasias Pulmonares , Neoplasias da Bexiga Urinária , Humanos , Citocinas/genética , Alelos , Carcinoma de Células de Transição/genética , Neoplasias da Bexiga Urinária/genética , Biomarcadores , Neoplasias Pulmonares/genética , Medição de Risco

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA