Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
1.
BMC Plant Biol ; 24(1): 379, 2024 May 08.
Artículo en Inglés | MEDLINE | ID: mdl-38720284

RESUMEN

BACKGROUND: Rice bean (Vigna umbellata), an underrated legume, adapts to diverse climatic conditions with the potential to support food and nutritional security worldwide. It is used as a vegetable, minor food crop and a fodder crop, being a rich source of proteins, minerals, and essential fatty acids. However, little effort has been made to decipher the genetic and molecular basis of various useful traits in this crop. Therefore, we considered three economically important traits i.e., flowering, maturity and seed weight of rice bean and identified the associated candidate genes employing an associative transcriptomics approach on 100 diverse genotypes out of 1800 evaluated rice bean accessions from the Indian National Genebank. RESULTS: The transcriptomics-based genotyping of one-hundred diverse rice bean cultivars followed by pre-processing of genotypic data resulted in 49,271 filtered markers. The STRUCTURE, PCA and Neighbor-Joining clustering of 100 genotypes revealed three putative sub-populations. The marker-trait association analysis involving various genome-wide association study (GWAS) models revealed significant association of 82 markers on 48 transcripts for flowering, 26 markers on 22 transcripts for maturity and 22 markers on 21 transcripts for seed weight. The transcript annotation provided information on the putative candidate genes for the considered traits. The candidate genes identified for flowering include HSC80, P-II PsbX, phospholipid-transporting-ATPase-9, pectin-acetylesterase-8 and E3-ubiquitin-protein-ligase-RHG1A. Further, the WRKY1 and DEAD-box-RH27 were found to be associated with seed weight. Furthermore, the associations of PIF3 and pentatricopeptide-repeat-containing-gene with maturity and seed weight, and aldo-keto-reductase with flowering and maturity were revealed. CONCLUSION: This study offers insights into the genetic basis of key agronomic traits in rice bean, including flowering, maturity, and seed weight. The identified markers and associated candidate genes provide valuable resources for future exploration and targeted breeding, aiming to enhance the agronomic performance of rice bean cultivars. Notably, this research represents the first transcriptome-wide association study in pulse crop, uncovering the candidate genes for agronomically useful traits.


Asunto(s)
Flores , Estudio de Asociación del Genoma Completo , Semillas , Transcriptoma , Semillas/genética , Semillas/crecimiento & desarrollo , Flores/genética , Flores/crecimiento & desarrollo , Vigna/genética , Vigna/crecimiento & desarrollo , Genes de Plantas , Genotipo , Perfilación de la Expresión Génica , Mapeo Cromosómico , Sitios de Carácter Cuantitativo/genética , Fenotipo
2.
Brief Bioinform ; 23(5)2022 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-35998895

RESUMEN

Linear B-cell epitopes have a prominent role in the development of peptide-based vaccines and disease diagnosis. High variability in the length of these epitopes is a major reason for low accuracy in their prediction. Most of the B-cell epitope prediction methods considered fixed length of epitope sequences and achieved good accuracy. Though a number of tools are available for the prediction of flexible length linear B-cell epitopes with reasonable accuracy, further improvement in the prediction performance is still expected. Thus, here we made an attempt to analyze the performance of machine learning approaches (MLA) with 18 different amino acid encoding schemes in the prediction of flexible length linear B-cell epitopes. We considered B-cell epitope sequences of variable lengths (11-56 amino acids) from well-established public resources. The performances of machine learning algorithms with the encoded epitope sequence datasets were evaluated. Besides, the feasible combinations of encoding schemes were also explored and analyzed. The results revealed that amino-acid composition (AC) and distribution component of composition-transition-distribution encoding schemes are suitable for heterogeneous epitope data, whereas amino-acid-anchoring-pair-composition (APC), dipeptide-composition and amino-acids-pair-propensity-scale (APP) are more appropriate for homogeneous data. Further, two combinations of peptide encoding schemes, i.e. APC + AC and APC + APP with random forest classifier were identified to have improved performance over the state-of-the-art tools for flexible length linear B-cell epitope prediction. The study also revealed better performance of random forest over other considered MLAs in the prediction of flexible length linear B-cell epitopes.


Asunto(s)
Epítopos de Linfocito B , Vacunas , Aminoácidos/genética , Dipéptidos , Péptidos/química
3.
Brief Bioinform ; 23(5)2022 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-36040109

RESUMEN

Maintaining duplicate germplasms in genebanks hampers effective conservation and utilization of genebank resources. The redundant germplasm adds to the cost of germplasm conservation by requiring a large proportion of the genebank financial resources towards conservation rather than enriching the diversity. Besides, genome-wide-association analysis using an association panel with over-represented germplasms can be biased resulting in spurious marker-trait associations. The conventional methods of germplasm duplicate removal using passport information suffer from incomplete or missing passport information and data handling errors at various stages of germplasm enrichment. This limitation is less likely in the case of genotypic data. Therefore, we developed a web-based tool, Germplasm Duplicate Identification and Removal Tool (G-DIRT), which allows germplasm duplicate identification based on identity-by-state analysis using single-nucleotide polymorphism genotyping information along with pre-processing of genotypic data. A homozygous genotypic difference threshold of 0.1% for germplasm duplicates has been determined using tetraploid wheat genotypic data with 94.97% of accuracy. Based on the genotypic difference, the tool also builds a dendrogram that can visually depict the relationship between genotypes. To overcome the constraint of high-dimensional genotypic data, an offline version of G-DIRT in the interface of R has also been developed. The G-DIRT is expected to help genebank curators, breeders and other researchers across the world in identifying germplasm duplicates from the global genebank collections by only using the easily sharable genotypic data instead of physically exchanging the seeds or propagating materials. The web server will complement the existing methods of germplasm duplicate identification based on passport or phenotypic information being freely accessible at http://webtools.nbpgr.ernet.in/gdirt/.


Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Genotipo , Semillas/genética
4.
Funct Integr Genomics ; 23(2): 169, 2023 May 20.
Artículo en Inglés | MEDLINE | ID: mdl-37209309

RESUMEN

Stripe rust (Sr), caused by Puccinia striiformis f. sp. tritici (Pst), is the most devastating disease that poses serious threat to the wheat-growing nations across the globe. Developing resistant cultivars is the most challenging aspect in wheat breeding. The function of resistance genes (R genes) and the mechanisms by which they influence plant-host interactions are poorly understood. In the present investigation, comparative transcriptome analysis was carried out by involving two near-isogenic lines (NILs) PBW343 and FLW29. The seedlings of both the genotypes were inoculated with Pst pathotype 46S119. In total, 1106 differentially expressed genes (DEGs) were identified at early stage of infection (12 hpi), whereas expressions of 877 and 1737 DEGs were observed at later stages (48 and 72 hpi) in FLW29. The identified DEGs were comprised of defense-related genes including putative R genes, 7 WRKY transcriptional factors, calcium, and hormonal signaling associated genes. Moreover, pathways involved in signaling of receptor kinases, G protein, and light showed higher expression in resistant cultivar and were common across different time points. Quantitative real-time PCR was used to further confirm the transcriptional expression of eight critical genes involved in plant defense mechanism against stripe rust. The information about genes are likely to improve our knowledge of the genetic mechanism that controls the stripe rust resistance in wheat, and data on resistance response-linked genes and pathways will be a significant resource for future research.


Asunto(s)
Basidiomycota , Triticum , Triticum/genética , Fitomejoramiento , Basidiomycota/genética , Genotipo , Perfilación de la Expresión Génica , Enfermedades de las Plantas/genética , Resistencia a la Enfermedad/genética
5.
Int J Mol Sci ; 23(3)2022 Jan 30.
Artículo en Inglés | MEDLINE | ID: mdl-35163534

RESUMEN

MicroRNAs (miRNAs) play a significant role in plant response to different abiotic stresses. Thus, identification of abiotic stress-responsive miRNAs holds immense importance in crop breeding programmes to develop cultivars resistant to abiotic stresses. In this study, we developed a machine learning-based computational method for prediction of miRNAs associated with abiotic stresses. Three types of datasets were used for prediction, i.e., miRNA, Pre-miRNA, and Pre-miRNA + miRNA. The pseudo K-tuple nucleotide compositional features were generated for each sequence to transform the sequence data into numeric feature vectors. Support vector machine (SVM) was employed for prediction. The area under receiver operating characteristics curve (auROC) of 70.21, 69.71, 77.94 and area under precision-recall curve (auPRC) of 69.96, 65.64, 77.32 percentages were obtained for miRNA, Pre-miRNA, and Pre-miRNA + miRNA datasets, respectively. Overall prediction accuracies for the independent test set were 62.33, 64.85, 69.21 percentages, respectively, for the three datasets. The SVM also achieved higher accuracy than other learning methods such as random forest, extreme gradient boosting, and adaptive boosting. To implement our method with ease, an online prediction server "ASRmiRNA" has been developed. The proposed approach is believed to supplement the existing effort for identification of abiotic stress-responsive miRNAs and Pre-miRNAs.


Asunto(s)
Biología Computacional/métodos , MicroARNs/genética , Plantas/genética , Algoritmos , Área Bajo la Curva , Regulación de la Expresión Génica de las Plantas , ARN de Planta/genética , Estrés Fisiológico , Máquina de Vectores de Soporte
6.
Physiol Mol Biol Plants ; 28(1): 1-16, 2022 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-35221569

RESUMEN

In plants, GIGANTEA (GI) protein plays different biological functions including carbon and sucrose metabolism, cell wall deposition, transpiration and hypocotyl elongation. This suggests that GI is an important class of proteins. So far, the resource-intensive experimental methods have been mostly utilized for identification of GI proteins. Thus, we made an attempt in this study to develop a computational model for fast and accurate prediction of GI proteins. Ten different supervised learning algorithms i.e., SVM, RF, JRIP, J48, LMT, IBK, NB, PART, BAGG and LGB were employed for prediction, where the amino acid composition (AAC), FASGAI features and physico-chemical (PHYC) properties were used as numerical inputs for the learning algorithms. Higher accuracies i.e., 96.75% of AUC-ROC and 86.7% of AUC-PR were observed for SVM coupled with AAC + PHYC feature combination, while evaluated with five-fold cross validation. With leave-one-out cross validation, 97.29% of AUC-ROC and 87.89% of AUC-PR were respectively achieved. While the performance of the model was evaluated with an independent dataset of 18 GI sequences, 17 were observed as correctly predicted. We have also performed proteome-wide identification of GI proteins in wheat, followed by functional annotation using Gene Ontology terms. A prediction server "GIpred" is freely accessible at http://cabgrid.res.in:8080/gipred/ for proteome-wide recognition of GI proteins. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s12298-022-01130-6.

7.
BMC Genet ; 20(1): 2, 2019 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-30616524

RESUMEN

BACKGROUND: Identification of unknown fungal species aids to the conservation of fungal diversity. As many fungal species cannot be cultured, morphological identification of those species is almost impossible. But, DNA barcoding technique can be employed for identification of such species. For fungal taxonomy prediction, the ITS (internal transcribed spacer) region of rDNA (ribosomal DNA) is used as barcode. Though the computational prediction of fungal species has become feasible with the availability of huge volume of barcode sequences in public domain, prediction of fungal species is challenging due to high degree of variability among ITS regions within species. RESULTS: A Random Forest (RF)-based predictor was built for identification of unknown fungal species. The reference and query sequences were mapped onto numeric features based on gapped base pair compositions, and then used as training and test sets respectively for prediction of fungal species using RF. More than 85% accuracy was found when 4 sequences per species in the reference set were utilized; whereas it was seen to be stabilized at ~88% if ≥7 sequence per species in the reference set were used for training of the model. The proposed model achieved comparable accuracy, while evaluated against existing methods through cross-validation procedure. The proposed model also outperformed several existing models used for identification of different species other than fungi. CONCLUSIONS: An online prediction server "funbarRF" is established at http://cabgrid.res.in:8080/funbarrf/ for fungal species identification. Besides, an R-package funbarRF ( https://cran.r-project.org/web/packages/funbarRF/ ) is also available for prediction using high throughput sequence data. The effort put in this work will certainly supplement the future endeavors in the direction of fungal taxonomy assignments based on DNA barcode.


Asunto(s)
Biología Computacional/métodos , Código de Barras del ADN Taxonómico/métodos , Hongos/clasificación , Hongos/genética , Aprendizaje Automático Supervisado , ADN de Hongos/genética , Programas Informáticos
8.
BMC Bioinformatics ; 18(1): 190, 2017 Mar 24.
Artículo en Inglés | MEDLINE | ID: mdl-28340571

RESUMEN

BACKGROUND: Insecticide resistance is a major challenge for the control program of insect pests in the fields of crop protection, human and animal health etc. Resistance to different insecticides is conferred by the proteins encoded from certain class of genes of the insects. To distinguish the insecticide resistant proteins from non-resistant proteins, no computational tool is available till date. Thus, development of such a computational tool will be helpful in predicting the insecticide resistant proteins, which can be targeted for developing appropriate insecticides. RESULTS: Five different sets of feature viz., amino acid composition (AAC), di-peptide composition (DPC), pseudo amino acid composition (PAAC), composition-transition-distribution (CTD) and auto-correlation function (ACF) were used to map the protein sequences into numeric feature vectors. The encoded numeric vectors were then used as input in support vector machine (SVM) for classification of insecticide resistant and non-resistant proteins. Higher accuracies were obtained under RBF kernel than that of other kernels. Further, accuracies were observed to be higher for DPC feature set as compared to others. The proposed approach achieved an overall accuracy of >90% in discriminating resistant from non-resistant proteins. Further, the two classes of resistant proteins i.e., detoxification-based and target-based were discriminated from non-resistant proteins with >95% accuracy. Besides, >95% accuracy was also observed for discrimination of proteins involved in detoxification- and target-based resistance mechanisms. The proposed approach not only outperformed Blastp, PSI-Blast and Delta-Blast algorithms, but also achieved >92% accuracy while assessed using an independent dataset of 75 insecticide resistant proteins. CONCLUSIONS: This paper presents the first computational approach for discriminating the insecticide resistant proteins from non-resistant proteins. Based on the proposed approach, an online prediction server DIRProt has also been developed for computational prediction of insecticide resistant proteins, which is accessible at http://cabgrid.res.in:8080/dirprot/ . The proposed approach is believed to supplement the efforts needed to develop dynamic insecticides in wet-lab by targeting the insecticide resistant proteins.


Asunto(s)
Sistema Enzimático del Citocromo P-450/metabolismo , Insecticidas/metabolismo , Proteínas/química , Ácido gamma-Aminobutírico/metabolismo , Animales , Humanos , Insecticidas/análisis
9.
J Theor Biol ; 404: 285-294, 2016 09 07.
Artículo en Inglés | MEDLINE | ID: mdl-27302911

RESUMEN

Identification of splice sites is important due to their key role in predicting the exon-intron structure of protein coding genes. Though several approaches have been developed for the prediction of splice sites, further improvement in the prediction accuracy will help predict gene structure more accurately. This paper presents a computational approach for prediction of donor splice sites with higher accuracy. In this approach, true and false splice sites were first encoded into numeric vectors and then used as input in artificial neural network (ANN), support vector machine (SVM) and random forest (RF) for prediction. ANN and SVM were found to perform equally and better than RF, while tested on HS3D and NN269 datasets. Further, the performance of ANN, SVM and RF were analyzed by using an independent test set of 50 genes and found that the prediction accuracy of ANN was higher than that of SVM and RF. All the predictors achieved higher accuracy while compared with the existing methods like NNsplice, MEM, MDD, WMM, MM1, FSPLICE, GeneID and ASSP, using the independent test set. We have also developed an online prediction server (PreDOSS) available at http://cabgrid.res.in:8080/predoss, for prediction of donor splice sites using the proposed approach.


Asunto(s)
Biología Computacional/métodos , Sitios de Empalme de ARN/genética , Área Bajo la Curva , Secuencia de Bases , Distribución de Chi-Cuadrado , Bases de Datos de Ácidos Nucleicos , Humanos , Internet , Redes Neurales de la Computación , Motivos de Nucleótidos/genética , Curva ROC , Homología de Secuencia de Ácido Nucleico , Máquina de Vectores de Soporte
10.
Indian J Biochem Biophys ; 52(1): 34-44, 2015 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-26040110

RESUMEN

Viral diseases like foot-and-mouth disease (FMD), calf scour (CS), bovine viral diarrhea (BVD), infectious bovine rhinotracheitis (IBR) etc. affect the growth and milk production of cattle (Bos taurus) causing severe economic loss. Epitope-based vaccine designing have been evolved to provide a new strategy for therapeutic application of pathogen-specific immunity in animals. Therefore, identification of major histocompatibility complex (MHC) binding peptides as potential T-cell epitopes is widely applied in peptide vaccine designing and immunotherapy. In this study, MetaMHCI tool was used with seven different algorithms to predict the potential T-cell epitopes for FMD, BVD, IBR and CS in cattle. A total of 54 protein sequences were filtered out from a total set of 6351 sequences of the pathogens causing the said diseases using bioinformatics approaches. These selected protein sequences were used as the key inputs for MetaMHCI tool to predict the epitopes for the BoLA-All MHC class I allele of B. taurus. Further, the epitopes were ranked based on a proposed principal component analysis based epitope score (PbES). The best epitope for each disease based on its predictability through maximum number of predictors and low PbES was modeled in PEP-FOLD server and docked with the BoLA-A11 protein for understanding the MHC-epitope interaction. Finally, a total of 78 epitopes were predicted, out of which 27 were for FMD, 25 for BVD, 12 for CS and 14 for IBR. These epitopes could be artificially synthesized and recommended to vaccinate the cattle for the considered diseases. Besides, the methodology adapted here could also be used to predict and analyze the epitopes for other microbial diseases of important animal species.


Asunto(s)
Enfermedades de los Bovinos/inmunología , Epítopos/inmunología , Antígenos de Histocompatibilidad Clase I/inmunología , Virosis/veterinaria , Animales , Bovinos , Virosis/inmunología
11.
BMC Bioinformatics ; 15: 362, 2014 Nov 25.
Artículo en Inglés | MEDLINE | ID: mdl-25420551

RESUMEN

BACKGROUND: Most of the approaches for splice site prediction are based on machine learning techniques. Though, these approaches provide high prediction accuracy, the window lengths used are longer in size. Hence, these approaches may not be suitable to predict the novel splice variants using the short sequence reads generated from next generation sequencing technologies. Further, machine learning techniques require numerically encoded data and produce different accuracy with different encoding procedures. Therefore, splice site prediction with short sequence motifs and without encoding sequence data became a motivation for the present study. RESULTS: An approach for finding association among nucleotide bases in the splice site motifs is developed and used further to determine the appropriate window size. Besides, an approach for prediction of donor splice sites using sum of absolute error criterion has also been proposed. The proposed approach has been compared with commonly used approaches i.e., Maximum Entropy Modeling (MEM), Maximal Dependency Decomposition (MDD), Weighted Matrix Method (WMM) and Markov Model of first order (MM1) and was found to perform equally with MEM and MDD and better than WMM and MM1 in terms of prediction accuracy. CONCLUSIONS: The proposed prediction approach can be used in the prediction of donor splice sites with higher accuracy using short sequence motifs and hence can be used as a complementary method to the existing approaches. Based on the proposed methodology, a web server was also developed for easy prediction of donor splice sites by users and is available at http://cabgrid.res.in:8080/sspred .


Asunto(s)
Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Modelos Estadísticos , Motivos de Nucleótidos/genética , Sitios de Empalme de ARN/genética , Empalme del ARN/genética , Análisis de Secuencia de ARN/métodos , Inteligencia Artificial , Secuencia de Bases , Biología Computacional , Humanos , Datos de Secuencia Molecular , Curva ROC
12.
Genes (Basel) ; 15(4)2024 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-38674383

RESUMEN

MicroRNAs (miRNAs) are small non-coding conserved molecules with lengths varying between 18-25nt. Plants miRNAs are very stable, and probably they might have been transferred across kingdoms via food intake. Such miRNAs are also called exogenous miRNAs, which regulate the gene expression in host organisms. The miRNAs present in the cluster bean, a drought tolerant legume crop having high commercial value, might have also played a regulatory role for the genes involved in nutrients synthesis or disease pathways in animals including humans due to dietary intake of plant parts of cluster beans. However, the predictive role of miRNAs of cluster beans for gene-disease association across kingdoms such as cattle and humans are not yet fully explored. Thus, the aim of the present study is to (i) find out the cluster bean miRNAs (cb-miRs) functionally similar to miRNAs of cattle and humans and predict their target genes' involvement in the occurrence of complex diseases, and (ii) identify the role of cb-miRs that are functionally non-similar to the miRNAs of cattle and humans and predict their targeted genes' association with complex diseases in host systems. Here, we predicted a total of 33 and 15 functionally similar cb-miRs (fs-cb-miRs) to human and cattle miRNAs, respectively. Further, Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis revealed the participation of targeted genes of fs-cb-miRs in 24 and 12 different pathways in humans and cattle, respectively. Few targeted genes in humans like LCP2, GABRA6, and MYH14 were predicted to be associated with disease pathways of Yesinia infection (hsa05135), neuroactive ligand-receptor interaction (hsa04080), and pathogenic Escherichia coli infection (hsa05130), respectively. However, targeted genes of fs-cb-miRs in humans like KLHL20, TNS1, and PAPD4 are associated with Alzheimer's, malignant tumor of the breast, and hepatitis C virus infection disease, respectively. Similarly, in cattle, targeted genes like ATG2B and DHRS11 of fs-cb-miRs participate in the pathways of Huntington disease and steroid biosynthesis, respectively. Additionally, the targeted genes like SURF4 and EDME2 of fs-cb-miRs are associated with mastitis and bovine osteoporosis, respectively. We also found a few cb-miRs that do not have functional similarity with human and cattle miRNAs but are found to target the genes in the host organisms and as well being associated with human and cattle diseases. Interestingly, a few genes such as NRM, PTPRE and SUZ12 were observed to be associated with Rheumatoid Arthritis, Asthma and Endometrial Stromal Sarcoma diseases, respectively, in humans and genes like SCNN1B associated with renal disease in cattle.


Asunto(s)
MicroARNs , Bovinos , Animales , MicroARNs/genética , Humanos , Cyamopsis/genética , ARN de Planta/genética , Enfermedades de los Bovinos/genética
13.
Genes (Basel) ; 14(5)2023 05 14.
Artículo en Inglés | MEDLINE | ID: mdl-37239442

RESUMEN

The rapidly evolving high-throughput sequencing (HTS) technologies generate voluminous genomic and metagenomic sequences, which can help classify the microbial communities with high accuracy in many ecosystems. Conventionally, the rule-based binning techniques are used to classify the contigs or scaffolds based on either sequence composition or sequence similarity. However, the accurate classification of the microbial communities remains a major challenge due to massive data volumes at hand as well as a requirement of efficient binning methods and classification algorithms. Therefore, we attempted here to implement iterative K-Means clustering for the initial binning of metagenomics sequences and applied various machine learning algorithms (MLAs) to classify the newly identified unknown microbes. The cluster annotation was achieved through the BLAST program of NCBI, which resulted in the grouping of assembled scaffolds into five classes, i.e., bacteria, archaea, eukaryota, viruses and others. The annotated cluster sequences were used to train machine learning algorithms (MLAs) to develop prediction models to classify unknown metagenomic sequences. In this study, we used metagenomic datasets of samples collected from the Ganga (Kanpur and Farakka) and the Yamuna (Delhi) rivers in India for clustering and training the MLA models. Further, the performance of MLAs was evaluated by 10-fold cross validation. The results revealed that the developed model based on the Random Forest had a superior performance compared to the other considered learning algorithms. The proposed method can be used for annotating the metagenomic scaffolds/contigs being complementary to existing methods of metagenomic data analysis. An offline predictor source code with the best prediction model is available at (https://github.com/Nalinikanta7/metagenomics).


Asunto(s)
Microbiota , Ríos , Programas Informáticos , Aprendizaje Automático , Metagenoma/genética , Microbiota/genética
14.
Plant Genome ; : e20259, 2022 Sep 13.
Artículo en Inglés | MEDLINE | ID: mdl-36098562

RESUMEN

One of the thrust areas of research in plant breeding is to develop crop cultivars with enhanced tolerance to abiotic stresses. Thus, identifying abiotic stress-responsive genes (SRGs) and proteins is important for plant breeding research. However, identifying such genes via established genetic approaches is laborious and resource intensive. Although transcriptome profiling has remained a reliable method of SRG identification, it is species specific. Additionally, identifying multistress responsive genes using gene expression studies is cumbersome. Thus, endorsing the need to develop a computational method for identifying the genes associated with different abiotic stresses. In this work, we aimed to develop a computational model for identifying genes responsive to six abiotic stresses: cold, drought, heat, light, oxidative, and salt. The predictions were performed using support vector machine (SVM), random forest, adaptive boosting (ADB), and extreme gradient boosting (XGB), where the autocross covariance (ACC) and K-mer compositional features were used as input. With ACC, K-mer, and ACC + K-mer compositional features, the overall accuracy of ∼60-77, ∼75-86, and ∼61-78% were respectively obtained using the SVM algorithm with fivefold cross-validation. The SVM also achieved higher accuracy than the other three algorithms. The proposed model was also assessed with an independent dataset and obtained an accuracy consistent with cross-validation. The proposed model is the first of its kind and is expected to serve the requirement of experimental biologists; however, the prediction accuracy was modest. Given its importance for the research community, the online prediction application, ASRpro, is made freely available (https://iasri-sg.icar.gov.in/asrpro/) for predicting abiotic SRGs and proteins.

15.
Plants (Basel) ; 11(15)2022 Jul 28.
Artículo en Inglés | MEDLINE | ID: mdl-35956445

RESUMEN

Wheat leaf rust caused by Puccinia triticina Eriks is an important disease that causes yield losses of up to 40% in susceptible varieties. Tetraploid emmer wheat (T. turgidum ssp. Dicoccum), commonly called Khapli wheat in India, is known to have evolved from wild emmer (Triticum turgidum var. dicoccoides), and harbors a good number of leaf rust resistance genes. In the present study, we are reporting on the screening of one hundred and twenty-three dicoccum wheat germplasm accessions against the leaf rust pathotype 77-5. Among these, an average of 45.50% of the germplasms were resistant, 46.74% were susceptible, and 8.53% had mesothetic reactions. Further, selected germplasm lines with accession numbers IC138898, IC47022, IC535116, IC535133, IC535139, IC551396, and IC534144 showed high level of resistance against the eighteen prevalent pathotypes. The infection type varied from ";", ";N", ";N1" to ";NC". PCR-based analysis of the resistant dicoccum lines with SSR marker gwm508 linked to the Lr53 gene, a leaf rust resistance gene effective against all the prevalent pathotypes of leaf rust in India and identified from a T. turgidum var. dicoccoides germplasm, indicated that Lr53 is not present in the selected accessions. Moreover, we have also generated 35K SNP genotyping data of seven lines and the susceptible control, Mandsaur Local, to study their relationships. The GDIRT tool based on homozygous genotypic differences revealed that the seven genotypes are unique to each other and may carry different resistance genes for leaf rust.

16.
3 Biotech ; 11(6): 305, 2021 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-34194898

RESUMEN

Protein-protein interactions of Interleukin-17 (IL17) play vital role in the autoimmune and inflammatory diseases, such as rheumatoid arthritis, multiple sclerosis, and psoriasis. Potent therapeutics for these diseases could be developed by blocking or modulating these interactions through biologics, peptide inhibitors and small molecule inhibitors. Unlike biologics, peptide inhibitors are cost effective and can be orally available. Peptide inhibitors do not require a binding groove as that of small molecules either. Therefore, crystal structure of IL17A in complex with a high affinity peptide inhibitor (HAP) (1-IHVTIPADLWDWIN-14) is investigated with an aim to find hot spots that could improve its potency. An in silico mutagenesis strategy was implemented using FoldX PSSM to scan for positions tolerant to amino acid substitution. Three positions T4, A7, and N14 showed improved stability when mutated with 'F/M/Y', 'P' and 'F/M/Y', respectively. A set of 31 mutant peptides are designed through combinations of these tolerant mutations using Build Model application of FoldX. Binding affinity and interactions of 31 peptides are assessed through protein-peptide docking and binding free energy calculations. Two peptides namely, P1 ("1-IHVTIPPDLWDWIY-14") and P2 ("1-IHVMIPPDLWDWIF-14") showed better binding affinity to IL17A dimerization site compared to HAP. Interactions of P1, P2 and HAP are also analyzed through 100 ns molecular dynamics simulations using GROMACS v5.0. The results revealed that the P2 peptide likely to offer better potency compared to HAP and P1. Therefore, the P2 peptide can be synthesized to develop oral therapies for autoimmune and inflammatory diseases with further experimental evaluations. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s13205-021-02856-y.

17.
Front Genet ; 12: 782366, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-35222517

RESUMEN

Globally, sodicity is one of the major abiotic stresses limiting the wheat productivity in arid and semi-arid regions. With due consideration, an investigation of the complex gene network associated with sodicity stress tolerance is required to identify transcriptional changes in plants during abiotic stress conditions. For this purpose, we sequenced the flag leaf transcriptome of a highly tolerant bread wheat germplasm (KRL 3-4) in order to extend our knowledge and better understanding of the molecular basis of sodicity tolerance. A total of 1,980 genes were differentially expressed in the flag leaf due to sodicity stress. Among these genes, 872 DEGs were upregulated and 1,108 were downregulated. Furthermore, annotation of DEGs revealed that a total of 1,384 genes were assigned to 2,267 GO terms corresponding to 502 (biological process), 638 (cellular component), and 1,127 (molecular function). GO annotation also revealed the involvement of genes related to several transcription factors; the important ones are expansins, peroxidase, glutathione-S-transferase, and metal ion transporters in response to sodicity. Additionally, from 127 KEGG pathways, only 40 were confidently enriched at a p-value <0.05 covering the five main KEGG categories of metabolism, i.e., environmental information processing, genetic information processing, organismal systems, and cellular processes. Most enriched pathways were prioritized using MapMan software and revealed that lipid metabolism, nutrient uptake, and protein homeostasis were paramount. We have also found 39 SNPs that mapped to the important sodicity stress-responsive genes associated with various pathways such as ROS scavenging, serine/threonine protein kinase, calcium signaling, and metal ion transporters. In a nutshell, only 19 important candidate genes contributing to sodicity tolerance in bread wheat were identified, and these genes might be helpful for better understanding and further improvement of sodicity tolerance in bread wheat.

18.
Sci Rep ; 10(1): 21593, 2020 12 09.
Artículo en Inglés | MEDLINE | ID: mdl-33299096

RESUMEN

Foot-and-mouth disease (FMD) endangers a large number of livestock populations across the globe being a highly contagious viral infection in wild and domestic cloven-hoofed animals. It adversely affects the socioeconomic status of millions of households. Vaccination has been used to protect animals against FMD virus (FMDV) to some extent but the effectiveness of available vaccines has been decreased due to high genetic variability in the FMDV genome. Another key aspect that the current vaccines are not favored is they do not provide the ability to differentiate between infected and vaccinated animals. Thus, RNA interference (RNAi) being a potential strategy to control virus replication, has opened up a new avenue for controlling the viral transmission. Hence, an attempt has been made here to establish the role of RNAi in therapeutic developments for FMD by computationally identifying (i) microRNA (miRNA) targets in FMDV using target prediction algorithms, (ii) targetable genomic regions in FMDV based on their dissimilarity with the host genome and, (iii) plausible anti-FMDV miRNA-like simulated nucleotide sequences (SNSs). The results revealed 12 mature host miRNAs that have 284 targets in 98 distinct FMDV genomic sequences. Wet-lab validation for anti-FMDV properties of 8 host miRNAs was carried out and all were observed to confer variable magnitude of antiviral effect. In addition, 14 miRBase miRNAs were found with better target accessibility in FMDV than that of Bos taurus. Further, 8 putative targetable regions having sense strand properties of siRNAs were identified on FMDV genes that are highly dissimilar with the host genome. A total of 16 SNSs having > 90% identity with mature miRNAs were also identified that have targets in FMDV genes. The information generated from this study is populated at http://bioinformatics.iasri.res.in/fmdisc/ to cater the needs of biologists, veterinarians and animal scientists working on FMD.


Asunto(s)
Enfermedades de los Bovinos/terapia , Fiebre Aftosa/terapia , Tratamiento con ARN de Interferencia , Algoritmos , Animales , Bovinos , Enfermedades de los Bovinos/genética , Biología Computacional , Fiebre Aftosa/genética , Virus de la Fiebre Aftosa/genética
19.
J Biomol Struct Dyn ; 37(10): 2641-2651, 2019 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-30051760

RESUMEN

Investigating the behaviour of bio-molecules through computational mutagenesis is gaining interest to facilitate the development of new therapeutic solutions for infectious diseases. The antigenetically variant genotypes of foot and mouth disease virus (FMDV) and their subsequent infections are challenging to tackle with traditional vaccination. In such scenario, neutralizing antibodies might provide an alternate solution to manage the FMDV infection. Thus, we have analysed the interaction of the mAb 4C4 with a synthetic G-H loop of FMDV-VP1 through in silico mutagenesis and molecular modelling. Initially, a set of 25,434 mutants were designed and the mutants having better energetic stability than 4C4 were clustered based on sequence identity. The best mutant representing each cluster was selected and evaluated for its binding affinity with the antigen in terms of docking scores, interaction energy and binding energy. Six mutants have confirmed better binding affinities towards the antigen than 4C4. Further, interaction of these mutants with the natural G-H loop that is bound to mAb SD6 was also evaluated. One 4C4 variant having mutations at the positions 2034(N→L), 2096(N→C), 2098(D→Y), 2532(T→K) and 2599(A→G) has revealed better binding affinities towards both the synthetic and natural G-H loops than 4C4 and SD6, respectively. A molecular dynamic simulation for 50 ns was conducted for mutant and wild-type antibody structures which supported the pre-simulation results. Therefore, these mutations on mAb 4C4 are believed to provide a better antibody-based therapeutic option for FMD. Communicated by Ramaswamy H. Sarma.


Asunto(s)
Anticuerpos Monoclonales/química , Anticuerpos Neutralizantes/química , Antivirales/química , Proteínas de la Cápside/química , Modelos Moleculares , Conformación Proteica , Secuencia de Aminoácidos , Anticuerpos Monoclonales/genética , Anticuerpos Monoclonales/farmacología , Anticuerpos Neutralizantes/genética , Anticuerpos Neutralizantes/farmacología , Complejo Antígeno-Anticuerpo/química , Antivirales/farmacología , Sitios de Unión , Proteínas de la Cápside/antagonistas & inhibidores , Descubrimiento de Drogas , Simulación del Acoplamiento Molecular , Simulación de Dinámica Molecular , Mutagénesis Sitio-Dirigida , Unión Proteica , Relación Estructura-Actividad
20.
Gene ; 705: 113-126, 2019 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-31009682

RESUMEN

Identification of splice sites is imperative for prediction of gene structure. Machine learning-based approaches (MLAs) have been reported to be more successful than the rule-based methods for identification of splice sites. However, the strings of alphabets should be transformed into numeric features through sequence encoding before using them as input in MLAs. In this study, we evaluated the performances of 8 different sequence encoding schemes i.e., Bayes kernel, density and sparse (DS), distribution of tri-nucleotide and 1st order Markov model (DM), frequency difference distance measure (FDDM), paired-nucleotide frequency difference between true and false sites (FDTF), 1st order Markov model (MM1), combination of both 1st and 2nd order Markov model (MM1 + MM2) and 2nd order Markov model (MM2) in respect of predicting donor and acceptor splice sites using 5 supervised learning methods (ANN, Bagging, Boosting, RF and SVM). The encoding schemes and machine learning methods were first evaluated in 4 species i.e., A. thaliana, C. elegans, D. melanogaster and H. sapiens, and then performances were validated with another four species i.e., Ciona intestinalis, Dictyostelium discoideum, Phaeodactylum tricornutum and Trypanosoma brucei. In terms of ROC (receiver-operating-characteristics) and PR (precision-recall) curves, FDTF encoding approach achieved higher accuracy followed by either MM2 or FDDM. Further, SVM was found to achieve higher accuracy (in terms of ROC and PR curves) followed by RF across encoding schemes and species. In terms of prediction accuracy across species, the SVM-FDTF combination was optimum than other combinations of classifiers and encoding schemes. Further, splice site prediction accuracies were observed higher for the species with low intron density. To our limited knowledge, this is the first attempt as far as comprehensive evaluation of sequence encoding schemes for prediction of splice sites is concerned. We have also developed an R-package EncDNA (https://cran.r-project.org/web/packages/EncDNA/index.html) for encoding of splice site motifs with different encoding schemes, which is expected to supplement the existing nucleotide sequence encoding approaches. This study is believed to be useful for the computational biologists for predicting different functional elements on the genomic DNA.


Asunto(s)
Biología Computacional/métodos , Sitios de Empalme de ARN , ARN Mensajero/metabolismo , Algoritmos , Animales , Arabidopsis , Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Aprendizaje Automático , Empalme del ARN , Curva ROC , Trypanosoma brucei brucei/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA