Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 28
Filter
Add more filters










Publication year range
1.
BMC Plant Biol ; 24(1): 379, 2024 May 08.
Article in English | MEDLINE | ID: mdl-38720284

ABSTRACT

BACKGROUND: Rice bean (Vigna umbellata), an underrated legume, adapts to diverse climatic conditions with the potential to support food and nutritional security worldwide. It is used as a vegetable, minor food crop and a fodder crop, being a rich source of proteins, minerals, and essential fatty acids. However, little effort has been made to decipher the genetic and molecular basis of various useful traits in this crop. Therefore, we considered three economically important traits i.e., flowering, maturity and seed weight of rice bean and identified the associated candidate genes employing an associative transcriptomics approach on 100 diverse genotypes out of 1800 evaluated rice bean accessions from the Indian National Genebank. RESULTS: The transcriptomics-based genotyping of one-hundred diverse rice bean cultivars followed by pre-processing of genotypic data resulted in 49,271 filtered markers. The STRUCTURE, PCA and Neighbor-Joining clustering of 100 genotypes revealed three putative sub-populations. The marker-trait association analysis involving various genome-wide association study (GWAS) models revealed significant association of 82 markers on 48 transcripts for flowering, 26 markers on 22 transcripts for maturity and 22 markers on 21 transcripts for seed weight. The transcript annotation provided information on the putative candidate genes for the considered traits. The candidate genes identified for flowering include HSC80, P-II PsbX, phospholipid-transporting-ATPase-9, pectin-acetylesterase-8 and E3-ubiquitin-protein-ligase-RHG1A. Further, the WRKY1 and DEAD-box-RH27 were found to be associated with seed weight. Furthermore, the associations of PIF3 and pentatricopeptide-repeat-containing-gene with maturity and seed weight, and aldo-keto-reductase with flowering and maturity were revealed. CONCLUSION: This study offers insights into the genetic basis of key agronomic traits in rice bean, including flowering, maturity, and seed weight. The identified markers and associated candidate genes provide valuable resources for future exploration and targeted breeding, aiming to enhance the agronomic performance of rice bean cultivars. Notably, this research represents the first transcriptome-wide association study in pulse crop, uncovering the candidate genes for agronomically useful traits.


Subject(s)
Flowers , Genome-Wide Association Study , Seeds , Transcriptome , Seeds/genetics , Seeds/growth & development , Flowers/genetics , Flowers/growth & development , Vigna/genetics , Vigna/growth & development , Genes, Plant , Genotype , Gene Expression Profiling , Chromosome Mapping , Quantitative Trait Loci/genetics , Phenotype
2.
Genes (Basel) ; 15(4)2024 04 01.
Article in English | MEDLINE | ID: mdl-38674383

ABSTRACT

MicroRNAs (miRNAs) are small non-coding conserved molecules with lengths varying between 18-25nt. Plants miRNAs are very stable, and probably they might have been transferred across kingdoms via food intake. Such miRNAs are also called exogenous miRNAs, which regulate the gene expression in host organisms. The miRNAs present in the cluster bean, a drought tolerant legume crop having high commercial value, might have also played a regulatory role for the genes involved in nutrients synthesis or disease pathways in animals including humans due to dietary intake of plant parts of cluster beans. However, the predictive role of miRNAs of cluster beans for gene-disease association across kingdoms such as cattle and humans are not yet fully explored. Thus, the aim of the present study is to (i) find out the cluster bean miRNAs (cb-miRs) functionally similar to miRNAs of cattle and humans and predict their target genes' involvement in the occurrence of complex diseases, and (ii) identify the role of cb-miRs that are functionally non-similar to the miRNAs of cattle and humans and predict their targeted genes' association with complex diseases in host systems. Here, we predicted a total of 33 and 15 functionally similar cb-miRs (fs-cb-miRs) to human and cattle miRNAs, respectively. Further, Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis revealed the participation of targeted genes of fs-cb-miRs in 24 and 12 different pathways in humans and cattle, respectively. Few targeted genes in humans like LCP2, GABRA6, and MYH14 were predicted to be associated with disease pathways of Yesinia infection (hsa05135), neuroactive ligand-receptor interaction (hsa04080), and pathogenic Escherichia coli infection (hsa05130), respectively. However, targeted genes of fs-cb-miRs in humans like KLHL20, TNS1, and PAPD4 are associated with Alzheimer's, malignant tumor of the breast, and hepatitis C virus infection disease, respectively. Similarly, in cattle, targeted genes like ATG2B and DHRS11 of fs-cb-miRs participate in the pathways of Huntington disease and steroid biosynthesis, respectively. Additionally, the targeted genes like SURF4 and EDME2 of fs-cb-miRs are associated with mastitis and bovine osteoporosis, respectively. We also found a few cb-miRs that do not have functional similarity with human and cattle miRNAs but are found to target the genes in the host organisms and as well being associated with human and cattle diseases. Interestingly, a few genes such as NRM, PTPRE and SUZ12 were observed to be associated with Rheumatoid Arthritis, Asthma and Endometrial Stromal Sarcoma diseases, respectively, in humans and genes like SCNN1B associated with renal disease in cattle.


Subject(s)
MicroRNAs , Cattle , Animals , MicroRNAs/genetics , Humans , Cyamopsis/genetics , RNA, Plant/genetics , Cattle Diseases/genetics
3.
Genes (Basel) ; 14(5)2023 05 14.
Article in English | MEDLINE | ID: mdl-37239442

ABSTRACT

The rapidly evolving high-throughput sequencing (HTS) technologies generate voluminous genomic and metagenomic sequences, which can help classify the microbial communities with high accuracy in many ecosystems. Conventionally, the rule-based binning techniques are used to classify the contigs or scaffolds based on either sequence composition or sequence similarity. However, the accurate classification of the microbial communities remains a major challenge due to massive data volumes at hand as well as a requirement of efficient binning methods and classification algorithms. Therefore, we attempted here to implement iterative K-Means clustering for the initial binning of metagenomics sequences and applied various machine learning algorithms (MLAs) to classify the newly identified unknown microbes. The cluster annotation was achieved through the BLAST program of NCBI, which resulted in the grouping of assembled scaffolds into five classes, i.e., bacteria, archaea, eukaryota, viruses and others. The annotated cluster sequences were used to train machine learning algorithms (MLAs) to develop prediction models to classify unknown metagenomic sequences. In this study, we used metagenomic datasets of samples collected from the Ganga (Kanpur and Farakka) and the Yamuna (Delhi) rivers in India for clustering and training the MLA models. Further, the performance of MLAs was evaluated by 10-fold cross validation. The results revealed that the developed model based on the Random Forest had a superior performance compared to the other considered learning algorithms. The proposed method can be used for annotating the metagenomic scaffolds/contigs being complementary to existing methods of metagenomic data analysis. An offline predictor source code with the best prediction model is available at (https://github.com/Nalinikanta7/metagenomics).


Subject(s)
Microbiota , Rivers , Software , Machine Learning , Metagenome/genetics , Microbiota/genetics
4.
Funct Integr Genomics ; 23(2): 169, 2023 May 20.
Article in English | MEDLINE | ID: mdl-37209309

ABSTRACT

Stripe rust (Sr), caused by Puccinia striiformis f. sp. tritici (Pst), is the most devastating disease that poses serious threat to the wheat-growing nations across the globe. Developing resistant cultivars is the most challenging aspect in wheat breeding. The function of resistance genes (R genes) and the mechanisms by which they influence plant-host interactions are poorly understood. In the present investigation, comparative transcriptome analysis was carried out by involving two near-isogenic lines (NILs) PBW343 and FLW29. The seedlings of both the genotypes were inoculated with Pst pathotype 46S119. In total, 1106 differentially expressed genes (DEGs) were identified at early stage of infection (12 hpi), whereas expressions of 877 and 1737 DEGs were observed at later stages (48 and 72 hpi) in FLW29. The identified DEGs were comprised of defense-related genes including putative R genes, 7 WRKY transcriptional factors, calcium, and hormonal signaling associated genes. Moreover, pathways involved in signaling of receptor kinases, G protein, and light showed higher expression in resistant cultivar and were common across different time points. Quantitative real-time PCR was used to further confirm the transcriptional expression of eight critical genes involved in plant defense mechanism against stripe rust. The information about genes are likely to improve our knowledge of the genetic mechanism that controls the stripe rust resistance in wheat, and data on resistance response-linked genes and pathways will be a significant resource for future research.


Subject(s)
Basidiomycota , Triticum , Triticum/genetics , Plant Breeding , Basidiomycota/genetics , Genotype , Gene Expression Profiling , Plant Diseases/genetics , Disease Resistance/genetics
5.
Plant Genome ; : e20259, 2022 Sep 13.
Article in English | MEDLINE | ID: mdl-36098562

ABSTRACT

One of the thrust areas of research in plant breeding is to develop crop cultivars with enhanced tolerance to abiotic stresses. Thus, identifying abiotic stress-responsive genes (SRGs) and proteins is important for plant breeding research. However, identifying such genes via established genetic approaches is laborious and resource intensive. Although transcriptome profiling has remained a reliable method of SRG identification, it is species specific. Additionally, identifying multistress responsive genes using gene expression studies is cumbersome. Thus, endorsing the need to develop a computational method for identifying the genes associated with different abiotic stresses. In this work, we aimed to develop a computational model for identifying genes responsive to six abiotic stresses: cold, drought, heat, light, oxidative, and salt. The predictions were performed using support vector machine (SVM), random forest, adaptive boosting (ADB), and extreme gradient boosting (XGB), where the autocross covariance (ACC) and K-mer compositional features were used as input. With ACC, K-mer, and ACC + K-mer compositional features, the overall accuracy of ∼60-77, ∼75-86, and ∼61-78% were respectively obtained using the SVM algorithm with fivefold cross-validation. The SVM also achieved higher accuracy than the other three algorithms. The proposed model was also assessed with an independent dataset and obtained an accuracy consistent with cross-validation. The proposed model is the first of its kind and is expected to serve the requirement of experimental biologists; however, the prediction accuracy was modest. Given its importance for the research community, the online prediction application, ASRpro, is made freely available (https://iasri-sg.icar.gov.in/asrpro/) for predicting abiotic SRGs and proteins.

6.
Brief Bioinform ; 23(5)2022 09 20.
Article in English | MEDLINE | ID: mdl-36040109

ABSTRACT

Maintaining duplicate germplasms in genebanks hampers effective conservation and utilization of genebank resources. The redundant germplasm adds to the cost of germplasm conservation by requiring a large proportion of the genebank financial resources towards conservation rather than enriching the diversity. Besides, genome-wide-association analysis using an association panel with over-represented germplasms can be biased resulting in spurious marker-trait associations. The conventional methods of germplasm duplicate removal using passport information suffer from incomplete or missing passport information and data handling errors at various stages of germplasm enrichment. This limitation is less likely in the case of genotypic data. Therefore, we developed a web-based tool, Germplasm Duplicate Identification and Removal Tool (G-DIRT), which allows germplasm duplicate identification based on identity-by-state analysis using single-nucleotide polymorphism genotyping information along with pre-processing of genotypic data. A homozygous genotypic difference threshold of 0.1% for germplasm duplicates has been determined using tetraploid wheat genotypic data with 94.97% of accuracy. Based on the genotypic difference, the tool also builds a dendrogram that can visually depict the relationship between genotypes. To overcome the constraint of high-dimensional genotypic data, an offline version of G-DIRT in the interface of R has also been developed. The G-DIRT is expected to help genebank curators, breeders and other researchers across the world in identifying germplasm duplicates from the global genebank collections by only using the easily sharable genotypic data instead of physically exchanging the seeds or propagating materials. The web server will complement the existing methods of germplasm duplicate identification based on passport or phenotypic information being freely accessible at http://webtools.nbpgr.ernet.in/gdirt/.


Subject(s)
Genome-Wide Association Study , Polymorphism, Single Nucleotide , Genotype , Seeds/genetics
7.
Brief Bioinform ; 23(5)2022 09 20.
Article in English | MEDLINE | ID: mdl-35998895

ABSTRACT

Linear B-cell epitopes have a prominent role in the development of peptide-based vaccines and disease diagnosis. High variability in the length of these epitopes is a major reason for low accuracy in their prediction. Most of the B-cell epitope prediction methods considered fixed length of epitope sequences and achieved good accuracy. Though a number of tools are available for the prediction of flexible length linear B-cell epitopes with reasonable accuracy, further improvement in the prediction performance is still expected. Thus, here we made an attempt to analyze the performance of machine learning approaches (MLA) with 18 different amino acid encoding schemes in the prediction of flexible length linear B-cell epitopes. We considered B-cell epitope sequences of variable lengths (11-56 amino acids) from well-established public resources. The performances of machine learning algorithms with the encoded epitope sequence datasets were evaluated. Besides, the feasible combinations of encoding schemes were also explored and analyzed. The results revealed that amino-acid composition (AC) and distribution component of composition-transition-distribution encoding schemes are suitable for heterogeneous epitope data, whereas amino-acid-anchoring-pair-composition (APC), dipeptide-composition and amino-acids-pair-propensity-scale (APP) are more appropriate for homogeneous data. Further, two combinations of peptide encoding schemes, i.e. APC + AC and APC + APP with random forest classifier were identified to have improved performance over the state-of-the-art tools for flexible length linear B-cell epitope prediction. The study also revealed better performance of random forest over other considered MLAs in the prediction of flexible length linear B-cell epitopes.


Subject(s)
Epitopes, B-Lymphocyte , Vaccines , Amino Acids/genetics , Dipeptides , Peptides/chemistry
8.
Plants (Basel) ; 11(15)2022 Jul 28.
Article in English | MEDLINE | ID: mdl-35956445

ABSTRACT

Wheat leaf rust caused by Puccinia triticina Eriks is an important disease that causes yield losses of up to 40% in susceptible varieties. Tetraploid emmer wheat (T. turgidum ssp. Dicoccum), commonly called Khapli wheat in India, is known to have evolved from wild emmer (Triticum turgidum var. dicoccoides), and harbors a good number of leaf rust resistance genes. In the present study, we are reporting on the screening of one hundred and twenty-three dicoccum wheat germplasm accessions against the leaf rust pathotype 77-5. Among these, an average of 45.50% of the germplasms were resistant, 46.74% were susceptible, and 8.53% had mesothetic reactions. Further, selected germplasm lines with accession numbers IC138898, IC47022, IC535116, IC535133, IC535139, IC551396, and IC534144 showed high level of resistance against the eighteen prevalent pathotypes. The infection type varied from ";", ";N", ";N1" to ";NC". PCR-based analysis of the resistant dicoccum lines with SSR marker gwm508 linked to the Lr53 gene, a leaf rust resistance gene effective against all the prevalent pathotypes of leaf rust in India and identified from a T. turgidum var. dicoccoides germplasm, indicated that Lr53 is not present in the selected accessions. Moreover, we have also generated 35K SNP genotyping data of seven lines and the susceptible control, Mandsaur Local, to study their relationships. The GDIRT tool based on homozygous genotypic differences revealed that the seven genotypes are unique to each other and may carry different resistance genes for leaf rust.

9.
Physiol Mol Biol Plants ; 28(1): 1-16, 2022 Jan.
Article in English | MEDLINE | ID: mdl-35221569

ABSTRACT

In plants, GIGANTEA (GI) protein plays different biological functions including carbon and sucrose metabolism, cell wall deposition, transpiration and hypocotyl elongation. This suggests that GI is an important class of proteins. So far, the resource-intensive experimental methods have been mostly utilized for identification of GI proteins. Thus, we made an attempt in this study to develop a computational model for fast and accurate prediction of GI proteins. Ten different supervised learning algorithms i.e., SVM, RF, JRIP, J48, LMT, IBK, NB, PART, BAGG and LGB were employed for prediction, where the amino acid composition (AAC), FASGAI features and physico-chemical (PHYC) properties were used as numerical inputs for the learning algorithms. Higher accuracies i.e., 96.75% of AUC-ROC and 86.7% of AUC-PR were observed for SVM coupled with AAC + PHYC feature combination, while evaluated with five-fold cross validation. With leave-one-out cross validation, 97.29% of AUC-ROC and 87.89% of AUC-PR were respectively achieved. While the performance of the model was evaluated with an independent dataset of 18 GI sequences, 17 were observed as correctly predicted. We have also performed proteome-wide identification of GI proteins in wheat, followed by functional annotation using Gene Ontology terms. A prediction server "GIpred" is freely accessible at http://cabgrid.res.in:8080/gipred/ for proteome-wide recognition of GI proteins. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s12298-022-01130-6.

10.
Int J Mol Sci ; 23(3)2022 Jan 30.
Article in English | MEDLINE | ID: mdl-35163534

ABSTRACT

MicroRNAs (miRNAs) play a significant role in plant response to different abiotic stresses. Thus, identification of abiotic stress-responsive miRNAs holds immense importance in crop breeding programmes to develop cultivars resistant to abiotic stresses. In this study, we developed a machine learning-based computational method for prediction of miRNAs associated with abiotic stresses. Three types of datasets were used for prediction, i.e., miRNA, Pre-miRNA, and Pre-miRNA + miRNA. The pseudo K-tuple nucleotide compositional features were generated for each sequence to transform the sequence data into numeric feature vectors. Support vector machine (SVM) was employed for prediction. The area under receiver operating characteristics curve (auROC) of 70.21, 69.71, 77.94 and area under precision-recall curve (auPRC) of 69.96, 65.64, 77.32 percentages were obtained for miRNA, Pre-miRNA, and Pre-miRNA + miRNA datasets, respectively. Overall prediction accuracies for the independent test set were 62.33, 64.85, 69.21 percentages, respectively, for the three datasets. The SVM also achieved higher accuracy than other learning methods such as random forest, extreme gradient boosting, and adaptive boosting. To implement our method with ease, an online prediction server "ASRmiRNA" has been developed. The proposed approach is believed to supplement the existing effort for identification of abiotic stress-responsive miRNAs and Pre-miRNAs.


Subject(s)
Computational Biology/methods , MicroRNAs/genetics , Plants/genetics , Algorithms , Area Under Curve , Gene Expression Regulation, Plant , RNA, Plant/genetics , Stress, Physiological , Support Vector Machine
11.
3 Biotech ; 11(6): 305, 2021 Jun.
Article in English | MEDLINE | ID: mdl-34194898

ABSTRACT

Protein-protein interactions of Interleukin-17 (IL17) play vital role in the autoimmune and inflammatory diseases, such as rheumatoid arthritis, multiple sclerosis, and psoriasis. Potent therapeutics for these diseases could be developed by blocking or modulating these interactions through biologics, peptide inhibitors and small molecule inhibitors. Unlike biologics, peptide inhibitors are cost effective and can be orally available. Peptide inhibitors do not require a binding groove as that of small molecules either. Therefore, crystal structure of IL17A in complex with a high affinity peptide inhibitor (HAP) (1-IHVTIPADLWDWIN-14) is investigated with an aim to find hot spots that could improve its potency. An in silico mutagenesis strategy was implemented using FoldX PSSM to scan for positions tolerant to amino acid substitution. Three positions T4, A7, and N14 showed improved stability when mutated with 'F/M/Y', 'P' and 'F/M/Y', respectively. A set of 31 mutant peptides are designed through combinations of these tolerant mutations using Build Model application of FoldX. Binding affinity and interactions of 31 peptides are assessed through protein-peptide docking and binding free energy calculations. Two peptides namely, P1 ("1-IHVTIPPDLWDWIY-14") and P2 ("1-IHVMIPPDLWDWIF-14") showed better binding affinity to IL17A dimerization site compared to HAP. Interactions of P1, P2 and HAP are also analyzed through 100 ns molecular dynamics simulations using GROMACS v5.0. The results revealed that the P2 peptide likely to offer better potency compared to HAP and P1. Therefore, the P2 peptide can be synthesized to develop oral therapies for autoimmune and inflammatory diseases with further experimental evaluations. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s13205-021-02856-y.

12.
Front Genet ; 12: 782366, 2021.
Article in English | MEDLINE | ID: mdl-35222517

ABSTRACT

Globally, sodicity is one of the major abiotic stresses limiting the wheat productivity in arid and semi-arid regions. With due consideration, an investigation of the complex gene network associated with sodicity stress tolerance is required to identify transcriptional changes in plants during abiotic stress conditions. For this purpose, we sequenced the flag leaf transcriptome of a highly tolerant bread wheat germplasm (KRL 3-4) in order to extend our knowledge and better understanding of the molecular basis of sodicity tolerance. A total of 1,980 genes were differentially expressed in the flag leaf due to sodicity stress. Among these genes, 872 DEGs were upregulated and 1,108 were downregulated. Furthermore, annotation of DEGs revealed that a total of 1,384 genes were assigned to 2,267 GO terms corresponding to 502 (biological process), 638 (cellular component), and 1,127 (molecular function). GO annotation also revealed the involvement of genes related to several transcription factors; the important ones are expansins, peroxidase, glutathione-S-transferase, and metal ion transporters in response to sodicity. Additionally, from 127 KEGG pathways, only 40 were confidently enriched at a p-value <0.05 covering the five main KEGG categories of metabolism, i.e., environmental information processing, genetic information processing, organismal systems, and cellular processes. Most enriched pathways were prioritized using MapMan software and revealed that lipid metabolism, nutrient uptake, and protein homeostasis were paramount. We have also found 39 SNPs that mapped to the important sodicity stress-responsive genes associated with various pathways such as ROS scavenging, serine/threonine protein kinase, calcium signaling, and metal ion transporters. In a nutshell, only 19 important candidate genes contributing to sodicity tolerance in bread wheat were identified, and these genes might be helpful for better understanding and further improvement of sodicity tolerance in bread wheat.

13.
Sci Rep ; 10(1): 21593, 2020 12 09.
Article in English | MEDLINE | ID: mdl-33299096

ABSTRACT

Foot-and-mouth disease (FMD) endangers a large number of livestock populations across the globe being a highly contagious viral infection in wild and domestic cloven-hoofed animals. It adversely affects the socioeconomic status of millions of households. Vaccination has been used to protect animals against FMD virus (FMDV) to some extent but the effectiveness of available vaccines has been decreased due to high genetic variability in the FMDV genome. Another key aspect that the current vaccines are not favored is they do not provide the ability to differentiate between infected and vaccinated animals. Thus, RNA interference (RNAi) being a potential strategy to control virus replication, has opened up a new avenue for controlling the viral transmission. Hence, an attempt has been made here to establish the role of RNAi in therapeutic developments for FMD by computationally identifying (i) microRNA (miRNA) targets in FMDV using target prediction algorithms, (ii) targetable genomic regions in FMDV based on their dissimilarity with the host genome and, (iii) plausible anti-FMDV miRNA-like simulated nucleotide sequences (SNSs). The results revealed 12 mature host miRNAs that have 284 targets in 98 distinct FMDV genomic sequences. Wet-lab validation for anti-FMDV properties of 8 host miRNAs was carried out and all were observed to confer variable magnitude of antiviral effect. In addition, 14 miRBase miRNAs were found with better target accessibility in FMDV than that of Bos taurus. Further, 8 putative targetable regions having sense strand properties of siRNAs were identified on FMDV genes that are highly dissimilar with the host genome. A total of 16 SNSs having > 90% identity with mature miRNAs were also identified that have targets in FMDV genes. The information generated from this study is populated at http://bioinformatics.iasri.res.in/fmdisc/ to cater the needs of biologists, veterinarians and animal scientists working on FMD.


Subject(s)
Cattle Diseases/therapy , Foot-and-Mouth Disease/therapy , RNAi Therapeutics , Algorithms , Animals , Cattle , Cattle Diseases/genetics , Computational Biology , Foot-and-Mouth Disease/genetics , Foot-and-Mouth Disease Virus/genetics
14.
Gene ; 705: 113-126, 2019 Jul 15.
Article in English | MEDLINE | ID: mdl-31009682

ABSTRACT

Identification of splice sites is imperative for prediction of gene structure. Machine learning-based approaches (MLAs) have been reported to be more successful than the rule-based methods for identification of splice sites. However, the strings of alphabets should be transformed into numeric features through sequence encoding before using them as input in MLAs. In this study, we evaluated the performances of 8 different sequence encoding schemes i.e., Bayes kernel, density and sparse (DS), distribution of tri-nucleotide and 1st order Markov model (DM), frequency difference distance measure (FDDM), paired-nucleotide frequency difference between true and false sites (FDTF), 1st order Markov model (MM1), combination of both 1st and 2nd order Markov model (MM1 + MM2) and 2nd order Markov model (MM2) in respect of predicting donor and acceptor splice sites using 5 supervised learning methods (ANN, Bagging, Boosting, RF and SVM). The encoding schemes and machine learning methods were first evaluated in 4 species i.e., A. thaliana, C. elegans, D. melanogaster and H. sapiens, and then performances were validated with another four species i.e., Ciona intestinalis, Dictyostelium discoideum, Phaeodactylum tricornutum and Trypanosoma brucei. In terms of ROC (receiver-operating-characteristics) and PR (precision-recall) curves, FDTF encoding approach achieved higher accuracy followed by either MM2 or FDDM. Further, SVM was found to achieve higher accuracy (in terms of ROC and PR curves) followed by RF across encoding schemes and species. In terms of prediction accuracy across species, the SVM-FDTF combination was optimum than other combinations of classifiers and encoding schemes. Further, splice site prediction accuracies were observed higher for the species with low intron density. To our limited knowledge, this is the first attempt as far as comprehensive evaluation of sequence encoding schemes for prediction of splice sites is concerned. We have also developed an R-package EncDNA (https://cran.r-project.org/web/packages/EncDNA/index.html) for encoding of splice site motifs with different encoding schemes, which is expected to supplement the existing nucleotide sequence encoding approaches. This study is believed to be useful for the computational biologists for predicting different functional elements on the genomic DNA.


Subject(s)
Computational Biology/methods , RNA Splice Sites , RNA, Messenger/metabolism , Algorithms , Animals , Arabidopsis , Caenorhabditis elegans/genetics , Drosophila melanogaster/genetics , Machine Learning , RNA Splicing , ROC Curve , Trypanosoma brucei brucei/genetics
15.
Sci Rep ; 9(1): 778, 2019 01 28.
Article in English | MEDLINE | ID: mdl-30692561

ABSTRACT

Herbicide resistance (HR) is a major concern for the agricultural producers as well as environmentalists. Resistance to commonly used herbicides are conferred due to mutation(s) in the genes encoding herbicide target sites/proteins (GETS). Identification of these genes through wet-lab experiments is time consuming and expensive. Thus, a supervised learning-based computational model has been proposed in this study, which is first of its kind for the prediction of seven classes of GETS. The cDNA sequences of the genes were initially transformed into numeric features based on the k-mer compositions and then supplied as input to the support vector machine. In the proposed SVM-based model, the prediction occurs in two stages, where a binary classifier in the first stage discriminates the genes involved in conferring the resistance to herbicides from other genes, followed by a multi-class classifier in the second stage that categorizes the predicted herbicide resistant genes in the first stage into any one of the seven resistant classes. Overall classification accuracies were observed to be ~89% and >97% for binary and multi-class classifications respectively. The proposed model confirmed higher accuracy than the homology-based algorithms viz., BLAST and Hidden Markov Model. Besides, the developed computational model achieved ~87% accuracy, while tested with an independent dataset. An online prediction server HRGPred ( http://cabgrid.res.in:8080/hrgpred ) has also been established to facilitate the prediction of GETS by the scientific community.


Subject(s)
Computational Biology/methods , Herbicide Resistance , Plant Proteins/genetics , Plants/genetics , Algorithms , Gene Expression Regulation, Plant , Models, Genetic , Sequence Analysis, DNA , Sequence Homology, Nucleic Acid , Support Vector Machine
16.
BMC Genet ; 20(1): 2, 2019 01 07.
Article in English | MEDLINE | ID: mdl-30616524

ABSTRACT

BACKGROUND: Identification of unknown fungal species aids to the conservation of fungal diversity. As many fungal species cannot be cultured, morphological identification of those species is almost impossible. But, DNA barcoding technique can be employed for identification of such species. For fungal taxonomy prediction, the ITS (internal transcribed spacer) region of rDNA (ribosomal DNA) is used as barcode. Though the computational prediction of fungal species has become feasible with the availability of huge volume of barcode sequences in public domain, prediction of fungal species is challenging due to high degree of variability among ITS regions within species. RESULTS: A Random Forest (RF)-based predictor was built for identification of unknown fungal species. The reference and query sequences were mapped onto numeric features based on gapped base pair compositions, and then used as training and test sets respectively for prediction of fungal species using RF. More than 85% accuracy was found when 4 sequences per species in the reference set were utilized; whereas it was seen to be stabilized at ~88% if ≥7 sequence per species in the reference set were used for training of the model. The proposed model achieved comparable accuracy, while evaluated against existing methods through cross-validation procedure. The proposed model also outperformed several existing models used for identification of different species other than fungi. CONCLUSIONS: An online prediction server "funbarRF" is established at http://cabgrid.res.in:8080/funbarrf/ for fungal species identification. Besides, an R-package funbarRF ( https://cran.r-project.org/web/packages/funbarRF/ ) is also available for prediction using high throughput sequence data. The effort put in this work will certainly supplement the future endeavors in the direction of fungal taxonomy assignments based on DNA barcode.


Subject(s)
Computational Biology/methods , DNA Barcoding, Taxonomic/methods , Fungi/classification , Fungi/genetics , Supervised Machine Learning , DNA, Fungal/genetics , Software
17.
J Biomol Struct Dyn ; 37(10): 2641-2651, 2019 Jul.
Article in English | MEDLINE | ID: mdl-30051760

ABSTRACT

Investigating the behaviour of bio-molecules through computational mutagenesis is gaining interest to facilitate the development of new therapeutic solutions for infectious diseases. The antigenetically variant genotypes of foot and mouth disease virus (FMDV) and their subsequent infections are challenging to tackle with traditional vaccination. In such scenario, neutralizing antibodies might provide an alternate solution to manage the FMDV infection. Thus, we have analysed the interaction of the mAb 4C4 with a synthetic G-H loop of FMDV-VP1 through in silico mutagenesis and molecular modelling. Initially, a set of 25,434 mutants were designed and the mutants having better energetic stability than 4C4 were clustered based on sequence identity. The best mutant representing each cluster was selected and evaluated for its binding affinity with the antigen in terms of docking scores, interaction energy and binding energy. Six mutants have confirmed better binding affinities towards the antigen than 4C4. Further, interaction of these mutants with the natural G-H loop that is bound to mAb SD6 was also evaluated. One 4C4 variant having mutations at the positions 2034(N→L), 2096(N→C), 2098(D→Y), 2532(T→K) and 2599(A→G) has revealed better binding affinities towards both the synthetic and natural G-H loops than 4C4 and SD6, respectively. A molecular dynamic simulation for 50 ns was conducted for mutant and wild-type antibody structures which supported the pre-simulation results. Therefore, these mutations on mAb 4C4 are believed to provide a better antibody-based therapeutic option for FMD. Communicated by Ramaswamy H. Sarma.


Subject(s)
Antibodies, Monoclonal/chemistry , Antibodies, Neutralizing/chemistry , Antiviral Agents/chemistry , Capsid Proteins/chemistry , Models, Molecular , Protein Conformation , Amino Acid Sequence , Antibodies, Monoclonal/genetics , Antibodies, Monoclonal/pharmacology , Antibodies, Neutralizing/genetics , Antibodies, Neutralizing/pharmacology , Antigen-Antibody Complex/chemistry , Antiviral Agents/pharmacology , Binding Sites , Capsid Proteins/antagonists & inhibitors , Drug Discovery , Molecular Docking Simulation , Molecular Dynamics Simulation , Mutagenesis, Site-Directed , Protein Binding , Structure-Activity Relationship
18.
BMC Bioinformatics ; 18(1): 190, 2017 Mar 24.
Article in English | MEDLINE | ID: mdl-28340571

ABSTRACT

BACKGROUND: Insecticide resistance is a major challenge for the control program of insect pests in the fields of crop protection, human and animal health etc. Resistance to different insecticides is conferred by the proteins encoded from certain class of genes of the insects. To distinguish the insecticide resistant proteins from non-resistant proteins, no computational tool is available till date. Thus, development of such a computational tool will be helpful in predicting the insecticide resistant proteins, which can be targeted for developing appropriate insecticides. RESULTS: Five different sets of feature viz., amino acid composition (AAC), di-peptide composition (DPC), pseudo amino acid composition (PAAC), composition-transition-distribution (CTD) and auto-correlation function (ACF) were used to map the protein sequences into numeric feature vectors. The encoded numeric vectors were then used as input in support vector machine (SVM) for classification of insecticide resistant and non-resistant proteins. Higher accuracies were obtained under RBF kernel than that of other kernels. Further, accuracies were observed to be higher for DPC feature set as compared to others. The proposed approach achieved an overall accuracy of >90% in discriminating resistant from non-resistant proteins. Further, the two classes of resistant proteins i.e., detoxification-based and target-based were discriminated from non-resistant proteins with >95% accuracy. Besides, >95% accuracy was also observed for discrimination of proteins involved in detoxification- and target-based resistance mechanisms. The proposed approach not only outperformed Blastp, PSI-Blast and Delta-Blast algorithms, but also achieved >92% accuracy while assessed using an independent dataset of 75 insecticide resistant proteins. CONCLUSIONS: This paper presents the first computational approach for discriminating the insecticide resistant proteins from non-resistant proteins. Based on the proposed approach, an online prediction server DIRProt has also been developed for computational prediction of insecticide resistant proteins, which is accessible at http://cabgrid.res.in:8080/dirprot/ . The proposed approach is believed to supplement the efforts needed to develop dynamic insecticides in wet-lab by targeting the insecticide resistant proteins.


Subject(s)
Cytochrome P-450 Enzyme System/metabolism , Insecticides/metabolism , Proteins/chemistry , gamma-Aminobutyric Acid/metabolism , Animals , Humans , Insecticides/analysis
19.
Sci Rep ; 7: 42362, 2017 02 13.
Article in English | MEDLINE | ID: mdl-28205576

ABSTRACT

Antimicrobial peptides (AMPs) are important components of the innate immune system that have been found to be effective against disease causing pathogens. Identification of AMPs through wet-lab experiment is expensive. Therefore, development of efficient computational tool is essential to identify the best candidate AMP prior to the in vitro experimentation. In this study, we made an attempt to develop a support vector machine (SVM) based computational approach for prediction of AMPs with improved accuracy. Initially, compositional, physico-chemical and structural features of the peptides were generated that were subsequently used as input in SVM for prediction of AMPs. The proposed approach achieved higher accuracy than several existing approaches, while compared using benchmark dataset. Based on the proposed approach, an online prediction server iAMPpred has also been developed to help the scientific community in predicting AMPs, which is freely accessible at http://cabgrid.res.in:8080/amppred/. The proposed approach is believed to supplement the tools and techniques that have been developed in the past for prediction of AMPs.


Subject(s)
Antimicrobial Cationic Peptides/chemistry , Chemical Phenomena , Support Vector Machine , Antifungal Agents/chemistry , Antiviral Agents/chemistry , Area Under Curve , Databases, Protein , ROC Curve
20.
Gene ; 592(2): 316-24, 2016 Nov 05.
Article in English | MEDLINE | ID: mdl-27393648

ABSTRACT

DNA barcoding is a molecular diagnostic method that allows automated and accurate identification of species based on a short and standardized fragment of DNA. To this end, an attempt has been made in this study to develop a computational approach for identifying the species by comparing its barcode with the barcode sequence of known species present in the reference library. Each barcode sequence was first mapped onto a numeric feature vector based on k-mer frequencies and then Random forest methodology was employed on the transformed dataset for species identification. The proposed approach outperformed similarity-based, tree-based, diagnostic-based approaches and found comparable with existing supervised learning based approaches in terms of species identification success rate, while compared using real and simulated datasets. Based on the proposed approach, an online web interface SPIDBAR has also been developed and made freely available at http://cabgrid.res.in:8080/spidbar/ for species identification by the taxonomists.


Subject(s)
DNA Barcoding, Taxonomic/methods , Software , Animals , Invertebrates/classification , Invertebrates/genetics , Plants/classification , Plants/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...