Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 36
Filtrar
1.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36416116

RESUMO

DNA-binding proteins (DBPs) play crucial roles in numerous cellular processes including nucleotide recognition, transcriptional control and the regulation of gene expression. Majority of the existing computational techniques for identifying DBPs are mainly applicable to human and mouse datasets. Even though some models have been tested on Arabidopsis, they produce poor accuracy when applied to other plant species. Therefore, it is imperative to develop an effective computational model for predicting plant DBPs. In this study, we developed a comprehensive computational model for plant specific DBPs identification. Five shallow learning and six deep learning models were initially used for prediction, where shallow learning methods outperformed deep learning algorithms. In particular, support vector machine achieved highest repeated 5-fold cross-validation accuracy of 94.0% area under receiver operating characteristic curve (AUC-ROC) and 93.5% area under precision recall curve (AUC-PR). With an independent dataset, the developed approach secured 93.8% AUC-ROC and 94.6% AUC-PR. While compared with the state-of-art existing tools by using an independent dataset, the proposed model achieved much higher accuracy. Overall results suggest that the developed computational model is more efficient and reliable as compared to the existing models for the prediction of DBPs in plants. For the convenience of the majority of experimental scientists, the developed prediction server PlDBPred is publicly accessible at https://iasri-sg.icar.gov.in/pldbpred/.The source code is also provided at https://iasri-sg.icar.gov.in/pldbpred/source_code.php for prediction using a large-size dataset.


Assuntos
Arabidopsis , Proteínas de Ligação a DNA , Algoritmos , Arabidopsis/genética , Arabidopsis/metabolismo , Biologia Computacional/métodos , Simulação por Computador , Proteínas de Ligação a DNA/genética , Proteínas de Ligação a DNA/metabolismo , Curva ROC , Software
2.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-35998895

RESUMO

Linear B-cell epitopes have a prominent role in the development of peptide-based vaccines and disease diagnosis. High variability in the length of these epitopes is a major reason for low accuracy in their prediction. Most of the B-cell epitope prediction methods considered fixed length of epitope sequences and achieved good accuracy. Though a number of tools are available for the prediction of flexible length linear B-cell epitopes with reasonable accuracy, further improvement in the prediction performance is still expected. Thus, here we made an attempt to analyze the performance of machine learning approaches (MLA) with 18 different amino acid encoding schemes in the prediction of flexible length linear B-cell epitopes. We considered B-cell epitope sequences of variable lengths (11-56 amino acids) from well-established public resources. The performances of machine learning algorithms with the encoded epitope sequence datasets were evaluated. Besides, the feasible combinations of encoding schemes were also explored and analyzed. The results revealed that amino-acid composition (AC) and distribution component of composition-transition-distribution encoding schemes are suitable for heterogeneous epitope data, whereas amino-acid-anchoring-pair-composition (APC), dipeptide-composition and amino-acids-pair-propensity-scale (APP) are more appropriate for homogeneous data. Further, two combinations of peptide encoding schemes, i.e. APC + AC and APC + APP with random forest classifier were identified to have improved performance over the state-of-the-art tools for flexible length linear B-cell epitope prediction. The study also revealed better performance of random forest over other considered MLAs in the prediction of flexible length linear B-cell epitopes.


Assuntos
Epitopos de Linfócito B , Vacinas , Aminoácidos/genética , Dipeptídeos , Peptídeos/química
3.
Funct Integr Genomics ; 23(2): 113, 2023 Mar 31.
Artigo em Inglês | MEDLINE | ID: mdl-37000299

RESUMO

Abiotic stresses are detrimental to plant growth and development and have a major negative impact on crop yields. A growing body of evidence indicates that a large number of long non-coding RNAs (lncRNAs) are key to many abiotic stress responses. Thus, identifying abiotic stress-responsive lncRNAs is essential in crop breeding programs in order to develop crop cultivars resistant to abiotic stresses. In this study, we have developed the first machine learning-based computational model for predicting abiotic stress-responsive lncRNAs. The lncRNA sequences which were responsive and non-responsive to abiotic stresses served as the two classes of the dataset for binary classification using the machine learning algorithms. The training dataset was created using 263 stress-responsive and 263 non-stress-responsive sequences, whereas the independent test set consists of 101 sequences from both classes. As the machine learning model can adopt only the numeric data, the Kmer features ranging from sizes 1 to 6 were utilized to represent lncRNAs in numeric form. To select important features, four different feature selection strategies were utilized. Among the seven learning algorithms, the support vector machine (SVM) achieved the highest cross-validation accuracy with the selected feature sets. The observed 5-fold cross-validation accuracy, AU-ROC, and AU-PRC were found to be 68.84, 72.78, and 75.86%, respectively. Furthermore, the robustness of the developed model (SVM with the selected feature) was evaluated using an independent test dataset, where the overall accuracy, AU-ROC, and AU-PRC were found to be 76.23, 87.71, and 88.49%, respectively. The developed computational approach was also implemented in an online prediction tool ASLncR accessible at https://iasri-sg.icar.gov.in/aslncr/ . The proposed computational model and the developed prediction tool are believed to supplement the existing effort for the identification of abiotic stress-responsive lncRNAs in plants.


Assuntos
RNA Longo não Codificante , RNA Longo não Codificante/genética , Biologia Computacional , Melhoramento Vegetal , Algoritmos , Plantas/genética , Estresse Fisiológico/genética
4.
Funct Integr Genomics ; 23(2): 92, 2023 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-36939943

RESUMO

Abiotic stresses have become a major challenge in recent years due to their pervasive nature and shocking impacts on plant growth, development, and quality. MicroRNAs (miRNAs) play a significant role in plant response to different abiotic stresses. Thus, identification of specific abiotic stress-responsive miRNAs holds immense importance in crop breeding programmes to develop cultivars resistant to abiotic stresses. In this study, we developed a machine learning-based computational model for prediction of miRNAs associated with four specific abiotic stresses such as cold, drought, heat and salt. The pseudo K-tuple nucleotide compositional features of Kmer size 1 to 5 were used to represent miRNAs in numeric form. Feature selection strategy was employed to select important features. With the selected feature sets, support vector machine (SVM) achieved the highest cross-validation accuracy in all four abiotic stress conditions. The highest cross-validated prediction accuracies in terms of area under precision-recall curve were found to be 90.15, 90.09, 87.71, and 89.25% for cold, drought, heat and salt respectively. Overall prediction accuracies for the independent dataset were respectively observed 84.57, 80.62, 80.38 and 82.78%, for the abiotic stresses. The SVM was also seen to outperform different deep learning models for prediction of abiotic stress-responsive miRNAs. To implement our method with ease, an online prediction server "ASmiR" has been established at https://iasri-sg.icar.gov.in/asmir/ . The proposed computational model and the developed prediction tool are believed to supplement the existing effort for identification of specific abiotic stress-responsive miRNAs in plants.


Assuntos
MicroRNAs , MicroRNAs/genética , Melhoramento Vegetal , Plantas/genética , Aprendizado de Máquina , Cloreto de Sódio , Estresse Fisiológico/genética , Regulação da Expressão Gênica de Plantas
5.
Heredity (Edinb) ; 128(6): 519-530, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35508540

RESUMO

We evaluated the performances of three BLUP and five Bayesian methods for genomic prediction by using nine actual and 54 simulated datasets. The genomic prediction accuracy was measured using Pearson's correlation coefficient between the genomic estimated breeding value (GEBV) and the observed phenotypic data using a fivefold cross-validation approach with 100 replications. The Bayesian alphabets performed better for the traits governed by a few genes/QTLs with relatively larger effects. On the contrary, the BLUP alphabets (GBLUP and CBLUP) exhibited higher genomic prediction accuracy for the traits controlled by several small-effect QTLs. Additionally, Bayesian methods performed better for the highly heritable traits and, for other traits, performed at par with the BLUP methods. Further, genomic BLUP (GBLUP) was identified as the least biased method for the GEBV estimation. Among the Bayesian methods, the Bayesian ridge regression and Bayesian LASSO were less biased than other Bayesian alphabets. Nonetheless, genomic prediction accuracy increased with an increase in trait heritability, irrespective of the sample size, marker density, and the QTL type (major/minor effect). In sum, this study provides valuable information regarding the choice of the selection method for genomic prediction in different breeding programs.


Assuntos
Genômica , Modelos Genéticos , Teorema de Bayes , Genômica/métodos , Genótipo , Fenótipo , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas
6.
Int J Mol Sci ; 23(3)2022 Jan 30.
Artigo em Inglês | MEDLINE | ID: mdl-35163534

RESUMO

MicroRNAs (miRNAs) play a significant role in plant response to different abiotic stresses. Thus, identification of abiotic stress-responsive miRNAs holds immense importance in crop breeding programmes to develop cultivars resistant to abiotic stresses. In this study, we developed a machine learning-based computational method for prediction of miRNAs associated with abiotic stresses. Three types of datasets were used for prediction, i.e., miRNA, Pre-miRNA, and Pre-miRNA + miRNA. The pseudo K-tuple nucleotide compositional features were generated for each sequence to transform the sequence data into numeric feature vectors. Support vector machine (SVM) was employed for prediction. The area under receiver operating characteristics curve (auROC) of 70.21, 69.71, 77.94 and area under precision-recall curve (auPRC) of 69.96, 65.64, 77.32 percentages were obtained for miRNA, Pre-miRNA, and Pre-miRNA + miRNA datasets, respectively. Overall prediction accuracies for the independent test set were 62.33, 64.85, 69.21 percentages, respectively, for the three datasets. The SVM also achieved higher accuracy than other learning methods such as random forest, extreme gradient boosting, and adaptive boosting. To implement our method with ease, an online prediction server "ASRmiRNA" has been developed. The proposed approach is believed to supplement the existing effort for identification of abiotic stress-responsive miRNAs and Pre-miRNAs.


Assuntos
Biologia Computacional/métodos , MicroRNAs/genética , Plantas/genética , Algoritmos , Área Sob a Curva , Regulação da Expressão Gênica de Plantas , RNA de Plantas/genética , Estresse Fisiológico , Máquina de Vetores de Suporte
7.
Physiol Mol Biol Plants ; 28(1): 1-16, 2022 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-35221569

RESUMO

In plants, GIGANTEA (GI) protein plays different biological functions including carbon and sucrose metabolism, cell wall deposition, transpiration and hypocotyl elongation. This suggests that GI is an important class of proteins. So far, the resource-intensive experimental methods have been mostly utilized for identification of GI proteins. Thus, we made an attempt in this study to develop a computational model for fast and accurate prediction of GI proteins. Ten different supervised learning algorithms i.e., SVM, RF, JRIP, J48, LMT, IBK, NB, PART, BAGG and LGB were employed for prediction, where the amino acid composition (AAC), FASGAI features and physico-chemical (PHYC) properties were used as numerical inputs for the learning algorithms. Higher accuracies i.e., 96.75% of AUC-ROC and 86.7% of AUC-PR were observed for SVM coupled with AAC + PHYC feature combination, while evaluated with five-fold cross validation. With leave-one-out cross validation, 97.29% of AUC-ROC and 87.89% of AUC-PR were respectively achieved. While the performance of the model was evaluated with an independent dataset of 18 GI sequences, 17 were observed as correctly predicted. We have also performed proteome-wide identification of GI proteins in wheat, followed by functional annotation using Gene Ontology terms. A prediction server "GIpred" is freely accessible at http://cabgrid.res.in:8080/gipred/ for proteome-wide recognition of GI proteins. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s12298-022-01130-6.

8.
Physiol Mol Biol Plants ; 28(3): 651-668, 2022 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-35465203

RESUMO

In the present study in wheat, GWAS was conducted for identification of marker trait associations (MTAs) for the following six grain morphology traits: (1) grain cross-sectional area (GCSA), (2) grain perimeter (GP), (3) grain length (GL), (4) grain width (GWid), (5) grain length-width ratio (GLWR) and (6) grain form-density (GFD). The data were recorded on a subset of spring wheat reference set (SWRS) comprising 225 diverse genotypes, which were genotyped using 10,904 SNPs and phenotyped for two consecutive years (2017-2018, 2018-2019). GWAS was conducted using five different models including two single-locus models (CMLM, SUPER), one multi-locus model (FarmCPU), one multi-trait model (mvLMM) and a model for Q x Q epistatic interactions. False discovery rate (FDR) [P value -log10(p) ≥ 5] and Bonferroni correction [P value -log10(p) ≥ 6] (corrected p value < 0.05) were applied to eliminate false positives due to multiple testing. This exercise gave 88 main effect and 29 epistatic MTAs after FDR and 13 main effect and 6 epistatic MTAs after Bonferroni corrections. MTAs obtained after Bonferroni corrections were further utilized for identification of 55 candidate genes (CGs). In silico expression analysis of CGs in different tissues at different parts of the seed at different developmental stages was also carried out. MTAs and CGs identified during the present study are useful addition to available resources for MAS to supplement wheat breeding programmes after due validation and also for future strategic basic research. Supplementary Information: The online version contains supplementary material available at 10.1007/s12298-022-01164-w.

9.
BMC Bioinformatics ; 22(1): 342, 2021 Jun 24.
Artigo em Inglês | MEDLINE | ID: mdl-34167457

RESUMO

BACKGROUND: Localization of messenger RNAs (mRNAs) plays a crucial role in the growth and development of cells. Particularly, it plays a major role in regulating spatio-temporal gene expression. The in situ hybridization is a promising experimental technique used to determine the localization of mRNAs but it is costly and laborious. It is also a known fact that a single mRNA can be present in more than one location, whereas the existing computational tools are capable of predicting only a single location for such mRNAs. Thus, the development of high-end computational tool is required for reliable and timely prediction of multiple subcellular locations of mRNAs. Hence, we develop the present computational model to predict the multiple localizations of mRNAs. RESULTS: The mRNA sequences from 9 different localizations were considered. Each sequence was first transformed to a numeric feature vector of size 5460, based on the k-mer features of sizes 1-6. Out of 5460 k-mer features, 1812 important features were selected by the Elastic Net statistical model. The Random Forest supervised learning algorithm was then employed for predicting the localizations with the selected features. Five-fold cross-validation accuracies of 70.87, 68.32, 68.36, 68.79, 96.46, 73.44, 70.94, 97.42 and 71.77% were obtained for the cytoplasm, cytosol, endoplasmic reticulum, exosome, mitochondrion, nucleus, pseudopodium, posterior and ribosome respectively. With an independent test set, accuracies of 65.33, 73.37, 75.86, 72.99, 94.26, 70.91, 65.53, 93.60 and 73.45% were obtained for the respective localizations. The developed approach also achieved higher accuracies than the existing localization prediction tools. CONCLUSIONS: This study presents a novel computational tool for predicting the multiple localization of mRNAs. Based on the proposed approach, an online prediction server "mLoc-mRNA" is accessible at http://cabgrid.res.in:8080/mlocmrna/ . The developed approach is believed to supplement the existing tools and techniques for the localization prediction of mRNAs.


Assuntos
Algoritmos , Biologia Computacional , Núcleo Celular , RNA Mensageiro/genética , Ribossomos
10.
Mol Breed ; 41(7): 46, 2021 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37309385

RESUMO

A genome-wide association study (GWAS) for 10 yield and yield component traits was conducted using an association panel comprising 225 diverse spring wheat genotypes. The panel was genotyped using 10,904 SNPs and evaluated for three years (2016-2019), which constituted three environments (E1, E2 and E3). Heritability for different traits ranged from 29.21 to 97.69%. Marker-trait associations (MTAs) were identified for each trait using data from each environment separately and also using BLUP values. Four different models were used, which included three single trait models (CMLM, FarmCPU, SUPER) and one multi-trait model (mvLMM). Hundreds of MTAs were obtained using each model, but after Bonferroni correction, only 6 MTAs for 3 traits were available using CMLM, and 21 MTAs for 4 traits were available using FarmCPU; none of the 525 MTAs obtained using SUPER could qualify after Bonferroni correction. Using BLUP, 20 MTAs were available, five of which also figured among MTAs identified for individual environments. Using mvLMM model, after Bonferroni correction, 38 multi-trait MTAs, for 15 different trait combinations were available. Epistatic interactions involving 28 pairs of MTAs were also available for seven of the 10 traits; no epistatic interactions were available for GNPS, PH, and BYPP. As many as 164 putative candidate genes (CGs) were identified using all the 50 MTAs (CMLM, 3; FarmCPU, 9; mvLMM, 6, epistasis, 21 and BLUP, 11 MTAs), which ranged from 20 (CMLM) to 66 (epistasis) CGs. In-silico expression analysis of CGs was also conducted in different tissues at different developmental stages. The information generated through the present study proved useful for developing a better understanding of the genetics of each of the 10 traits; the study also provided novel markers for marker-assisted selection (MAS) to be utilized for the development of wheat cultivars with improved agronomic traits. Supplementary Information: The online version contains supplementary material available at 10.1007/s11032-021-01240-1.

11.
Plant Dis ; 104(1): 71-81, 2020 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-31697221

RESUMO

The ToxA-Tsn1 system is an example of an inverse gene-for-gene relationship. The gene ToxA encodes a host-selective toxin (HST) which functions as a necrotrophic effector and is often responsible for the virulence of the pathogen. The genomes of several fungal pathogens (e.g., Pyrenophora tritici-repentis, Parastagonospora nodorum, and Bipolaris sorokiniana) have been shown to carry the ToxA gene. Tsn1 is a sensitivity gene in the host, whose presence generally helps a ToxA-positive pathogen to cause spot blotch in wheat. Cultivars lacking Tsn1 are generally resistant to spot blotch; this resistance is attributed to a number of other known genes which impart resistance in the absence of Tsn1. In the present study, 110 isolates of B. sorokiniana strains, collected from the ME5A and ME4C megaenvironments of India, were screened for the presence of the ToxA gene; 77 (70%) were found to be ToxA positive. Similarly, 220 Indian wheat cultivars were screened for the presence of the Tsn1 gene; 81 (36.8%) were found to be Tsn1 positive. When 20 wheat cultivars (11 with Tsn1 and 9 with tsn1) were inoculated with ToxA-positive isolates, seedlings of only those carrying the Tsn1 allele (not tsn1) developed necrotic spots surrounded by a chlorotic halo. No such distinction between Tsn1 and tsn1 carriers was observed when adult plants were inoculated. This study suggests that the absence of Tsn1 facilitated resistance against spot blotch of wheat. Therefore, the selection of wheat genotypes for the absence of the Tsn1 allele can improve resistance to spot blotch.


Assuntos
Ascomicetos , Interações Hospedeiro-Patógeno , Triticum , Virulência , Ascomicetos/genética , Ascomicetos/patogenicidade , Resistência à Doença/genética , Genes Fúngicos/genética , Genes de Plantas/genética , Interações Hospedeiro-Patógeno/genética , Índia , Triticum/genética , Triticum/microbiologia , Virulência/genética
12.
BMC Genet ; 20(1): 2, 2019 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-30616524

RESUMO

BACKGROUND: Identification of unknown fungal species aids to the conservation of fungal diversity. As many fungal species cannot be cultured, morphological identification of those species is almost impossible. But, DNA barcoding technique can be employed for identification of such species. For fungal taxonomy prediction, the ITS (internal transcribed spacer) region of rDNA (ribosomal DNA) is used as barcode. Though the computational prediction of fungal species has become feasible with the availability of huge volume of barcode sequences in public domain, prediction of fungal species is challenging due to high degree of variability among ITS regions within species. RESULTS: A Random Forest (RF)-based predictor was built for identification of unknown fungal species. The reference and query sequences were mapped onto numeric features based on gapped base pair compositions, and then used as training and test sets respectively for prediction of fungal species using RF. More than 85% accuracy was found when 4 sequences per species in the reference set were utilized; whereas it was seen to be stabilized at ~88% if ≥7 sequence per species in the reference set were used for training of the model. The proposed model achieved comparable accuracy, while evaluated against existing methods through cross-validation procedure. The proposed model also outperformed several existing models used for identification of different species other than fungi. CONCLUSIONS: An online prediction server "funbarRF" is established at http://cabgrid.res.in:8080/funbarrf/ for fungal species identification. Besides, an R-package funbarRF ( https://cran.r-project.org/web/packages/funbarRF/ ) is also available for prediction using high throughput sequence data. The effort put in this work will certainly supplement the future endeavors in the direction of fungal taxonomy assignments based on DNA barcode.


Assuntos
Biologia Computacional/métodos , Código de Barras de DNA Taxonômico/métodos , Fungos/classificação , Fungos/genética , Aprendizado de Máquina Supervisionado , DNA Fúngico/genética , Software
13.
BMC Bioinformatics ; 18(1): 190, 2017 Mar 24.
Artigo em Inglês | MEDLINE | ID: mdl-28340571

RESUMO

BACKGROUND: Insecticide resistance is a major challenge for the control program of insect pests in the fields of crop protection, human and animal health etc. Resistance to different insecticides is conferred by the proteins encoded from certain class of genes of the insects. To distinguish the insecticide resistant proteins from non-resistant proteins, no computational tool is available till date. Thus, development of such a computational tool will be helpful in predicting the insecticide resistant proteins, which can be targeted for developing appropriate insecticides. RESULTS: Five different sets of feature viz., amino acid composition (AAC), di-peptide composition (DPC), pseudo amino acid composition (PAAC), composition-transition-distribution (CTD) and auto-correlation function (ACF) were used to map the protein sequences into numeric feature vectors. The encoded numeric vectors were then used as input in support vector machine (SVM) for classification of insecticide resistant and non-resistant proteins. Higher accuracies were obtained under RBF kernel than that of other kernels. Further, accuracies were observed to be higher for DPC feature set as compared to others. The proposed approach achieved an overall accuracy of >90% in discriminating resistant from non-resistant proteins. Further, the two classes of resistant proteins i.e., detoxification-based and target-based were discriminated from non-resistant proteins with >95% accuracy. Besides, >95% accuracy was also observed for discrimination of proteins involved in detoxification- and target-based resistance mechanisms. The proposed approach not only outperformed Blastp, PSI-Blast and Delta-Blast algorithms, but also achieved >92% accuracy while assessed using an independent dataset of 75 insecticide resistant proteins. CONCLUSIONS: This paper presents the first computational approach for discriminating the insecticide resistant proteins from non-resistant proteins. Based on the proposed approach, an online prediction server DIRProt has also been developed for computational prediction of insecticide resistant proteins, which is accessible at http://cabgrid.res.in:8080/dirprot/ . The proposed approach is believed to supplement the efforts needed to develop dynamic insecticides in wet-lab by targeting the insecticide resistant proteins.


Assuntos
Sistema Enzimático do Citocromo P-450/metabolismo , Inseticidas/metabolismo , Proteínas/química , Ácido gama-Aminobutírico/metabolismo , Animais , Humanos , Inseticidas/análise
14.
J Theor Biol ; 404: 285-294, 2016 09 07.
Artigo em Inglês | MEDLINE | ID: mdl-27302911

RESUMO

Identification of splice sites is important due to their key role in predicting the exon-intron structure of protein coding genes. Though several approaches have been developed for the prediction of splice sites, further improvement in the prediction accuracy will help predict gene structure more accurately. This paper presents a computational approach for prediction of donor splice sites with higher accuracy. In this approach, true and false splice sites were first encoded into numeric vectors and then used as input in artificial neural network (ANN), support vector machine (SVM) and random forest (RF) for prediction. ANN and SVM were found to perform equally and better than RF, while tested on HS3D and NN269 datasets. Further, the performance of ANN, SVM and RF were analyzed by using an independent test set of 50 genes and found that the prediction accuracy of ANN was higher than that of SVM and RF. All the predictors achieved higher accuracy while compared with the existing methods like NNsplice, MEM, MDD, WMM, MM1, FSPLICE, GeneID and ASSP, using the independent test set. We have also developed an online prediction server (PreDOSS) available at http://cabgrid.res.in:8080/predoss, for prediction of donor splice sites using the proposed approach.


Assuntos
Biologia Computacional/métodos , Sítios de Splice de RNA/genética , Área Sob a Curva , Sequência de Bases , Distribuição de Qui-Quadrado , Bases de Dados de Ácidos Nucleicos , Humanos , Internet , Redes Neurais de Computação , Motivos de Nucleotídeos/genética , Curva ROC , Homologia de Sequência do Ácido Nucleico , Máquina de Vetores de Suporte
15.
Indian J Biochem Biophys ; 52(1): 34-44, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26040110

RESUMO

Viral diseases like foot-and-mouth disease (FMD), calf scour (CS), bovine viral diarrhea (BVD), infectious bovine rhinotracheitis (IBR) etc. affect the growth and milk production of cattle (Bos taurus) causing severe economic loss. Epitope-based vaccine designing have been evolved to provide a new strategy for therapeutic application of pathogen-specific immunity in animals. Therefore, identification of major histocompatibility complex (MHC) binding peptides as potential T-cell epitopes is widely applied in peptide vaccine designing and immunotherapy. In this study, MetaMHCI tool was used with seven different algorithms to predict the potential T-cell epitopes for FMD, BVD, IBR and CS in cattle. A total of 54 protein sequences were filtered out from a total set of 6351 sequences of the pathogens causing the said diseases using bioinformatics approaches. These selected protein sequences were used as the key inputs for MetaMHCI tool to predict the epitopes for the BoLA-All MHC class I allele of B. taurus. Further, the epitopes were ranked based on a proposed principal component analysis based epitope score (PbES). The best epitope for each disease based on its predictability through maximum number of predictors and low PbES was modeled in PEP-FOLD server and docked with the BoLA-A11 protein for understanding the MHC-epitope interaction. Finally, a total of 78 epitopes were predicted, out of which 27 were for FMD, 25 for BVD, 12 for CS and 14 for IBR. These epitopes could be artificially synthesized and recommended to vaccinate the cattle for the considered diseases. Besides, the methodology adapted here could also be used to predict and analyze the epitopes for other microbial diseases of important animal species.


Assuntos
Doenças dos Bovinos/imunologia , Epitopos/imunologia , Antígenos de Histocompatibilidade Classe I/imunologia , Viroses/veterinária , Animais , Bovinos , Viroses/imunologia
16.
BMC Bioinformatics ; 15: 362, 2014 Nov 25.
Artigo em Inglês | MEDLINE | ID: mdl-25420551

RESUMO

BACKGROUND: Most of the approaches for splice site prediction are based on machine learning techniques. Though, these approaches provide high prediction accuracy, the window lengths used are longer in size. Hence, these approaches may not be suitable to predict the novel splice variants using the short sequence reads generated from next generation sequencing technologies. Further, machine learning techniques require numerically encoded data and produce different accuracy with different encoding procedures. Therefore, splice site prediction with short sequence motifs and without encoding sequence data became a motivation for the present study. RESULTS: An approach for finding association among nucleotide bases in the splice site motifs is developed and used further to determine the appropriate window size. Besides, an approach for prediction of donor splice sites using sum of absolute error criterion has also been proposed. The proposed approach has been compared with commonly used approaches i.e., Maximum Entropy Modeling (MEM), Maximal Dependency Decomposition (MDD), Weighted Matrix Method (WMM) and Markov Model of first order (MM1) and was found to perform equally with MEM and MDD and better than WMM and MM1 in terms of prediction accuracy. CONCLUSIONS: The proposed prediction approach can be used in the prediction of donor splice sites with higher accuracy using short sequence motifs and hence can be used as a complementary method to the existing approaches. Based on the proposed methodology, a web server was also developed for easy prediction of donor splice sites by users and is available at http://cabgrid.res.in:8080/sspred .


Assuntos
Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Modelos Estatísticos , Motivos de Nucleotídeos/genética , Sítios de Splice de RNA/genética , Splicing de RNA/genética , Análise de Sequência de RNA/métodos , Inteligência Artificial , Sequência de Bases , Biologia Computacional , Humanos , Dados de Sequência Molecular , Curva ROC
17.
Comput Struct Biotechnol J ; 23: 1631-1640, 2024 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38660008

RESUMO

RNA-binding proteins (RBPs) are central to key functions such as post-transcriptional regulation, mRNA stability, and adaptation to varied environmental conditions in prokaryotes. While the majority of research has concentrated on eukaryotic RBPs, recent developments underscore the crucial involvement of prokaryotic RBPs. Although computational methods have emerged in recent years to identify RBPs, they have fallen short in accurately identifying prokaryotic RBPs due to their generic nature. To bridge this gap, we introduce RBProkCNN, a novel machine learning-driven computational model meticulously designed for the accurate prediction of prokaryotic RBPs. The prediction process involves the utilization of eight shallow learning algorithms and four deep learning models, incorporating PSSM-based evolutionary features. By leveraging a convolutional neural network (CNN) and evolutionarily significant features selected through extreme gradient boosting variable importance measure, RBProkCNN achieved the highest accuracy in five-fold cross-validation, yielding 98.04% auROC and 98.19% auPRC. Furthermore, RBProkCNN demonstrated robust performance with an independent dataset, showcasing a commendable 95.77% auROC and 95.78% auPRC. Noteworthy is its superior predictive accuracy when compared to several state-of-the-art existing models. RBProkCNN is available as an online prediction tool (https://iasri-sg.icar.gov.in/rbprokcnn/), offering free access to interested users. This tool represents a substantial contribution, enriching the array of resources available for the accurate and efficient prediction of prokaryotic RBPs.

18.
Protein Sci ; 33(6): e5015, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38747369

RESUMO

Prokaryotic DNA binding proteins (DBPs) play pivotal roles in governing gene regulation, DNA replication, and various cellular functions. Accurate computational models for predicting prokaryotic DBPs hold immense promise in accelerating the discovery of novel proteins, fostering a deeper understanding of prokaryotic biology, and facilitating the development of therapeutics targeting for potential disease interventions. However, existing generic prediction models often exhibit lower accuracy in predicting prokaryotic DBPs. To address this gap, we introduce ProkDBP, a novel machine learning-driven computational model for prediction of prokaryotic DBPs. For prediction, a total of nine shallow learning algorithms and five deep learning models were utilized, with the shallow learning models demonstrating higher performance metrics compared to their deep learning counterparts. The light gradient boosting machine (LGBM), coupled with evolutionarily significant features selected via random forest variable importance measure (RF-VIM) yielded the highest five-fold cross-validation accuracy. The model achieved the highest auROC (0.9534) and auPRC (0.9575) among the 14 machine learning models evaluated. Additionally, ProkDBP demonstrated substantial performance with an independent dataset, exhibiting higher values of auROC (0.9332) and auPRC (0.9371). Notably, when benchmarked against several cutting-edge existing models, ProkDBP showcased superior predictive accuracy. Furthermore, to promote accessibility and usability, ProkDBP (https://iasri-sg.icar.gov.in/prokdbp/) is available as an online prediction tool, enabling free access to interested users. This tool stands as a significant contribution, enhancing the repertoire of resources for accurate and efficient prediction of prokaryotic DBPs.


Assuntos
Proteínas de Bactérias , Proteínas de Ligação a DNA , Aprendizado de Máquina , Algoritmos , Proteínas de Bactérias/química , Proteínas de Bactérias/metabolismo , Proteínas de Bactérias/genética , Biologia Computacional/métodos , Proteínas de Ligação a DNA/química , Proteínas de Ligação a DNA/metabolismo
19.
Biochim Biophys Acta Gen Subj ; 1868(6): 130597, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38490467

RESUMO

BACKGROUND: Abiotic stresses pose serious threat to the growth and yield of crop plants. Several studies suggest that in plants, transcription factors (TFs) are important regulators of gene expression, especially when it comes to coping with abiotic stresses. Therefore, it is crucial to identify TFs associated with abiotic stress response for breeding of abiotic stress tolerant crop cultivars. METHODS: Based on a machine learning framework, a computational model was envisaged to predict TFs associated with abiotic stress response in plants. To numerically encode TF sequences, four distinct sequence derived features were generated. The prediction was performed using ten shallow learning and four deep learning algorithms. For prediction using more pertinent and informative features, feature selection techniques were also employed. RESULTS: Using the features chosen by the light-gradient boosting machine-variable importance measure (LGBM-VIM), the LGBM achieved the highest cross-validation performance metrics (accuracy: 86.81%, auROC: 92.98%, and auPRC: 94.03%). Further evaluation of the proposed model (LGBM prediction method + LGBM-VIM selected features) was also done using an independent test dataset, where the accuracy, auROC and auPRC were observed 81.98%, 90.65% and 91.30%, respectively. CONCLUSIONS: To facilitate the adoption of the proposed strategy by users, the approach was implemented as a prediction server called ASPTF, accessible at https://iasri-sg.icar.gov.in/asptf/. The developed approach and the corresponding web application are anticipated to supplement experimental methods in the identification of transcription factors (TFs) responsive to abiotic stress in plants.


Assuntos
Aprendizado de Máquina , Estresse Fisiológico , Fatores de Transcrição , Fatores de Transcrição/metabolismo , Fatores de Transcrição/genética , Algoritmos , Regulação da Expressão Gênica de Plantas , Biologia Computacional/métodos , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Plantas/metabolismo , Plantas/genética
20.
Plant Genome ; 16(4): e20332, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37122189

RESUMO

In wheat, genomic prediction accuracy (GPA) was assessed for three micronutrient traits (grain iron, grain zinc, and ß-carotenoid concentrations) using eight Bayesian regression models. For this purpose, data on 246 accessions, each genotyped with 17,937 DArT markers, were utilized. The phenotypic data on traits were available for 2013-2014 from Powerkheda (Madhya Pradesh) and for 2014-2015 from Meerut (Uttar Pradesh), India. The accuracy of the models was measured in terms of reliability, which was computed following a repeated cross-validation approach. The predictions were obtained independently for each of the two environments after adjusting for the local effects and across environments after adjusting for the environmental effects. The Bayes ridge regression (BayesRR) model outperformed the other seven models, whereas BayesLASSO (BayesL) was the least efficient. The GPA increased with an increase in the size of the training set as well as with an increase in marker density. The GPA values differed for the three traits and were higher for the best linear unbiased estimate (BLUE) (obtained after adjusting for the environmental effects) relative to those for the two environments. The GPA also remained unaffected after accounting for the population structure. The results of the present study suggest that only the best model should be used for the estimations of genomic estimated breeding values (GEBVs) before their use for genomic selection to improve the grain micronutrient contents.


Assuntos
Micronutrientes , Triticum , Triticum/genética , Teorema de Bayes , Reprodutibilidade dos Testes , Pão , Melhoramento Vegetal , Genômica/métodos , Grão Comestível/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA