Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
PLoS One ; 19(2): e0298636, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38394324

RESUMO

Information on genetic diversity and population structure is helpful to strategize enhancing the genetic base of hybrid parental lines in breeding programs. The present study determined the population structure and genetic diversity of 109 pearl millet hybrid parental lines, known for their better adaptation and performance in drought-prone environments, using 16,472 single nucleotide polymorphic (SNP) markers generated from GBS (genotyping-by-sequencing) platforms. The SNPs were distributed uniformly across the pearl millet genome and showed considerable genetic diversity (0.337), expected heterozygosity (0.334), and observed heterozygosity (0.031). Most of the pairs of lines (78.36%) had Identity-by-State (IBS) based genetic distances of more than 0.3, indicating a significant amount of genetic diversity among the parental lines. Bayesian model-based population stratification, neighbor-joining phylogenetic analysis, and principal coordinate analysis (PCoA) differentiated all hybrid parental lines into two clear-cut major groups, one each for seed parents (B-lines) and pollinators (R-lines). Majority of parental lines sharing common parentages were found grouped in the same cluster. Analysis of molecular variance (AMOVA) revealed 7% of the variation among subpopulations, and 93% of the variation was attributable to within sub-populations. Chromosome 3 had the highest number of LD regions. Genomic LD decay distance was 0.69 Mb and varied across the different chromosomes. Genetic diversity based on 11 agro-morphological and grain quality traits also suggested that the majority of the B- and R-lines were grouped into two major clusters with few overlaps. In addition, the combined analysis of phenotypic and genotypic data showed similarities in the population grouping patterns. The present study revealed the uniqueness of most of the inbred lines, which can be a valuable source of new alleles and help breeders to utilize these inbred lines for the development of hybrids in drought-prone environments.


Assuntos
Pennisetum , Pennisetum/genética , Filogenia , Secas , Teorema de Bayes , Melhoramento Vegetal , Variação Genética , Polimorfismo de Nucleotídeo Único
2.
Physiol Mol Biol Plants ; 28(1): 1-16, 2022 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-35221569

RESUMO

In plants, GIGANTEA (GI) protein plays different biological functions including carbon and sucrose metabolism, cell wall deposition, transpiration and hypocotyl elongation. This suggests that GI is an important class of proteins. So far, the resource-intensive experimental methods have been mostly utilized for identification of GI proteins. Thus, we made an attempt in this study to develop a computational model for fast and accurate prediction of GI proteins. Ten different supervised learning algorithms i.e., SVM, RF, JRIP, J48, LMT, IBK, NB, PART, BAGG and LGB were employed for prediction, where the amino acid composition (AAC), FASGAI features and physico-chemical (PHYC) properties were used as numerical inputs for the learning algorithms. Higher accuracies i.e., 96.75% of AUC-ROC and 86.7% of AUC-PR were observed for SVM coupled with AAC + PHYC feature combination, while evaluated with five-fold cross validation. With leave-one-out cross validation, 97.29% of AUC-ROC and 87.89% of AUC-PR were respectively achieved. While the performance of the model was evaluated with an independent dataset of 18 GI sequences, 17 were observed as correctly predicted. We have also performed proteome-wide identification of GI proteins in wheat, followed by functional annotation using Gene Ontology terms. A prediction server "GIpred" is freely accessible at http://cabgrid.res.in:8080/gipred/ for proteome-wide recognition of GI proteins. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s12298-022-01130-6.

3.
3 Biotech ; 11(11): 484, 2021 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-34790508

RESUMO

Identification of splice sites is an important aspect with regard to the prediction of gene structure. In most of the existing splice site prediction studies, machine learning algorithms coupled with sequence-derived features have been successfully employed for splice site recognition. However, the splice site identification by incorporating the secondary structure information is lacking, particularly in plant species. Thus, we made an attempt in this study to evaluate the performance of structural features on the splice site prediction accuracy in Arabidopsis thaliana. Prediction accuracies were evaluated with the sequence-derived features alone as well as by incorporating the structural features into the sequence-derived features, where support vector machine (SVM) was employed as prediction algorithm. Both short (40 base pairs) and long (105 base pairs) sequence datasets were considered for evaluation. After incorporating the secondary structure features, improvements in accuracies were observed only for the longer sequence dataset and the improvement was found to be higher with the sequence-derived features that accounted nucleotide dependencies. On the other hand, either a little or no improvement in accuracies was found for the short sequence dataset. The performance of SVM was further compared with that of LogitBoost, Random Forest (RF), AdaBoost and XGBoost machine learning methods. The prediction accuracies of SVM, AdaBoost and XGBoost were observed to be at par and higher than that of RF and LogitBoost algorithms. While prediction was performed by taking all the sequence-derived features along with the structural features, a little improvement in accuracies was found as compared to the combination of individual sequence-based features and structural features. To the best of our knowledge, this is the first attempt concerning the computational prediction of splice sites using machine learning methods by incorporating the secondary structure information into the sequence-derived features. All the source codes are available at https://github.com/meher861982/SSFeature. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s13205-021-03036-8.

4.
Plant Methods ; 17(1): 46, 2021 Apr 26.
Artigo em Inglês | MEDLINE | ID: mdl-33902670

RESUMO

BACKGROUND: Circadian rhythms regulate several physiological and developmental processes of plants. Hence, the identification of genes with the underlying circadian rhythmic features is pivotal. Though computational methods have been developed for the identification of circadian genes, all these methods are based on gene expression datasets. In other words, we failed to search any sequence-based model, and that motivated us to deploy the present computational method to identify the proteins encoded by the circadian genes. RESULTS: Support vector machine (SVM) with seven kernels, i.e., linear, polynomial, radial, sigmoid, hyperbolic, Bessel and Laplace was utilized for prediction by employing compositional, transitional and physico-chemical features. Higher accuracy of 62.48% was achieved with the Laplace kernel, following the fivefold cross- validation approach. The developed model further secured 62.96% accuracy with an independent dataset. The SVM also outperformed other state-of-art machine learning algorithms, i.e., Random Forest, Bagging, AdaBoost, XGBoost and LASSO. We also performed proteome-wide identification of circadian proteins in two cereal crops namely, Oryza sativa and Sorghum bicolor, followed by the functional annotation of the predicted circadian proteins with Gene Ontology (GO) terms. CONCLUSIONS: To the best of our knowledge, this is the first computational method to identify the circadian genes with the sequence data. Based on the proposed method, we have developed an R-package PredCRG ( https://cran.r-project.org/web/packages/PredCRG/index.html ) for the scientific community for proteome-wide identification of circadian genes. The present study supplements the existing computational methods as well as wet-lab experiments for the recognition of circadian genes.

6.
Sci Rep ; 10(1): 14557, 2020 09 03.
Artigo em Inglês | MEDLINE | ID: mdl-32884018

RESUMO

MicroRNAs (miRNAs) are one kind of non-coding RNA, play vital role in regulating several physiological and developmental processes. Subcellular localization of miRNAs and their abundance in the native cell are central for maintaining physiological homeostasis. Besides, RNA silencing activity of miRNAs is also influenced by their localization and stability. Thus, development of computational method for subcellular localization prediction of miRNAs is desired. In this work, we have proposed a computational method for predicting subcellular localizations of miRNAs based on principal component scores of thermodynamic, structural properties and pseudo compositions of di-nucleotides. Prediction accuracy was analyzed following fivefold cross validation, where ~ 63-71% of AUC-ROC and ~ 69-76% of AUC-PR were observed. While evaluated with independent test set, > 50% localizations were found to be correctly predicted. Besides, the developed computational model achieved higher accuracy than the existing methods. A user-friendly prediction server "miRNALoc" is freely accessible at https://cabgrid.res.in:8080/mirnaloc/ , by which the user can predict localizations of miRNAs.


Assuntos
Algoritmos , Biologia Computacional/métodos , MicroRNAs/análise , Nucleotídeos/química , Análise de Componente Principal/métodos , Precursores de RNA/química , Frações Subcelulares/metabolismo , Humanos , MicroRNAs/química , MicroRNAs/genética , Precursores de RNA/genética , Termodinâmica
7.
Gene ; 705: 113-126, 2019 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-31009682

RESUMO

Identification of splice sites is imperative for prediction of gene structure. Machine learning-based approaches (MLAs) have been reported to be more successful than the rule-based methods for identification of splice sites. However, the strings of alphabets should be transformed into numeric features through sequence encoding before using them as input in MLAs. In this study, we evaluated the performances of 8 different sequence encoding schemes i.e., Bayes kernel, density and sparse (DS), distribution of tri-nucleotide and 1st order Markov model (DM), frequency difference distance measure (FDDM), paired-nucleotide frequency difference between true and false sites (FDTF), 1st order Markov model (MM1), combination of both 1st and 2nd order Markov model (MM1 + MM2) and 2nd order Markov model (MM2) in respect of predicting donor and acceptor splice sites using 5 supervised learning methods (ANN, Bagging, Boosting, RF and SVM). The encoding schemes and machine learning methods were first evaluated in 4 species i.e., A. thaliana, C. elegans, D. melanogaster and H. sapiens, and then performances were validated with another four species i.e., Ciona intestinalis, Dictyostelium discoideum, Phaeodactylum tricornutum and Trypanosoma brucei. In terms of ROC (receiver-operating-characteristics) and PR (precision-recall) curves, FDTF encoding approach achieved higher accuracy followed by either MM2 or FDDM. Further, SVM was found to achieve higher accuracy (in terms of ROC and PR curves) followed by RF across encoding schemes and species. In terms of prediction accuracy across species, the SVM-FDTF combination was optimum than other combinations of classifiers and encoding schemes. Further, splice site prediction accuracies were observed higher for the species with low intron density. To our limited knowledge, this is the first attempt as far as comprehensive evaluation of sequence encoding schemes for prediction of splice sites is concerned. We have also developed an R-package EncDNA (https://cran.r-project.org/web/packages/EncDNA/index.html) for encoding of splice site motifs with different encoding schemes, which is expected to supplement the existing nucleotide sequence encoding approaches. This study is believed to be useful for the computational biologists for predicting different functional elements on the genomic DNA.


Assuntos
Biologia Computacional/métodos , Sítios de Splice de RNA , RNA Mensageiro/metabolismo , Algoritmos , Animais , Arabidopsis , Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Aprendizado de Máquina , Splicing de RNA , Curva ROC , Trypanosoma brucei brucei/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA