Búsqueda | BVS CLAP/SMR-OPS/OMS

Application of sequence semantic and integrated cellular geography approach to study alternative biogenesis of exonic circular RNA.

Kumar, Rajnish; Mondal, Rajkrishna; Lahiri, Tapobrata; Pal, Manoj Kumar.

BMC Bioinformatics ; 24(1): 148, 2023 Apr 17.

Artículo en Inglés | MEDLINE | ID: mdl-37069509

RESUMEN

BACKGROUND: Concurrent existence of lncRNA and circular RNA at both nucleus and cytosol within a cell at different proportions is well reported. Previous studies showed that circular RNAs are synthesized in nucleus followed by transportation across the nuclear membrane and the export is primarily defined by their length. lncRNAs primarily originated through inefficient splicing and seem to use NXF1 for cytoplasm export. However, it is not clear whether circularization of lncRNA happens only in nucleus or it also occurs in cytoplasm. Studies indicate that circular RNAs arise when the splicing apparatus undergoes a phenomenon of back splicing. Minor spliceosome (U12 type) mediated splicing occurs in cytoplasm and is responsible for the splicing of 0.5% of introns of human cells. Therefore, possibility of cRNA biogenesis mediated by minor spliceosome at cytoplasm cannot be ruled out. Secondly, information on genes transcribing both circular and lncRNAs along with total number of RBP binding sites for both of these RNA types is extractable from databases. This study showed how these apparently unconnected pieces of reports could be put together to build a model for exploring biogenesis of circular RNA. RESULTS: As a result of this study, a model was built under the premises that, sequences with special semantics were molecular precursors in biogenesis of circular RNA which occurred through catalytic role of some specific RBPs. The model outcome was further strengthened by fulfillment of three logical lemmas which were extracted and assimilated in this work using a novel data analytic approach, Integrated Cellular Geography. Result of the study was found to be in well agreement with proposed model. Furthermore this study also indicated that biogenesis of circular RNA was a post-transcriptional event. CONCLUSIONS: Overall, this study provides a novel systems biology based model under the paradigm of Integrated Cellular Geography which can assimilate independently performed experimental results and data published by global researchers on RNA biology to provide important information on biogenesis of circular RNAs considering lncRNAs as precursor molecule. This study also suggests the possible RBP-mediated circularization of RNA in the cytoplasm through back-splicing using minor spliceosome.

Asunto(s)

ARN Circular , ARN Largo no Codificante , Humanos , ARN Circular/metabolismo , ARN Largo no Codificante/genética , ARN Largo no Codificante/metabolismo , Semántica , ARN/química , Empalme del ARN , Intrones , Precursores del ARN/genética

An improved protein structure evaluation using a semi-empirically derived structure property.

Pal, Manoj Kumar; Lahiri, Tapobrata; Tanwar, Garima; Kumar, Rajnish.

BMC Struct Biol ; 18(1): 16, 2018 12 12.

Artículo en Inglés | MEDLINE | ID: mdl-30541545

RESUMEN

BACKGROUND: In the backdrop of challenge to obtain a protein structure under the known limitations of both experimental and theoretical techniques, the need of a fast as well as accurate protein structure evaluation method still exists to substantially reduce a huge gap between number of known sequences and structures. Among currently practiced theoretical techniques, homology modelling backed by molecular dynamics based optimization appears to be the most popular one. However it suffers from contradictory indications of different validation parameters generated from a set of protein models which are predicted against a particular target protein. For example, in one model Ramachandran Score may be quite high making it acceptable, whereas, its potential energy may not be very low making it unacceptable and vice versa. Towards resolving this problem, the main objective of this study was fixed as to utilize a simple experimentally derived output, Surface Roughness Index of concerned protein of unknown structure as an intervening agent that could be obtained using ordinary microscopic images of heat denatured aggregates of the same protein. RESULT: It was intriguing to observe that direct experimental knowledge of the concerned protein, however simple it may be, might give insight on acceptability of its particular structural model out of a confusion set of models generated from database driven comparative technique for structure prediction. The result obtained from a widely varying structural class of proteins indicated that speed of protein structure evaluation can be further enhanced without compromising with accuracy by recruiting simple experimental output. CONCLUSION: In this work, a semi-empirical methodological approach was provided for improving protein structure evaluation. It showed that, once structure models of a protein were obtained through homology technique, the problem of selection of a best model out of a confusion set of Pareto-optimal structures could be resolved by employing a structure agent directly obtainable through experiment with the same protein as experimental ingredient. Overall, in the backdrop of getting a reasonably accurate protein structure of pathogens causing epidemics or biological warfare, such approach could be of use as a plausible solution for fast drug design.

Asunto(s)

Modelos Moleculares , Proteínas/química , Citocromos c/química , Hemoglobinas/química , Conformación Proteica , Albúmina Sérica/química

TemPred: A Novel Protein Template Search Engine to Improve Protein Structure Prediction.

Tripathi, Asmita; Mondal, Rajkrishna; Lahiri, Tapobrata; Chaurasiya, Deepak; Pal, Manoj Kumar.

IEEE/ACM Trans Comput Biol Bioinform ; 20(3): 2112-2121, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-37018272

RESUMEN

Among new protein structure predictors, the recently developed AlphaFold predictor relies on contact map in line with contact map potential based threading model that basically relies on fold recognition. In parallel, sequence similarity based homology model relies on homologue recognition. Both of these methods rely on sequence-structure or sequence-sequence similarity with protein with known structure in absence of which, as argued in the development of AlphaFold, the structure prediction becomes quite challenging. However, the term, "known structure" depends on the similarity method adopted to identify it, for example, through sequence match yielding homologue or sequence-structure match yielding a fold. Also, quite often, AlphaFold structures are found to be not acceptable by the structure evaluating gold standard parameters. In this context, this work utilized the concept of ordered local physicochemical property, ProtPCV by Pal et al (2020) providing a new similarity criteria to identify the template protein with known structure. Finally a template search engine, TemPred was developed using the ProtPCV similarity criteria. It was intriguing to find that quite often templates generated by TemPred were better than that produced by the conventional search engines. It pointed out the need of combined approach to get better structural model for a protein.

Asunto(s)

Algoritmos , Motor de Búsqueda , Modelos Moleculares , Proteínas/química , Programas Informáticos

APT: An Automated Probe Tracker From Gene Expression Data.

Kumar, Gautam; Kumar, Rajnish; Pal, Manoj Kumar; Pramanik, Nilotpal; Lahiri, Tapobrata; Gupta, Ankita; Pandey, Saket.

IEEE/ACM Trans Comput Biol Bioinform ; 18(5): 1864-1874, 2021.

Artículo en Inglés | MEDLINE | ID: mdl-31825870

RESUMEN

Out of currently available semi-automatic tools for detecting diagnostic probes relevant to a pathophysiological condition, ArrayMining and GEO2R of NCBI are most popular. The shortcomings of ArrayMining and GEO2R are that both tools list the probes ordering them on the basis of their individual statistical level of significances with only difference of statistical methods used by them. While the latest tool GEO2R outputs either top 250 or all genes following its own ranking mechanism, ArrayMining requires number of probes to be inputted by the user. This study provided a way for automatic selection of probe-set that can be obtained from the voting of outputs resulted from statistical methods, t-Test, Mann-Whitney Test and Empirical Bayes Moderated t-test. It was also intriguing to find that the parameters of these statistical methods can be represented as a mathematical function of group fisher's discriminant ratio of a disease-control expression data-pair. Result of this fully automatic method, APT shows 88.97 percent success in comparison to 80.40 and 87.60 percent successes of ArrayMining and GEO2R respectively to include reported probes. Furthermore, out of 10 fold cross validation and 5 new test cases, APT shows a better performance than both ArrayMining and GEO2R in regards to sensitivity and specificity.

Asunto(s)

Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Modelos Estadísticos , Teorema de Bayes , Reconocimiento de Normas Patrones Automatizadas/métodos

ProtPCV: A Fixed Dimensional Numerical Representation of Protein Sequence to Significantly Reduce Sequence Search Time.

Pal, Manoj Kumar; Lahiri, Tapobrata; Kumar, Rajnish.

Interdiscip Sci ; 12(3): 276-287, 2020 Sep.

Artículo en Inglés | MEDLINE | ID: mdl-32524529

RESUMEN

Protein sequence is a wealth of experimental information which is yet to be exploited to extract information on protein homologues. Consequently, it is observed from publications that dynamic programming, heuristics and HMM profile-based alignment techniques along with the alignment free techniques do not directly utilize ordered profile of physicochemical properties of a protein to identify its homologue. Also, it is found that these works lack crucial bench-marking or validation in absence of which their incorporation in search engines may appears to be questionable. In this direction this research approach offers fixed dimensional numerical representation of protein sequences extending the concept of periodicity count value of nucleotide types (2017) to accommodate Euclidean distance as direct similarity measure between two proteins. Instead of bench-marking with BLAST and PSI-BLAST only, this new similarity measure was also compared with Needleman-Wunsch and Smith-Waterman. For enhancing the strength of comparison, this work for the first time introduces two novel benchmarking methods based on correlation of "similarity scores" and "proximity of ranked outputs from a standard sequence alignment method" between all possible pairs of search techniques including the new one presented in this paper. It is found that the novel and unique numerical representation of a protein can reduce computational complexity of protein sequence search to the tune of O(log(n)). It may also help implementation of various other similarity-based operation possible, such as clustering, phylogenetic analysis and classification of proteins on the basis of the properties used to build this numerical representation of protein.

Asunto(s)

Programas Informáticos , Análisis por Conglomerados , Biología Computacional/métodos , Filogenia , Análisis de Secuencia de Proteína/métodos

PCV: An Alignment Free Method for Finding Homologous Nucleotide Sequences and its Application in Phylogenetic Study.

Kumar, Rajnish; Mishra, Bharat Kumar; Lahiri, Tapobrata; Kumar, Gautam; Kumar, Nilesh; Gupta, Rahul; Pal, Manoj Kumar.

Interdiscip Sci ; 9(2): 173-183, 2017 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-26825665

RESUMEN

Online retrieval of the homologous nucleotide sequences through existing alignment techniques is a common practice against the given database of sequences. The salient point of these techniques is their dependence on local alignment techniques and scoring matrices the reliability of which is limited by computational complexity and accuracy. Toward this direction, this work offers a novel way for numerical representation of genes which can further help in dividing the data space into smaller partitions helping formation of a search tree. In this context, this paper introduces a 36-dimensional Periodicity Count Value (PCV) which is representative of a particular nucleotide sequence and created through adaptation from the concept of stochastic model of Kolekar et al. (American Institute of Physics 1298:307-312, 2010. doi: 10.1063/1.3516320 ). The PCV construct uses information on physicochemical properties of nucleotides and their positional distribution pattern within a gene. It is observed that PCV representation of gene reduces computational cost in the calculation of distances between a pair of genes while being consistent with the existing methods. The validity of PCV-based method was further tested through their use in molecular phylogeny constructs in comparison with that using existing sequence alignment methods.

Asunto(s)

Biología Computacional/métodos , Evolución Molecular , Filogenia , Algoritmos , Secuencia de Bases , Alineación de Secuencia , Análisis de Secuencia de ADN/métodos

Protein structure validation using a semi-empirical method.

Lahiri, Tapobrata; Singh, Kalpana; Pal, Manoj Kumar; Verma, Gaurav.

Bioinformation ; 8(20): 984-7, 2012.

Artículo en Inglés | MEDLINE | ID: mdl-23275692

RESUMEN

Current practice of validating predicted protein structural model is knowledge-based where scoring parameters are derived from already known structures to obtain decision on validation out of this structure information. For example, the scoring parameter, Ramachandran Score gives percentage conformity with steric-property higher value of which implies higher acceptability. On the other hand, Force-Field Energy Score gives conformity with energy-wise stability higher value of which implies lower acceptability. Naturally, setting these two scoring parameters as target objectives sometimes yields a set of multiple models for the same protein for which acceptance based on a particular parameter, say, Ramachandran score, may not satisfy well with the acceptance of the same model based on other parameter, say, energy score. The confusion set of such models can further be resolved by introducing some parameters value of which are easily obtainable through experiment on the same protein. In this piece of work it was found that the confusion regarding final acceptance of a model out of multiple models of the same protein can be removed using a parameter Surface Rough Index which can be obtained through semi-empirical method from the ordinary microscopic image of heat denatured protein.

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA