Pesquisa | BVS - MINISTÉRIO DA SAÚDE

Computational Methods for Predicting Functions at the mRNA Isoform Level.

Mishra, Sambit K; Muthye, Viraj; Kandoi, Gaurav.

Int J Mol Sci ; 21(16)2020 Aug 08.

Artigo em Inglês | MEDLINE | ID: mdl-32784445

RESUMO

Multiple mRNA isoforms of the same gene are produced via alternative splicing, a biological mechanism that regulates protein diversity while maintaining genome size. Alternatively spliced mRNA isoforms of the same gene may sometimes have very similar sequence, but they can have significantly diverse effects on cellular function and regulation. The products of alternative splicing have important and diverse functional roles, such as response to environmental stress, regulation of gene expression, human heritable, and plant diseases. The mRNA isoforms of the same gene can have dramatically different functions. Despite the functional importance of mRNA isoforms, very little has been done to annotate their functions. The recent years have however seen the development of several computational methods aimed at predicting mRNA isoform level biological functions. These methods use a wide array of proteo-genomic data to develop machine learning-based mRNA isoform function prediction tools. In this review, we discuss the computational methods developed for predicting the biological function at the individual mRNA isoform level.

Assuntos

Biologia Computacional/métodos , Isoformas de RNA/metabolismo , Processamento Alternativo/genética , Animais , Redes Reguladoras de Genes , Humanos , Aprendizado de Máquina , Isoformas de RNA/genética , RNA Mensageiro/genética , RNA Mensageiro/metabolismo

MMPdb and MitoPredictor: Tools for facilitating comparative analysis of animal mitochondrial proteomes.

Muthye, Viraj; Kandoi, Gaurav; Lavrov, Dennis V.

Mitochondrion ; 51: 118-125, 2020 03.

Artigo em Inglês | MEDLINE | ID: mdl-31972373

RESUMO

Data on experimentally-characterized animal mitochondrial proteomes (mt-proteomes) are limited to a few model organisms and are scattered across multiple databases, impeding a comparative analysis. We developed two resources to address these problems. First, we re-analyzed proteomic data from six species with experimentally characterized mt-proteomes: animals (Homo sapiens, Mus musculus, Caenorhabditis elegans, and Drosophila melanogaster), and outgroups (Acanthamoeba castellanii and Saccharomyces cerevisiae) and created the Metazoan Mitochondrial Proteome Database (MMPdb) to host the results. Second, we developed a novel pipeline, "MitoPredictor" that uses a Random Forest classifier to infer mitochondrial localization of proteins based on orthology, mitochondrial targeting signal prediction, and protein domain analyses. Both tools generate an R Shiny applet that can be used to visualize and interact with the results and can be used on a personal computer. MMPdb is also available online at https://mmpdb.eeob.iastate.edu/.

Assuntos

Bases de Dados de Proteínas , Aprendizado de Máquina , Mitocôndrias/metabolismo , Proteínas Mitocondriais/metabolismo , Acanthamoeba castellanii , Animais , Caenorhabditis elegans , Drosophila melanogaster , Metabolismo Energético/fisiologia , Humanos , Camundongos , Proteoma/genética , Saccharomyces cerevisiae

Tissue-specific mouse mRNA isoform networks.

Kandoi, Gaurav; Dickerson, Julie A.

Sci Rep ; 9(1): 13949, 2019 09 27.

Artigo em Inglês | MEDLINE | ID: mdl-31562339

RESUMO

Alternative Splicing produces multiple mRNA isoforms of genes which have important diverse roles such as regulation of gene expression, human heritable diseases, and response to environmental stresses. However, little has been done to assign functions at the mRNA isoform level. Functional networks, where the interactions are quantified by their probability of being involved in the same biological process are typically generated at the gene level. We use a diverse array of tissue-specific RNA-seq datasets and sequence information to train random forest models that predict the functional networks. Since there is no mRNA isoform-level gold standard, we use single isoform genes co-annotated to Gene Ontology biological process annotations, Kyoto Encyclopedia of Genes and Genomes pathways, BioCyc pathways and protein-protein interactions as functionally related (positive pair). To generate the non-functional pairs (negative pair), we use the Gene Ontology annotations tagged with "NOT" qualifier. We describe 17 Tissue-spEcific mrNa iSoform functIOnal Networks (TENSION) following a leave-one-tissue-out strategy in addition to an organism level reference functional network for mouse. We validate our predictions by comparing its performance with previous methods, randomized positive and negative class labels, updated Gene Ontology annotations, and by literature evidence. We demonstrate the ability of our networks to reveal tissue-specific functional differences of the isoforms of the same genes. All scripts and data from TENSION are available at: https://doi.org/10.25380/iastate.c.4275191 .

Assuntos

Redes Reguladoras de Genes/fisiologia , Isoformas de RNA/metabolismo , RNA Mensageiro/metabolismo , Algoritmos , Processamento Alternativo , Animais , Camundongos , Modelos Genéticos , Especificidade de Órgãos , Isoformas de RNA/genética , RNA Mensageiro/genética

Coupling dynamics and evolutionary information with structure to identify protein regulatory and functional binding sites.

Mishra, Sambit K; Kandoi, Gaurav; Jernigan, Robert L.

Proteins ; 87(10): 850-868, 2019 10.

Artigo em Inglês | MEDLINE | ID: mdl-31141211

RESUMO

Binding sites in proteins can be either specifically functional binding sites (active sites) that bind specific substrates with high affinity or regulatory binding sites (allosteric sites), that modulate the activity of functional binding sites through effector molecules. Owing to their significance in determining protein function, the identification of protein functional and regulatory binding sites is widely acknowledged as an important biological problem. In this work, we present a novel binding site prediction method, Active and Regulatory site Prediction (AR-Pred), which supplements protein geometry, evolutionary, and physicochemical features with information about protein dynamics to predict putative active and allosteric site residues. As the intrinsic dynamics of globular proteins plays an essential role in controlling binding events, we find it to be an important feature for the identification of protein binding sites. We train and validate our predictive models on multiple balanced training and validation sets with random forest machine learning and obtain an ensemble of discrete models for each prediction type. Our models for active site prediction yield a median area under the curve (AUC) of 91% and Matthews correlation coefficient (MCC) of 0.68, whereas the less well-defined allosteric sites are predicted at a lower level with a median AUC of 80% and MCC of 0.48. When tested on an independent set of proteins, our models for active site prediction show comparable performance to two existing methods and gains compared to two others, while the allosteric site models show gains when tested against three existing prediction methods. AR-Pred is available as a free downloadable package at https://github.com/sambitmishra0628/AR-PRED_source.

Assuntos

Inteligência Artificial , Evolução Molecular , Simulação de Dinâmica Molecular , Conformação Proteica , Proteínas/química , Proteínas/metabolismo , Algoritmos , Regulação Alostérica , Sítio Alostérico , Sítios de Ligação , Bases de Dados de Proteínas , Humanos , Aprendizado de Máquina , Ligação Proteica

Predicting Protein Secondary Structure Using Consensus Data Mining (CDM) Based on Empirical Statistics and Evolutionary Information.

Kandoi, Gaurav; Leelananda, Sumudu P; Jernigan, Robert L; Sen, Taner Z.

Methods Mol Biol ; 1484: 35-44, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-27787818

RESUMO

Predicting the secondary structure of a protein from its sequence still remains a challenging problem. The prediction accuracies remain around 80 %, and for very diverse methods. Using evolutionary information and machine learning algorithms in particular has had the most impact. In this chapter, we will first define secondary structures, then we will review the Consensus Data Mining (CDM) technique based on the robust GOR algorithm and Fragment Database Mining (FDM) approach. GOR V is an empirical method utilizing a sliding window approach to model the secondary structural elements of a protein by making use of generalized evolutionary information. FDM uses data mining from experimental structure fragments, and is able to successfully predict the secondary structure of a protein by combining experimentally determined structural fragments based on sequence similarities of the fragments. The CDM method combines predictions from GOR V and FDM in a hierarchical manner to produce consensus predictions for secondary structure. In other words, if sequence fragment are not available, then it uses GOR V to make the secondary structure prediction. The online server of CDM is available at http://gor.bb.iastate.edu/cdm/ .

Assuntos

Estrutura Secundária de Proteína/genética , Proteínas/genética , Software , Algoritmos , Sequência de Aminoácidos/genética , Mineração de Dados , Proteínas/química , Alinhamento de Sequência/métodos

Prediction of Druggable Proteins Using Machine Learning and Systems Biology: A Mini-Review.

Kandoi, Gaurav; Acencio, Marcio L; Lemke, Ney.

Front Physiol ; 6: 366, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26696900

RESUMO

The emergence of -omics technologies has allowed the collection of vast amounts of data on biological systems. Although, the pace of such collection has been exponential, the impact of these data remains small on many critical biomedical applications such as drug development. Limited resources, high costs, and low hit-to-lead ratio have led researchers to search for more cost effective methodologies. A possible alternative is to incorporate computational methods of potential drug target prediction early during drug discovery workflow. Computational methods based on systems approaches have the advantage of taking into account the global properties of a molecule not limited to its sequence, structure or function. Machine learning techniques are powerful tools that can extract relevant information from massive and noisy data sets. In recent years the scientific community has explored the combined power of these fields to propose increasingly accurate and low cost methods to propose interesting drug targets. In this mini-review, we describe promising approaches based on the simultaneous use of systems biology and machine learning to access gene and protein druggability. Moreover, we discuss the state-of-the-art of this emerging and interdisciplinary field, discussing data sources, algorithms and the performance of the different methodologies. Finally, we indicate interesting avenues of research and some remaining open challenges.

HGV&TB: a comprehensive online resource on human genes and genetic variants associated with tuberculosis.

Sahajpal, Ruchika; Kandoi, Gaurav; Dhiman, Heena; Raj, Sweety; Scaria, Vinod; Bhartiya, Deeksha; Hasija, Yasha.

Database (Oxford) ; 2014: bau112, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25502817

RESUMO

Tuberculosis (TB) is an infectious disease caused by fastidious pathogen Mycobacterium tuberculosis. TB has emerged as one of the major causes of mortality in the developing world. Role of host genetic factors that modulate disease susceptibility have not been studied widely. Recent studies have reported few genetic loci that provide impetus to this area of research. The availability of tools has enabled genome-wide scans for disease susceptibility loci associated with infectious diseases. Till now, information on human genetic variations and their associated genes that modulate TB susceptibility have not been systematically compiled. In this work, we have created a resource: HGV&TB, which hosts genetic variations reported to be associated with TB susceptibility in humans. It currently houses information on 307 variations in 98 genes. In total, 101 of these variations are exonic, whereas 78 fall in intronic regions. We also analysed the pathogenicity of the genetic variations, their phenotypic consequences and ethnic origin. Using various computational analyses, 30 variations of the 101 exonic variations were predicted to be pathogenic. The resource is freely available at http://genome.igib.res.in/hgvtb/index.html. Using integrative analysis, we have shown that the disease associated variants are selectively enriched in the immune signalling pathways which are crucial in the pathophysiology of TB. Database URL: http://genome.igib.res.in/hgvtb/index.html

Assuntos

Predisposição Genética para Doença , Variação Genética , Internet , Tuberculose/genética , Mapeamento Cromossômico , Bases de Dados Genéticas , Loci Gênicos , Genética Populacional , Genoma Humano/genética , Humanos , Fenótipo , Característica Quantitativa Herdável , Software

A case for pharmacogenomics in management of cardiac arrhythmias.

Kandoi, Gaurav; Nanda, Anjali; Scaria, Vinod; Sivasubbu, Sridhar.

Indian Pacing Electrophysiol J ; 12(2): 54-64, 2012 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-22557843

RESUMO

Disorders of the cardiac rhythm are quite prevalent in clinical practice. Though the variability in drug response between individuals has been extensively studied, this information has not been widely used in clinical practice. Rapid advances in the field of pharmacogenomics have provided us with crucial insights on inter-individual genetic variability and its impact on drug metabolism and action. Technologies for faster and cheaper genetic testing and even personal genome sequencing would enable clinicians to optimize prescription based on the genetic makeup of the individual, which would open up new avenues in the area of personalized medicine. We have systematically looked at literature evidence on pharmacogenomics markers for anti-arrhythmic agents from the OpenPGx consortium collection and reason the applicability of genetics in the management of arrhythmia. We also discuss potential issues that need to be resolved before personalized pharmacogenomics becomes a reality in regular clinical practice.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA