Pesquisa | BVS - MINISTÉRIO DA SAÚDE

Graph Theory-Based Sequence Descriptors as Remote Homology Predictors.

Agüero-Chapin, Guillermin; Galpert, Deborah; Molina-Ruiz, Reinaldo; Ancede-Gallardo, Evys; Pérez-Machado, Gisselle; de la Riva, Gustavo A; Antunes, Agostinho.

Biomolecules ; 10(1)2019 12 23.

Artigo em Inglês | MEDLINE | ID: mdl-31878100

RESUMO

Alignment-free (AF) methodologies have increased in popularity in the last decades as alternative tools to alignment-based (AB) algorithms for performing comparative sequence analyses. They have been especially useful to detect remote homologs within the twilight zone of highly diverse gene/protein families and superfamilies. The most popular alignment-free methodologies, as well as their applications to classification problems, have been described in previous reviews. Despite a new set of graph theory-derived sequence/structural descriptors that have been gaining relevance in the detection of remote homology, they have been omitted as AF predictors when the topic is addressed. Here, we first go over the most popular AF approaches used for detecting homology signals within the twilight zone and then bring out the state-of-the-art tools encoding graph theory-derived sequence/structure descriptors and their success for identifying remote homologs. We also highlight the tendency of integrating AF features/measures with the AB ones, either into the same prediction model or by assembling the predictions from different algorithms using voting/weighting strategies, for improving the detection of remote signals. Lastly, we briefly discuss the efforts made to scale up AB and AF features/measures for the comparison of multiple genomes and proteomes. Alongside the achieved experiences in remote homology detection by both the most popular AF tools and other less known ones, we provide our own using the graphical-numerical methodologies, MARCH-INSIDE, TI2BioP, and ProtDCal. We also present a new Python-based tool (SeqDivA) with a friendly graphical user interface (GUI) for delimiting the twilight zone by using several similar criteria.

Assuntos

Biologia Computacional/métodos , Gráficos por Computador , Análise de Sequência de Proteína , Homologia de Sequência , Sequência de Aminoácidos

Non-linear models based on simple topological indices to identify RNase III protein members.

Agüero-Chapin, Guillermin; de la Riva, Gustavo A; Molina-Ruiz, Reinaldo; Sánchez-Rodríguez, Aminael; Pérez-Machado, Gisselle; Vasconcelos, Vítor; Antunes, Agostinho.

J Theor Biol ; 273(1): 167-78, 2011 Mar 21.

Artigo em Inglês | MEDLINE | ID: mdl-21192951

RESUMO

Alignment-free classifiers are especially useful in the functional classification of protein classes with variable homology and different domain structures. Thus, the Topological Indices to BioPolymers (TI2BioP) methodology (Agüero-Chapin et al., 2010) inspired in both the TOPS-MODE and the MARCH-INSIDE methodologies allows the calculation of simple topological indices (TIs) as alignment-free classifiers. These indices were derived from the clustering of the amino acids into four classes of hydrophobicity and polarity revealing higher sequence-order information beyond the amino acid composition level. The predictability power of such TIs was evaluated for the first time on the RNase III family, due to the high diversity of its members (primary sequence and domain organization). Three non-linear models were developed for RNase III class prediction: Decision Tree Model (DTM), Artificial Neural Networks (ANN)-model and Hidden Markov Model (HMM). The first two are alignment-free approaches, using TIs as input predictors. Their performances were compared with a non-classical HMM, modified according to our amino acid clustering strategy. The alignment-free models showed similar performances on the training and the test sets reaching values above 90% in the overall classification. The non-classical HMM showed the highest rate in the classification with values above 95% in training and 100% in test. Although the higher accuracy of the HMM, the DTM showed simplicity for the RNase III classification with low computational cost. Such simplicity was evaluated in respect to HMM and ANN models for the functional annotation of a new bacterial RNase III class member, isolated and annotated by our group.

Assuntos

Dinâmica não Linear , Ribonuclease III/química , Sequência de Aminoácidos , Árvores de Decisões , Ensaios Enzimáticos , Escherichia coli/enzimologia , Cadeias de Markov , Dados de Sequência Molecular , Redes Neurais de Computação , Conformação Proteica , Curva ROC , Proteínas Recombinantes/química , Proteínas Recombinantes/metabolismo , Reprodutibilidade dos Testes , Ribonuclease III/isolamento & purificação , Alinhamento de Sequência

Alignment-free prediction of polygalacturonases with pseudofolding topological indices: experimental isolation from Coffea arabica and prediction of a new sequence.

Agüero-Chapin, Guillermín; Varona-Santos, Javier; de la Riva, Gustavo A; Antunes, Agostinho; González-Vlla, Tomás; Uriarte, Eugenio; González-Díaz, Humberto.

J Proteome Res ; 8(4): 2122-8, 2009 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-19296677

RESUMO

Polygalacturonases (PGs) have called the attention of microbiology scientists and biotechnology or pharmaceutical industry because they are protein enzymes relevant to phytopathogens invasion, fruit ripening, and potential antimicrobial drug targets. Numeric Topological Indices (TIs) of protein pseudofolding lattices can be used as input for classification algorithms in Quantitative Structure-Activity Relationship (OSAR) studies. However, a comparative study of different OSAR models for PGs has not been reported. In this study, we calculated for the first time two classes of TIs (Spectral moments (pik) and Entropy (thetak) values) for the Markov matrices associated to pseudofolding lattices of 108 PGs and 100 non-PGs heterogeneous proteins. Afterward, we developed different linear classifiers based on Linear Discriminant Analysis (LDA) and four types of nonlinear Artificial Neural Networks (ANN). The pik-LDA model correctly classified 98.8% of PGs and 100% non-PGs used to train the model, as well as 98.1% of all sequences used as external validation series. The rk-LDA model was the more accurate and/or simpler found. In addition, we report for the first time the experimental isolation and successful prediction of a new PG sequence from Coffea arabica. This sequence was deposited in the GenBank by our group with accession number GDQ336394. The present type of models are an interesting alignment-free complement to alignment-based procedures.

Assuntos

Coffea/enzimologia , Simulação por Computador , Proteínas de Plantas/metabolismo , Poligalacturonase/metabolismo , Dobramento de Proteína , Poligalacturonase/isolamento & purificação , Relação Quantitativa Estrutura-Atividade

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA