Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 3 de 3
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Methods Mol Biol ; 1484: 275-300, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-27787833

RESUMO

Here, we present two perspectives on the task of predicting post translational modifications (PTMs) from local sequence fragments using machine learning algorithms. The first is the description of the fundamental steps required to construct a PTM predictor from the very beginning. These steps include data gathering, feature extraction, or machine-learning classifier selection. The second part of our work contains the detailed discussion of more advanced problems which are encountered in PTM prediction task. Probably the most challenging issues which we have covered here are: (1) how to address the training data class imbalance problem (we also present statistics describing the problem); (2) how to properly set up cross-validation folds with an approach which takes into account the homology of protein data records, to address this problem we present our folds-over-clusters algorithm; and (3) how to efficiently reach for new sources of learning features. Presented techniques and notes resulted from intense studies in the field, performed by our and other groups, and can be useful both for researchers beginning in the field of PTM prediction and for those who want to extend the repertoire of their research techniques.


Assuntos
Biologia Computacional/métodos , Processamento de Proteína Pós-Traducional/genética , Proteínas/química , Software , Algoritmos , Sequência de Aminoácidos/genética , Aprendizado de Máquina , Proteínas/genética , Análise de Sequência de Proteína
2.
Mol Reprod Dev ; 83(2): 144-8, 2016 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26660717

RESUMO

Glyceraldehyde-3-phosphate dehydrogenase from human sperm (GAPDHS) provides energy to the sperm flagellum, and is therefore essential for sperm motility and male fertility. This isoform is distinct from somatic GAPDH, not only in being specific for the testis but also because it contains an additional amino-terminal region that encodes a proline-rich motif that is known to bind to the fibrous sheath of the sperm tail. By conducting a large-scale sequence comparison on low-complexity sequences available in databases, we identified a strong similarity between the proline-rich motif from GAPDHS and the proline-rich sequence from Ena/vasodilator-stimulated phosphoprotein-like (EVL), which is known to bind an SH3 domain of dynamin-binding protein (DNMBP). The putative binding partners of the proline-rich GAPDHS motif include SH3 domain-binding protein 4 (SH3BP4) and the IL2-inducible T-cell kinase/tyrosine-protein kinase ITK/TSK (ITK). This result implies that GAPDHS participates in specific signal-transduction pathways. Gene Ontology category-enrichment analysis showed several functional classes shared by both proteins, of which the most interesting ones are related to signal transduction and regulation of hydrolysis. Furthermore, a mutation of one EVL proline to leucine is known to cause colorectal cancer, suggesting that mutation of homologous amino acid residue in the GAPDHS motif may be functionally deleterious.


Assuntos
Gliceraldeído-3-Fosfato Desidrogenase (Fosforiladora) , Mutação de Sentido Incorreto , Cauda do Espermatozoide/enzimologia , Proteínas Adaptadoras de Transdução de Sinal/genética , Proteínas Adaptadoras de Transdução de Sinal/metabolismo , Substituição de Aminoácidos , Moléculas de Adesão Celular/genética , Moléculas de Adesão Celular/metabolismo , Gliceraldeído-3-Fosfato Desidrogenase (Fosforiladora)/genética , Gliceraldeído-3-Fosfato Desidrogenase (Fosforiladora)/metabolismo , Humanos , Leucina/genética , Leucina/metabolismo , Masculino , Prolina/genética , Prolina/metabolismo , Proteínas Tirosina Quinases/genética , Proteínas Tirosina Quinases/metabolismo , Transdução de Sinais , Domínios de Homologia de src/genética
3.
PeerJ ; 3: e1041, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26157620

RESUMO

Accurate identification of protein-protein interactions (PPI) is the key step in understanding proteins' biological functions, which are typically context-dependent. Many existing PPI predictors rely on aggregated features from protein sequences, however only a few methods exploit local information about specific residue contacts. In this work we present a two-stage machine learning approach for prediction of protein-protein interactions. We start with the carefully filtered data on protein complexes available for Saccharomyces cerevisiae in the Protein Data Bank (PDB) database. First, we build linear descriptions of interacting and non-interacting sequence segment pairs based on their inter-residue distances. Secondly, we train machine learning classifiers to predict binary segment interactions for any two short sequence fragments. The final prediction of the protein-protein interaction is done using the 2D matrix representation of all-against-all possible interacting sequence segments of both analysed proteins. The level-I predictor achieves 0.88 AUC for micro-scale, i.e., residue-level prediction. The level-II predictor improves the results further by a more complex learning paradigm. We perform 30-fold macro-scale, i.e., protein-level cross-validation experiment. The level-II predictor using PSIPRED-predicted secondary structure reaches 0.70 precision, 0.68 recall, and 0.70 AUC, whereas other popular methods provide results below 0.6 threshold (recall, precision, AUC). Our results demonstrate that multi-scale sequence features aggregation procedure is able to improve the machine learning results by more than 10% as compared to other sequence representations. Prepared datasets and source code for our experimental pipeline are freely available for download from: http://zubekj.github.io/mlppi/ (open source Python implementation, OS independent).

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...