Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Mais filtros

Base de dados
Ano de publicação
Tipo de documento
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 12: 412, 2011 Oct 25.
Artigo em Inglês | MEDLINE | ID: mdl-22026913

RESUMO

BACKGROUND: Machine learning methods are nowadays used for many biological prediction problems involving drugs, ligands or polypeptide segments of a protein. In order to build a prediction model a so called training data set of molecules with measured target properties is needed. For many such problems the size of the training data set is limited as measurements have to be performed in a wet lab. Furthermore, the considered problems are often complex, such that it is not clear which molecular descriptors (features) may be suitable to establish a strong correlation with the target property. In many applications all available descriptors are used. This can lead to difficult machine learning problems, when thousands of descriptors are considered and only few (e.g. below hundred) molecules are available for training. RESULTS: The CoEPrA contest provides four data sets, which are typical for biological regression problems (few molecules in the training data set and thousands of descriptors). We applied the same two-step training procedure for all four regression tasks. In the first stage, we used optimized L1 regularization to select the most relevant features. Thus, the initial set of more than 6,000 features was reduced to about 50. In the second stage, we used only the selected features from the preceding stage applying a milder L2 regularization, which generally yielded further improvement of prediction performance. Our linear model employed a soft loss function which minimizes the influence of outliers. CONCLUSIONS: The proposed two-step method showed good results on all four CoEPrA regression tasks. Thus, it may be useful for many other biological prediction problems where for training only a small number of molecules are available, which are described by thousands of descriptors.


Assuntos
Inteligência Artificial , Biologia Computacional/métodos , Animais , Bases de Dados Genéticas , Humanos , Internet , Análise de Componente Principal , Análise de Regressão
2.
Bioinformatics ; 26(5): 603-9, 2010 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-20097914

RESUMO

MOTIVATION: In silico methods to classify compounds as potential drugs that bind to a specific target become increasingly important for drug design. To build classification devices training sets of drugs with known activities are needed. For many such classification problems, not only qualitative but also quantitative information of a specific property (e.g. binding affinity) is available. The latter can be used to build a regression scheme to predict this property for new compounds. Predicting a compound property explicitly is generally more difficult than classifying that the property lies below or above a given threshold value. Hence, an indirect classification that is based on regression may lead to poorer results than a direct classification scheme. In fact, initially researchers are only interested to classify compounds as potential drugs. The activities of these compounds are subsequently measured in wet lab. RESULTS: We propose a novel approach that uses available quantitative information directly for classification rather than first using a regression scheme. It uses a new type of loss function called weighted biased regression. Application of this method to four widely studied datasets of the CoEPrA contest (Comparative Evaluation of Prediction Algorithms, http://coepra.org) shows that it can outperform simple classification methods that do not make use of this additional quantitative information. AVAILABILITY: A stand alone application is available at the webpage http://agknapp.chemie.fu-berlin.de/agknapp/index.php?menu=software&page=PeptideClassifier that can be used to build a model for a peptide training set to be submitted.


Assuntos
Algoritmos , Peptídeos/química , Sítios de Ligação , Bases de Dados Factuais , Desenho de Fármacos , Ligantes , Relação Quantitativa Estrutura-Atividade , Análise de Regressão
3.
J Comput Aided Mol Des ; 25(12): 1121-33, 2011 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-22101402

RESUMO

In silico methods characterizing molecular compounds with respect to pharmacologically relevant properties can accelerate the identification of new drugs and reduce their development costs. Quantitative structure-activity/-property relationship (QSAR/QSPR) correlate structure and physico-chemical properties of molecular compounds with a specific functional activity/property under study. Typically a large number of molecular features are generated for the compounds. In many cases the number of generated features exceeds the number of molecular compounds with known property values that are available for learning. Machine learning methods tend to overfit the training data in such situations, i.e. the method adjusts to very specific features of the training data, which are not characteristic for the considered property. This problem can be alleviated by diminishing the influence of unimportant, redundant or even misleading features. A better strategy is to eliminate such features completely. Ideally, a molecular property can be described by a small number of features that are chemically interpretable. The purpose of the present contribution is to provide a predictive modeling approach, which combines feature generation, feature selection, model building and control of overtraining into a single application called DemQSAR. DemQSAR is used to predict human volume of distribution (VD(ss)) and human clearance (CL). To control overtraining, quadratic and linear regularization terms were employed. A recursive feature selection approach is used to reduce the number of descriptors. The prediction performance is as good as the best predictions reported in the recent literature. The example presented here demonstrates that DemQSAR can generate a model that uses very few features while maintaining high predictive power. A standalone DemQSAR Java application for model building of any user defined property as well as a web interface for the prediction of human VD(ss) and CL is available on the webpage of DemPRED: http://agknapp.chemie.fu-berlin.de/dempred/ .


Assuntos
Preparações Farmacêuticas/química , Farmacocinética , Relação Quantitativa Estrutura-Atividade , Inteligência Artificial , Humanos , Taxa de Depuração Metabólica , Modelos Biológicos
4.
Chemphyschem ; 11(6): 1196-206, 2010 Apr 26.
Artigo em Inglês | MEDLINE | ID: mdl-20411561

RESUMO

Haehnel et al. synthesized 399 different artificial cytochrome b (aCb) models. They consist of a template-assisted four-helix bundle with one embedded heme group. Their redox potentials were measured and cover the range from -148 to -89 mV. No crystal structures of these aCb are available. Therefore, we use the chemical composition and general structural principles to generate atomic coordinates of 31 of these aCb mutants, which are chosen to cover the whole interval of redox potentials. We start by modeling the coordinates of one aCb from scratch. Its structure remains stable after energy minimization and during molecular dynamics simulation over 2 ns. Based on this structure, coordinates of the other 30 aCb mutants are modeled. The calculated redox potentials for these 31 aCb agree within 10 mV with the experimental values in terms of root mean square deviation. Analysis of the dependence of heme redox potential on protein environment shows that the shifts in redox potentials relative to the model systems in water are due to the low-dielectric medium of the protein and the protonation states of the heme propionic acid groups, which are influenced by the surrounding amino acids. Alternatively, we perform a blind prediction of the same redox potentials using an empirical approach based on a linear scoring function and reach a similar accuracy. Both methods are useful to understand and predict heme redox potentials. Based on the modeled structure we can understand the detailed structural differences between aCb mutants that give rise to shifts in heme redox potential. On the other hand, one can explore the correlation between sequence variations and aCb redox potentials more directly and on much larger scale using the empirical prediction scheme, which--thanks to its simplicity--is much faster.


Assuntos
Coenzimas/química , Citocromos b/química , Sequência de Aminoácidos , Substituição de Aminoácidos , Heme/química , Simulação de Dinâmica Molecular , Dados de Sequência Molecular , Mutação , Oxirredução , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Eletricidade Estática
5.
Genome Inform ; 24: 21-30, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-22081586

RESUMO

Protein-Protein interactions play an important role in many cellular processes. However experimental determination of the protein complex structure is quite difficult and time consuming. Hence, there is need for fast and accurate in silico protein docking methods. These methods generally consist of two stages: (i) a sampling algorithm that generates a large number of candidate complex geometries (decoys), and (ii) a scoring function that ranks these decoys such that nearnative decoys are higher ranked than other decoys. We have recently developed a neural network based scoring function that performed better than other state-of-the-art scoring functions on a benchmark of 65 protein complexes. Here, we use similar ideas to develop a method that is based on linear scoring functions. We compare the linear scoring function of the present study with other knowledge-based scoring functions such as ZDOCK 3.0, ZRANK and the previously developed neural network. Despite its simplicity the linear scoring function performs as good as the compared state-of-the-art methods and predictions are simple and rapid to compute.


Assuntos
Mapeamento de Interação de Proteínas/métodos , Proteínas/química , Software , Algoritmos , Biologia Computacional/métodos , Bases de Dados de Proteínas , Simulação de Acoplamento Molecular , Redes Neurais de Computação , Linguagens de Programação , Reprodutibilidade dos Testes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA