Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
1.
IEEE Trans Pattern Anal Mach Intell ; 43(9): 3055-3066, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-33539291

RESUMO

Automated machine learning (AutoML) seeks to automatically find so-called machine learning pipelines that maximize the prediction performance when being used to train a model on a given dataset. One of the main and yet open challenges in AutoMLis an effective use of computational resources: An AutoML process involves the evaluation of many candidate pipelines, which are costly but often ineffective because they are canceled due to a timeout. In this paper, we present an approach to predict the runtime of two-step machine learning pipelines with up to one pre-processor, which can be used to anticipate whether or not a pipeline will time out. Separate runtime models are trained offline for each algorithm that may be used in a pipeline, and an overall prediction is derived from these models. We empirically show that the approach increases successful evaluations made by an AutoML tool while preserving or even improving on the previously best solutions.

2.
IEEE Trans Pattern Anal Mach Intell ; 43(9): 3037-3054, 2021 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-33439834

RESUMO

Automated machine learning (AutoML) supports the algorithmic construction and data-specific customization of machine learning pipelines, including the selection, combination, and parametrization of machine learning algorithms as main constituents. Generally speaking, AutoML approaches comprise two major components: a search space model and an optimizer for traversing the space. Recent approaches have shown impressive results in the realm of supervised learning, most notably (single-label) classification (SLC). Moreover, first attempts at extending these approaches towards multi-label classification (MLC) have been made. While the space of candidate pipelines is already huge in SLC, the complexity of the search space is raised to an even higher power in MLC. One may wonder, therefore, whether and to what extent optimizers established for SLC can scale to this increased complexity, and how they compare to each other. This paper makes the following contributions: First, we survey existing approaches to AutoML for MLC. Second, we augment these approaches with optimizers not previously tried for MLC. Third, we propose a benchmarking framework that supports a fair and systematic comparison. Fourth, we conduct an extensive experimental study, evaluating the methods on a suite of MLC problems. We find a grammar-based best-first search to compare favorably to other optimizers.

3.
BioData Min ; 9: 10, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26933450

RESUMO

BACKGROUND: Antiretroviral therapy is essential for human immunodeficiency virus (HIV) infected patients to inhibit viral replication and therewith to slow progression of disease and prolong a patient's life. However, the high mutation rate of HIV can lead to a fast adaptation of the virus under drug pressure and thereby to the evolution of resistant variants. In turn, these variants will lead to the failure of antiretroviral treatment. Moreover, these mutations cannot only lead to resistance against single drugs, but also to cross-resistance, i.e., resistance against drugs that have not yet been applied. METHODS: 662 protease sequences and 715 reverse transcriptase sequences with complete resistance profiles were analyzed using machine learning techniques, namely binary relevance classifiers, classifier chains, and ensembles of classifier chains. RESULTS: In our study, we applied multi-label classification models incorporating cross-resistance information to predict drug resistance for two of the major drug classes used in antiretroviral therapy for HIV-1, namely protease inhibitors (PIs) and non-nucleoside reverse transcriptase inhibitors (NNRTIs). By means of multi-label learning, namely classifier chains (CCs) and ensembles of classifier chains (ECCs), we were able to improve overall prediction accuracy for all drugs compared to hitherto applied binary classification models. CONCLUSIONS: The development of fast and precise models to predict drug resistance in HIV-1 is highly important to enable a highly effective personalized therapy. Cross-resistance information can be exploited to improve prediction accuracy of computational drug resistance models.

4.
J Bioinform Comput Biol ; 12(1): 1350016, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24467755

RESUMO

Predicting the sub-cellular localization of proteins is an important task in bioinformatics, for which many standard prediction tools are available. While these tools are powerful in general and capable of predicting protein localization for the most common compartments, their performance strongly depends on the organism of interest. More importantly, there are special compartments, such as the apicoplast of apicomplexan parasites, for which these tools cannot provide a prediction at all. In the absence of a highly conserved targeting signal, even motif searches may not be able to provide a lead for the accurate prediction of protein localization for a compartment of interest. In order to approach difficult cases of that kind, we propose an alternative method that complements existing approaches by using a more targeted protein sequence model. Moreover, our method makes use of (weighted) measures for time series comparison. To demonstrate its performance, we use this method for predicting localization in special compartments of three different species, for which existing methods yield only sub-optimal results. As shown experimentally, our method is indeed capable of producing reliable predictions of sub-cellular localization for difficult cases, i.e. if training data is scarce and a potential protein targeting signal may not be well conserved.


Assuntos
Biologia Computacional/métodos , Proteínas/metabolismo , Análise de Sequência de Proteína/métodos , Apicoplastos/metabolismo , Plasmodium falciparum/metabolismo , Plastídeos/metabolismo , Sinais Direcionadores de Proteínas , Frações Subcelulares , Toxoplasma/metabolismo
6.
J Clin Epidemiol ; 67(2): 124-32, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24290150

RESUMO

OBJECTIVES: The classical diagnostic cross-sectional study has a focus on one disease only. Generalist clinicians, however, are confronted with a wide range of diagnoses. We propose the "comprehensive diagnostic study design" to evaluate diagnostic tests regarding more than one disease outcome. STUDY DESIGN AND SETTING: We present the secondary analysis of a data set obtained from patients presenting with chest pain in primary care. Participating clinicians recorded 42 items of the history and physical examination. Diagnostic outcomes were reviewed by an independent panel after 6-month follow-up (n = 710 complete cases). We used Shannon entropy as a measure of uncertainty before and after testing. Four different analytical strategies modeling specific clinical ways of reasoning were evaluated. RESULTS: Although the "global entropy" strategy reduced entropy most, it is unlikely to be of clinical use because of its complexity. "Inductive" and "fixed-set" strategies turned out to be efficient requiring a small amount of data only. The "deductive" procedure resulted in the smallest reduction of entropy. CONCLUSION: We suggest that the comprehensive diagnostic study design is a feasible and valid option to improve our understanding of the diagnostic process. It is also promising as a justification for clinical recommendations.


Assuntos
Sistemas de Apoio a Decisões Clínicas , Testes Diagnósticos de Rotina/métodos , Testes Diagnósticos de Rotina/normas , Dor no Peito/diagnóstico , Estudos Transversais , Humanos , Exame Físico , Atenção Primária à Saúde/métodos , Incerteza
7.
Artigo em Inglês | MEDLINE | ID: mdl-26356860

RESUMO

To calculate similarities between molecular structures, measures based on the maximum common subgraph are frequently applied. For the comparison of protein binding sites, these measures are not fully appropriate since graphs representing binding sites on a detailed atomic level tend to get very large. In combination with an NP-hard problem, a large graph leads to a computationally demanding task. Therefore, for the comparison of binding sites, a less detailed coarse graph model is used building upon so-called pseudocenters. Consistently, a loss of structural data is caused since many atoms are discarded and no information about the shape of the binding site is considered. This is usually resolved by performing subsequent calculations based on additional information. These steps are usually quite expensive, making the whole approach very slow. The main drawback of a graph-based model solely based on pseudocenters, however, is the loss of information about the shape of the protein surface. In this study, we propose a novel and efficient modeling formalism that does not increase the size of the graph model compared to the original approach, but leads to graphs containing considerably more information assigned to the nodes. More specifically, additional descriptors considering surface characteristics are extracted from the local surface and attributed to the pseudocenters stored in Cavbase. These properties are evaluated as additional node labels, which lead to a gain of information and allow for much faster but still very accurate comparisons between different structures.


Assuntos
Sítios de Ligação , Biologia Computacional/métodos , Modelos Moleculares , Ligação Proteica , Proteínas/química , Proteínas/metabolismo , Algoritmos , Bases de Dados de Proteínas
8.
Artigo em Inglês | MEDLINE | ID: mdl-26357052

RESUMO

Enzyme sequences and structures are routinely used in the biological sciences as queries to search for functionally related enzymes in online databases. To this end, one usually departs from some notion of similarity, comparing two enzymes by looking for correspondences in their sequences, structures or surfaces. For a given query, the search operation results in a ranking of the enzymes in the database, from very similar to dissimilar enzymes, while information about the biological function of annotated database enzymes is ignored. In this work, we show that rankings of that kind can be substantially improved by applying kernel-based learning algorithms. This approach enables the detection of statistical dependencies between similarities of the active cleft and the biological function of annotated enzymes. This is in contrast to search-based approaches, which do not take annotated training data into account. Similarity measures based on the active cleft are known to outperform sequence-based or structure-based measures under certain conditions. We consider the Enzyme Commission (EC) classification hierarchy for obtaining annotated enzymes during the training phase. The results of a set of sizeable experiments indicate a consistent and significant improvement for a set of similarity measures that exploit information about small cavities in the surface of enzymes.


Assuntos
Sítios de Ligação , Biologia Computacional/métodos , Enzimas/química , Aprendizado de Máquina , Algoritmos , Bases de Dados de Proteínas , Conformação Proteica
9.
Z Evid Fortbild Qual Gesundhwes ; 107(9-10): 585-91, 2013.
Artigo em Alemão | MEDLINE | ID: mdl-24315328

RESUMO

In a primary care setting the diagnostic process typically starts with a symptom or sign reported by the patient. Primary care physicians face the challenge to consider a broad spectrum of possible aetiologies or differential diagnoses when choosing appropriate diagnostic tests. The classical diagnostic cross-sectional study investigates the accuracy of a diagnostic test or a combination of several tests in regard to just one target disease. The complexity facing the clinician remains unconsidered or is being split and presented in several parts which the clinician has to combine. In this paper we suggest a design for diagnostic studies that considers the requirements of diagnosis in primary care more comprehensively: the comprehensive diagnostic study. The essential characteristic of the design is the simultaneous consideration of the whole spectrum of relevant aetiologies when evaluating several diagnostic tests. We present single characteristics and specific features of this design in regard to research question, study sampling, index test, reference standard and analysis, and illustrate them using the example of a study investigating chest pain in primary care.


Assuntos
Dor no Peito/etiologia , Atenção Primária à Saúde , Inteligência Artificial , Estudos Transversais , Diagnóstico Diferencial , Testes Diagnósticos de Rotina , Medicina Baseada em Evidências , Alemanha , Humanos , Teoria da Informação , Anamnese , Exame Físico , Valor Preditivo dos Testes , Projetos de Pesquisa
10.
BMC Fam Pract ; 14: 154, 2013 Oct 18.
Artigo em Inglês | MEDLINE | ID: mdl-24138299

RESUMO

BACKGROUND: Chest pain is a common complaint and reason for consultation in primary care. Traditional textbooks still assign pain localization a certain discriminative role in the differential diagnosis of chest pain. The aim of our study was to synthesize pain drawings from a large sample of chest pain patients and to examine whether pain localizations differ for different underlying etiologies. METHODS: We conducted a cross-sectional study including 1212 consecutive patients with chest pain recruited in 74 primary care offices in Germany. Primary care providers (PCPs) marked pain localization and radiation of each patient on a pictogram. After 6 months, an independent interdisciplinary reference panel reviewed clinical data of every patient, deciding on the etiology of chest pain at the time of patient recruitment. PCP drawings were entered in a specially designed computer program to produce merged pain charts for different etiologies. Dissimilarities between individual pain localizations and differences on the level of diagnostic groups were analyzed using the Hausdorff distance and the C-index. RESULTS: Pain location in patients with coronary heart disease (CHD) did not differ from the combined group of all other patients, including patients with chest wall syndrome (CWS), gastro-esophageal reflux disease (GERD) or psychogenic chest pain. There was also no difference in chest pain location between male and female CHD patients. CONCLUSIONS: Pain localization is not helpful in discriminating CHD from other common chest pain etiologies.


Assuntos
Dor no Peito/fisiopatologia , Doença das Coronárias/diagnóstico , Refluxo Gastroesofágico/diagnóstico , Transtornos Psicofisiológicos/diagnóstico , Síndrome de Tietze/diagnóstico , Dor no Peito/etiologia , Estudos de Coortes , Doença das Coronárias/complicações , Estudos Transversais , Diagnóstico Diferencial , Feminino , Refluxo Gastroesofágico/complicações , Humanos , Hipertensão/complicações , Hipertensão/diagnóstico , Masculino , Exame Físico , Pleuropneumonia/complicações , Pleuropneumonia/diagnóstico , Atenção Primária à Saúde , Transtornos Psicofisiológicos/complicações , Infecções Respiratórias/complicações , Infecções Respiratórias/diagnóstico , Gastropatias/complicações , Gastropatias/diagnóstico , Traumatismos Torácicos/complicações , Traumatismos Torácicos/diagnóstico , Síndrome de Tietze/complicações
11.
Bioinformatics ; 29(16): 1946-52, 2013 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-23793752

RESUMO

MOTIVATION: Antiretroviral treatment regimens can sufficiently suppress viral replication in human immunodeficiency virus (HIV)-infected patients and prevent the progression of the disease. However, one of the factors contributing to the progression of the disease despite ongoing antiretroviral treatment is the emergence of drug resistance. The high mutation rate of HIV can lead to a fast adaptation of the virus under drug pressure, thus to failure of antiretroviral treatment due to the evolution of drug-resistant variants. Moreover, cross-resistance phenomena have been frequently found in HIV-1, leading to resistance not only against a drug from the current treatment, but also to other not yet applied drugs. Automatic classification and prediction of drug resistance is increasingly important in HIV research as well as in clinical settings, and to this end, machine learning techniques have been widely applied. Nevertheless, cross-resistance information was not taken explicitly into account, yet. RESULTS: In our study, we demonstrated the use of cross-resistance information to predict drug resistance in HIV-1. We tested a set of more than 600 reverse transcriptase sequences and corresponding resistance information for six nucleoside analogues. Based on multilabel classification models and cross-resistance information, we were able to significantly improve overall prediction accuracy for all drugs, compared with single binary classifiers without any additional information. Moreover, we identified drug-specific patterns within the reverse transcriptase sequences that can be used to determine an optimal order of the classifiers within the classifier chains. These patterns are in good agreement with known resistance mutations and support the use of cross-resistance information in such prediction models. CONTACT: dominik.heider@uni-due.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Fármacos Anti-HIV/farmacologia , HIV-1/efeitos dos fármacos , Inibidores da Transcriptase Reversa/farmacologia , Farmacorresistência Viral/genética , Transcriptase Reversa do HIV/genética , HIV-1/genética , Mutação , Análise de Sequência , Replicação Viral/efeitos dos fármacos
12.
Mol Inform ; 31(6-7): 443-52, 2012 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-27477463

RESUMO

A key task in structural biology is to define a meaningful similarity measure for the comparison of protein structures. Recently, the use of graphs as modeling tools for molecular data has gained increasing importance. In this context, kernel functions have attracted a lot of attention, especially since they allow for the application of a rich repertoire of methods from the field of kernel-based machine learning. However, most of the existing graph kernels have been designed for unlabeled and/or unweighted graphs, although proteins are often more naturally and more exactly represented in terms of node-labeled and edge-weighted graphs. Here we analyze kernel-based protein comparison methods and propose extensions to existing graph kernels to exploit node-labeled and edge-weighted graphs. Moreover, we propose an instance of the substructure fingerprint kernel suitable for the analysis of protein binding sites. By using fuzzy fingerprints, we solve the problem of discontinuity on bin-boundaries arising in the case of labeled graphs.

13.
BMC Med Res Methodol ; 11: 155, 2011 Nov 22.
Artigo em Inglês | MEDLINE | ID: mdl-22108386

RESUMO

BACKGROUND: In chest pain, physicians are confronted with numerous interrelationships between symptoms and with evidence for or against classifying a patient into different diagnostic categories. The aim of our study was to find natural groups of patients on the basis of risk factors, history and clinical examination data which should then be validated with patients' final diagnoses. METHODS: We conducted a cross-sectional diagnostic study in 74 primary care practices to establish the validity of symptoms and findings for the diagnosis of coronary heart disease. A total of 1199 patients above age 35 presenting with chest pain were included in the study. General practitioners took a standardized history and performed a physical examination. They also recorded their preliminary diagnoses, investigations and management related to the patient's chest pain. We used multiple correspondence analysis (MCA) to examine associations on variable level, and multidimensional scaling (MDS), k-means and fuzzy cluster analyses to search for subgroups on patient level. We further used heatmaps to graphically illustrate the results. RESULTS: A multiple correspondence analysis supported our data collection strategy on variable level. Six factors emerged from this analysis: "chest wall syndrome", "vital threat", "stomach and bowel pain", "angina pectoris", "chest infection syndrome", and " self-limiting chest pain". MDS, k-means and fuzzy cluster analysis on patient level were not able to find distinct groups. The resulting cluster solutions were not interpretable and had insufficient statistical quality criteria. CONCLUSIONS: Chest pain is a heterogeneous clinical category with no coherent associations between signs and symptoms on patient level.


Assuntos
Dor no Peito/diagnóstico , Doença das Coronárias/diagnóstico , Clínicos Gerais/estatística & dados numéricos , Padrões de Prática Médica/estatística & dados numéricos , Adulto , Idoso , Dor no Peito/classificação , Dor no Peito/etiologia , Análise por Conglomerados , Doença das Coronárias/classificação , Doença das Coronárias/complicações , Estudos Transversais , Diagnóstico Diferencial , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Análise Multivariada
14.
Artigo em Inglês | MEDLINE | ID: mdl-21358005

RESUMO

Geometric objects are often represented approximately in terms of a finite set of points in three-dimensional euclidean space. In this paper, we extend this representation to what we call labeled point clouds. A labeled point cloud is a finite set of points, where each point is not only associated with a position in three-dimensional space, but also with a discrete class label that represents a specific property. This type of model is especially suitable for modeling biomolecules such as proteins and protein binding sites, where a label may represent an atom type or a physico-chemical property. Proceeding from this representation, we address the question of how to compare two labeled points clouds in terms of their similarity. Using fuzzy modeling techniques, we develop a suitable similarity measure as well as an efficient evolutionary algorithm to compute it. Moreover, we consider the problem of establishing an alignment of the structures in the sense of a one-to-one correspondence between their basic constituents. From a biological point of view, alignments of this kind are of great interest, since mutually corresponding molecular constituents offer important information about evolution and heredity, and can also serve as a means to explain a degree of similarity. In this paper, we therefore develop a method for computing pairwise or multiple alignments of labeled point clouds. To this end, we proceed from an optimal superposition of the corresponding point clouds and construct an alignment which is as much as possible in agreement with the neighborhood structure established by this superposition. We apply our methods to the structural analysis of protein binding sites.


Assuntos
Algoritmos , Proteínas/química , Alinhamento de Sequência/métodos , Sítios de Ligação , Análise de Sequência de Proteína/métodos
15.
Artigo em Inglês | MEDLINE | ID: mdl-21339532

RESUMO

Comparative analysis is a topic of utmost importance in structural bioinformatics. Recently, a structural counterpart to sequence alignment, called multiple graph alignment, was introduced as a tool for the comparison of protein structures in general and protein binding sites in particular. Using approximate graph matching techniques, this method enables the identification of approximately conserved patterns in functionally related structures. In this paper, we introduce a new method for computing graph alignments motivated by two problems of the original approach, a conceptual and a computational one. First, the existing approach is of limited usefulness for structures that only share common substructures. Second, the goal to find a globally optimal alignment leads to an optimization problem that is computationally intractable. To overcome these disadvantages, we propose a semiglobal approach to graph alignment in analogy to semiglobal sequence alignment that combines the advantages of local and global graph matching.


Assuntos
Algoritmos , Motivos de Aminoácidos , Biologia Computacional/métodos , Proteínas/química , Alinhamento de Sequência/métodos , Animais , Proteínas de Bactérias/química , Sítios de Ligação , Camundongos , Modelos Moleculares , Análise de Sequência de Proteína
16.
J Chem Inf Model ; 50(9): 1644-59, 2010 Sep 27.
Artigo em Inglês | MEDLINE | ID: mdl-20795677

RESUMO

In combinatorial chemistry, molecules are assembled according to combinatorial principles by linking suitable reagents or decorating a given scaffold with appropriate substituents from a large chemical space of starting materials. Often the number of possible combinations greatly exceeds the number feasible to handle by an in-depth in silico approach or even more if it should be experimentally synthesized. Therefore, powerful tools to efficiently enumerate large chemical spaces are required. They can be provided by genetic algorithms, which mimic Darwinian evolution. GARLig (genetic algorithm using reagents to compose ligands) has been developed to perform subset selection in large chemical compound spaces subject to target-specific 3D-scoring criteria. GARLig uses different scoring schemes, such as AutoDock4 Score, GOLDScore, and DrugScore(CSD), as fitness functions. Its genetic parameters have been optimized to characterize combinatorial libraries with respect to the binding to various targets of pharmaceutical interest. A large tripeptide library of 20(3) members has been used to profile amino acid frequencies in putative substrates for trypsin, thrombin, factor Xa, and plasmin. A peptidomimetic scaffold assembled from a selection of a 25(3) building block was used to test the performance of the evolutionary algorithm in suggesting potent inhibitors of the enzyme cathepsin D. In a final case study, our program was used to characterize and rank a combinatorial drug-like library comprising 33,750 potential thrombin inhibitors. These case studies demonstrate that GARLig finds experimentally confirmed potent leads by processing a significantly smaller subset of the fully enumerated combinatorial library. Furthermore, the profiles of amino acids computed by the genetic algorithm match the observed amino acid frequencies found by screening peptide libraries in substrate cleavage assays.


Assuntos
Algoritmos , Automação , Proteínas/química , Ligantes
17.
Bioinformatics ; 25(16): 2110-7, 2009 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-19286830

RESUMO

The concept of multiple graph alignment (MGA) has recently been introduced as a novel method for the structural analysis of biomolecules. Using approximate graph matching techniques, this method enables the robust identification of approximately conserved patterns in biologically related structures. In particular, MGA enables the characterization of functional protein families independent of sequence or fold homology. This article first recalls the concept of MGA and then addresses the problem of computing optimal alignments from an algorithmic point of view. In this regard, a method from the field of evolutionary algorithms is proposed and empirically compared with a hitherto existing heuristic approach. Empirically, it is shown that the former yields significantly better results than the latter, albeit at the cost of an increased runtime.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Algoritmos , Sítios de Ligação , Gráficos por Computador , Bases de Dados de Proteínas , Mapeamento de Interação de Proteínas , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos
18.
Proteins ; 76(2): 317-30, 2009 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-19173307

RESUMO

Structure-based drug design tries to mutually map pharmacological space populated by putative target proteins onto chemical space comprising possible small molecule drug candidates. Both spaces are connected where proteins and ligands recognize each other: in the binding pockets. Therefore, it is highly relevant to study the properties of the space composed by all possible binding cavities. In the present contribution, a global mapping of protein cavity space is presented by extracting consensus cavities from individual members of protein families and clustering them in terms of their shape and exposed physicochemical properties. Discovered similarities indicate common binding epitopes in binding pockets independent of any possibly given similarity in sequence and fold space. Unexpected links between remote targets indicate possible cross-reactivity of ligands and suggest putative side effects. The global clustering of cavity space is compared to a similar clustering of sequence and fold space and compared to chemical ligand space spanned by the chemical properties of small molecules found in binding pockets of crystalline complexes. The overall similarity architecture of sequence, fold, and cavity space differs significantly. Similarities in cavity space can be mapped best to similarities in ligand binding space indicating possible cross-reactivities. Most cross-reactivities affect co-factor and other endogenous ligand binding sites.


Assuntos
Biologia Computacional/métodos , Enzimas/química , Proteínas/química , Algoritmos , Sítios de Ligação , Enzimas/metabolismo , Conformação Proteica , Dobramento de Proteína , Proteínas/metabolismo , Relação Estrutura-Atividade
19.
ChemMedChem ; 2(10): 1432-47, 2007 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-17694525

RESUMO

Increasingly, drug-discovery processes focus on complete gene families. Tools for analyzing similarities and differences across protein families are important for the understanding of key functional features of proteins. Herein we present a method for classifying protein families on the basis of the properties of their active sites. We have developed Cavbase, a method for describing and comparing protein binding pockets, and show its application to the functional classification of the binding pockets of the protein family of protein kinases. A diverse set of kinase cavities is mutually compared and analyzed in terms of recurring functional recognition patterns in the active sites. We are able to propose a relevant classification based on the binding motifs in the active sites. The obtained classification provides a novel perspective on functional properties across protein space. The classification of the MAP and the c-Abl kinases is analyzed in detail, showing a clear separation of the respective kinase subfamilies. Remarkable cross-relations among protein kinases are detected, in contrast to sequence-based classifications, which are not able to detect these relations. Furthermore, our classification is able to highlight features important in the optimization of protein kinase inhibitors. Using small-molecule inhibition data we could rationalize cross-reactivities between unrelated kinases which become apparent in the structural comparison of their binding sites. This procedure helps in the identification of other possible kinase targets that behave similarly in "binding pocket space" to the kinase under consideration.


Assuntos
Bases de Dados de Proteínas , Proteínas Quinases/metabolismo , Trifosfato de Adenosina/metabolismo , Sítios de Ligação , Modelos Moleculares , Conformação Proteica , Dobramento de Proteína , Proteínas Quinases/química
20.
IEEE Trans Syst Man Cybern B Cybern ; 37(4): 1039-43, 2007 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-17702300

RESUMO

This short correspondence is a reply to a recently published paper by Verlinde et al. in which the authors empirically compared fuzzy and nonfuzzy association analysis and, on the basis of their results, questioned the usefulness of a fuzzy approach. Although we highly welcome the critical examination of the topic and definitely agree that fuzzy extensions of existing methods call for a thorough justification, the empirical comparison presented in the aforementioned paper is in our opinion not objective and extensive enough to fully warrant the conclusions drawn from the results. Apart from some general comments on the claims raised in their paper, we present empirical results based on an alternative experimental setup that lead to different conclusions.


Assuntos
Algoritmos , Inteligência Artificial , Bases de Dados Factuais , Técnicas de Apoio para a Decisão , Lógica Fuzzy , Armazenamento e Recuperação da Informação/métodos , Modelos Estatísticos , Simulação por Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA