Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
Proteins ; 84(4): 411-26, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26799916

RESUMO

Energy functions, fragment libraries, and search methods constitute three key components of fragment-assembly methods for protein structure prediction, which are all crucial for their ability to generate high-accuracy predictions. All of these components are tightly coupled; efficient searching becomes more important as the quality of fragment libraries decreases. Given these relationships, there is currently a poor understanding of the strengths and weaknesses of the sampling approaches currently used in fragment-assembly techniques. Here, we determine how the performance of search techniques can be assessed in a meaningful manner, given the above problems. We describe a set of techniques that aim to reduce the impact of the energy function, and assess exploration in view of the search space defined by a given fragment library. We illustrate our approach using Rosetta and EdaFold, and show how certain features of these methods encourage or limit conformational exploration. We demonstrate that individual trajectories of Rosetta are susceptible to local minima in the energy landscape, and that this can be linked to non-uniform sampling across the protein chain. We show that EdaFold's novel approach can help balance broad exploration with locating good low-energy conformations. This occurs through two mechanisms which cannot be readily differentiated using standard performance measures: exclusion of false minima, followed by an increasingly focused search in low-energy regions of conformational space. Measures such as ours can be helpful in characterizing new fragment-based methods in terms of the quality of conformational exploration realized.


Assuntos
Algoritmos , Biblioteca Gênica , Fragmentos de Peptídeos/química , Simulação por Computador , Modelos Moleculares , Fragmentos de Peptídeos/genética , Conformação Proteica , Dobramento de Proteína , Termodinâmica
2.
Evol Comput ; 24(4): 577-607, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26908350

RESUMO

Computational approaches to de novo protein tertiary structure prediction, including those based on the preeminent "fragment-assembly" technique, have failed to scale up fully to larger proteins (on the order of 100 residues and above). A number of limiting factors are thought to contribute to the scaling problem over and above the simple combinatorial explosion, but the key ones relate to the lack of exploration of properly diverse protein folds, and to an acute form of "deception" in the energy function, whereby low-energy conformations do not reliably equate with native structures. In this article, solutions to both of these problems are investigated through a multistage memetic algorithm incorporating the successful Rosetta method as a local search routine. We found that specialised genetic operators significantly add to structural diversity and that this translates well to reaching low energies. The use of a generalised stochastic ranking procedure for selection enables the memetic algorithm to handle and traverse deep energy wells that can be considered deceptive, which further adds to the ability of the algorithm to obtain a much-improved diversity of folds. The results should translate to a tangible improvement in the performance of protein structure prediction algorithms in blind experiments such as CASP, and potentially to a further step towards the more challenging problem of predicting the three-dimensional shape of large proteins.


Assuntos
Algoritmos , Proteínas/química , Biologia Computacional , Evolução Molecular , Simulação de Dinâmica Molecular , Fragmentos de Peptídeos/química , Fragmentos de Peptídeos/genética , Conformação Proteica , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Proteínas/genética , Processos Estocásticos
3.
Proteins ; 80(2): 490-504, 2012 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-22095594

RESUMO

In fragment-assembly techniques for protein structure prediction, models of protein structure are assembled from fragments of known protein structures. This process is typically guided by a knowledge-based energy function and uses a heuristic optimization method. The fragments play two important roles in this process: they define the set of structural parameters available, and they also assume the role of the main variation operators that are used by the optimiser. Previous analysis has typically focused on the first of these roles. In particular, the relationship between local amino acid sequence and local protein structure has been studied by a range of authors. The correlation between the two has been shown to vary with the window length considered, and the results of these analyses have informed directly the choice of fragment length in state-of-the-art prediction techniques. Here, we focus on the second role of fragments and aim to determine the effect of fragment length from an optimization perspective. We use theoretical analyses to reveal how the size and structure of the search space changes as a function of insertion length. Furthermore, empirical analyses are used to explore additional ways in which the size of the fragment insertion influences the search both in a simulation model and for the fragment-assembly technique, Rosetta.


Assuntos
Modelos Moleculares , Fragmentos de Peptídeos/química , Proteínas/química , Algoritmos , Cadeias de Markov , Conformação Proteica
4.
Bioinformatics ; 26(10): 1324-31, 2010 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-20363732

RESUMO

MOTIVATION: Studying biological systems, not just at an individual component level but at a system-wide level, gives us great potential to understand fundamental functions and essential biological properties. Despite considerable advances in the topological analysis of metabolic networks, inadequate knowledge of the enzyme kinetic rate laws and their associated parameter values still hampers large-scale kinetic modelling. Furthermore, the integration of gene expression and protein levels into kinetic models is not straightforward. RESULTS: The focus of our research is on streamlining the construction of large-scale kinetic models. A novel software tool was developed, which enables the generation of generic rate equations for all reactions in a model. It encompasses an algorithm for estimating the concentration of proteins for a reaction to reach a particular steady state when kinetic parameters are unknown, and two robust methods for parameter estimation. It also allows for the seamless integration of gene expression or protein levels into a reaction and can generate equations for both transcription and translation. We applied this methodology to model the yeast glycolysis pathway; our results show that the behaviour of the system can be accurately described using generic kinetic equations. AVAILABILITY AND IMPLEMENTATION: The software tool, together with its source code in Java, is available from our project web site at http://www.bioinf.manchester.ac.uk/schwartz/grape CONTACT: jean-marc.schwartz@manchester.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Biologia Computacional/métodos , Redes e Vias Metabólicas , Glicólise , Cinética , Proteoma/metabolismo , Saccharomyces cerevisiae/metabolismo , Software
5.
Artigo em Inglês | MEDLINE | ID: mdl-33477442

RESUMO

With a reduction in the mortality rate of burn patients, length of stay (LOS) has been increasingly adopted as an outcome measure. Some studies have attempted to identify factors that explain a burn patient's LOS. However, few have investigated the association between LOS and a patient's mental and socioeconomic status. There is anecdotal evidence for links between these factors; uncovering these will aid in better addressing the specific physical and emotional needs of burn patients and facilitate the planning of scarce hospital resources. Here, we employ machine learning (clustering) and statistical models (regression) to investigate whether segmentation by socioeconomic/mental status can improve the performance and interpretability of an upstream predictive model, relative to a unitary model. Although we found no significant difference in the unitary model's performance and the segment-specific models, the interpretation of the segment-specific models reveals a reduced impact of burn severity in LOS prediction with increasing adverse socioeconomic and mental status. Furthermore, the socioeconomic segments' models highlight an increased influence of living circumstances and source of injury on LOS. These findings suggest that in addition to ensuring that patients' physical needs are met, management of their mental status is crucial for delivering an effective care plan.


Assuntos
Queimaduras , Humanos , Tempo de Internação , Modelos Estatísticos , Estudos Retrospectivos
6.
Bioinformatics ; 25(10): 1271-9, 2009 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-19297350

RESUMO

MOTIVATION: Decoy datasets, consisting of a solved protein structure and numerous alternative native-like structures, are in common use for the evaluation of scoring functions in protein structure prediction. Several pitfalls with the use of these datasets have been identified in the literature, as well as useful guidelines for generating more effective decoy datasets. We contribute to this ongoing discussion an empirical assessment of several decoy datasets commonly used in experimental studies. RESULTS: We find that artefacts and sampling issues in the large majority of these data make it trivial to discriminate the native structure. This underlines that evaluation based on the rank/z-score of the native is a weak test of scoring function performance. Moreover, sampling biases present in the way decoy sets are generated or used can strongly affect other types of evaluation measures such as the correlation between score and root mean squared deviation (RMSD) to the native. We demonstrate how, depending on type of bias and evaluation context, sampling biases may lead to both over- or under-estimation of the quality of scoring terms, functions or methods. AVAILABILITY: Links to the software and data used in this study are available at http://dbkgroup.org/handl/decoy_sets.


Assuntos
Biologia Computacional/métodos , Conformação Proteica , Proteínas/química , Artefatos , Bases de Dados de Proteínas , Dobramento de Proteína
7.
Biomolecules ; 9(10)2019 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-31618996

RESUMO

Our previous work with fragment-assembly methods has demonstrated specific deficiencies in conformational sampling behaviour that, when addressed through improved sampling algorithms, can lead to more reliable prediction of tertiary protein structure when good fragments are available, and when score values can be relied upon to guide the search to the native basin. In this paper, we present preliminary investigations into two important questions arising from more difficult prediction problems. First, we investigated the extent to which native-like conformational states are generated during multiple runs of our search protocols. We determined that, in cases of difficult prediction, native-like decoys are rarely or never generated. Second, we developed a scheme for decoy retention that balances the objectives of retaining low-scoring structures and retaining conformationally diverse structures sampled during the course of the search. Our method succeeds at retaining more diverse sets of structures, and, for a few targets, more native-like solutions are retained as compared to our original, energy-based retention scheme. However, in general, we found that the rate at which native-like structural states are generated has a much stronger effect on eventual distributions of predictive accuracy in the decoy sets, as compared to the specific decoy retention strategy used. We found that our protocols show differences in their ability to access native-like states for some targets, and this may explain some of the differences in predictive performance seen between these methods. There appears to be an interaction between fragment sets and move operators, which influences the accessibility of native-like structures for given targets. Our results point to clear directions for further improvements in fragment-based methods, which are likely to enable higher accuracy predictions.


Assuntos
Proteínas/química , Algoritmos , Conformação Proteica , Termodinâmica
8.
Sci Rep ; 8(1): 13694, 2018 09 12.
Artigo em Inglês | MEDLINE | ID: mdl-30209258

RESUMO

Difficulty in sampling large and complex conformational spaces remains a key limitation in fragment-based de novo prediction of protein structure. Our previous work has shown that even for small-to-medium-sized proteins, some current methods inadequately sample alternative structures. We have developed two new conformational sampling techniques, one employing a bilevel optimisation framework and the other employing iterated local search. We combine strategies of forced structural perturbation (where some fragment insertions are accepted regardless of their impact on scores) and greedy local optimisation, allowing greater exploration of the available conformational space. Comparisons against the Rosetta Abinitio method indicate that our protocols more frequently generate native-like predictions for many targets, even following the low-resolution phase, using a given set of fragment libraries. By contrasting results across two different fragment sets, we show that our methods are able to better take advantage of high-quality fragments. These improvements can also translate into more reliable identification of near-native structures in a simple clustering-based model selection procedure. We show that when fragment libraries are sufficiently well-constructed, improved breadth of exploration within runs improves prediction accuracy. Our results also suggest that in benchmarking scenarios, a total exclusion of fragments drawn from homologous templates can make performance differences between methods appear less pronounced.


Assuntos
Fragmentos de Peptídeos/química , Proteínas/química , Benchmarking/métodos , Análise por Conglomerados , Simulação por Computador , Heurística , Modelos Moleculares , Conformação Proteica
9.
Artigo em Inglês | MEDLINE | ID: mdl-17473320

RESUMO

This paper reviews the application of multiobjective optimization in the fields of bioinformatics and computational biology. A survey of existing work, organized by application area, forms the main body of the review, following an introduction to the key concepts in multiobjective optimization. An original contribution of the review is the identification of five distinct "contexts," giving rise to multiple objectives: These are used to explain the reasons behind the use of multiobjective optimization in each application area and also to point the way to potential future uses of the technique.


Assuntos
Algoritmos , Inteligência Artificial , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Modelos Biológicos , Reconhecimento Automatizado de Padrão/métodos , Análise de Sequência/métodos , Biologia Computacional/tendências , Perfilação da Expressão Gênica/tendências , Reconhecimento Automatizado de Padrão/tendências , Análise de Sequência/tendências
10.
Metabolomics ; 11(2): 323-339, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25750602

RESUMO

We exploit the recent availability of a community reconstruction of the human metabolic network ('Recon2') to study how close in structural terms are marketed drugs to the nearest known metabolite(s) that Recon2 contains. While other encodings using different kinds of chemical fingerprints give greater differences, we find using the 166 Public MDL Molecular Access (MACCS) keys that 90 % of marketed drugs have a Tanimoto similarity of more than 0.5 to the (structurally) 'nearest' human metabolite. This suggests a 'rule of 0.5' mnemonic for assessing the metabolite-like properties that characterise successful, marketed drugs. Multiobjective clustering leads to a similar conclusion, while artificial (synthetic) structures are seen to be less human-metabolite-like. This 'rule of 0.5' may have considerable predictive value in chemical biology and drug discovery, and may represent a powerful filter for decision making processes.

11.
Bioinformatics ; 21(15): 3201-12, 2005 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-15914541

RESUMO

MOTIVATION: The discovery of novel biological knowledge from the ab initio analysis of post-genomic data relies upon the use of unsupervised processing methods, in particular clustering techniques. Much recent research in bioinformatics has therefore been focused on the transfer of clustering methods introduced in other scientific fields and on the development of novel algorithms specifically designed to tackle the challenges posed by post-genomic data. The partitions returned by a clustering algorithm are commonly validated using visual inspection and concordance with prior biological knowledge--whether the clusters actually correspond to the real structure in the data is somewhat less frequently considered. Suitable computational cluster validation techniques are available in the general data-mining literature, but have been given only a fraction of the same attention in bioinformatics. RESULTS: This review paper aims to familiarize the reader with the battery of techniques available for the validation of clustering results, with a particular focus on their application to post-genomic data analysis. Synthetic and real biological datasets are used to demonstrate the benefits, and also some of the perils, of analytical clustervalidation. AVAILABILITY: The software used in the experiments is available at http://dbkweb.ch.umist.ac.uk/handl/clustervalidation/. SUPPLEMENTARY INFORMATION: Enlarged colour plots are provided in the Supplementary Material, which is available at http://dbkweb.ch.umist.ac.uk/handl/clustervalidation/.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Análise por Conglomerados , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Reconhecimento Automatizado de Padrão/métodos , Software , Inteligência Artificial , Interpretação Estatística de Dados , Análise de Sequência com Séries de Oligonucleotídeos/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA