Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
Más filtros













Base de datos
Intervalo de año de publicación
1.
Nucleic Acids Res ; 2024 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-38747351

RESUMEN

The PSIRED Workbench is a long established and popular bioinformatics web service offering a wide range of machine learning based analyses for characterizing protein structure and function. In this paper we provide an update of the recent additions and developments to the webserver, with a focus on new Deep Learning based methods. We briefly discuss some trends in server usage since the publication of AlphaFold2 and we give an overview of some upcoming developments for the service. The PSIPRED Workbench is available at http://bioinf.cs.ucl.ac.uk/psipred.

2.
Nat Commun ; 14(1): 8445, 2023 Dec 19.
Artículo en Inglés | MEDLINE | ID: mdl-38114456

RESUMEN

The AlphaFold Protein Structure Database, containing predictions for over 200 million proteins, has been met with enthusiasm over its potential in enriching structural biological research and beyond. Currently, access to the database is precluded by an urgent need for tools that allow the efficient traversal, discovery, and documentation of its contents. Identifying domain regions in the database is a non-trivial endeavour and doing so will aid our understanding of protein structure and function, while facilitating drug discovery and comparative genomics. Here, we describe a deep learning method for domain segmentation called Merizo, which learns to cluster residues into domains in a bottom-up manner. Merizo is trained on CATH domains and fine-tuned on AlphaFold2 models via self-distillation, enabling it to be applied to both experimental and AlphaFold2 models. As proof of concept, we apply Merizo to the human proteome, identifying 40,818 putative domains that can be matched to CATH representative domains.


Asunto(s)
Genómica , Proteínas , Humanos , Dominios Proteicos , Estructura Terciaria de Proteína , Proteínas/genética , Proteínas/química , Bases de Datos de Proteínas
3.
Curr Opin Struct Biol ; 81: 102627, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37320955

RESUMEN

Recent breakthroughs in protein structure prediction have increasingly relied on the use of deep neural networks. These recent methods are notable in that they produce 3-D atomic coordinates as a direct output of the networks, a feature which presents many advantages. Although most techniques of this type make use of multiple sequence alignments as their primary input, a new wave of methods have attempted to use just single sequences as the input. We discuss the make-up and operating principles of these models, and highlight new developments in these areas, as well as areas for future development.


Asunto(s)
Aprendizaje Automático , Proteínas , Proteínas/química , Redes Neurales de la Computación , Alineación de Secuencia
4.
Proc Natl Acad Sci U S A ; 119(4)2022 01 25.
Artículo en Inglés | MEDLINE | ID: mdl-35074909

RESUMEN

Deep learning-based prediction of protein structure usually begins by constructing a multiple sequence alignment (MSA) containing homologs of the target protein. The most successful approaches combine large feature sets derived from MSAs, and considerable computational effort is spent deriving these input features. We present a method that greatly reduces the amount of preprocessing required for a target MSA, while producing main chain coordinates as a direct output of a deep neural network. The network makes use of just three recurrent networks and a stack of residual convolutional layers, making the predictor very fast to run, and easy to install and use. Our approach constructs a directly learned representation of the sequences in an MSA, starting from a one-hot encoding of the sequences. When supplemented with an approximate precision matrix, the learned representation can be used to produce structural models of comparable or greater accuracy as compared to our original DMPfold method, while requiring less than a second to produce a typical model. This level of accuracy and speed allows very large-scale three-dimensional modeling of proteins on minimal hardware, and we demonstrate this by producing models for over 1.3 million uncharacterized regions of proteins extracted from the BFD sequence clusters. After constructing an initial set of approximate models, we select a confident subset of over 30,000 models for further refinement and analysis, revealing putative novel protein folds. We also provide updated models for over 5,000 Pfam families studied in the original DMPfold paper.


Asunto(s)
Modelos Moleculares , Conformación Proteica , Programas Informáticos , Algoritmos , Caspasas/química , Biología Computacional , Bases de Datos de Proteínas , Aprendizaje Profundo , Ensayos Analíticos de Alto Rendimiento , Proteínas/química
5.
Nat Rev Mol Cell Biol ; 23(1): 40-55, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-34518686

RESUMEN

The expanding scale and inherent complexity of biological data have encouraged a growing use of machine learning in biology to build informative and predictive models of the underlying biological processes. All machine learning techniques fit models to data; however, the specific methods are quite varied and can at first glance seem bewildering. In this Review, we aim to provide readers with a gentle introduction to a few key machine learning techniques, including the most recently developed and widely used techniques involving deep neural networks. We describe how different techniques may be suited to specific types of biological data, and also discuss some best practices and points to consider when one is embarking on experiments involving machine learning. Some emerging directions in machine learning methodology are also discussed.


Asunto(s)
Biología , Aprendizaje Automático , Animales , Aprendizaje Profundo , Humanos , Redes Neurales de la Computación
6.
Biomolecules ; 9(10)2019 10 15.
Artículo en Inglés | MEDLINE | ID: mdl-31618996

RESUMEN

Our previous work with fragment-assembly methods has demonstrated specific deficiencies in conformational sampling behaviour that, when addressed through improved sampling algorithms, can lead to more reliable prediction of tertiary protein structure when good fragments are available, and when score values can be relied upon to guide the search to the native basin. In this paper, we present preliminary investigations into two important questions arising from more difficult prediction problems. First, we investigated the extent to which native-like conformational states are generated during multiple runs of our search protocols. We determined that, in cases of difficult prediction, native-like decoys are rarely or never generated. Second, we developed a scheme for decoy retention that balances the objectives of retaining low-scoring structures and retaining conformationally diverse structures sampled during the course of the search. Our method succeeds at retaining more diverse sets of structures, and, for a few targets, more native-like solutions are retained as compared to our original, energy-based retention scheme. However, in general, we found that the rate at which native-like structural states are generated has a much stronger effect on eventual distributions of predictive accuracy in the decoy sets, as compared to the specific decoy retention strategy used. We found that our protocols show differences in their ability to access native-like states for some targets, and this may explain some of the differences in predictive performance seen between these methods. There appears to be an interaction between fragment sets and move operators, which influences the accessibility of native-like structures for given targets. Our results point to clear directions for further improvements in fragment-based methods, which are likely to enable higher accuracy predictions.


Asunto(s)
Proteínas/química , Algoritmos , Conformación Proteica , Termodinámica
7.
Proteins ; 87(12): 1179-1189, 2019 12.
Artículo en Inglés | MEDLINE | ID: mdl-31589782

RESUMEN

Although many structural bioinformatics tools have been using neural network models for a long time, deep neural network (DNN) models have attracted considerable interest in recent years. Methods employing DNNs have had a significant impact in recent CASP experiments, notably in CASP12 and especially CASP13. In this article, we offer a brief introduction to some of the key principles and properties of DNN models and discuss why they are naturally suited to certain problems in structural bioinformatics. We also briefly discuss methodological improvements that have enabled these successes. Using the contact prediction task as an example, we also speculate why DNN models are able to produce reasonably accurate predictions even in the absence of many homologues for a given target sequence, a result that can at first glance appear surprising given the lack of input information. We end on some thoughts about how and why these types of models can be so effective, as well as a discussion on potential pitfalls.


Asunto(s)
Biología Computacional , Aprendizaje Profundo , Conformación Proteica , Modelos Moleculares , Redes Neurales de la Computación , Proteínas/química , Proteínas/genética , Proteínas/ultraestructura , Homología Estructural de Proteína
8.
Nat Commun ; 10(1): 3977, 2019 09 04.
Artículo en Inglés | MEDLINE | ID: mdl-31484923

RESUMEN

The inapplicability of amino acid covariation methods to small protein families has limited their use for structural annotation of whole genomes. Recently, deep learning has shown promise in allowing accurate residue-residue contact prediction even for shallow sequence alignments. Here we introduce DMPfold, which uses deep learning to predict inter-atomic distance bounds, the main chain hydrogen bond network, and torsion angles, which it uses to build models in an iterative fashion. DMPfold produces more accurate models than two popular methods for a test set of CASP12 domains, and works just as well for transmembrane proteins. Applied to all Pfam domains without known structures, confident models for 25% of these so-called dark families were produced in under a week on a small 200 core cluster. DMPfold provides models for 16% of human proteome UniProt entries without structures, generates accurate models with fewer than 100 sequences in some cases, and is freely available.


Asunto(s)
Biología Computacional/métodos , Aprendizaje Profundo , Modelos Moleculares , Conformación Proteica , Proteoma/química , Proteómica/métodos , Algoritmos , Animales , Sitios de Unión/genética , Humanos , Proteoma/genética , Proteoma/metabolismo , Reproducibilidad de los Resultados
9.
Proteins ; 87(12): 1092-1099, 2019 12.
Artículo en Inglés | MEDLINE | ID: mdl-31298436

RESUMEN

In this article, we describe our efforts in contact prediction in the CASP13 experiment. We employed a new deep learning-based contact prediction tool, DeepMetaPSICOV (or DMP for short), together with new methods and data sources for alignment generation. DMP evolved from MetaPSICOV and DeepCov and combines the input feature sets used by these methods as input to a deep, fully convolutional residual neural network. We also improved our method for multiple sequence alignment generation and included metagenomic sequences in the search. We discuss successes and failures of our approach and identify areas where further improvements may be possible. DMP is freely available at: https://github.com/psipred/DeepMetaPSICOV.


Asunto(s)
Biología Computacional , Conformación Proteica , Proteínas/ultraestructura , Algoritmos , Secuencia de Aminoácidos/genética , Aprendizaje Profundo , Aprendizaje Automático , Metagenoma/genética , Redes Neurales de la Computación , Proteínas/química , Proteínas/genética , Análisis de Secuencia de Proteína
10.
Sci Rep ; 9(1): 7083, 2019 05 08.
Artículo en Inglés | MEDLINE | ID: mdl-31068650

RESUMEN

RAS genotyping is mandatory to predict anti-EGFR monoclonal antibodies (mAbs) therapy resistance and BRAF genotyping is a relevant prognosis marker in patients with metastatic colorectal cancer. Although the role of hotspot mutations is well defined, the impact of uncommon mutations is still unknown. In this study, we aimed to discuss the potential utility of detecting uncommon RAS and BRAF mutation profiles with next-generation sequencing. A total of 779 FFPE samples from patients with metastatic colorectal cancer with valid NGS results were screened and 22 uncommon mutational profiles of KRAS, NRAS and BRAF genes were selected. In silico prediction of mutation impact was then assessed by 2 predictive scores and a structural protein modelling. Three samples carry a single KRAS non-hotspot mutation, one a single NRAS non-hotspot mutation, four a single BRAF non-hotspot mutation and fourteen carry several mutations. This in silico study shows that some non-hotspot RAS mutations seem to behave like hotspot mutations and warrant further examination to assess whether they should confer a resistance to anti-EGFR mAbs therapy for patients bearing these non-hotspot RAS mutations. For BRAF gene, non-V600E mutations may characterise a novel subtype of mCRC with better prognosis, potentially implying a modification of therapeutic strategy.


Asunto(s)
Neoplasias Colorrectales/genética , Pruebas Diagnósticas de Rutina/métodos , Genotipo , Técnicas de Genotipaje/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Mutación , Antineoplásicos Inmunológicos/farmacología , Antineoplásicos Inmunológicos/uso terapéutico , Neoplasias Colorrectales/tratamiento farmacológico , Simulación por Computador , Resistencia a Antineoplásicos/genética , Receptores ErbB/antagonistas & inhibidores , GTP Fosfohidrolasas/genética , Humanos , Proteínas de la Membrana/genética , Metástasis de la Neoplasia/genética , Polimorfismo de Nucleótido Simple , Proteínas Proto-Oncogénicas B-raf/genética , Proteínas Proto-Oncogénicas p21(ras)/genética , Estudios Retrospectivos
11.
Sci Rep ; 8(1): 13694, 2018 09 12.
Artículo en Inglés | MEDLINE | ID: mdl-30209258

RESUMEN

Difficulty in sampling large and complex conformational spaces remains a key limitation in fragment-based de novo prediction of protein structure. Our previous work has shown that even for small-to-medium-sized proteins, some current methods inadequately sample alternative structures. We have developed two new conformational sampling techniques, one employing a bilevel optimisation framework and the other employing iterated local search. We combine strategies of forced structural perturbation (where some fragment insertions are accepted regardless of their impact on scores) and greedy local optimisation, allowing greater exploration of the available conformational space. Comparisons against the Rosetta Abinitio method indicate that our protocols more frequently generate native-like predictions for many targets, even following the low-resolution phase, using a given set of fragment libraries. By contrasting results across two different fragment sets, we show that our methods are able to better take advantage of high-quality fragments. These improvements can also translate into more reliable identification of near-native structures in a simple clustering-based model selection procedure. We show that when fragment libraries are sufficiently well-constructed, improved breadth of exploration within runs improves prediction accuracy. Our results also suggest that in benchmarking scenarios, a total exclusion of fragments drawn from homologous templates can make performance differences between methods appear less pronounced.


Asunto(s)
Fragmentos de Péptidos/química , Proteínas/química , Benchmarking/métodos , Análisis por Conglomerados , Simulación por Computador , Heurística , Modelos Moleculares , Conformación Proteica
12.
Bioinformatics ; 34(19): 3308-3315, 2018 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-29718112

RESUMEN

Motivation: In addition to substitution frequency data from protein sequence alignments, many state-of-the-art methods for contact prediction rely on additional sources of information, or features, of protein sequences in order to predict residue-residue contacts, such as solvent accessibility, predicted secondary structure, and scores from other contact prediction methods. It is unclear how much of this information is needed to achieve state-of-the-art results. Here, we show that using deep neural network models, simple alignment statistics contain sufficient information to achieve state-of-the-art precision. Our prediction method, DeepCov, uses fully convolutional neural networks operating on amino-acid pair frequency or covariance data derived directly from sequence alignments, without using global statistical methods such as sparse inverse covariance or pseudolikelihood estimation. Results: Comparisons against CCMpred and MetaPSICOV2 show that using pairwise covariance data calculated from raw alignments as input allows us to match or exceed the performance of both of these methods. Almost all of the achieved precision is obtained when considering relatively local windows (around 15 residues) around any member of a given residue pairing; larger window sizes have comparable performance. Assessment on a set of shallow sequence alignments (fewer than 160 effective sequences) indicates that the new method is substantially more precise than CCMpred and MetaPSICOV2 in this regime, suggesting that improved precision is attainable on smaller sequence families. Overall, the performance of DeepCov is competitive with the state of the art, and our results demonstrate that global models, which employ features from all parts of the input alignment when predicting individual contacts, are not strictly needed in order to attain precise contact predictions. Availability and implementation: DeepCov is freely available at https://github.com/psipred/DeepCov. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Redes Neurales de la Computación , Dominios y Motivos de Interacción de Proteínas , Proteínas/química , Secuencia de Aminoácidos , Biología Computacional , Alineación de Secuencia
13.
Virus Evol ; 3(2): vex019, 2017 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-28852572

RESUMEN

Despite the use of combination antiretroviral drugs for the treatment of HIV-1 infection, the emergence of drug resistance remains a problem. Resistance may be conferred either by a single mutation or a concerted set of mutations. The involvement of multiple mutations can arise due to interactions between sites in the amino acid sequence as a consequence of the need to maintain protein structure. To better understand the nature of such epistatic interactions, we reconstructed the ancestral sequences of HIV-1's Pol protein, and traced the evolutionary trajectories leading to mutations associated with drug resistance. Using contemporary and ancestral sequences we modelled the effects of mutations (i.e. amino acid replacements) on protein structure to understand the functional effects of residue changes. Although the majority of resistance-associated sequences tend to destabilise the protein structure, we find there is a general tendency for protein stability to decrease across HIV-1's evolutionary history. That a similar pattern is observed in the non-drug resistance lineages indicates that non-resistant mutations, for example, associated with escape from the immune response, also impacts on protein stability. Maintenance of optimal protein structure therefore represents a major constraining factor to the evolution of HIV-1.

14.
Evol Comput ; 24(4): 577-607, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-26908350

RESUMEN

Computational approaches to de novo protein tertiary structure prediction, including those based on the preeminent "fragment-assembly" technique, have failed to scale up fully to larger proteins (on the order of 100 residues and above). A number of limiting factors are thought to contribute to the scaling problem over and above the simple combinatorial explosion, but the key ones relate to the lack of exploration of properly diverse protein folds, and to an acute form of "deception" in the energy function, whereby low-energy conformations do not reliably equate with native structures. In this article, solutions to both of these problems are investigated through a multistage memetic algorithm incorporating the successful Rosetta method as a local search routine. We found that specialised genetic operators significantly add to structural diversity and that this translates well to reaching low energies. The use of a generalised stochastic ranking procedure for selection enables the memetic algorithm to handle and traverse deep energy wells that can be considered deceptive, which further adds to the ability of the algorithm to obtain a much-improved diversity of folds. The results should translate to a tangible improvement in the performance of protein structure prediction algorithms in blind experiments such as CASP, and potentially to a further step towards the more challenging problem of predicting the three-dimensional shape of large proteins.


Asunto(s)
Algoritmos , Proteínas/química , Biología Computacional , Evolución Molecular , Simulación de Dinámica Molecular , Fragmentos de Péptidos/química , Fragmentos de Péptidos/genética , Conformación Proteica , Estructura Secundaria de Proteína , Estructura Terciaria de Proteína , Proteínas/genética , Procesos Estocásticos
15.
Proteins ; 84(4): 411-26, 2016 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-26799916

RESUMEN

Energy functions, fragment libraries, and search methods constitute three key components of fragment-assembly methods for protein structure prediction, which are all crucial for their ability to generate high-accuracy predictions. All of these components are tightly coupled; efficient searching becomes more important as the quality of fragment libraries decreases. Given these relationships, there is currently a poor understanding of the strengths and weaknesses of the sampling approaches currently used in fragment-assembly techniques. Here, we determine how the performance of search techniques can be assessed in a meaningful manner, given the above problems. We describe a set of techniques that aim to reduce the impact of the energy function, and assess exploration in view of the search space defined by a given fragment library. We illustrate our approach using Rosetta and EdaFold, and show how certain features of these methods encourage or limit conformational exploration. We demonstrate that individual trajectories of Rosetta are susceptible to local minima in the energy landscape, and that this can be linked to non-uniform sampling across the protein chain. We show that EdaFold's novel approach can help balance broad exploration with locating good low-energy conformations. This occurs through two mechanisms which cannot be readily differentiated using standard performance measures: exclusion of false minima, followed by an increasingly focused search in low-energy regions of conformational space. Measures such as ours can be helpful in characterizing new fragment-based methods in terms of the quality of conformational exploration realized.


Asunto(s)
Algoritmos , Biblioteca de Genes , Fragmentos de Péptidos/química , Simulación por Computador , Modelos Moleculares , Fragmentos de Péptidos/genética , Conformación Proteica , Pliegue de Proteína , Termodinámica
16.
Spectrochim Acta A Mol Biomol Spectrosc ; 136 Pt A: 32-41, 2015 Feb 05.
Artículo en Inglés | MEDLINE | ID: mdl-24274986

RESUMEN

As intermolecular interactions such as the hydrogen bond are electrostatic in origin, rigorous treatment of this term within force field methodologies should be mandatory. We present a method able of accurately reproducing such interactions for seven van der Waals complexes. It uses atomic multipole moments up to hexadecupole moment mapped to the positions of the nuclear coordinates by the machine learning method kriging. Models were built at three levels of theory: HF/6-31G(**), B3LYP/aug-cc-pVDZ and M06-2X/aug-cc-pVDZ. The quality of the kriging models was measured by their ability to predict the electrostatic interaction energy between atoms in external test examples for which the true energies are known. At all levels of theory, >90% of test cases for small van der Waals complexes were predicted within 1 kJ mol(-1), decreasing to 60-70% of test cases for larger base pair complexes. Models built on moments obtained at B3LYP and M06-2X level generally outperformed those at HF level. For all systems the individual interactions were predicted with a mean unsigned error of less than 1 kJ mol(-1).


Asunto(s)
Enlace de Hidrógeno , Modelos Químicos , Modelos Moleculares , Electricidad Estática , Amoníaco/química , Inteligencia Artificial , Agua/química
17.
Phys Chem Chem Phys ; 16(6): 2256-9, 2014 Feb 14.
Artículo en Inglés | MEDLINE | ID: mdl-24394921

RESUMEN

A combination of the temperature- and pressure-dependencies of the kinetic isotope effect on the proton coupled electron transfer during ascorbate oxidation by ferricyanide suggests that this reference reaction may exploit vibrationally assisted quantum tunnelling of the transferred proton.


Asunto(s)
Ácido Ascórbico/química , Ferricianuros/química , Protones , Transporte de Electrón , Cinética , Oxidación-Reducción , Presión , Temperatura
18.
J Comput Chem ; 34(21): 1850-61, 2013 Aug 05.
Artículo en Inglés | MEDLINE | ID: mdl-23720381

RESUMEN

We propose a generic method to model polarization in the context of high-rank multipolar electrostatics. This method involves the machine learning technique kriging, here used to capture the response of an atomic multipole moment of a given atom to a change in the positions of the atoms surrounding this atom. The atoms are malleable boxes with sharp boundaries, they do not overlap and exhaust space. The method is applied to histidine where it is able to predict atomic multipole moments (up to hexadecapole) for unseen configurations, after training on 600 geometries distorted using normal modes of each of its 24 local energy minima at B3LYP/apc-1 level. The quality of the predictions is assessed by calculating the Coulomb energy between an atom for which the moments have been predicted and the surrounding atoms (having exact moments). Only interactions between atoms separated by three or more bonds ("1, 4 and higher" interactions) are included in this energy error. This energy is compared with that of a central atom with exact multipole moments interacting with the same environment. The resulting energy discrepancies are summed for 328 atom-atom interactions, for each of the 29 atoms of histidine being a central atom in turn. For 80% of the 539 test configurations (outside the training set), this summed energy deviates by less than 1 kcal mol(-1).


Asunto(s)
Histidina/química , Modelos Químicos , Péptidos/química , Conformación Molecular , Electricidad Estática
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA