Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 36
Filtrar
1.
Cell ; 149(7): 1607-21, 2012 Jun 22.
Artículo en Inglés | MEDLINE | ID: mdl-22579045

RESUMEN

We show that amino acid covariation in proteins, extracted from the evolutionary sequence record, can be used to fold transmembrane proteins. We use this technique to predict previously unknown 3D structures for 11 transmembrane proteins (with up to 14 helices) from their sequences alone. The prediction method (EVfold_membrane) applies a maximum entropy approach to infer evolutionary covariation in pairs of sequence positions within a protein family and then generates all-atom models with the derived pairwise distance constraints. We benchmark the approach with blinded de novo computation of known transmembrane protein structures from 23 families, demonstrating unprecedented accuracy of the method for large transmembrane proteins. We show how the method can predict oligomerization, functional sites, and conformational changes in transmembrane proteins. With the rapid rise in large-scale sequencing, more accurate and more comprehensive information on evolutionary constraints can be decoded from genetic variation, greatly expanding the repertoire of transmembrane proteins amenable to modeling by this method.


Asunto(s)
Algoritmos , Proteínas de la Membrana/química , Proteínas de la Membrana/genética , Secuencia de Aminoácidos , Animales , Secuencia Conservada , Evolución Molecular , Humanos , Modelos Moleculares , Conformación Proteica , Estructura Secundaria de Proteína , Alineación de Secuencia , Homología Estructural de Proteína
2.
Nucleic Acids Res ; 51(D1): D753-D759, 2023 01 06.
Artículo en Inglés | MEDLINE | ID: mdl-36477304

RESUMEN

The MGnify platform (https://www.ebi.ac.uk/metagenomics) facilitates the assembly, analysis and archiving of microbiome-derived nucleic acid sequences. The platform provides access to taxonomic assignments and functional annotations for nearly half a million analyses covering metabarcoding, metatranscriptomic, and metagenomic datasets, which are derived from a wide range of different environments. Over the past 3 years, MGnify has not only grown in terms of the number of datasets contained but also increased the breadth of analyses provided, such as the analysis of long-read sequences. The MGnify protein database now exceeds 2.4 billion non-redundant sequences predicted from metagenomic assemblies. This collection is now organised into a relational database making it possible to understand the genomic context of the protein through navigation back to the source assembly and sample metadata, marking a major improvement. To extend beyond the functional annotations already provided in MGnify, we have applied deep learning-based annotation methods. The technology underlying MGnify's Application Programming Interface (API) and website has been upgraded, and we have enabled the ability to perform downstream analysis of the MGnify data through the introduction of a coupled Jupyter Lab environment.


Asunto(s)
Microbiota , Análisis de Secuencia , Genómica/métodos , Metagenoma , Metagenómica/métodos , Microbiota/genética , Programas Informáticos , Análisis de Secuencia/métodos
3.
Biophys J ; 121(16): 3023-3033, 2022 08 16.
Artículo en Inglés | MEDLINE | ID: mdl-35859421

RESUMEN

Collagen fibrils are the major constituents of the extracellular matrix, which provides structural support to vertebrate connective tissues. It is widely assumed that the superstructure of collagen fibrils is encoded in the primary sequences of the molecular building blocks. However, the interplay between large-scale architecture and small-scale molecular interactions makes the ab initio prediction of collagen structure challenging. Here, we propose a model that allows us to predict the periodic structure of collagen fibers and the axial offset between the molecules, purely on the basis of simple predictive rules for the interaction between amino acid residues. With our model, we identify the sequence-dependent collagen fiber geometries with the lowest free energy and validate the predicted geometries against the available experimental data. We propose a procedure for searching for optimal staggering distances. Finally, we build a classification algorithm and use it to scan 11 data sets of vertebrate fibrillar collagens, and predict the periodicity of the resulting assemblies. We analyzed the experimentally observed variance of the optimal stagger distances across species, and find that these distances, and the resulting fibrillar phenotypes, are evolutionary well preserved. Moreover, we observed that the energy minimum at the optimal stagger distance is broad in all cases, suggesting a further evolutionary adaptation designed to improve the assembly kinetics. Our periodicity predictions are not only in good agreement with the experimental data on collagen molecular staggering for all collagen types analyzed, but also for synthetic peptides. We argue that, with our model, it becomes possible to design tailor-made, periodic collagen structures, thereby enabling the design of novel biomimetic materials based on collagen-mimetic trimers.


Asunto(s)
Materiales Biomiméticos , Colágeno , Materiales Biomiméticos/química , Colágeno/metabolismo , Matriz Extracelular/metabolismo , Colágenos Fibrilares , Péptidos/química
4.
Brief Bioinform ; 21(5): 1549-1567, 2020 09 25.
Artículo en Inglés | MEDLINE | ID: mdl-31626279

RESUMEN

Antibodies are proteins that recognize the molecular surfaces of potentially noxious molecules to mount an adaptive immune response or, in the case of autoimmune diseases, molecules that are part of healthy cells and tissues. Due to their binding versatility, antibodies are currently the largest class of biotherapeutics, with five monoclonal antibodies ranked in the top 10 blockbuster drugs. Computational advances in protein modelling and design can have a tangible impact on antibody-based therapeutic development. Antibody-specific computational protocols currently benefit from an increasing volume of data provided by next generation sequencing and application to related drug modalities based on traditional antibodies, such as nanobodies. Here we present a structured overview of available databases, methods and emerging trends in computational antibody analysis and contextualize them towards the engineering of candidate antibody therapeutics.


Asunto(s)
Anticuerpos Monoclonales/química , Anticuerpos Monoclonales/inmunología , Anticuerpos Monoclonales/uso terapéutico , Biología Computacional/métodos , Bases de Datos de Proteínas , Simulación del Acoplamiento Molecular , Conformación Proteica
5.
Proc Natl Acad Sci U S A ; 116(24): 11624-11629, 2019 06 11.
Artículo en Inglés | MEDLINE | ID: mdl-31127041

RESUMEN

Deep neural networks have achieved state-of-the-art accuracy at classifying molecules with respect to whether they bind to specific protein targets. A key breakthrough would occur if these models could reveal the fragment pharmacophores that are causally involved in binding. Extracting chemical details of binding from the networks could enable scientific discoveries about the mechanisms of drug actions. However, doing so requires shining light into the black box that is the trained neural network model, a task that has proved difficult across many domains. Here we show how the binding mechanism learned by deep neural network models can be interrogated, using a recently described attribution method. We first work with carefully constructed synthetic datasets, in which the molecular features responsible for "binding" are fully known. We find that networks that achieve perfect accuracy on held-out test datasets still learn spurious correlations, and we are able to exploit this nonrobustness to construct adversarial examples that fool the model. This makes these models unreliable for accurately revealing information about the mechanisms of protein-ligand binding. In light of our findings, we prescribe a test that checks whether a hypothesized mechanism can be learned. If the test fails, it indicates that the model must be simplified or regularized and/or that the training dataset requires augmentation.


Asunto(s)
Unión Proteica/fisiología , Proteínas/química , Algoritmos , Ligandos , Aprendizaje Automático , Modelos Químicos , Redes Neurales de la Computación
6.
Proc Natl Acad Sci U S A ; 115(4): 690-695, 2018 01 23.
Artículo en Inglés | MEDLINE | ID: mdl-29311320

RESUMEN

Covariance analysis of protein sequence alignments uses coevolving pairs of sequence positions to predict features of protein structure and function. However, current methods ignore the phylogenetic relationships between sequences, potentially corrupting the identification of covarying positions. Here, we use random matrix theory to demonstrate the existence of a power law tail that distinguishes the spectrum of covariance caused by phylogeny from that caused by structural interactions. The power law is essentially independent of the phylogenetic tree topology, depending on just two parameters-the sequence length and the average branch length. We demonstrate that these power law tails are ubiquitous in the large protein sequence alignments used to predict contacts in 3D structure, as predicted by our theory. This suggests that to decouple phylogenetic effects from the interactions between sequence distal sites that control biological function, it is necessary to remove or down-weight the eigenvectors of the covariance matrix with largest eigenvalues. We confirm that truncating these eigenvectors improves contact prediction.


Asunto(s)
Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Algoritmos , Evolución Molecular , Modelos Teóricos , Análisis Multivariante , Filogenia , Proteínas/química , Alineación de Secuencia/estadística & datos numéricos
7.
Biophys J ; 117(3): 520-532, 2019 08 06.
Artículo en Inglés | MEDLINE | ID: mdl-31353036

RESUMEN

The accurate prediction of RNA secondary structure from primary sequence has had enormous impact on research from the past 40 years. Although many algorithms are available to make these predictions, the inclusion of non-nested loops, termed pseudoknots, still poses challenges arising from two main factors: 1) no physical model exists to estimate the loop entropies of complex intramolecular pseudoknots, and 2) their NP-complete enumeration has impeded their study. Here, we address both challenges. First, we develop a polymer physics model that can address arbitrarily complex pseudoknots using only two parameters corresponding to concrete physical quantities-over an order of magnitude fewer than the sparsest state-of-the-art phenomenological methods. Second, by coupling this model to exhaustive enumeration of the set of possible structures, we compute the entire free energy landscape of secondary structures resulting from a primary RNA sequence. We demonstrate that for RNA structures of ∼80 nucleotides, with minimal heuristics, the complete enumeration of possible secondary structures can be accomplished quickly despite the NP-complete nature of the problem. We further show that despite our loop entropy model's parametric sparsity, it performs better than or on par with previously published methods in predicting both pseudoknotted and non-pseudoknotted structures on a benchmark data set of RNA structures of ≤80 nucleotides. We suggest ways in which the accuracy of the model can be further improved.


Asunto(s)
Entropía , Conformación de Ácido Nucleico , Polímeros/química , ARN , Algoritmos , ARN/química , Termodinámica
8.
Phys Rev Lett ; 123(23): 238102, 2019 Dec 06.
Artículo en Inglés | MEDLINE | ID: mdl-31868483

RESUMEN

Collagen consists of three peptides twisted together through a periodic array of hydrogen bonds. Here we use this as inspiration to find design rules for programmed specific interactions for self-assembling synthetic collagenlike triple helices, starting from disordered configurations. The assembly generically nucleates defects in the triple helix, the characteristics of which can be manipulated by spatially varying the enthalpy of helix formation. Defect formation slows assembly, evoking kinetic pathologies that have been observed to mutations in the primary collagen amino acid sequence. The controlled formation and interaction between defects gives a route for hierarchical self-assembly of bundles of twisted filaments.


Asunto(s)
Colágeno/química , Modelos Químicos , Secuencia de Aminoácidos , Modelos Moleculares , Nanoestructuras/química , Péptidos/química , Conformación Proteica en Hélice alfa
9.
Proc Natl Acad Sci U S A ; 113(48): 13564-13569, 2016 11 29.
Artículo en Inglés | MEDLINE | ID: mdl-27856761

RESUMEN

Rapid determination of whether a candidate compound will bind to a particular target receptor remains a stumbling block in drug discovery. We use an approach inspired by random matrix theory to decompose the known ligand set of a target in terms of orthogonal "signals" of salient chemical features, and distinguish these from the much larger set of ligand chemical features that are not relevant for binding to that particular target receptor. After removing the noise caused by finite sampling, we show that the similarity of an unknown ligand to the remaining, cleaned chemical features is a robust predictor of ligand-target affinity, performing as well or better than any algorithm in the published literature. We interpret our algorithm as deriving a model for the binding energy between a target receptor and the set of known ligands, where the underlying binding energy model is related to the classic Ising model in statistical physics.


Asunto(s)
Biología Computacional/métodos , Descubrimiento de Drogas/métodos , Algoritmos , Ligandos , Modelos Teóricos , Unión Proteica , Proteínas/química
10.
Proc Natl Acad Sci U S A ; 113(43): 12180-12185, 2016 10 25.
Artículo en Inglés | MEDLINE | ID: mdl-27663738

RESUMEN

Specific protein-protein interactions are crucial in the cell, both to ensure the formation and stability of multiprotein complexes and to enable signal transduction in various pathways. Functional interactions between proteins result in coevolution between the interaction partners, causing their sequences to be correlated. Here we exploit these correlations to accurately identify, from sequence data alone, which proteins are specific interaction partners. Our general approach, which employs a pairwise maximum entropy model to infer couplings between residues, has been successfully used to predict the 3D structures of proteins from sequences. Thus inspired, we introduce an iterative algorithm to predict specific interaction partners from two protein families whose members are known to interact. We first assess the algorithm's performance on histidine kinases and response regulators from bacterial two-component signaling systems. We obtain a striking 0.93 true positive fraction on our complete dataset without any a priori knowledge of interaction partners, and we uncover the origin of this success. We then apply the algorithm to proteins from ATP-binding cassette (ABC) transporter complexes, and obtain accurate predictions in these systems as well. Finally, we present two metrics that accurately distinguish interacting protein families from noninteracting ones, using only sequence data.


Asunto(s)
Transportadoras de Casetes de Unión a ATP/química , Histidina Quinasa/química , Unión Proteica/genética , Mapas de Interacción de Proteínas/genética , Transportadoras de Casetes de Unión a ATP/genética , Algoritmos , Bacterias/química , Bacterias/genética , Entropía , Histidina Quinasa/genética , Análisis de Secuencia de Proteína , Transducción de Señal
11.
Proteins ; 86(7): 697-706, 2018 07.
Artículo en Inglés | MEDLINE | ID: mdl-29569425

RESUMEN

Nanobodies are a class of antigen-binding protein derived from camelids that achieve comparable binding affinities and specificities to classical antibodies, despite comprising only a single 15 kDa variable domain. Their reduced size makes them an exciting target molecule with which we can explore the molecular code that underpins binding specificity-how is such high specificity achieved? Here, we use a novel dataset of 90 nonredundant, protein-binding nanobodies with antigen-bound crystal structures to address this question. To provide a baseline for comparison we construct an analogous set of classical antibodies, allowing us to probe how nanobodies achieve high specificity binding with a dramatically reduced sequence space. Our analysis reveals that nanobodies do not diversify their framework region to compensate for the loss of the VL domain. In addition to the previously reported increase in H3 loop length, we find that nanobodies create diversity by drawing their paratope regions from a significantly larger set of aligned sequence positions, and by exhibiting greater structural variation in their H1 and H2 loops.


Asunto(s)
Anticuerpos , Anticuerpos de Dominio Único , Anticuerpos/química , Anticuerpos/genética , Anticuerpos/inmunología , Especificidad de Anticuerpos , Sitios de Unión de Anticuerpos , Modelos Moleculares , Conformación Proteica , Alineación de Secuencia , Anticuerpos de Dominio Único/química , Anticuerpos de Dominio Único/genética , Relación Estructura-Actividad
12.
Phys Rev Lett ; 119(20): 208101, 2017 Nov 17.
Artículo en Inglés | MEDLINE | ID: mdl-29219382

RESUMEN

In many contexts, it is extremely costly to perform enough high-quality experimental measurements to accurately parametrize a predictive quantitative model. However, it is often much easier to carry out large numbers of experiments that indicate whether each sample is above or below a given threshold. Can many such categorical or "coarse" measurements be combined with a much smaller number of high-resolution or "fine" measurements to yield accurate models? Here, we demonstrate an intuitive strategy, inspired by statistical physics, wherein the coarse measurements are used to identify the salient features of the data, while the fine measurements determine the relative importance of these features. A linear model is inferred from the fine measurements, augmented by a quadratic term that captures the correlation structure of the coarse data. We illustrate our strategy by considering the problems of predicting the antimalarial potency and aqueous solubility of small organic molecules from their 2D molecular structure.

13.
J Biol Chem ; 290(36): 22225-35, 2015 Sep 04.
Artículo en Inglés | MEDLINE | ID: mdl-26187469

RESUMEN

Allostery is a fundamental process by which ligand binding to a protein alters its activity at a distant site. Both experimental and theoretical evidence demonstrate that allostery can be communicated through altered slow relaxation protein dynamics without conformational change. The catabolite activator protein (CAP) of Escherichia coli is an exemplar for the analysis of such entropically driven allostery. Negative allostery in CAP occurs between identical cAMP binding sites. Changes to the cAMP-binding pocket can therefore impact the allosteric properties of CAP. Here we demonstrate, through a combination of coarse-grained modeling, isothermal calorimetry, and structural analysis, that decreasing the affinity of CAP for cAMP enhances negative cooperativity through an entropic penalty for ligand binding. The use of variant cAMP ligands indicates the data are not explained by structural heterogeneity between protein mutants. We observe computationally that altered interaction strength between CAP and cAMP variously modifies the change in allosteric cooperativity due to second site CAP mutations. As the degree of correlated motion between the cAMP-contacting site and a second site on CAP increases, there is a tendency for computed double mutations at these sites to drive CAP toward noncooperativity. Naturally occurring pairs of covarying residues in CAP do not display this tendency, suggesting a selection pressure to fine tune allostery on changes to the CAP ligand-binding pocket without a drive to a noncooperative state. In general, we hypothesize an evolutionary selection pressure to retain slow relaxation dynamics-induced allostery in proteins in which evolution of the ligand-binding site is occurring.


Asunto(s)
Proteína Receptora de AMP Cíclico/química , AMP Cíclico/química , Proteínas de Escherichia coli/química , Estructura Secundaria de Proteína , Estructura Terciaria de Proteína , Regulación Alostérica , Sitios de Unión , Cristalografía por Rayos X , AMP Cíclico/metabolismo , Proteína Receptora de AMP Cíclico/metabolismo , Entropía , Escherichia coli/metabolismo , Proteínas de Escherichia coli/metabolismo , Ligandos , Conformación Molecular , Simulación de Dinámica Molecular , Estructura Molecular , Unión Proteica
14.
PLoS Comput Biol ; 11(2): e1004091, 2015 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-25723535

RESUMEN

Statistical coupling analysis (SCA) is a method for analyzing multiple sequence alignments that was used to identify groups of coevolving residues termed "sectors". The method applies spectral analysis to a matrix obtained by combining correlation information with sequence conservation. It has been asserted that the protein sectors identified by SCA are functionally significant, with different sectors controlling different biochemical properties of the protein. Here we reconsider the available experimental data and note that it involves almost exclusively proteins with a single sector. We show that in this case sequence conservation is the dominating factor in SCA, and can alone be used to make statistically equivalent functional predictions. Therefore, we suggest shifting the experimental focus to proteins for which SCA identifies several sectors. Correlations in protein alignments, which have been shown to be informative in a number of independent studies, would then be less dominated by sequence conservation.


Asunto(s)
Biología Computacional/métodos , Dominios y Motivos de Interacción de Proteínas/fisiología , Proteínas/química , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Algoritmos , Secuencia de Aminoácidos , Secuencia Conservada , Dominios PDZ , Tetrahidrofolato Deshidrogenasa
15.
Proc Natl Acad Sci U S A ; 109(18): E1063-71, 2012 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-22517748

RESUMEN

Polycomb Group (PcG) proteins mediate heritable gene silencing by modifying chromatin structure. An essential PcG complex, PRC1, compacts chromatin and inhibits chromatin remodeling. In Drosophila melanogaster, the intrinsically disordered C-terminal region of PSC (PSC-CTR) mediates these noncovalent effects on chromatin, and is essential for viability. Because the PSC-CTR sequence is poorly conserved, the significance of its effects on chromatin outside of Drosophila was unclear. The absence of folded domains also made it difficult to understand how the sequence of PSC-CTR encodes its function. To determine the mechanistic basis and extent of conservation of PSC-CTR activity, we identified 17 metazoan PSC-CTRs spanning chordates to arthropods, and examined their sequence features and biochemical properties. PSC-CTR sequences are poorly conserved, but are all highly charged and structurally disordered. We show that active PSC-CTRs--which bind DNA tightly and inhibit chromatin remodeling efficiently--are distinguished from less active ones by the absence of extended negatively charged stretches. PSC-CTR activity can be increased by dispersing its contiguous negative charge, confirming the importance of this property. Using the sequence properties defined as important for PSC-CTR activity, we predicted the presence of active PSC-CTRs in additional diverse genomes. Our analysis reveals broad conservation of PSC-CTR activity across metazoans. This conclusion could not have been determined from sequence alignments. We further find that plants that lack active PSC-CTRs instead possess a functionally analogous PcG protein, EMF1. Thus, our study suggests that a disordered domain with dispersed negative charges underlies PRC1 activity, and is conserved across metazoans and plants.


Asunto(s)
Proteínas Represoras/química , Proteínas Represoras/genética , Animales , Ensamble y Desensamble de Cromatina , Secuencia Conservada , Proteínas de Drosophila/química , Proteínas de Drosophila/genética , Proteínas de Drosophila/metabolismo , Evolución Molecular , Proteínas de Insectos/química , Proteínas de Insectos/genética , Proteínas de Insectos/metabolismo , Filogenia , Proteínas de Plantas/química , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Proteínas del Grupo Polycomb , Subunidades de Proteína , Proteínas Represoras/metabolismo
16.
Nat Chem ; 2024 May 16.
Artículo en Inglés | MEDLINE | ID: mdl-38755312

RESUMEN

Several peptide dual agonists of the human glucagon receptor (GCGR) and the glucagon-like peptide-1 receptor (GLP-1R) are in development for the treatment of type 2 diabetes, obesity and their associated complications. Candidates must have high potency at both receptors, but it is unclear whether the limited experimental data available can be used to train models that accurately predict the activity at both receptors of new peptide variants. Here we use peptide sequence data labelled with in vitro potency at human GCGR and GLP-1R to train several models, including a deep multi-task neural-network model using multiple loss optimization. Model-guided sequence optimization was used to design three groups of peptide variants, with distinct ranges of predicted dual activity. We found that three of the model-designed sequences are potent dual agonists with superior biological activity. With our designs we were able to achieve up to sevenfold potency improvement at both receptors simultaneously compared to the best dual-agonist in the training set.

17.
Methods Mol Biol ; 2586: 49-77, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36705898

RESUMEN

Here we detail the LandscapeFold secondary structure prediction algorithm and how it is used. The algorithm was previously described and tested in (Kimchi O et al., Biophys J 117(3):520-532, 2019), though it was not named there. The algorithm directly enumerates all possible secondary structures into which up to two RNA or single-stranded DNA sequences can fold. It uses a polymer physics model to estimate the configurational entropy of structures including complex pseudoknots. We detail each of these steps and ways in which the user can adjust the algorithm as desired. The code is available on the GitHub repository https://github.com/ofer-kimchi/LandscapeFold .


Asunto(s)
Algoritmos , ARN , Conformación de Ácido Nucleico , ARN/genética , Entropía , ADN de Cadena Simple
18.
Elife ; 122023 02 27.
Artículo en Inglés | MEDLINE | ID: mdl-36847334

RESUMEN

Predicting the function of a protein from its amino acid sequence is a long-standing challenge in bioinformatics. Traditional approaches use sequence alignment to compare a query sequence either to thousands of models of protein families or to large databases of individual protein sequences. Here we introduce ProteInfer, which instead employs deep convolutional neural networks to directly predict a variety of protein functions - Enzyme Commission (EC) numbers and Gene Ontology (GO) terms - directly from an unaligned amino acid sequence. This approach provides precise predictions which complement alignment-based methods, and the computational efficiency of a single neural network permits novel and lightweight software interfaces, which we demonstrate with an in-browser graphical interface for protein function prediction in which all computation is performed on the user's personal computer with no data uploaded to remote servers. Moreover, these models place full-length amino acid sequences into a generalised functional space, facilitating downstream analysis and interpretation. To read the interactive version of this paper, please visit https://google-research.github.io/proteinfer/.


Asunto(s)
Algoritmos , Redes Neurales de la Computación , Proteínas/genética , Proteínas/química , Secuencia de Aminoácidos , Programas Informáticos , Biología Computacional/métodos
19.
Biochem Soc Trans ; 40(3): 475-91, 2012 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-22616857

RESUMEN

All proteins require physical interactions with other proteins in order to perform their functions. Most of them oligomerize into homomers, and a vast majority of these homomers interact with other proteins, at least part of the time, forming transient or obligate heteromers. In the present paper, we review the structural, biophysical and evolutionary aspects of these protein interactions. We discuss how protein function and stability benefit from oligomerization, as well as evolutionary pathways by which oligomers emerge, mostly from the perspective of homomers. Finally, we emphasize the specificities of heteromeric complexes and their structure and evolution. We also discuss two analytical approaches increasingly being used to study protein structures as well as their interactions. First, we review the use of the biological networks and graph theory for analysis of protein interactions and structure. Secondly, we discuss recent advances in techniques for detecting correlated mutations, with the emphasis on their role in identifying pathways of allosteric communication.


Asunto(s)
Distinciones y Premios , Complejos Multiproteicos/metabolismo , Proteínas/química , Proteínas/metabolismo , Regulación Alostérica , Animales , Evolución Molecular , Humanos , Numismática , Estructura Cuaternaria de Proteína , Proteínas/genética
20.
Nat Biotechnol ; 40(6): 932-937, 2022 06.
Artículo en Inglés | MEDLINE | ID: mdl-35190689

RESUMEN

Understanding the relationship between amino acid sequence and protein function is a long-standing challenge with far-reaching scientific and translational implications. State-of-the-art alignment-based techniques cannot predict function for one-third of microbial protein sequences, hampering our ability to exploit data from diverse organisms. Here, we train deep learning models to accurately predict functional annotations for unaligned amino acid sequences across rigorous benchmark assessments built from the 17,929 families of the protein families database Pfam. The models infer known patterns of evolutionary substitutions and learn representations that accurately cluster sequences from unseen families. Combining deep models with existing methods significantly improves remote homology detection, suggesting that the deep models learn complementary information. This approach extends the coverage of Pfam by >9.5%, exceeding additions made over the last decade, and predicts function for 360 human reference proteome proteins with no previous Pfam annotation. These results suggest that deep learning models will be a core component of future protein annotation tools.


Asunto(s)
Aprendizaje Profundo , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Humanos , Anotación de Secuencia Molecular , Proteoma/metabolismo , Proteómica
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA