RESUMO
Protein structures are essential to understanding cellular processes in molecular detail. While advances in artificial intelligence revealed the tertiary structure of proteins at scale, their quaternary structure remains mostly unknown. We devise a scalable strategy based on AlphaFold2 to predict homo-oligomeric assemblies across four proteomes spanning the tree of life. Our results suggest that approximately 45% of an archaeal proteome and a bacterial proteome and 20% of two eukaryotic proteomes form homomers. Our predictions accurately capture protein homo-oligomerization, recapitulate megadalton complexes, and unveil hundreds of homo-oligomer types, including three confirmed experimentally by structure determination. Integrating these datasets with omics information suggests that a majority of known protein complexes are symmetric. Finally, these datasets provide a structural context for interpreting disease mutations and reveal coiled-coil regions as major enablers of quaternary structure evolution in human. Our strategy is applicable to any organism and provides a comprehensive view of homo-oligomerization in proteomes.
Assuntos
Inteligência Artificial , Proteínas , Proteoma , Humanos , Proteínas/química , Proteínas/genética , Archaea/química , Archaea/genética , Eucariotos/química , Eucariotos/genética , Bactérias/química , Bactérias/genéticaRESUMO
Mutations in transporters can impact an individual's response to drugs and cause many diseases. Few variants in transporters have been evaluated for their functional impact. Here, we combine saturation mutagenesis and multi-phenotypic screening to dissect the impact of 11,213 missense single-amino-acid deletions, and synonymous variants across the 554 residues of OCT1, a key liver xenobiotic transporter. By quantifying in parallel expression and substrate uptake, we find that most variants exert their primary effect on protein abundance, a phenotype not commonly measured alongside function. Using our mutagenesis results combined with structure prediction and molecular dynamic simulations, we develop accurate structure-function models of the entire transport cycle, providing biophysical characterization of all known and possible human OCT1 polymorphisms. This work provides a complete functional map of OCT1 variants along with a framework for integrating functional genomics, biophysical modeling, and human genetics to predict variant effects on disease and drug efficacy.
Assuntos
Simulação de Dinâmica Molecular , Transportador 1 de Cátions Orgânicos , Conformação Proteica , Humanos , Transporte Biológico , Células HEK293 , Mutação , Mutação de Sentido Incorreto , Fator 1 de Transcrição de Octâmero , Transportador 1 de Cátions Orgânicos/genética , Transportador 1 de Cátions Orgânicos/metabolismo , Farmacogenética , Fenótipo , Relação Estrutura-AtividadeRESUMO
Protein flexibility ranges from simple hinge movements to functional disorder. Around half of all human proteins contain apparently disordered regions with little 3D or functional information, and many of these proteins are associated with disease. Building on the evolutionary couplings approach previously successful in predicting 3D states of ordered proteins and RNA, we developed a method to predict the potential for ordered states for all apparently disordered proteins with sufficiently rich evolutionary information. The approach is highly accurate (79%) for residue interactions as tested in more than 60 known disordered regions captured in a bound or specific condition. Assessing the potential for structure of more than 1,000 apparently disordered regions of human proteins reveals a continuum of structural order with at least 50% with clear propensity for three- or two-dimensional states. Co-evolutionary constraints reveal hitherto unseen structures of functional importance in apparently disordered proteins.
Assuntos
Proteínas Intrinsicamente Desordenadas/química , Evolução Molecular Direcionada/métodos , Genômica , Humanos , Proteínas Intrinsicamente Desordenadas/genética , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína , Proteoma/química , Proteoma/genéticaRESUMO
tRNA function is based on unique structures that enable mRNA decoding using anticodon trinucleotides. These structures interact with specific aminoacyl-tRNA synthetases and ribosomes using 3D shape and sequence signatures. Beyond translation, tRNAs serve as versatile signaling molecules interacting with other RNAs and proteins. Through evolutionary processes, tRNA fragmentation emerges as not merely random degradation but an act of recreation, generating specific shorter molecules called tRNA-derived small RNAs (tsRNAs). These tsRNAs exploit their linear sequences and newly arranged 3D structures for unexpected biological functions, epitomizing the tRNA "renovatio" (from Latin, meaning renewal, renovation, and rebirth). Emerging methods to uncover full tRNA/tsRNA sequences and modifications, combined with techniques to study RNA structures and to integrate AI-powered predictions, will enable comprehensive investigations of tRNA fragmentation products and new interaction potentials in relation to their biological functions. We anticipate that these directions will herald a new era for understanding biological complexity and advancing pharmaceutical engineering.
Assuntos
Aminoacil-tRNA Sintetases , RNA de Transferência , RNA de Transferência/metabolismo , Anticódon , Aminoacil-tRNA Sintetases/metabolismo , Ribossomos/metabolismo , RNA Mensageiro/genéticaRESUMO
ATG9A and ATG2A are essential core members of the autophagy machinery. ATG9A is a lipid scramblase that allows equilibration of lipids across a membrane bilayer, whereas ATG2A facilitates lipid flow between tethered membranes. Although both have been functionally linked during the formation of autophagosomes, the molecular details and consequences of their interaction remain unclear. By combining data from peptide arrays, crosslinking, and hydrogen-deuterium exchange mass spectrometry together with cryoelectron microscopy, we propose a molecular model of the ATG9A-2A complex. Using this integrative structure modeling approach, we identify several interfaces mediating ATG9A-2A interaction that would allow a direct transfer of lipids from ATG2A into the lipid-binding perpendicular branch of ATG9A. Mutational analyses combined with functional activity assays demonstrate their importance for autophagy, thereby shedding light on this protein complex at the heart of autophagy.
Assuntos
Autofagossomos , Autofagia , Microscopia Crioeletrônica , Bioensaio , LipídeosRESUMO
Solute carrier (SLCs) transporters mediate the transport of a broad range of solutes across biological membranes. Dysregulation of SLCs has been associated with various pathologies, including metabolic and neurological disorders, as well as cancer and rare diseases. SLCs are therefore emerging as key targets for therapeutic intervention with several recently approved drugs targeting these proteins. Unlocking this large and complex group of proteins is essential to identifying unknown SLC targets and developing next-generation SLC therapeutics. Recent progress in experimental and computational techniques has significantly advanced SLC research, including drug discovery. Here, we review emerging topics in therapeutic discovery of SLCs, focusing on state-of-the-art approaches in structural, chemical, and computational biology, and discuss current challenges in transporter drug discovery.
Assuntos
Neoplasias , Proteínas Carreadoras de Solutos , Humanos , Proteínas Carreadoras de Solutos/química , Proteínas Carreadoras de Solutos/metabolismo , Proteínas de Membrana Transportadoras/química , Transporte Biológico/fisiologia , Descoberta de Drogas/métodos , Neoplasias/metabolismoRESUMO
Breakthrough methods in machine learning (ML), protein structure prediction, and novel ultrafast structural aligners are revolutionizing structural biology. Obtaining accurate models of proteins and annotating their functions on a large scale is no longer limited by time and resources. The most recent method to be top ranked by the Critical Assessment of Structure Prediction (CASP) assessment, AlphaFold 2 (AF2), is capable of building structural models with an accuracy comparable to that of experimental structures. Annotations of 3D models are keeping pace with the deposition of the structures due to advancements in protein language models (pLMs) and structural aligners that help validate these transferred annotations. In this review we describe how recent developments in ML for protein science are making large-scale structural bioinformatics available to the general scientific community.
Assuntos
Aprendizado de Máquina , Proteínas , Proteínas/química , Biologia Computacional/métodos , Conformação ProteicaRESUMO
The last five years have seen impressive progress in deep learning models applied to protein research. Most notably, sequence-based structure predictions have seen transformative gains in the form of AlphaFold2 and related approaches. Millions of missense protein variants in the human population lack annotations, and these computational methods are a valuable means to prioritize variants for further analysis. Here, we review the recent progress in deep learning models applied to the prediction of protein structure and protein variants, with particular emphasis on their implications for human genetics and health. Improved prediction of protein structures facilitates annotations of the impact of variants on protein stability, protein-protein interaction interfaces, and small-molecule binding pockets. Moreover, it contributes to the study of host-pathogen interactions and the characterization of protein function. As genome sequencing in large cohorts becomes increasingly prevalent, we believe that better integration of state-of-the-art protein informatics technologies into human genetics research is of paramount importance.
Assuntos
Proteínas , Humanos , Proteínas/genética , Proteínas/química , Proteínas/metabolismo , Aprendizado Profundo , Genética Humana , Conformação Proteica , Biologia Computacional/métodosRESUMO
Secreted signaling peptides are central regulators of growth, development, and stress responses, but specific steps in the evolution of these peptides and their receptors are not well understood. Also, the molecular mechanisms of peptide-receptor binding are only known for a few examples, primarily owing to the limited availability of protein structural determination capabilities to few laboratories worldwide. Plants have evolved a multitude of secreted signaling peptides and corresponding transmembrane receptors. Stress-responsive SERINE RICH ENDOGENOUS PEPTIDES (SCOOPs) were recently identified. Bioactive SCOOPs are proteolytically processed by subtilases and are perceived by the leucine-rich repeat receptor kinase MALE DISCOVERER 1-INTERACTING RECEPTOR-LIKE KINASE 2 (MIK2) in the model plant Arabidopsis thaliana. How SCOOPs and MIK2 have (co)evolved, and how SCOOPs bind to MIK2 are unknown. Using in silico analysis of 350 plant genomes and subsequent functional testing, we revealed the conservation of MIK2 as SCOOP receptor within the plant order Brassicales. We then leveraged AI-based structural modeling and comparative genomics to identify two conserved putative SCOOP-MIK2 binding pockets across Brassicales MIK2 homologues predicted to interact with the "SxS" motif of otherwise sequence-divergent SCOOPs. Mutagenesis of both predicted binding pockets compromised SCOOP binding to MIK2, SCOOP-induced complex formation between MIK2 and its coreceptor BRASSINOSTEROID INSENSITIVE 1-ASSOCIATED KINASE 1, and SCOOP-induced reactive oxygen species production, thus, confirming our in silico predictions. Collectively, in addition to revealing the elusive SCOOP-MIK2 binding mechanism, our analytic pipeline combining phylogenomics, AI-based structural predictions, and experimental biochemical and physiological validation provides a blueprint for the elucidation of peptide ligand-receptor perception mechanisms.
Assuntos
Proteínas de Arabidopsis , Arabidopsis , Arabidopsis/metabolismo , Arabidopsis/genética , Proteínas de Arabidopsis/metabolismo , Proteínas de Arabidopsis/química , Proteínas de Arabidopsis/genética , Ligantes , Ligação Proteica , Proteínas Serina-Treonina Quinases/metabolismo , Proteínas Serina-Treonina Quinases/química , Proteínas Serina-Treonina Quinases/genética , Peptídeos/metabolismo , Peptídeos/química , Evolução Molecular , Modelos Moleculares , Transdução de Sinais , FosfotransferasesRESUMO
Two years on from the initial release of AlphaFold, we have seen its widespread adoption as a structure prediction tool. Here, we discuss some of the latest work based on AlphaFold, with a particular focus on its use within the structural biology community. This encompasses use cases like speeding up structure determination itself, enabling new computational studies, and building new tools and workflows. We also look at the ongoing validation of AlphaFold, as its predictions continue to be compared against large numbers of experimental structures to further delineate the model's capabilities and limitations.
RESUMO
Protein language models (pLMs) have emerged as potent tools for predicting and designing protein structure and function, and the degree to which these models fundamentally understand the inherent biophysics of protein structure stands as an open question. Motivated by a finding that pLM-based structure predictors erroneously predict nonphysical structures for protein isoforms, we investigated the nature of sequence context needed for contact predictions in the pLM Evolutionary Scale Modeling (ESM-2). We demonstrate by use of a "categorical Jacobian" calculation that ESM-2 stores statistics of coevolving residues, analogously to simpler modeling approaches like Markov Random Fields and Multivariate Gaussian models. We further investigated how ESM-2 "stores" information needed to predict contacts by comparing sequence masking strategies, and found that providing local windows of sequence information allowed ESM-2 to best recover predicted contacts. This suggests that pLMs predict contacts by storing motifs of pairwise contacts. Our investigation highlights the limitations of current pLMs and underscores the importance of understanding the underlying mechanisms of these models.
Assuntos
Proteínas , Proteínas/química , Evolução Molecular , Motivos de Aminoácidos , Modelos Moleculares , Conformação Proteica , Cadeias de MarkovRESUMO
Protein structure prediction has been greatly improved by deep learning in the past few years. However, the most successful methods rely on multiple sequence alignment (MSA) of the sequence homologs of the protein under prediction. In nature, a protein folds in the absence of its sequence homologs and thus, a MSA-free structure prediction method is desired. Here, we develop a single-sequence-based protein structure prediction method RaptorX-Single by integrating several protein language models and a structure generation module and then study its advantage over MSA-based methods. Our experimental results indicate that in addition to running much faster than MSA-based methods such as AlphaFold2, RaptorX-Single outperforms AlphaFold2 and other MSA-free methods in predicting the structure of antibodies (after fine-tuning on antibody data), proteins of very few sequence homologs, and single mutation effects. By comparing different protein language models, our results show that not only the scale but also the training data of protein language models will impact the performance. RaptorX-Single also compares favorably to MSA-based AlphaFold2 when the protein under prediction has a large number of sequence homologs.
Assuntos
Anticorpos , Proteínas , Proteínas/genética , Proteínas/química , Anticorpos/genética , Alinhamento de Sequência , AlgoritmosRESUMO
Identifying antibodies that neutralize specific antigens is crucial for developing effective immunotherapies, but this task remains challenging for many target antigens. The rise of deep learning-based computational approaches presents a promising avenue to address this challenge. Here, we assess the performance of a deep learning approach through two benchmark tests aimed at predicting antibodies for the receptor-binding domain of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike protein. Three different strategies for constructing input sequence alignments are employed for predicting structural models of antigen-antibody complexes. In our initial testing set, which comprises known experimental structures, these strategies collectively yield a significant top-ranked prediction for 61% of cases and a success rate of 47%. Notably, one strategy that utilizes the sequences of known antigen binders outperforms the other two, achieving a precision of 90% in a subsequent test set of ~1,000 antibodies, balanced between true and control antibodies for the antigen, albeit with a lower recall of 25%. Our results underscore the potential of integrating deep learning methods with single B cell sequencing techniques to enhance the prediction accuracy of antigen-antibody interactions.
Assuntos
Complexo Antígeno-Anticorpo , Aprendizado Profundo , SARS-CoV-2 , Glicoproteína da Espícula de Coronavírus , Glicoproteína da Espícula de Coronavírus/imunologia , Glicoproteína da Espícula de Coronavírus/química , Humanos , SARS-CoV-2/imunologia , Complexo Antígeno-Anticorpo/imunologia , Complexo Antígeno-Anticorpo/química , COVID-19/imunologia , COVID-19/virologia , Anticorpos Antivirais/imunologia , Anticorpos Neutralizantes/imunologia , Biologia Computacional/métodosRESUMO
Proteins perform their biological functions through motion. Although high throughput prediction of the three-dimensional static structures of proteins has proved feasible using deep-learning-based methods, predicting the conformational motions remains a challenge. Purely data-driven machine learning methods encounter difficulty for addressing such motions because available laboratory data on conformational motions are still limited. In this work, we develop a method for generating protein allosteric motions by integrating physical energy landscape information into deep-learning-based methods. We show that local energetic frustration, which represents a quantification of the local features of the energy landscape governing protein allosteric dynamics, can be utilized to empower AlphaFold2 (AF2) to predict protein conformational motions. Starting from ground state static structures, this integrative method generates alternative structures as well as pathways of protein conformational motions, using a progressive enhancement of the energetic frustration features in the input multiple sequence alignment sequences. For a model protein adenylate kinase, we show that the generated conformational motions are consistent with available experimental and molecular dynamics simulation data. Applying the method to another two proteins KaiB and ribose-binding protein, which involve large-amplitude conformational changes, can also successfully generate the alternative conformations. We also show how to extract overall features of the AF2 energy landscape topography, which has been considered by many to be black box. Incorporating physical knowledge into deep-learning-based structure prediction algorithms provides a useful strategy to address the challenges of dynamic structure prediction of allosteric proteins.
Assuntos
Simulação de Dinâmica Molecular , Conformação Proteica , Proteínas/química , Adenilato Quinase/química , Adenilato Quinase/metabolismo , Regulação Alostérica , Aprendizado ProfundoRESUMO
In recent years, cyclic peptides have emerged as a promising therapeutic modality due to their diverse biological activities. Understanding the structures of these cyclic peptides and their complexes is crucial for unlocking invaluable insights about protein target-cyclic peptide interaction, which can facilitate the development of novel-related drugs. However, conducting experimental observations is time-consuming and expensive. Computer-aided drug design methods are not practical enough in real-world applications. To tackles this challenge, we introduce HighFold, an AlphaFold-derived model in this study. By integrating specific details about the head-to-tail circle and disulfide bridge structures, the HighFold model can accurately predict the structures of cyclic peptides and their complexes. Our model demonstrates superior predictive performance compared to other existing approaches, representing a significant advancement in structure-activity research. The HighFold model is openly accessible at https://github.com/hongliangduan/HighFold.
Assuntos
Dissulfetos , Peptídeos Cíclicos , Peptídeos Cíclicos/química , Dissulfetos/química , Software , Modelos Moleculares , Conformação Proteica , Algoritmos , Biologia Computacional/métodosRESUMO
In recent decades, antibodies have emerged as indispensable therapeutics for combating diseases, particularly viral infections. However, their development has been hindered by limited structural information and labor-intensive engineering processes. Fortunately, significant advancements in deep learning methods have facilitated the precise prediction of protein structure and function by leveraging co-evolution information from homologous proteins. Despite these advances, predicting the conformation of antibodies remains challenging due to their unique evolution and the high flexibility of their antigen-binding regions. Here, to address this challenge, we present the Bio-inspired Antibody Language Model (BALM). This model is trained on a vast dataset comprising 336 million 40% nonredundant unlabeled antibody sequences, capturing both unique and conserved properties specific to antibodies. Notably, BALM showcases exceptional performance across four antigen-binding prediction tasks. Moreover, we introduce BALMFold, an end-to-end method derived from BALM, capable of swiftly predicting full atomic antibody structures from individual sequences. Remarkably, BALMFold outperforms those well-established methods like AlphaFold2, IgFold, ESMFold and OmegaFold in the antibody benchmark, demonstrating significant potential to advance innovative engineering and streamline therapeutic antibody development by reducing the need for unnecessary trials. The BALMFold structure prediction server is freely available at https://beamlab-sh.com/models/BALMFold.
Assuntos
Anticorpos , Anticorpos/química , Anticorpos/imunologia , Biologia Computacional/métodos , Conformação Proteica , Humanos , Modelos Moleculares , Aprendizado ProfundoRESUMO
Accurate prediction of antibody-antigen complex structures is pivotal in drug discovery, vaccine design and disease treatment and can facilitate the development of more effective therapies and diagnostics. In this work, we first review the antibody-antigen docking (ABAG-docking) datasets. Then, we present the creation and characterization of a comprehensive benchmark dataset of antibody-antigen complexes. We categorize the dataset based on docking difficulty, interface properties and structural characteristics, to provide a diverse set of cases for rigorous evaluation. Compared with Docking Benchmark 5.5, we have added 112 cases, including 14 single-domain antibody (sdAb) cases and 98 monoclonal antibody (mAb) cases, and also increased the proportion of Difficult cases. Our dataset contains diverse cases, including human/humanized antibodies, sdAbs, rodent antibodies and other types, opening the door to better algorithm development. Furthermore, we provide details on the process of building the benchmark dataset and introduce a pipeline for periodic updates to keep it up to date. We also utilize multiple complex prediction methods including ZDOCK, ClusPro, HDOCK and AlphaFold-Multimer for testing and analyzing this dataset. This benchmark serves as a valuable resource for evaluating and advancing docking computational methods in the analysis of antibody-antigen interaction, enabling researchers to develop more accurate and effective tools for predicting and designing antibody-antigen complexes. The non-redundant ABAG-docking structure benchmark dataset is available at https://github.com/Zhaonan99/Antibody-antigen-complex-structure-benchmark-dataset.
Assuntos
Algoritmos , Benchmarking , Humanos , Anticorpos Monoclonais , Anticorpos Monoclonais Humanizados , Complexo Antígeno-AnticorpoRESUMO
Advancements in peptidomics have revealed numerous small open reading frames with coding potential and revealed that some of these micropeptides are closely related to human cancer. However, the systematic analysis and integration from sequence to structure and function remains largely undeveloped. Here, as a solution, we built a workflow for the collection and analysis of proteomic data, transcriptomic data, and clinical outcomes for cancer-associated micropeptides using publicly available datasets from large cohorts. We initially identified 19 586 novel micropeptides by reanalyzing proteomic profile data from 3753 samples across 8 cancer types. Further quantitative analysis of these micropeptides, along with associated clinical data, identified 3065 that were dysregulated in cancer, with 370 of them showing a strong association with prognosis. Moreover, we employed a deep learning framework to construct a micropeptide-protein interaction network for further bioinformatics analysis, revealing that micropeptides are involved in multiple biological processes as bioactive molecules. Taken together, our atlas provides a benchmark for high-throughput prediction and functional exploration of micropeptides, providing new insights into their biological mechanisms in cancer. The HMPA is freely available at http://hmpa.zju.edu.cn.
Assuntos
Biologia Computacional , Neoplasias , Peptídeos , Proteômica , Humanos , Proteômica/métodos , Peptídeos/metabolismo , Peptídeos/genética , Peptídeos/química , Neoplasias/metabolismo , Neoplasias/genética , Biologia Computacional/métodos , Proteoma/metabolismo , Mapas de Interação de Proteínas , Aprendizado ProfundoRESUMO
The breakthrough in cryo-electron microscopy (cryo-EM) technology has led to an increasing number of density maps of biological macromolecules. However, constructing accurate protein complex atomic structures from cryo-EM maps remains a challenge. In this study, we extend our previously developed DEMO-EM to present DEMO-EM2, an automated method for constructing protein complex models from cryo-EM maps through an iterative assembly procedure intertwining chain- and domain-level matching and fitting for predicted chain models. The method was carefully evaluated on 27 cryo-electron tomography (cryo-ET) maps and 16 single-particle EM maps, where DEMO-EM2 models achieved an average TM-score of 0.92, outperforming those of state-of-the-art methods. The results demonstrate an efficient method that enables the rapid and reliable solution of challenging cryo-EM structure modeling problems.
Assuntos
Microscopia Crioeletrônica , Microscopia Crioeletrônica/métodos , Modelos Moleculares , Conformação ProteicaRESUMO
Protein structure prediction is a longstanding issue crucial for identifying new drug targets and providing a mechanistic understanding of protein functions. To enhance the progress in this field, a spectrum of computational methodologies has been cultivated. AlphaFold2 has exhibited exceptional precision in predicting wild-type protein structures, with performance exceeding that of other methods. However, predicting the structures of missense mutant proteins using AlphaFold2 remains challenging due to the intricate and substantial structural alterations caused by minor sequence variations in the mutant proteins. Molecular dynamics (MD) has been validated for precisely capturing changes in amino acid interactions attributed to protein mutations. Therefore, for the first time, a strategy entitled 'MoDAFold' was proposed to improve the accuracy and reliability of missense mutant protein structure prediction by combining AlphaFold2 with MD. Multiple case studies have confirmed the superior performance of MoDAFold compared to other methods, particularly AlphaFold2.