RESUMO
Deep learning-based methods for generating functional proteins address the growing need for novel biocatalysts, allowing for precise tailoring of functionalities to meet specific requirements. This advancement leads to the development of highly efficient and specialized proteins with diverse applications across scientific, technological, and biomedical fields. This study establishes a pipeline for protein sequence generation with a conditional protein diffusion model, namely CPDiffusion, to create diverse sequences of proteins with enhanced functions. CPDiffusion accommodates protein-specific conditions, such as secondary structures and highly conserved amino acids. Without relying on extensive training data, CPDiffusion effectively captures highly conserved residues and sequence features for specific protein families. We applied CPDiffusion to generate artificial sequences of Argonaute (Ago) proteins based on the backbone structures of wild-type (WT) Kurthia massiliensis Ago (KmAgo) and Pyrococcus furiosus Ago (PfAgo), which are complex multi-domain programmable endonucleases. The generated sequences deviate by up to nearly 400 amino acids from their WT templates. Experimental tests demonstrated that the majority of the generated proteins for both KmAgo and PfAgo show unambiguous activity in DNA cleavage, with many of them exhibiting superior activity as compared to the WT. These findings underscore CPDiffusion's remarkable success rate in generating novel sequences for proteins with complex structures and functions in a single step, leading to enhanced activity. This approach facilitates the design of enzymes with multi-domain molecular structures and intricate functions through in silico generation and screening, all accomplished without the need for supervision from labeled data.
RESUMO
Increasing the binding affinity of an antibody to its target antigen is a crucial task in antibody therapeutics development. This paper presents a pretrainable geometric graph neural network, GearBind, and explores its potential in in silico affinity maturation. Leveraging multi-relational graph construction, multi-level geometric message passing and contrastive pretraining on mass-scale, unlabeled protein structural data, GearBind outperforms previous state-of-the-art approaches on SKEMPI and an independent test set. A powerful ensemble model based on GearBind is then derived and used to successfully enhance the binding of two antibodies with distinct formats and target antigens. ELISA EC50 values of the designed antibody mutants are decreased by up to 17 fold, and KD values by up to 6.1 fold. These promising results underscore the utility of geometric deep learning and effective pretraining in macromolecule interaction modeling tasks.
Assuntos
Afinidade de Anticorpos , Redes Neurais de Computação , Humanos , Anticorpos/imunologia , Anticorpos/química , Simulação por Computador , Aprendizado Profundo , Antígenos/imunologia , Ligação Proteica , Ensaio de Imunoadsorção Enzimática , Modelos MolecularesRESUMO
Fine-tuning pretrained protein language models (PLMs) has emerged as a prominent strategy for enhancing downstream prediction tasks, often outperforming traditional supervised learning approaches. As a widely applied powerful technique in natural language processing, employing parameter-efficient fine-tuning techniques could potentially enhance the performance of PLMs. However, the direct transfer to life science tasks is nontrivial due to the different training strategies and data forms. To address this gap, we introduce SES-Adapter, a simple, efficient, and scalable adapter method for enhancing the representation learning of PLMs. SES-Adapter incorporates PLM embeddings with structural sequence embeddings to create structure-aware representations. We show that the proposed method is compatible with different PLM architectures and across diverse tasks. Extensive evaluations are conducted on 2 types of folding structures with notable quality differences, 9 state-of-the-art baselines, and 9 benchmark data sets across distinct downstream tasks. Results show that compared to vanilla PLMs, SES-Adapter improves downstream task performance by a maximum of 11% and an average of 3%, with significantly accelerated convergence speed by a maximum of 1034% and an average of 362%, the training efficiency is also improved by approximately 2 times. Moreover, positive optimization is observed even with low-quality predicted structures. The source code for SES-Adapter is available at https://github.com/tyang816/SES-Adapter.
Assuntos
Modelos Moleculares , Proteínas , Proteínas/química , Conformação Proteica , Processamento de Linguagem NaturalRESUMO
Prokaryotic Argonaute (pAgo) proteins, a class of DNA/RNA-guided programmable endonucleases, have been extensively utilized in nucleic acid-based biosensors. The specific binding and cleavage of nucleic acids by pAgo proteins, which are crucial processes for their applications, are dependent on the presence of Mn2+ bound in the pockets, as verified through X-ray crystallography. However, a comprehensive understanding of how dissociated Mn2+ in the solvent affects the catalytic cycle, and its underlying regulatory role in this structure-function relationship, remains underdetermined. By combining experimental and computational methods, this study reveals that unbound Mn2+ in solution enhances the flexibility of diverse pAgo proteins. This increase in flexibility through decreasing the number of hydrogen bonds, induced by Mn2+, leads to higher affinity for substrates, thus facilitating cleavage. More importantly, Mn2+-induced structural flexibility increases the mismatch tolerance between guide-target pairs by increasing the conformational states, thereby enhancing the cleavage of mismatches. Further simulations indicate that the enhanced flexibility in linkers triggers conformational changes in the PAZ domain for recognizing various lengths of nucleic acids. Additionally, Mn2+-induced dynamic alterations of the protein cause a conformational shift in the N domain and catalytic sites towards their functional form, resulting in a decreased energy penalty for target release and cleavage. These findings demonstrate that the dynamic conformations of pAgo proteins, resulting from the presence of the unbound Mn2+ in solution, significantly promote the catalytic cycle of endonucleases and the tolerance of cleavage to mismatches. This flexibility enhancement mechanism serves as a general strategy employed by Ago proteins from diverse prokaryotes to accomplish their catalytic functions and provide useful information for Ago-based precise molecular diagnostics.
RESUMO
Phosphorylation of proteins plays an important regulatory role at almost all levels of cellular organization. Molecular dynamics (MD) simulation is a promising tool to reveal the mechanism of how phosphorylation regulates many key biological processes at the atomistic level. MD simulation accuracy depends on force field precision, while the current force fields for phospho-amino acids have resulted in notable inconsistency with experimental data. Here, a new force field parameter (named FB18CMAP) is generated by fitting against quantum mechanics (QM) energy in aqueous solution with φ/ψ dihedral potential-energy surfaces optimized using CMAP parameters. MD simulations of phosphorylated dipeptides, intrinsically disordered proteins (IDPs), and ordered (folded) proteins show that FB18CMAP can mimic NMR observables and structural characteristics of phosphorylated dipeptides and proteins more accurately than the FB18 force field. These findings suggest that FB18CMAP performs well in both the simulation of ordered and disordered states of phosphorylated proteins.
Assuntos
Proteínas Intrinsicamente Desordenadas , Fosfoproteínas , Conformação Proteica , Fosforilação , Simulação de Dinâmica Molecular , Proteínas Intrinsicamente Desordenadas/química , Dipeptídeos/químicaRESUMO
Ancestral metabolism has remained controversial due to a lack of evidence beyond sequence-based reconstructions. Although prebiotic chemists have provided hints that metabolism might originate from non-enzymatic protometabolic pathways, gaps between ancestral reconstruction and prebiotic processes mean there is much that is still unknown. Here, we apply proteome-wide 3D structure predictions and comparisons to investigate ancestorial metabolism of ancient bacteria and archaea, to provide information beyond sequence as a bridge to the prebiotic processes. We compare representative bacterial and archaeal strains, which reveal surprisingly similar physiological and metabolic characteristics via microbiological and biophysical experiments. Pairwise comparison of protein structures identify the conserved metabolic modules in bacteria and archaea, despite interference from overly variable sequences. The conserved modules (for example, middle of glycolysis, partial TCA, proton/sulfur respiration, building block biosynthesis) constitute the basic functions that possibly existed in the archaeal-bacterial common ancestor, which are remarkably consistent with the experimentally confirmed protometabolic pathways. These structure-based findings provide a new perspective to reconstructing the ancestral metabolism and understanding its origin, which suggests high-throughput protein 3D structure prediction is a promising approach, deserving broader application in future ancestral exploration.
Assuntos
Archaea , Proteoma , Archaea/genética , Archaea/metabolismo , Proteoma/metabolismo , Filogenia , Evolução Molecular , Bactérias/genética , Bactérias/metabolismoRESUMO
Phosphorylation plays a key role in plant biology, such as the accumulation of plant cells to form the observed proteome. Statistical analysis found that many phosphorylation sites are located in disordered regions. However, current force fields are mainly trained for structural proteins, which might not have the capacity to perfectly capture the dynamic conformation of the phosphorylated proteins. Therefore, we evaluated the performance of ff03CMAP, a balanced force field between structural and disordered proteins, for the sampling of the phosphorylated proteins. The test results of 11 different phosphorylated systems, including dipeptides, disordered proteins, folded proteins, and their complex, indicate that the ff03CMAP force field can better sample the conformations of phosphorylation sites for disordered proteins and disordered regions than ff03. For the solvent model, the results strongly suggest that the ff03CMAP force field with the TIP4PD water model is the best combination for the conformer sampling. Additional tests of CHARMM36m and FB18 force fields on two phosphorylated systems suggest that the overall performance of ff03CMAP is similar to that of FB18 and better than that of CHARMM36m. These results can help other researchers to choose suitable force field and solvent models to investigate the dynamic properties of phosphorylation proteins.
Assuntos
Proteínas Intrinsicamente Desordenadas , Dipeptídeos , Proteínas Intrinsicamente Desordenadas/química , Simulação de Dinâmica Molecular , Fosforilação , Conformação Proteica , Proteoma , Solventes/química , ÁguaRESUMO
BACKGROUND: Prokaryotic Argonaute (pAgo) proteins are well-known oligonucleotide-directed endonucleases, which contain a conserved PIWI domain required for endonuclease activity. Distantly related to pAgos, PIWI-RE family, which is defined as PIWI with conserved R and E residues, has been suggested to exhibit divergent activities. The distinctive biochemical properties and physiological functions of PIWI-RE family members need to be elucidated to explore their applications in gene editing. RESULTS: Here, we describe the catalytic performance and cellular functions of a PIWI-RE family protein from Pseudomonas stutzeri (PsPIWI-RE). Structural modelling suggests that the protein possesses a PIWI structure similar to that of pAgo, but with different PAZ-like and N-terminal domains. Unlike previously reported pAgos, recombinant PsPIWI-RE acts as an RNA-guided DNA nuclease, as well as a DNA-guided RNA nuclease. It cleaves single-stranded DNA at temperatures ranging from 20 to 65 °C, with an optimum temperature of 45 °C. Mutation at D525 or D610 significantly reduced its endonuclease activity, confirming that both residues are key for catalysis. Comparing with wild-type, mutant with PIWI-RE knockout is more sensitive to ciprofloxacin as DNA replication inhibitor, suggesting PIWI-RE may potentially be involved in DNA replication. CONCLUSION: Our study provides the first insights into the programmable nuclease activity and biological function of the unknown PIWI-RE family of proteins, emphasizing their important role in vivo and potential application in genomic DNA modification.
RESUMO
Hepatitis C virus (HCV) is a notorious member of the Flaviviridae family of enveloped, positive-strand RNA viruses. Non-structural protein 5A (NS5A) plays a key role in HCV replication and assembly. NS5A is a multi-domain protein which includes an N-terminal amphipathic membrane anchoring alpha helix, a highly structured domain-1, and two intrinsically disordered domains 2-3. The highly structured domain-1 contains a zinc finger (Zf)-site, and binding of zinc stabilizes the overall structure, while ejection of this zinc from the Zf-site destabilizes the overall structure. Therefore, NS5A is an attractive target for anti-HCV therapy by disulfiram, through ejection of zinc from the Zf-site. However, the zinc ejection mechanism is poorly understood. To disclose this mechanism based on three different states, A-state (NS5A protein), B-state (NS5A + Zn), and C-state (NS5A + Zn + disulfiram), we have performed molecular dynamics (MD) simulation in tandem with DFT calculations in the current study. The MD results indicate that disulfiram triggers Zn ejection from the Zf-site predominantly through altering the overall conformation ensemble. On the other hand, the DFT assessment demonstrates that the Zn adopts a tetrahedral configuration at the Zf-site with four Cys residues, which indicates a stable protein structure morphology. Disulfiram binding induces major conformational changes at the Zf-site, introduces new interactions of Cys39 with disulfiram, and further weakens the interaction of this residue with Zn, causing ejection of zinc from the Zf-site. The proposed mechanism elucidates the therapeutic potential of disulfiram and offers theoretical guidance for the advancement of drug candidates.