Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 41.513
Filter
1.
Proc Natl Acad Sci U S A ; 121(28): e2400151121, 2024 Jul 09.
Article in English | MEDLINE | ID: mdl-38954548

ABSTRACT

Protein folding and evolution are intimately linked phenomena. Here, we revisit the concept of exons as potential protein folding modules across a set of 38 abundant and conserved protein families. Taking advantage of genomic exon-intron organization and extensive protein sequence data, we explore exon boundary conservation and assess the foldon-like behavior of exons using energy landscape theoretic measurements. We found deviations in the exon size distribution from exponential decay indicating selection in evolution. We show that when taken together there is a pronounced tendency to independent foldability for segments corresponding to the more conserved exons, supporting the idea of exon-foldon correspondence. While 45% of the families follow this general trend when analyzed individually, there are some families for which other stronger functional determinants, such as preserving frustrated active sites, may be acting. We further develop a systematic partitioning of protein domains using exon boundary hotspots, showing that minimal common exons correspond with uninterrupted alpha and/or beta elements for the majority of the families but not for all of them.


Subject(s)
Exons , Protein Folding , Exons/genetics , Humans , Proteins/genetics , Proteins/chemistry , Evolution, Molecular , Introns/genetics
2.
Zhonghua Yi Xue Yi Chuan Xue Za Zhi ; 41(7): 872-880, 2024 Jul 10.
Article in Chinese | MEDLINE | ID: mdl-38946376

ABSTRACT

With the advance of research, non-coding RNA has been found to surpass the traditional definition to directly code functional proteins by coding sequence elements and binding with ribosomes. Among the non-coding RNAs, the function of circRNA encoded proteins has been most extensively studied. This study has used "circRNA", "encoded", and "translation" as the key words to search the PubMed and Web of Science databases. The retrieved literature was screened and traced, with the translation mechanism, related research methods, and correlation with diseases of circRNA reviewed. CircRNA can translate proteins through a non-cap-dependent pathway. Multiple molecular techniques, in particular mass spectrometry analysis, have important value in identifying unique peptide segments of circRNA encoded proteins for confirming their existence. The proteins encoded by the circRNA are involved in the pathogenesis of diseases of the digestive, neurological, urinary systems and the breast, and have the potential to serve as novel targets for disease diagnosis and treatment. This article has provided a comprehensive review for the basic theory, experimental methods, and disease-related research in the field of circRNA translation, which may provide clues for the identification of new diagnostic and therapeutic targets.


Subject(s)
RNA, Circular , RNA, Circular/genetics , Humans , RNA/genetics , Proteins/genetics , Animals , Protein Biosynthesis , Disease/genetics
3.
Nat Commun ; 15(1): 5566, 2024 Jul 02.
Article in English | MEDLINE | ID: mdl-38956442

ABSTRACT

Accurately modeling the protein fitness landscapes holds great importance for protein engineering. Pre-trained protein language models have achieved state-of-the-art performance in predicting protein fitness without wet-lab experimental data, but their accuracy and interpretability remain limited. On the other hand, traditional supervised deep learning models require abundant labeled training examples for performance improvements, posing a practical barrier. In this work, we introduce FSFP, a training strategy that can effectively optimize protein language models under extreme data scarcity for fitness prediction. By combining meta-transfer learning, learning to rank, and parameter-efficient fine-tuning, FSFP can significantly boost the performance of various protein language models using merely tens of labeled single-site mutants from the target protein. In silico benchmarks across 87 deep mutational scanning datasets demonstrate FSFP's superiority over both unsupervised and supervised baselines. Furthermore, we successfully apply FSFP to engineer the Phi29 DNA polymerase through wet-lab experiments, achieving a 25% increase in the positive rate. These results underscore the potential of our approach in aiding AI-guided protein engineering.


Subject(s)
Protein Engineering , Protein Engineering/methods , Deep Learning , Proteins/genetics , Proteins/metabolism , Mutation , DNA-Directed DNA Polymerase/metabolism , Computer Simulation , Models, Molecular , Algorithms
4.
Mol Genet Genomic Med ; 12(6): e2475, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38938072

ABSTRACT

BACKGROUND: Spastic paraplegia 11 (SPG11) is the most prevalent form of autosomal recessive hereditary spastic paraplegia, resulting from biallelic pathogenic variants in the SPG11 gene (MIM *610844). METHODS: The proband is a 36-year-old female referred for genetic evaluation due to cognitive dysfunction, gait impairment, and corpus callosum atrophy (brain MRI was normal at 25-years-old). Diagnostic approaches included CGH array, next-generation sequencing, and whole transcriptome sequencing. RESULTS: CGH array revealed a 180 kb deletion located upstream of SPG11. Sequencing of SPG11 uncovered two rare single nucleotide variants: the novel variant c.3143C>T in exon 17 (in cis with the deletion), and the previously reported pathogenic variant c.6409C>T in exon 34 (in trans). Whole transcriptome sequencing revealed that the variant c.3143C>T caused exon 17 skipping. CONCLUSION: We report a novel sequence variant in the SPG11 gene resulting in exon 17 skipping, which, along with a nonsense variant, causes Spastic Paraplegia 11 in our proband. In addition, a deletion upstream of SPG11 was identified in the patient, whose implication in the phenotype remains uncertain. Nonetheless, the deletion apparently affects cis-regulatory elements of the gene, suggesting a potential new pathogenic mechanism underlying the disease in a subset of undiagnosed patients. Our findings further support the hypothesis that the origin of thin corpus callosum in patients with SPG11 is of progressive nature.


Subject(s)
Spastic Paraplegia, Hereditary , Humans , Female , Adult , Spastic Paraplegia, Hereditary/genetics , Spastic Paraplegia, Hereditary/diagnosis , Spastic Paraplegia, Hereditary/pathology , Exons , Proteins/genetics , Codon, Nonsense , Corpus Callosum/pathology , Corpus Callosum/diagnostic imaging , Sequence Deletion , Phenotype
5.
Stat Appl Genet Mol Biol ; 23(1)2024 Jan 01.
Article in English | MEDLINE | ID: mdl-38943434

ABSTRACT

Understanding a protein's function based solely on its amino acid sequence is a crucial but intricate task in bioinformatics. Traditionally, this challenge has proven difficult. However, recent years have witnessed the rise of deep learning as a powerful tool, achieving significant success in protein function prediction. Their strength lies in their ability to automatically learn informative features from protein sequences, which can then be used to predict the protein's function. This study builds upon these advancements by proposing a novel model: CNN-CBAM+BiGRU. It incorporates a Convolutional Block Attention Module (CBAM) alongside BiGRUs. CBAM acts as a spotlight, guiding the CNN to focus on the most informative parts of the protein data, leading to more accurate feature extraction. BiGRUs, a type of Recurrent Neural Network (RNN), excel at capturing long-range dependencies within the protein sequence, which are essential for accurate function prediction. The proposed model integrates the strengths of both CNN-CBAM and BiGRU. This study's findings, validated through experimentation, showcase the effectiveness of this combined approach. For the human dataset, the suggested method outperforms the CNN-BIGRU+ATT model by +1.0 % for cellular components, +1.1 % for molecular functions, and +0.5 % for biological processes. For the yeast dataset, the suggested method outperforms the CNN-BIGRU+ATT model by +2.4 % for the cellular component, +1.2 % for molecular functions, and +0.6 % for biological processes.


Subject(s)
Computational Biology , Neural Networks, Computer , Proteins , Computational Biology/methods , Humans , Proteins/genetics , Proteins/metabolism , Deep Learning , Databases, Protein , Algorithms , Amino Acid Sequence
6.
Proc Natl Acad Sci U S A ; 121(26): e2312335121, 2024 Jun 25.
Article in English | MEDLINE | ID: mdl-38889151

ABSTRACT

Predicting the effects of one or more mutations to the in vivo or in vitro properties of a wild-type protein is a major computational challenge, due to the presence of epistasis, that is, of interactions between amino acids in the sequence. We introduce a computationally efficient procedure to build minimal epistatic models to predict mutational effects by combining evolutionary (homologous sequence) and few mutational-scan data. Mutagenesis measurements guide the selection of links in a sparse graphical model, while the parameters on the nodes and the edges are inferred from sequence data. We show, on 10 mutational scans, that our pipeline exhibits performances comparable to state-of-the-art deep networks trained on many more data, while requiring much less parameters and being hence more interpretable. In particular, the identified interactions adapt to the wild-type protein and to the fitness or biochemical property experimentally measured, mostly focus on key functional sites, and are not necessarily related to structural contacts. Therefore, our method is able to extract information relevant for one mutational experiment from homologous sequence data reflecting the multitude of structural and functional constraints acting on proteins throughout evolution.


Subject(s)
Mutation , Proteins , Proteins/genetics , Proteins/metabolism , Proteins/chemistry , Epistasis, Genetic , Evolution, Molecular , Computational Biology/methods
7.
BMC Genomics ; 25(1): 630, 2024 Jun 24.
Article in English | MEDLINE | ID: mdl-38914936

ABSTRACT

Deep Mutational Scanning (DMS) assays are powerful tools to study sequence-function relationships by measuring the effects of thousands of sequence variants on protein function. During a DMS experiment, several technical artefacts might distort non-linearly the functional score obtained, potentially biasing the interpretation of the results. We therefore tested several technical parameters in the deepPCA workflow, a DMS assay for protein-protein interactions, in order to identify technical sources of non-linearities. We found that parameters common to many DMS assays such as amount of transformed DNA, timepoint of harvest and library composition can cause non-linearities in the data. Designing experiments in a way to minimize these non-linear effects will improve the quantification and interpretation of mutation effects.


Subject(s)
Mutation , Workflow , Proteins/metabolism , Proteins/genetics , High-Throughput Nucleotide Sequencing , Protein Interaction Mapping/methods , DNA Mutational Analysis/methods , Protein Binding
8.
Open Biol ; 14(6): 230439, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38862022

ABSTRACT

Volatile low complexity regions (LCRs) are a novel source of adaptive variation, functional diversification and evolutionary novelty. An interplay of selection and mutation governs the composition and length of low complexity regions. High %GC and mutations provide length variability because of mechanisms like replication slippage. Owing to the complex dynamics between selection and mutation, we need a better understanding of their coexistence. Our findings underscore that positively selected sites (PSS) and low complexity regions prefer the terminal regions of genes, co-occurring in most Tetrapoda clades. We observed that positively selected sites within a gene have position-specific roles. Central-positively selected site genes primarily participate in defence responses, whereas terminal-positively selected site genes exhibit non-specific functions. Low complexity region-containing genes in the Tetrapoda clade exhibit a significantly higher %GC and lower ω (dN/dS: non-synonymous substitution rate/synonymous substitution rate) compared with genes without low complexity regions. This lower ω implies that despite providing rapid functional diversity, low complexity region-containing genes are subjected to intense purifying selection. Furthermore, we observe that low complexity regions consistently display ubiquitous prevalence at lower purity levels, but exhibit a preference for specific positions within a gene as the purity of the low complexity region stretch increases, implying a composition-dependent evolutionary role. Our findings collectively contribute to the understanding of how genetic diversity and adaptation are shaped by the interplay of selection and low complexity regions in the Tetrapoda clade.


Subject(s)
Evolution, Molecular , Selection, Genetic , Animals , Mutation , Phylogeny , Proteins/genetics , Proteins/chemistry , Base Composition
9.
Protein Sci ; 33(7): e4998, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38888487

ABSTRACT

Knotted proteins, although scarce, are crucial structural components of certain protein families, and their roles continue to be a topic of intense research. Capitalizing on the vast collection of protein structure predictions offered by AlphaFold (AF), this study computationally examines the entire UniProt database to create a robust dataset of knotted and unknotted proteins. Utilizing this dataset, we develop a machine learning (ML) model capable of accurately predicting the presence of knots in protein structures solely from their amino acid sequences. We tested the model's capabilities on 100 proteins whose structures had not yet been predicted by AF and found agreement with our local prediction in 92% cases. From the point of view of structural biology, we found that all potentially knotted proteins predicted by AF can be classified only into 17 families. This allows us to discover the presence of unknotted proteins in families with a highly conserved knot. We found only three new protein families: UCH, DUF4253, and DUF2254, that contain both knotted and unknotted proteins, and demonstrate that deletions within the knot core could potentially account for the observed unknotted (trivial) topology. Finally, we have shown that in the majority of knotted families (11 out of 15), the knotted topology is strictly conserved in functional proteins with very low sequence similarity. We have conclusively demonstrated that proteins AF predicts as unknotted are structurally accurate in their unknotted configurations. However, these proteins often represent nonfunctional fragments, lacking significant portions of the knot core (amino acid sequence).


Subject(s)
Databases, Protein , Machine Learning , Models, Molecular , Proteins , Proteins/chemistry , Proteins/genetics , Protein Conformation , Amino Acid Sequence
10.
Oncol Res ; 32(6): 1119-1128, 2024.
Article in English | MEDLINE | ID: mdl-38827327

ABSTRACT

It has been shown that the high expression of human epididymis protein 4 (HE4) in most lung cancers is related to the poor prognosis of patients, but the mechanism of pathological transformation of HE4 in lung cancer is still unclear. The current study is expected to clarify the function and mechanism of HE4 in the occurrence and metastasis of lung adenocarcinoma (LUAD). Immunoblotting evaluated HE4 expression in lung cancer cell lines and biopsies, and through analysis of The Cancer Genome Atlas (TCGA) dataset. Frequent HE4 overexpression was demonstrated in LUAD, but not in lung squamous cell carcinoma (LUSC), indicating that HE4 can serve as a biomarker to distinguish between LUAD and LUSC. HE4 knockdown significantly inhibited cell growth, colony formation, wound healing, and invasion, and blocked the G1-phase of the cell cycle in LUAD cell lines through inactivation of the EGFR signaling downstream including PI3K/AKT/mTOR and RAF/MAPK pathways. The first-line EGFR inhibitor gefitinib and HE4 shRNA had no synergistic inhibitory effect on the growth of lung adenocarcinoma cells, while the third-line EGFR inhibitor osimertinib showed additive anti-proliferative effects. Moreover, we provided evidence that HE4 regulated EGFR expression by transcription regulation and protein interaction in LUAD. Our findings suggest that HE4 positively modulates the EGFR signaling pathway to promote growth and invasiveness in LUAD and highlight that targeting HE4 could be a novel strategy for LUAD treatment.


Subject(s)
Adenocarcinoma of Lung , Cell Proliferation , ErbB Receptors , Lung Neoplasms , Neoplasm Invasiveness , Signal Transduction , WAP Four-Disulfide Core Domain Protein 2 , Humans , ErbB Receptors/metabolism , ErbB Receptors/genetics , Adenocarcinoma of Lung/pathology , Adenocarcinoma of Lung/genetics , Adenocarcinoma of Lung/metabolism , WAP Four-Disulfide Core Domain Protein 2/metabolism , Lung Neoplasms/pathology , Lung Neoplasms/genetics , Lung Neoplasms/metabolism , Cell Line, Tumor , Gene Knockdown Techniques , Animals , Mice , Gene Expression Regulation, Neoplastic , Cell Movement/genetics , Proteins/metabolism , Proteins/genetics
11.
Commun Biol ; 7(1): 679, 2024 Jun 03.
Article in English | MEDLINE | ID: mdl-38830995

ABSTRACT

Proteins and nucleic-acids are essential components of living organisms that interact in critical cellular processes. Accurate prediction of nucleic acid-binding residues in proteins can contribute to a better understanding of protein function. However, the discrepancy between protein sequence information and obtained structural and functional data renders most current computational models ineffective. Therefore, it is vital to design computational models based on protein sequence information to identify nucleic acid binding sites in proteins. Here, we implement an ensemble deep learning model-based nucleic-acid-binding residues on proteins identification method, called SOFB, which characterizes protein sequences by learning the semantics of biological dynamics contexts, and then develop an ensemble deep learning-based sequence network to learn feature representation and classification by explicitly modeling dynamic semantic information. Among them, the language learning model, which is constructed from natural language to biological language, captures the underlying relationships of protein sequences, and the ensemble deep learning-based sequence network consisting of different convolutional layers together with Bi-LSTM refines various features for optimal performance. Meanwhile, to address the imbalanced issue, we adopt ensemble learning to train multiple models and then incorporate them. Our experimental results on several DNA/RNA nucleic-acid-binding residue datasets demonstrate that our proposed model outperforms other state-of-the-art methods. In addition, we conduct an interpretability analysis of the identified nucleic acid binding residue sequences based on the attention weights of the language learning model, revealing novel insights into the dynamic semantic information that supports the identified nucleic acid binding residues. SOFB is available at https://github.com/Encryptional/SOFB and https://figshare.com/articles/online_resource/SOFB_figshare_rar/25499452 .


Subject(s)
Deep Learning , Binding Sites , Nucleic Acids/metabolism , Nucleic Acids/chemistry , Proteins/chemistry , Proteins/metabolism , Proteins/genetics , Protein Binding , Computational Biology/methods
12.
Article in English | MEDLINE | ID: mdl-38894604

ABSTRACT

The release of AlphaFold2 has sparked a rapid expansion in protein model databases. Efficient protein structure retrieval is crucial for the analysis of structure models, while measuring the similarity between structures is the key challenge in structural retrieval. Although existing structure alignment algorithms can address this challenge, they are often time-consuming. Currently, the state-of-the-art approach involves converting protein structures into three-dimensional (3D) Zernike descriptors and assessing similarity using Euclidean distance. However, the methods for computing 3D Zernike descriptors mainly rely on structural surfaces and are predominantly web-based, thus limiting their application in studying custom datasets. To overcome this limitation, we developed FP-Zernike, a user-friendly toolkit for computing different types of Zernike descriptors based on feature points. Users simply need to enter a single line of command to calculate the Zernike descriptors of all structures in customized datasets. FP-Zernike outperforms the leading method in terms of retrieval accuracy and binary classification accuracy across diverse benchmark datasets. In addition, we showed the application of FP-Zernike in the construction of the descriptor database and the protocol used for the Protein Data Bank (PDB) dataset to facilitate the local deployment of this tool for interested readers. Our demonstration contained 590,685 structures, and at this scale, our system required only 4-9 s to complete a retrieval. The experiments confirmed that it achieved the state-of-the-art accuracy level. FP-Zernike is an open-source toolkit, with the source code and related data accessible at https://ngdc.cncb.ac.cn/biocode/tools/BT007365/releases/0.1, as well as through a webserver at http://www.structbioinfo.cn/.


Subject(s)
Databases, Protein , Software , Algorithms , Protein Conformation , Proteins/chemistry , Proteins/genetics , Computational Biology/methods
13.
Protein Sci ; 33(7): e5086, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38923241

ABSTRACT

Variation in mutation rates at sites in proteins can largely be understood by the constraint that proteins must fold into stable structures. Models that calculate site-specific rates based on protein structure and a thermodynamic stability model have shown a significant but modest ability to predict empirical site-specific rates calculated from sequence. Models that use detailed atomistic models of protein energetics do not outperform simpler approaches using packing density. We demonstrate that a fundamental reason for this is that empirical site-specific rates are the result of the average effect of many different microenvironments in a phylogeny. By analyzing the results of evolutionary dynamics simulations, we show how averaging site-specific rates across many extant protein structures can lead to correct recovery of site-rate prediction. This result is also demonstrated in natural protein sequences and experimental structures. Using predicted structures, we demonstrate that atomistic models can improve upon contact density metrics in predicting site-specific rates from a structure. The results give fundamental insights into the factors governing the distribution of site-specific rates in protein families.


Subject(s)
Proteins , Proteins/chemistry , Proteins/genetics , Protein Conformation , Thermodynamics , Evolution, Molecular , Mutation , Models, Molecular , Molecular Dynamics Simulation
14.
PLoS Comput Biol ; 20(6): e1012123, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38935611

ABSTRACT

AlphaFold2 is an Artificial Intelligence-based program developed to predict the 3D structure of proteins given only their amino acid sequence at atomic resolution. Due to the accuracy and efficiency at which AlphaFold2 can generate 3D structure predictions and its widespread adoption into various aspects of biochemical research, the technique of protein structure prediction should be considered for incorporation into the undergraduate biochemistry curriculum. A module for introducing AlphaFold2 into a senior-level biochemistry laboratory classroom was developed. The module's focus was to have students predict the structures of proteins from the MPOX 22 global outbreak virus isolate genome, which had no structures elucidated at that time. The goal of this study was to both determine the impact the module had on students and to develop a framework for introducing AlphaFold2 into the undergraduate curriculum so that instructors for biochemistry courses, regardless of their background in bioinformatics, could adapt the module into their classrooms.


Subject(s)
Artificial Intelligence , Biochemistry , Curriculum , Humans , Biochemistry/education , Computational Biology/education , Computational Biology/methods , Protein Conformation , Students , Software , Universities , Proteins/chemistry , Proteins/metabolism , Proteins/genetics , Amino Acid Sequence
15.
Sci Signal ; 17(842): eadp5354, 2024 Jun 25.
Article in English | MEDLINE | ID: mdl-38917220

ABSTRACT

WWC1 is a scaffolding protein in the evolutionarily conserved Hippo signaling network and is genetically linked to human memory and synaptic plasticity. In the archives of Science Signaling, Stepan et al. demonstrate the translational potential of modulating WWC1 through pharmacological inhibition of Hippo-pathway kinases to enhance cognition.


Subject(s)
Memory , Neuronal Plasticity , Signal Transduction , Humans , Neuronal Plasticity/physiology , Memory/physiology , Intracellular Signaling Peptides and Proteins/metabolism , Intracellular Signaling Peptides and Proteins/genetics , Animals , Protein Serine-Threonine Kinases/metabolism , Protein Serine-Threonine Kinases/genetics , Proteins/metabolism , Proteins/genetics
16.
Genome Biol Evol ; 16(5)2024 05 02.
Article in English | MEDLINE | ID: mdl-38735759

ABSTRACT

A fundamental goal in evolutionary biology and population genetics is to understand how selection shapes the fate of new mutations. Here, we test the null hypothesis that insertion-deletion (indel) events in protein-coding regions occur randomly with respect to secondary structures. We identified indels across 11,444 sequence alignments in mouse, rat, human, chimp, and dog genomes and then quantified their overlap with four different types of secondary structure-alpha helices, beta strands, protein bends, and protein turns-predicted by deep-learning methods of AlphaFold2. Indels overlapped secondary structures 54% as much as expected and were especially underrepresented over beta strands, which tend to form internal, stable regions of proteins. In contrast, indels were enriched by 155% over regions without any predicted secondary structures. These skews were stronger in the rodent lineages compared to the primate lineages, consistent with population genetic theory predicting that natural selection will be more efficient in species with larger effective population sizes. Nonsynonymous substitutions were also less common in regions of protein secondary structure, although not as strongly reduced as in indels. In a complementary analysis of thousands of human genomes, we showed that indels overlapping secondary structure segregated at significantly lower frequency than indels outside of secondary structure. Taken together, our study shows that indels are selected against if they overlap secondary structure, presumably because they disrupt the tertiary structure and function of a protein.


Subject(s)
INDEL Mutation , Protein Structure, Secondary , Humans , Animals , Mice , Rats , Evolution, Molecular , Proteins/genetics , Proteins/chemistry , Dogs , Selection, Genetic , Genome
17.
Nucleic Acids Res ; 52(W1): W182-W186, 2024 Jul 05.
Article in English | MEDLINE | ID: mdl-38747341

ABSTRACT

AlphaFind is a web-based search engine that provides fast structure-based retrieval in the entire set of AlphaFold DB structures. Unlike other protein processing tools, AlphaFind is focused entirely on tertiary structure, automatically extracting the main 3D features of each protein chain and using a machine learning model to find the most similar structures. This indexing approach and the 3D feature extraction method used by AlphaFind have both demonstrated remarkable scalability to large datasets as well as to large protein structures. The web application itself has been designed with a focus on clarity and ease of use. The searcher accepts any valid UniProt ID, Protein Data Bank ID or gene symbol as input, and returns a set of similar protein chains from AlphaFold DB, including various similarity metrics between the query and each of the retrieved results. In addition to the main search functionality, the application provides 3D visualizations of protein structure superpositions in order to allow researchers to instantly analyze the structural similarity of the retrieved results. The AlphaFind web application is available online for free and without any registration at https://alphafind.fi.muni.cz.


Subject(s)
Databases, Protein , Proteome , Software , Proteome/chemistry , Proteome/genetics , Internet , Search Engine , Machine Learning , Protein Conformation , Proteins/chemistry , Proteins/genetics , Proteins/metabolism , Protein Folding , Models, Molecular , Structural Homology, Protein
18.
Nucleic Acids Res ; 52(W1): W287-W293, 2024 Jul 05.
Article in English | MEDLINE | ID: mdl-38747351

ABSTRACT

The PSIRED Workbench is a long established and popular bioinformatics web service offering a wide range of machine learning based analyses for characterizing protein structure and function. In this paper we provide an update of the recent additions and developments to the webserver, with a focus on new Deep Learning based methods. We briefly discuss some trends in server usage since the publication of AlphaFold2 and we give an overview of some upcoming developments for the service. The PSIPRED Workbench is available at http://bioinf.cs.ucl.ac.uk/psipred.


Subject(s)
Deep Learning , Proteins , Software , Proteins/chemistry , Proteins/genetics , Internet , Protein Conformation , Computational Biology/methods , Sequence Analysis, Protein/methods
19.
Nucleic Acids Res ; 52(W1): W140-W147, 2024 Jul 05.
Article in English | MEDLINE | ID: mdl-38769064

ABSTRACT

Genomic variation can impact normal biological function in complex ways and so understanding variant effects requires a broad range of data to be coherently assimilated. Whilst the volume of human variant data and relevant annotations has increased, the corresponding increase in the breadth of participating fields, standards and versioning mean that moving between genomic, coding, protein and structure positions is increasingly complex. In turn this makes investigating variants in diverse formats and assimilating annotations from different resources challenging. ProtVar addresses these issues to facilitate the contextualization and interpretation of human missense variation with unparalleled flexibility and ease of accessibility for use by the broadest range of researchers. By precalculating all possible variants in the human proteome it offers near instantaneous mapping between all relevant data types. It also combines data and analyses from a plethora of resources to bring together genomic, protein sequence and function annotations as well as structural insights and predictions to better understand the likely effect of missense variation in humans. It is offered as an intuitive web server https://www.ebi.ac.uk/protvar where data can be explored and downloaded, and can be accessed programmatically via an API.


Subject(s)
Mutation, Missense , Software , Humans , Databases, Protein , Molecular Sequence Annotation , Proteome/genetics , Proteins/genetics , Proteins/chemistry , Internet , Genomics/methods
20.
Nucleic Acids Res ; 52(W1): W207-W214, 2024 Jul 05.
Article in English | MEDLINE | ID: mdl-38783112

ABSTRACT

Protein-protein interactions (PPIs) play a vital role in cellular functions and are essential for therapeutic development and understanding diseases. However, current predictive tools often struggle to balance efficiency and precision in predicting the effects of mutations on these complex interactions. To address this, we present DDMut-PPI, a deep learning model that efficiently and accurately predicts changes in PPI binding free energy upon single and multiple point mutations. Building on the robust Siamese network architecture with graph-based signatures from our prior work, DDMut, the DDMut-PPI model was enhanced with a graph convolutional network operated on the protein interaction interface. We used residue-specific embeddings from ProtT5 protein language model as node features, and a variety of molecular interactions as edge features. By integrating evolutionary context with spatial information, this framework enables DDMut-PPI to achieve a robust Pearson correlation of up to 0.75 (root mean squared error: 1.33 kcal/mol) in our evaluations, outperforming most existing methods. Importantly, the model demonstrated consistent performance across mutations that increase or decrease binding affinity. DDMut-PPI offers a significant advancement in the field and will serve as a valuable tool for researchers probing the complexities of protein interactions. DDMut-PPI is freely available as a web server and an application programming interface at https://biosig.lab.uq.edu.au/ddmut_ppi.


Subject(s)
Deep Learning , Protein Interaction Mapping , Protein Interaction Mapping/methods , Protein Binding , Mutation , Software , Protein Interaction Maps/genetics , Humans , Proteins/genetics , Proteins/metabolism , Proteins/chemistry , Point Mutation
SELECTION OF CITATIONS
SEARCH DETAIL
...