Your browser doesn't support javascript.
loading
Montrer: 20 | 50 | 100
Résultats 1 - 20 de 41.513
Filtrer
1.
Proc Natl Acad Sci U S A ; 121(28): e2400151121, 2024 Jul 09.
Article de Anglais | MEDLINE | ID: mdl-38954548

RÉSUMÉ

Protein folding and evolution are intimately linked phenomena. Here, we revisit the concept of exons as potential protein folding modules across a set of 38 abundant and conserved protein families. Taking advantage of genomic exon-intron organization and extensive protein sequence data, we explore exon boundary conservation and assess the foldon-like behavior of exons using energy landscape theoretic measurements. We found deviations in the exon size distribution from exponential decay indicating selection in evolution. We show that when taken together there is a pronounced tendency to independent foldability for segments corresponding to the more conserved exons, supporting the idea of exon-foldon correspondence. While 45% of the families follow this general trend when analyzed individually, there are some families for which other stronger functional determinants, such as preserving frustrated active sites, may be acting. We further develop a systematic partitioning of protein domains using exon boundary hotspots, showing that minimal common exons correspond with uninterrupted alpha and/or beta elements for the majority of the families but not for all of them.


Sujet(s)
Exons , Pliage des protéines , Exons/génétique , Humains , Protéines/génétique , Protéines/composition chimique , Évolution moléculaire , Introns/génétique
2.
Zhonghua Yi Xue Yi Chuan Xue Za Zhi ; 41(7): 872-880, 2024 Jul 10.
Article de Chinois | MEDLINE | ID: mdl-38946376

RÉSUMÉ

With the advance of research, non-coding RNA has been found to surpass the traditional definition to directly code functional proteins by coding sequence elements and binding with ribosomes. Among the non-coding RNAs, the function of circRNA encoded proteins has been most extensively studied. This study has used "circRNA", "encoded", and "translation" as the key words to search the PubMed and Web of Science databases. The retrieved literature was screened and traced, with the translation mechanism, related research methods, and correlation with diseases of circRNA reviewed. CircRNA can translate proteins through a non-cap-dependent pathway. Multiple molecular techniques, in particular mass spectrometry analysis, have important value in identifying unique peptide segments of circRNA encoded proteins for confirming their existence. The proteins encoded by the circRNA are involved in the pathogenesis of diseases of the digestive, neurological, urinary systems and the breast, and have the potential to serve as novel targets for disease diagnosis and treatment. This article has provided a comprehensive review for the basic theory, experimental methods, and disease-related research in the field of circRNA translation, which may provide clues for the identification of new diagnostic and therapeutic targets.


Sujet(s)
ARN circulaire , ARN circulaire/génétique , Humains , ARN/génétique , Protéines/génétique , Animaux , Biosynthèse des protéines , Maladie/génétique
3.
Nat Commun ; 15(1): 5566, 2024 Jul 02.
Article de Anglais | MEDLINE | ID: mdl-38956442

RÉSUMÉ

Accurately modeling the protein fitness landscapes holds great importance for protein engineering. Pre-trained protein language models have achieved state-of-the-art performance in predicting protein fitness without wet-lab experimental data, but their accuracy and interpretability remain limited. On the other hand, traditional supervised deep learning models require abundant labeled training examples for performance improvements, posing a practical barrier. In this work, we introduce FSFP, a training strategy that can effectively optimize protein language models under extreme data scarcity for fitness prediction. By combining meta-transfer learning, learning to rank, and parameter-efficient fine-tuning, FSFP can significantly boost the performance of various protein language models using merely tens of labeled single-site mutants from the target protein. In silico benchmarks across 87 deep mutational scanning datasets demonstrate FSFP's superiority over both unsupervised and supervised baselines. Furthermore, we successfully apply FSFP to engineer the Phi29 DNA polymerase through wet-lab experiments, achieving a 25% increase in the positive rate. These results underscore the potential of our approach in aiding AI-guided protein engineering.


Sujet(s)
Ingénierie des protéines , Ingénierie des protéines/méthodes , Apprentissage profond , Protéines/génétique , Protéines/métabolisme , Mutation , DNA-directed DNA polymerase/métabolisme , Simulation numérique , Modèles moléculaires , Algorithmes
4.
Mol Genet Genomic Med ; 12(6): e2475, 2024 Jun.
Article de Anglais | MEDLINE | ID: mdl-38938072

RÉSUMÉ

BACKGROUND: Spastic paraplegia 11 (SPG11) is the most prevalent form of autosomal recessive hereditary spastic paraplegia, resulting from biallelic pathogenic variants in the SPG11 gene (MIM *610844). METHODS: The proband is a 36-year-old female referred for genetic evaluation due to cognitive dysfunction, gait impairment, and corpus callosum atrophy (brain MRI was normal at 25-years-old). Diagnostic approaches included CGH array, next-generation sequencing, and whole transcriptome sequencing. RESULTS: CGH array revealed a 180 kb deletion located upstream of SPG11. Sequencing of SPG11 uncovered two rare single nucleotide variants: the novel variant c.3143C>T in exon 17 (in cis with the deletion), and the previously reported pathogenic variant c.6409C>T in exon 34 (in trans). Whole transcriptome sequencing revealed that the variant c.3143C>T caused exon 17 skipping. CONCLUSION: We report a novel sequence variant in the SPG11 gene resulting in exon 17 skipping, which, along with a nonsense variant, causes Spastic Paraplegia 11 in our proband. In addition, a deletion upstream of SPG11 was identified in the patient, whose implication in the phenotype remains uncertain. Nonetheless, the deletion apparently affects cis-regulatory elements of the gene, suggesting a potential new pathogenic mechanism underlying the disease in a subset of undiagnosed patients. Our findings further support the hypothesis that the origin of thin corpus callosum in patients with SPG11 is of progressive nature.


Sujet(s)
Paraplégie spasmodique héréditaire , Humains , Femelle , Adulte , Paraplégie spasmodique héréditaire/génétique , Paraplégie spasmodique héréditaire/diagnostic , Paraplégie spasmodique héréditaire/anatomopathologie , Exons , Protéines/génétique , Codon non-sens , Corps calleux/anatomopathologie , Corps calleux/imagerie diagnostique , Délétion de séquence , Phénotype
5.
Stat Appl Genet Mol Biol ; 23(1)2024 Jan 01.
Article de Anglais | MEDLINE | ID: mdl-38943434

RÉSUMÉ

Understanding a protein's function based solely on its amino acid sequence is a crucial but intricate task in bioinformatics. Traditionally, this challenge has proven difficult. However, recent years have witnessed the rise of deep learning as a powerful tool, achieving significant success in protein function prediction. Their strength lies in their ability to automatically learn informative features from protein sequences, which can then be used to predict the protein's function. This study builds upon these advancements by proposing a novel model: CNN-CBAM+BiGRU. It incorporates a Convolutional Block Attention Module (CBAM) alongside BiGRUs. CBAM acts as a spotlight, guiding the CNN to focus on the most informative parts of the protein data, leading to more accurate feature extraction. BiGRUs, a type of Recurrent Neural Network (RNN), excel at capturing long-range dependencies within the protein sequence, which are essential for accurate function prediction. The proposed model integrates the strengths of both CNN-CBAM and BiGRU. This study's findings, validated through experimentation, showcase the effectiveness of this combined approach. For the human dataset, the suggested method outperforms the CNN-BIGRU+ATT model by +1.0 % for cellular components, +1.1 % for molecular functions, and +0.5 % for biological processes. For the yeast dataset, the suggested method outperforms the CNN-BIGRU+ATT model by +2.4 % for the cellular component, +1.2 % for molecular functions, and +0.6 % for biological processes.


Sujet(s)
Biologie informatique , , Protéines , Biologie informatique/méthodes , Humains , Protéines/génétique , Protéines/métabolisme , Apprentissage profond , Bases de données de protéines , Algorithmes , Séquence d'acides aminés
6.
Proc Natl Acad Sci U S A ; 121(26): e2312335121, 2024 Jun 25.
Article de Anglais | MEDLINE | ID: mdl-38889151

RÉSUMÉ

Predicting the effects of one or more mutations to the in vivo or in vitro properties of a wild-type protein is a major computational challenge, due to the presence of epistasis, that is, of interactions between amino acids in the sequence. We introduce a computationally efficient procedure to build minimal epistatic models to predict mutational effects by combining evolutionary (homologous sequence) and few mutational-scan data. Mutagenesis measurements guide the selection of links in a sparse graphical model, while the parameters on the nodes and the edges are inferred from sequence data. We show, on 10 mutational scans, that our pipeline exhibits performances comparable to state-of-the-art deep networks trained on many more data, while requiring much less parameters and being hence more interpretable. In particular, the identified interactions adapt to the wild-type protein and to the fitness or biochemical property experimentally measured, mostly focus on key functional sites, and are not necessarily related to structural contacts. Therefore, our method is able to extract information relevant for one mutational experiment from homologous sequence data reflecting the multitude of structural and functional constraints acting on proteins throughout evolution.


Sujet(s)
Mutation , Protéines , Protéines/génétique , Protéines/métabolisme , Protéines/composition chimique , Épistasie , Évolution moléculaire , Biologie informatique/méthodes
7.
BMC Genomics ; 25(1): 630, 2024 Jun 24.
Article de Anglais | MEDLINE | ID: mdl-38914936

RÉSUMÉ

Deep Mutational Scanning (DMS) assays are powerful tools to study sequence-function relationships by measuring the effects of thousands of sequence variants on protein function. During a DMS experiment, several technical artefacts might distort non-linearly the functional score obtained, potentially biasing the interpretation of the results. We therefore tested several technical parameters in the deepPCA workflow, a DMS assay for protein-protein interactions, in order to identify technical sources of non-linearities. We found that parameters common to many DMS assays such as amount of transformed DNA, timepoint of harvest and library composition can cause non-linearities in the data. Designing experiments in a way to minimize these non-linear effects will improve the quantification and interpretation of mutation effects.


Sujet(s)
Mutation , Flux de travaux , Protéines/métabolisme , Protéines/génétique , Séquençage nucléotidique à haut débit , Cartographie d'interactions entre protéines/méthodes , Analyse de mutations d'ADN/méthodes , Liaison aux protéines
8.
Open Biol ; 14(6): 230439, 2024 Jun.
Article de Anglais | MEDLINE | ID: mdl-38862022

RÉSUMÉ

Volatile low complexity regions (LCRs) are a novel source of adaptive variation, functional diversification and evolutionary novelty. An interplay of selection and mutation governs the composition and length of low complexity regions. High %GC and mutations provide length variability because of mechanisms like replication slippage. Owing to the complex dynamics between selection and mutation, we need a better understanding of their coexistence. Our findings underscore that positively selected sites (PSS) and low complexity regions prefer the terminal regions of genes, co-occurring in most Tetrapoda clades. We observed that positively selected sites within a gene have position-specific roles. Central-positively selected site genes primarily participate in defence responses, whereas terminal-positively selected site genes exhibit non-specific functions. Low complexity region-containing genes in the Tetrapoda clade exhibit a significantly higher %GC and lower ω (dN/dS: non-synonymous substitution rate/synonymous substitution rate) compared with genes without low complexity regions. This lower ω implies that despite providing rapid functional diversity, low complexity region-containing genes are subjected to intense purifying selection. Furthermore, we observe that low complexity regions consistently display ubiquitous prevalence at lower purity levels, but exhibit a preference for specific positions within a gene as the purity of the low complexity region stretch increases, implying a composition-dependent evolutionary role. Our findings collectively contribute to the understanding of how genetic diversity and adaptation are shaped by the interplay of selection and low complexity regions in the Tetrapoda clade.


Sujet(s)
Évolution moléculaire , Sélection génétique , Animaux , Mutation , Phylogenèse , Protéines/génétique , Protéines/composition chimique , Composition en bases nucléiques
9.
Protein Sci ; 33(7): e4998, 2024 Jul.
Article de Anglais | MEDLINE | ID: mdl-38888487

RÉSUMÉ

Knotted proteins, although scarce, are crucial structural components of certain protein families, and their roles continue to be a topic of intense research. Capitalizing on the vast collection of protein structure predictions offered by AlphaFold (AF), this study computationally examines the entire UniProt database to create a robust dataset of knotted and unknotted proteins. Utilizing this dataset, we develop a machine learning (ML) model capable of accurately predicting the presence of knots in protein structures solely from their amino acid sequences. We tested the model's capabilities on 100 proteins whose structures had not yet been predicted by AF and found agreement with our local prediction in 92% cases. From the point of view of structural biology, we found that all potentially knotted proteins predicted by AF can be classified only into 17 families. This allows us to discover the presence of unknotted proteins in families with a highly conserved knot. We found only three new protein families: UCH, DUF4253, and DUF2254, that contain both knotted and unknotted proteins, and demonstrate that deletions within the knot core could potentially account for the observed unknotted (trivial) topology. Finally, we have shown that in the majority of knotted families (11 out of 15), the knotted topology is strictly conserved in functional proteins with very low sequence similarity. We have conclusively demonstrated that proteins AF predicts as unknotted are structurally accurate in their unknotted configurations. However, these proteins often represent nonfunctional fragments, lacking significant portions of the knot core (amino acid sequence).


Sujet(s)
Bases de données de protéines , Apprentissage machine , Modèles moléculaires , Protéines , Protéines/composition chimique , Protéines/génétique , Conformation des protéines , Séquence d'acides aminés
10.
Oncol Res ; 32(6): 1119-1128, 2024.
Article de Anglais | MEDLINE | ID: mdl-38827327

RÉSUMÉ

It has been shown that the high expression of human epididymis protein 4 (HE4) in most lung cancers is related to the poor prognosis of patients, but the mechanism of pathological transformation of HE4 in lung cancer is still unclear. The current study is expected to clarify the function and mechanism of HE4 in the occurrence and metastasis of lung adenocarcinoma (LUAD). Immunoblotting evaluated HE4 expression in lung cancer cell lines and biopsies, and through analysis of The Cancer Genome Atlas (TCGA) dataset. Frequent HE4 overexpression was demonstrated in LUAD, but not in lung squamous cell carcinoma (LUSC), indicating that HE4 can serve as a biomarker to distinguish between LUAD and LUSC. HE4 knockdown significantly inhibited cell growth, colony formation, wound healing, and invasion, and blocked the G1-phase of the cell cycle in LUAD cell lines through inactivation of the EGFR signaling downstream including PI3K/AKT/mTOR and RAF/MAPK pathways. The first-line EGFR inhibitor gefitinib and HE4 shRNA had no synergistic inhibitory effect on the growth of lung adenocarcinoma cells, while the third-line EGFR inhibitor osimertinib showed additive anti-proliferative effects. Moreover, we provided evidence that HE4 regulated EGFR expression by transcription regulation and protein interaction in LUAD. Our findings suggest that HE4 positively modulates the EGFR signaling pathway to promote growth and invasiveness in LUAD and highlight that targeting HE4 could be a novel strategy for LUAD treatment.


Sujet(s)
Adénocarcinome pulmonaire , Prolifération cellulaire , Récepteurs ErbB , Tumeurs du poumon , Invasion tumorale , Transduction du signal , Protéine-2 à domaine WAP à 4 ponts disulfure , Humains , Récepteurs ErbB/métabolisme , Récepteurs ErbB/génétique , Adénocarcinome pulmonaire/anatomopathologie , Adénocarcinome pulmonaire/génétique , Adénocarcinome pulmonaire/métabolisme , Protéine-2 à domaine WAP à 4 ponts disulfure/métabolisme , Tumeurs du poumon/anatomopathologie , Tumeurs du poumon/génétique , Tumeurs du poumon/métabolisme , Lignée cellulaire tumorale , Techniques de knock-down de gènes , Animaux , Souris , Régulation de l'expression des gènes tumoraux , Mouvement cellulaire/génétique , Protéines/métabolisme , Protéines/génétique
11.
Commun Biol ; 7(1): 679, 2024 Jun 03.
Article de Anglais | MEDLINE | ID: mdl-38830995

RÉSUMÉ

Proteins and nucleic-acids are essential components of living organisms that interact in critical cellular processes. Accurate prediction of nucleic acid-binding residues in proteins can contribute to a better understanding of protein function. However, the discrepancy between protein sequence information and obtained structural and functional data renders most current computational models ineffective. Therefore, it is vital to design computational models based on protein sequence information to identify nucleic acid binding sites in proteins. Here, we implement an ensemble deep learning model-based nucleic-acid-binding residues on proteins identification method, called SOFB, which characterizes protein sequences by learning the semantics of biological dynamics contexts, and then develop an ensemble deep learning-based sequence network to learn feature representation and classification by explicitly modeling dynamic semantic information. Among them, the language learning model, which is constructed from natural language to biological language, captures the underlying relationships of protein sequences, and the ensemble deep learning-based sequence network consisting of different convolutional layers together with Bi-LSTM refines various features for optimal performance. Meanwhile, to address the imbalanced issue, we adopt ensemble learning to train multiple models and then incorporate them. Our experimental results on several DNA/RNA nucleic-acid-binding residue datasets demonstrate that our proposed model outperforms other state-of-the-art methods. In addition, we conduct an interpretability analysis of the identified nucleic acid binding residue sequences based on the attention weights of the language learning model, revealing novel insights into the dynamic semantic information that supports the identified nucleic acid binding residues. SOFB is available at https://github.com/Encryptional/SOFB and https://figshare.com/articles/online_resource/SOFB_figshare_rar/25499452 .


Sujet(s)
Apprentissage profond , Sites de fixation , Acides nucléiques/métabolisme , Acides nucléiques/composition chimique , Protéines/composition chimique , Protéines/métabolisme , Protéines/génétique , Liaison aux protéines , Biologie informatique/méthodes
12.
Article de Anglais | MEDLINE | ID: mdl-38894604

RÉSUMÉ

The release of AlphaFold2 has sparked a rapid expansion in protein model databases. Efficient protein structure retrieval is crucial for the analysis of structure models, while measuring the similarity between structures is the key challenge in structural retrieval. Although existing structure alignment algorithms can address this challenge, they are often time-consuming. Currently, the state-of-the-art approach involves converting protein structures into three-dimensional (3D) Zernike descriptors and assessing similarity using Euclidean distance. However, the methods for computing 3D Zernike descriptors mainly rely on structural surfaces and are predominantly web-based, thus limiting their application in studying custom datasets. To overcome this limitation, we developed FP-Zernike, a user-friendly toolkit for computing different types of Zernike descriptors based on feature points. Users simply need to enter a single line of command to calculate the Zernike descriptors of all structures in customized datasets. FP-Zernike outperforms the leading method in terms of retrieval accuracy and binary classification accuracy across diverse benchmark datasets. In addition, we showed the application of FP-Zernike in the construction of the descriptor database and the protocol used for the Protein Data Bank (PDB) dataset to facilitate the local deployment of this tool for interested readers. Our demonstration contained 590,685 structures, and at this scale, our system required only 4-9 s to complete a retrieval. The experiments confirmed that it achieved the state-of-the-art accuracy level. FP-Zernike is an open-source toolkit, with the source code and related data accessible at https://ngdc.cncb.ac.cn/biocode/tools/BT007365/releases/0.1, as well as through a webserver at http://www.structbioinfo.cn/.


Sujet(s)
Bases de données de protéines , Logiciel , Algorithmes , Conformation des protéines , Protéines/composition chimique , Protéines/génétique , Biologie informatique/méthodes
13.
Protein Sci ; 33(7): e5086, 2024 Jul.
Article de Anglais | MEDLINE | ID: mdl-38923241

RÉSUMÉ

Variation in mutation rates at sites in proteins can largely be understood by the constraint that proteins must fold into stable structures. Models that calculate site-specific rates based on protein structure and a thermodynamic stability model have shown a significant but modest ability to predict empirical site-specific rates calculated from sequence. Models that use detailed atomistic models of protein energetics do not outperform simpler approaches using packing density. We demonstrate that a fundamental reason for this is that empirical site-specific rates are the result of the average effect of many different microenvironments in a phylogeny. By analyzing the results of evolutionary dynamics simulations, we show how averaging site-specific rates across many extant protein structures can lead to correct recovery of site-rate prediction. This result is also demonstrated in natural protein sequences and experimental structures. Using predicted structures, we demonstrate that atomistic models can improve upon contact density metrics in predicting site-specific rates from a structure. The results give fundamental insights into the factors governing the distribution of site-specific rates in protein families.


Sujet(s)
Protéines , Protéines/composition chimique , Protéines/génétique , Conformation des protéines , Thermodynamique , Évolution moléculaire , Mutation , Modèles moléculaires , Simulation de dynamique moléculaire
14.
PLoS Comput Biol ; 20(6): e1012123, 2024 Jun.
Article de Anglais | MEDLINE | ID: mdl-38935611

RÉSUMÉ

AlphaFold2 is an Artificial Intelligence-based program developed to predict the 3D structure of proteins given only their amino acid sequence at atomic resolution. Due to the accuracy and efficiency at which AlphaFold2 can generate 3D structure predictions and its widespread adoption into various aspects of biochemical research, the technique of protein structure prediction should be considered for incorporation into the undergraduate biochemistry curriculum. A module for introducing AlphaFold2 into a senior-level biochemistry laboratory classroom was developed. The module's focus was to have students predict the structures of proteins from the MPOX 22 global outbreak virus isolate genome, which had no structures elucidated at that time. The goal of this study was to both determine the impact the module had on students and to develop a framework for introducing AlphaFold2 into the undergraduate curriculum so that instructors for biochemistry courses, regardless of their background in bioinformatics, could adapt the module into their classrooms.


Sujet(s)
Intelligence artificielle , Biochimie , Programme d'études , Humains , Biochimie/enseignement et éducation , Biologie informatique/enseignement et éducation , Biologie informatique/méthodes , Conformation des protéines , Étudiants , Logiciel , Universités , Protéines/composition chimique , Protéines/métabolisme , Protéines/génétique , Séquence d'acides aminés
15.
Sci Signal ; 17(842): eadp5354, 2024 Jun 25.
Article de Anglais | MEDLINE | ID: mdl-38917220

RÉSUMÉ

WWC1 is a scaffolding protein in the evolutionarily conserved Hippo signaling network and is genetically linked to human memory and synaptic plasticity. In the archives of Science Signaling, Stepan et al. demonstrate the translational potential of modulating WWC1 through pharmacological inhibition of Hippo-pathway kinases to enhance cognition.


Sujet(s)
Mémoire , Plasticité neuronale , Transduction du signal , Humains , Plasticité neuronale/physiologie , Mémoire/physiologie , Protéines et peptides de signalisation intracellulaire/métabolisme , Protéines et peptides de signalisation intracellulaire/génétique , Animaux , Protein-Serine-Threonine Kinases/métabolisme , Protein-Serine-Threonine Kinases/génétique , Protéines/métabolisme , Protéines/génétique
16.
Genome Biol Evol ; 16(5)2024 05 02.
Article de Anglais | MEDLINE | ID: mdl-38735759

RÉSUMÉ

A fundamental goal in evolutionary biology and population genetics is to understand how selection shapes the fate of new mutations. Here, we test the null hypothesis that insertion-deletion (indel) events in protein-coding regions occur randomly with respect to secondary structures. We identified indels across 11,444 sequence alignments in mouse, rat, human, chimp, and dog genomes and then quantified their overlap with four different types of secondary structure-alpha helices, beta strands, protein bends, and protein turns-predicted by deep-learning methods of AlphaFold2. Indels overlapped secondary structures 54% as much as expected and were especially underrepresented over beta strands, which tend to form internal, stable regions of proteins. In contrast, indels were enriched by 155% over regions without any predicted secondary structures. These skews were stronger in the rodent lineages compared to the primate lineages, consistent with population genetic theory predicting that natural selection will be more efficient in species with larger effective population sizes. Nonsynonymous substitutions were also less common in regions of protein secondary structure, although not as strongly reduced as in indels. In a complementary analysis of thousands of human genomes, we showed that indels overlapping secondary structure segregated at significantly lower frequency than indels outside of secondary structure. Taken together, our study shows that indels are selected against if they overlap secondary structure, presumably because they disrupt the tertiary structure and function of a protein.


Sujet(s)
Mutation de type INDEL , Structure secondaire des protéines , Humains , Animaux , Souris , Rats , Évolution moléculaire , Protéines/génétique , Protéines/composition chimique , Chiens , Sélection génétique , Génome
17.
Nucleic Acids Res ; 52(W1): W182-W186, 2024 Jul 05.
Article de Anglais | MEDLINE | ID: mdl-38747341

RÉSUMÉ

AlphaFind is a web-based search engine that provides fast structure-based retrieval in the entire set of AlphaFold DB structures. Unlike other protein processing tools, AlphaFind is focused entirely on tertiary structure, automatically extracting the main 3D features of each protein chain and using a machine learning model to find the most similar structures. This indexing approach and the 3D feature extraction method used by AlphaFind have both demonstrated remarkable scalability to large datasets as well as to large protein structures. The web application itself has been designed with a focus on clarity and ease of use. The searcher accepts any valid UniProt ID, Protein Data Bank ID or gene symbol as input, and returns a set of similar protein chains from AlphaFold DB, including various similarity metrics between the query and each of the retrieved results. In addition to the main search functionality, the application provides 3D visualizations of protein structure superpositions in order to allow researchers to instantly analyze the structural similarity of the retrieved results. The AlphaFind web application is available online for free and without any registration at https://alphafind.fi.muni.cz.


Sujet(s)
Bases de données de protéines , Protéome , Logiciel , Protéome/composition chimique , Protéome/génétique , Internet , Moteur de recherche , Apprentissage machine , Conformation des protéines , Protéines/composition chimique , Protéines/génétique , Protéines/métabolisme , Pliage des protéines , Modèles moléculaires , Similitude structurale de protéines
18.
Nucleic Acids Res ; 52(W1): W287-W293, 2024 Jul 05.
Article de Anglais | MEDLINE | ID: mdl-38747351

RÉSUMÉ

The PSIRED Workbench is a long established and popular bioinformatics web service offering a wide range of machine learning based analyses for characterizing protein structure and function. In this paper we provide an update of the recent additions and developments to the webserver, with a focus on new Deep Learning based methods. We briefly discuss some trends in server usage since the publication of AlphaFold2 and we give an overview of some upcoming developments for the service. The PSIPRED Workbench is available at http://bioinf.cs.ucl.ac.uk/psipred.


Sujet(s)
Apprentissage profond , Protéines , Logiciel , Protéines/composition chimique , Protéines/génétique , Internet , Conformation des protéines , Biologie informatique/méthodes , Analyse de séquence de protéine/méthodes
19.
Nucleic Acids Res ; 52(W1): W140-W147, 2024 Jul 05.
Article de Anglais | MEDLINE | ID: mdl-38769064

RÉSUMÉ

Genomic variation can impact normal biological function in complex ways and so understanding variant effects requires a broad range of data to be coherently assimilated. Whilst the volume of human variant data and relevant annotations has increased, the corresponding increase in the breadth of participating fields, standards and versioning mean that moving between genomic, coding, protein and structure positions is increasingly complex. In turn this makes investigating variants in diverse formats and assimilating annotations from different resources challenging. ProtVar addresses these issues to facilitate the contextualization and interpretation of human missense variation with unparalleled flexibility and ease of accessibility for use by the broadest range of researchers. By precalculating all possible variants in the human proteome it offers near instantaneous mapping between all relevant data types. It also combines data and analyses from a plethora of resources to bring together genomic, protein sequence and function annotations as well as structural insights and predictions to better understand the likely effect of missense variation in humans. It is offered as an intuitive web server https://www.ebi.ac.uk/protvar where data can be explored and downloaded, and can be accessed programmatically via an API.


Sujet(s)
Mutation faux-sens , Logiciel , Humains , Bases de données de protéines , Annotation de séquence moléculaire , Protéome/génétique , Protéines/génétique , Protéines/composition chimique , Internet , Génomique/méthodes
20.
Nucleic Acids Res ; 52(W1): W207-W214, 2024 Jul 05.
Article de Anglais | MEDLINE | ID: mdl-38783112

RÉSUMÉ

Protein-protein interactions (PPIs) play a vital role in cellular functions and are essential for therapeutic development and understanding diseases. However, current predictive tools often struggle to balance efficiency and precision in predicting the effects of mutations on these complex interactions. To address this, we present DDMut-PPI, a deep learning model that efficiently and accurately predicts changes in PPI binding free energy upon single and multiple point mutations. Building on the robust Siamese network architecture with graph-based signatures from our prior work, DDMut, the DDMut-PPI model was enhanced with a graph convolutional network operated on the protein interaction interface. We used residue-specific embeddings from ProtT5 protein language model as node features, and a variety of molecular interactions as edge features. By integrating evolutionary context with spatial information, this framework enables DDMut-PPI to achieve a robust Pearson correlation of up to 0.75 (root mean squared error: 1.33 kcal/mol) in our evaluations, outperforming most existing methods. Importantly, the model demonstrated consistent performance across mutations that increase or decrease binding affinity. DDMut-PPI offers a significant advancement in the field and will serve as a valuable tool for researchers probing the complexities of protein interactions. DDMut-PPI is freely available as a web server and an application programming interface at https://biosig.lab.uq.edu.au/ddmut_ppi.


Sujet(s)
Apprentissage profond , Cartographie d'interactions entre protéines , Cartographie d'interactions entre protéines/méthodes , Liaison aux protéines , Mutation , Logiciel , Cartes d'interactions protéiques/génétique , Humains , Protéines/génétique , Protéines/métabolisme , Protéines/composition chimique , Mutation ponctuelle
SÉLECTION CITATIONS
DÉTAIL DE RECHERCHE
...