Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 20 de 341
Filtrar
1.
Cell ; 177(3): 737-750.e15, 2019 04 18.
Artículo en Inglés | MEDLINE | ID: mdl-31002798

RESUMEN

The proteasome mediates selective protein degradation and is dynamically regulated in response to proteotoxic challenges. SKN-1A/Nrf1, an endoplasmic reticulum (ER)-associated transcription factor that undergoes N-linked glycosylation, serves as a sensor of proteasome dysfunction and triggers compensatory upregulation of proteasome subunit genes. Here, we show that the PNG-1/NGLY1 peptide:N-glycanase edits the sequence of SKN-1A protein by converting particular N-glycosylated asparagine residues to aspartic acid. Genetically introducing aspartates at these N-glycosylation sites bypasses the requirement for PNG-1/NGLY1, showing that protein sequence editing rather than deglycosylation is key to SKN-1A function. This pathway is required to maintain sufficient proteasome expression and activity, and SKN-1A hyperactivation confers resistance to the proteotoxicity of human amyloid beta peptide. Deglycosylation-dependent protein sequence editing explains how ER-associated and cytosolic isoforms of SKN-1 perform distinct cytoprotective functions corresponding to those of mammalian Nrf1 and Nrf2. Thus, we uncover an unexpected mechanism by which N-linked glycosylation regulates protein function and proteostasis.


Asunto(s)
Proteínas de Caenorhabditis elegans/metabolismo , Proteínas de Unión al ADN/metabolismo , Complejo de la Endopetidasa Proteasomal/metabolismo , Factores de Transcripción/metabolismo , Secuencia de Aminoácidos , Animales , Asparagina/metabolismo , Bortezomib/farmacología , Sistemas CRISPR-Cas/genética , Caenorhabditis elegans/metabolismo , Proteínas de Caenorhabditis elegans/química , Proteínas de Caenorhabditis elegans/genética , Proteínas de Unión al ADN/química , Proteínas de Unión al ADN/genética , Retículo Endoplásmico/metabolismo , Edición Génica , Regulación de la Expresión Génica/efectos de los fármacos , Estrés Oxidativo , Complejo de la Endopetidasa Proteasomal/genética , Subunidades de Proteína/química , Subunidades de Proteína/genética , Subunidades de Proteína/metabolismo , Alineación de Secuencia , Factores de Transcripción/química , Factores de Transcripción/genética
2.
Immunity ; 56(7): 1681-1698.e13, 2023 07 11.
Artículo en Inglés | MEDLINE | ID: mdl-37301199

RESUMEN

CD4+ T cell responses are exquisitely antigen specific and directed toward peptide epitopes displayed by human leukocyte antigen class II (HLA-II) on antigen-presenting cells. Underrepresentation of diverse alleles in ligand databases and an incomplete understanding of factors affecting antigen presentation in vivo have limited progress in defining principles of peptide immunogenicity. Here, we employed monoallelic immunopeptidomics to identify 358,024 HLA-II binders, with a particular focus on HLA-DQ and HLA-DP. We uncovered peptide-binding patterns across a spectrum of binding affinities and enrichment of structural antigen features. These aspects underpinned the development of context-aware predictor of T cell antigens (CAPTAn), a deep learning model that predicts peptide antigens based on their affinity to HLA-II and full sequence of their source proteins. CAPTAn was instrumental in discovering prevalent T cell epitopes from bacteria in the human microbiome and a pan-variant epitope from SARS-CoV-2. Together CAPTAn and associated datasets present a resource for antigen discovery and the unraveling genetic associations of HLA alleles with immunopathologies.


Asunto(s)
COVID-19 , Aprendizaje Profundo , Humanos , Captano , SARS-CoV-2 , Antígenos HLA , Epítopos de Linfocito T , Péptidos
3.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38557677

RESUMEN

Protein design is central to nearly all protein engineering problems, as it can enable the creation of proteins with new biological functions, such as improving the catalytic efficiency of enzymes. One key facet of protein design, fixed-backbone protein sequence design, seeks to design new sequences that will conform to a prescribed protein backbone structure. Nonetheless, existing sequence design methods present limitations, such as low sequence diversity and shortcomings in experimental validation of the designed functional proteins. These inadequacies obstruct the goal of functional protein design. To improve these limitations, we initially developed the Graphormer-based Protein Design (GPD) model. This model utilizes the Transformer on a graph-based representation of three-dimensional protein structures and incorporates Gaussian noise and a sequence random masks to node features, thereby enhancing sequence recovery and diversity. The performance of the GPD model was significantly better than that of the state-of-the-art ProteinMPNN model on multiple independent tests, especially for sequence diversity. We employed GPD to design CalB hydrolase and generated nine artificially designed CalB proteins. The results show a 1.7-fold increase in catalytic activity compared to that of the wild-type CalB and strong substrate selectivity on p-nitrophenyl acetate with different carbon chain lengths (C2-C16). Thus, the GPD method could be used for the de novo design of industrial enzymes and protein drugs. The code was released at https://github.com/decodermu/GPD.


Asunto(s)
Ingeniería de Proteínas , Proteínas , Proteínas/química , Secuencia de Aminoácidos , Ingeniería de Proteínas/métodos
4.
Brief Bioinform ; 25(4)2024 May 23.
Artículo en Inglés | MEDLINE | ID: mdl-38851299

RESUMEN

Protein-protein interactions (PPIs) are the basis of many important biological processes, with protein complexes being the key forms implementing these interactions. Understanding protein complexes and their functions is critical for elucidating mechanisms of life processes, disease diagnosis and treatment and drug development. However, experimental methods for identifying protein complexes have many limitations. Therefore, it is necessary to use computational methods to predict protein complexes. Protein sequences can indicate the structure and biological functions of proteins, while also determining their binding abilities with other proteins, influencing the formation of protein complexes. Integrating these characteristics to predict protein complexes is very promising, but currently there is no effective framework that can utilize both protein sequence and PPI network topology for complex prediction. To address this challenge, we have developed HyperGraphComplex, a method based on hypergraph variational autoencoder that can capture expressive features from protein sequences without feature engineering, while also considering topological properties in PPI networks, to predict protein complexes. Experiment results demonstrated that HyperGraphComplex achieves satisfactory predictive performance when compared with state-of-art methods. Further bioinformatics analysis shows that the predicted protein complexes have similar attributes to known ones. Moreover, case studies corroborated the remarkable predictive capability of our model in identifying protein complexes, including 3 that were not only experimentally validated by recent studies but also exhibited high-confidence structural predictions from AlphaFold-Multimer. We believe that the HyperGraphComplex algorithm and our provided proteome-wide high-confidence protein complex prediction dataset will help elucidate how proteins regulate cellular processes in the form of complexes, and facilitate disease diagnosis and treatment and drug development. Source codes are available at https://github.com/LiDlab/HyperGraphComplex.


Asunto(s)
Biología Computacional , Mapeo de Interacción de Proteínas , Biología Computacional/métodos , Mapeo de Interacción de Proteínas/métodos , Proteínas/metabolismo , Proteínas/química , Algoritmos , Mapas de Interacción de Proteínas , Bases de Datos de Proteínas , Humanos , Análisis de Secuencia de Proteína/métodos , Secuencia de Aminoácidos
5.
Brief Bioinform ; 25(3)2024 Mar 27.
Artículo en Inglés | MEDLINE | ID: mdl-38600663

RESUMEN

Protein sequence design can provide valuable insights into biopharmaceuticals and disease treatments. Currently, most protein sequence design methods based on deep learning focus on network architecture optimization, while ignoring protein-specific physicochemical features. Inspired by the successful application of structure templates and pre-trained models in the protein structure prediction, we explored whether the representation of structural sequence profile can be used for protein sequence design. In this work, we propose SPDesign, a method for protein sequence design based on structural sequence profile using ultrafast shape recognition. Given an input backbone structure, SPDesign utilizes ultrafast shape recognition vectors to accelerate the search for similar protein structures in our in-house PAcluster80 structure database and then extracts the sequence profile through structure alignment. Combined with structural pre-trained knowledge and geometric features, they are further fed into an enhanced graph neural network for sequence prediction. The results show that SPDesign significantly outperforms the state-of-the-art methods, such as ProteinMPNN, Pifold and LM-Design, leading to 21.89%, 15.54% and 11.4% accuracy gains in sequence recovery rate on CATH 4.2 benchmark, respectively. Encouraging results also have been achieved on orphan and de novo (designed) benchmarks with few homologous sequences. Furthermore, analysis conducted by the PDBench tool suggests that SPDesign performs well in subdivided structures. More interestingly, we found that SPDesign can well reconstruct the sequences of some proteins that have similar structures but different sequences. Finally, the structural modeling verification experiment indicates that the sequences designed by SPDesign can fold into the native structures more accurately.


Asunto(s)
Redes Neurales de la Computación , Proteínas , Alineación de Secuencia , Secuencia de Aminoácidos , Proteínas/química , Análisis de Secuencia de Proteína/métodos
6.
Brief Bioinform ; 24(1)2023 01 19.
Artículo en Inglés | MEDLINE | ID: mdl-36403092

RESUMEN

MOTIVATION: Biological experimental approaches to protein-protein interaction (PPI) site prediction are critical for understanding the mechanisms of biochemical processes but are time-consuming and laborious. With the development of Deep Learning (DL) techniques, the most popular Convolutional Neural Networks (CNN)-based methods have been proposed to address these problems. Although significant progress has been made, these methods still have limitations in encoding the characteristics of each amino acid in protein sequences. Current methods cannot efficiently explore the nature of Position Specific Scoring Matrix (PSSM), secondary structure and raw protein sequences by processing them all together. For PPI site prediction, how to effectively model the PPI context with attention to prediction remains an open problem. In addition, the long-distance dependencies of PPI features are important, which is very challenging for many CNN-based methods because the innate ability of CNN is difficult to outperform auto-regressive models like Transformers. RESULTS: To effectively mine the properties of PPI features, a novel hybrid neural network named HN-PPISP is proposed, which integrates a Multi-layer Perceptron Mixer (MLP-Mixer) module for local feature extraction and a two-stage multi-branch module for global feature capture. The model merits Transformer, TextCNN and Bi-LSTM as a powerful alternative for PPI site prediction. On the one hand, this is the first application of an advanced Transformer (i.e. MLP-Mixer) with a hybrid network for sequence-based PPI prediction. On the other hand, unlike existing methods that treat global features altogether, the proposed two-stage multi-branch hybrid module firstly assigns different attention scores to the input features and then encodes the feature through different branch modules. In the first stage, different improved attention modules are hybridized to extract features from the raw protein sequences, secondary structure and PSSM, respectively. In the second stage, a multi-branch network is designed to aggregate information from both branches in parallel. The two branches encode the features and extract dependencies through several operations such as TextCNN, Bi-LSTM and different activation functions. Experimental results on real-world public datasets show that our model consistently achieves state-of-the-art performance over seven remarkable baselines. AVAILABILITY: The source code of HN-PPISP model is available at https://github.com/ylxu05/HN-PPISP.


Asunto(s)
Redes Neurales de la Computación , Programas Informáticos , Secuencia de Aminoácidos , Aminoácidos , Estructura Secundaria de Proteína
7.
Brief Bioinform ; 24(4)2023 07 20.
Artículo en Inglés | MEDLINE | ID: mdl-37429578

RESUMEN

Computational protein design has been demonstrated to be the most powerful tool in the last few years among protein designing and repacking tasks. In practice, these two tasks are strongly related but often treated separately. Besides, state-of-the-art deep-learning-based methods cannot provide interpretability from an energy perspective, affecting the accuracy of the design. Here we propose a new systematic approach, including both a posterior probability and a joint probability parts, to solve the two essential questions once for all. This approach takes the physicochemical property of amino acids into consideration and uses the joint probability model to ensure the convergence between structure and amino acid type. Our results demonstrated that this method could generate feasible, high-confidence sequences with low-energy side conformations. The designed sequences can fold into target structures with high confidence and maintain relatively stable biochemical properties. The side chain conformation has a significantly lower energy landscape without delegating to a rotamer library or performing the expensive conformational searches. Overall, we propose an end-to-end method that combines the advantages of both deep learning and energy-based methods. The design results of this model demonstrate high efficiency, and precision, as well as a low energy state and good interpretability.


Asunto(s)
Aprendizaje Profundo , Modelos Moleculares , Proteínas/química , Secuencia de Aminoácidos , Aminoácidos/química , Conformación Proteica
8.
Brief Bioinform ; 24(5)2023 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-37649385

RESUMEN

Protein crystallization is crucial for biology, but the steps involved are complex and demanding in terms of external factors and internal structure. To save on experimental costs and time, the tendency of proteins to crystallize can be initially determined and screened by modeling. As a result, this study created a new pipeline aimed at using protein sequence to predict protein crystallization propensity in the protein material production stage, purification stage and production of crystal stage. The newly created pipeline proposed a new feature selection method, which involves combining Chi-square (${\chi }^{2}$) and recursive feature elimination together with the 12 selected features, followed by a linear discriminant analysisfor dimensionality reduction and finally, a support vector machine algorithm with hyperparameter tuning and 10-fold cross-validation is used to train the model and test the results. This new pipeline has been tested on three different datasets, and the accuracy rates are higher than the existing pipelines. In conclusion, our model provides a new solution to predict multistage protein crystallization propensity which is a big challenge in computational biology.


Asunto(s)
Algoritmos , Aprendizaje Automático , Cristalización , Secuencia de Aminoácidos , Biología Computacional
9.
Brief Bioinform ; 24(3)2023 05 19.
Artículo en Inglés | MEDLINE | ID: mdl-37020337

RESUMEN

Identification of potent peptides through model prediction can reduce benchwork in wet experiments. However, the conventional process of model buildings can be complex and time consuming due to challenges such as peptide representation, feature selection, model selection and hyperparameter tuning. Recently, advanced pretrained deep learning-based language models (LMs) have been released for protein sequence embedding and applied to structure and function prediction. Based on these developments, we have developed UniDL4BioPep, a universal deep-learning model architecture for transfer learning in bioactive peptide binary classification modeling. It can directly assist users in training a high-performance deep-learning model with a fixed architecture and achieve cutting-edge performance to meet the demands in efficiently novel bioactive peptide discovery. To the best of our best knowledge, this is the first time that a pretrained biological language model is utilized for peptide embeddings and successfully predicts peptide bioactivities through large-scale evaluations of those peptide embeddings. The model was also validated through uniform manifold approximation and projection analysis. By combining the LM with a convolutional neural network, UniDL4BioPep achieved greater performances than the respective state-of-the-art models for 15 out of 20 different bioactivity dataset prediction tasks. The accuracy, Mathews correlation coefficient and area under the curve were 0.7-7, 1.23-26.7 and 0.3-25.6% higher, respectively. A user-friendly web server of UniDL4BioPep for the tested bioactivities is established and freely accessible at https://nepc2pvmzy.us-east-1.awsapprunner.com. The source codes, datasets and templates of UniDL4BioPep for other bioactivity fitting and prediction tasks are available at https://github.com/dzjxzyd/UniDL4BioPep.


Asunto(s)
Aprendizaje Profundo , Redes Neurales de la Computación , Péptidos/química , Programas Informáticos , Secuencia de Aminoácidos
10.
Brief Bioinform ; 24(3)2023 05 19.
Artículo en Inglés | MEDLINE | ID: mdl-36946414

RESUMEN

In the era of constantly increasing amounts of the available protein data, a relevant and interpretable visualization becomes crucial, especially for tasks requiring human expertise. Poincaré disk projection has previously demonstrated its important efficiency for visualization of biological data such as single-cell RNAseq data. Here, we develop a new method PoincaréMSA for visual representation of complex relationships between protein sequences based on Poincaré maps embedding. We demonstrate its efficiency and potential for visualization of protein family topology as well as evolutionary and functional annotation of uncharacterized sequences. PoincaréMSA is implemented in open source Python code with available interactive Google Colab notebooks as described at https://www.dsimb.inserm.fr/POINCARE_MSA.


Asunto(s)
Proteínas , Programas Informáticos , Humanos , Secuencia de Aminoácidos , Evolución Biológica
11.
Mol Cell Proteomics ; 22(8): 100591, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37301379

RESUMEN

The human proteome comprises of all of the proteins produced by the sequences translated from the human genome with additional modifications in both sequence and function caused by nonsynonymous variants and posttranslational modifications including cleavage of the initial transcript into smaller peptides and polypeptides. The UniProtKB database (www.uniprot.org) is the world's leading high-quality, comprehensive and freely accessible resource of protein sequence and functional information and presents a summary of experimentally verified, or computationally predicted, functional information added by our expert biocuration team for each protein in the proteome. Researchers in the field of mass spectrometry-based proteomics both consume and add to the body of data available in UniProtKB, and this review highlights the information we provide to this community and the knowledge we in turn obtain from groups via deposition of large-scale datasets in public domain databases.


Asunto(s)
Proteoma , Proteómica , Humanos , Proteoma/genética , Bases de Datos de Proteínas , Secuencia de Aminoácidos , Péptidos
12.
Proc Natl Acad Sci U S A ; 119(24): e2203176119, 2022 06 14.
Artículo en Inglés | MEDLINE | ID: mdl-35648808

RESUMEN

Bacterial signal transduction systems sense changes in the environment and transmit these signals to control cellular responses. The simplest one-component signal transduction systems include an input sensor domain and an output response domain encoded in a single protein chain. Alternatively, two-component signal transduction systems transmit signals by phosphorelay between input and output domains from separate proteins. The membrane-tethered periplasmic bile acid sensor that activates the Vibrio parahaemolyticus type III secretion system adopts an obligate heterodimer of two proteins encoded by partially overlapping VtrA and VtrC genes. This co-component signal transduction system binds bile acid using a lipocalin-like domain in VtrC and transmits the signal through the membrane to a cytoplasmic DNA-binding transcription factor in VtrA. Using the domain and operon organization of VtrA/VtrC, we identify a fast-evolving superfamily of co-component systems in enteric bacteria. Accurate machine learning­based fold predictions for the candidate co-components support their homology in the twilight zone of rapidly evolving sequences and provide mechanistic hypotheses about previously unrecognized lipid-sensing functions.


Asunto(s)
Proteínas Bacterianas , Regulación Bacteriana de la Expresión Génica , Islas Genómicas , Proteínas de la Membrana , Sistemas de Secreción Tipo III , Vibrio parahaemolyticus , Proteínas Bacterianas/genética , Proteínas Bacterianas/metabolismo , Ácidos y Sales Biliares/metabolismo , Proteínas de Unión al ADN/metabolismo , Proteínas de la Membrana/genética , Proteínas de la Membrana/metabolismo , Multimerización de Proteína , Transducción de Señal , Factores de Transcripción/metabolismo , Sistemas de Secreción Tipo III/genética , Vibrio parahaemolyticus/genética , Vibrio parahaemolyticus/patogenicidad , Virulencia/genética
13.
Proteomics ; : e2400044, 2024 Jun 02.
Artículo en Francés | MEDLINE | ID: mdl-38824664

RESUMEN

RNA-dependent liquid-liquid phase separation (LLPS) proteins play critical roles in cellular processes such as stress granule formation, DNA repair, RNA metabolism, germ cell development, and protein translation regulation. The abnormal behavior of these proteins is associated with various diseases, particularly neurodegenerative disorders like amyotrophic lateral sclerosis and frontotemporal dementia, making their identification crucial. However, conventional biochemistry-based methods for identifying these proteins are time-consuming and costly. Addressing this challenge, our study developed a robust computational model for their identification. We constructed a comprehensive dataset containing 137 RNA-dependent and 606 non-RNA-dependent LLPS protein sequences, which were then encoded using amino acid composition, composition of K-spaced amino acid pairs, Geary autocorrelation, and conjoined triad methods. Through a combination of correlation analysis, mutual information scoring, and incremental feature selection, we identified an optimal feature subset. This subset was used to train a random forest model, which achieved an accuracy of 90% when tested against an independent dataset. This study demonstrates the potential of computational methods as efficient alternatives for the identification of RNA-dependent LLPS proteins. To enhance the accessibility of the model, a user-centric web server has been established and can be accessed via the link: http://rpp.lin-group.cn.

14.
BMC Bioinformatics ; 25(1): 85, 2024 Feb 28.
Artículo en Inglés | MEDLINE | ID: mdl-38413857

RESUMEN

PURPOSE: Despite the many progresses with alignment algorithms, aligning divergent protein sequences with less than 20-35% pairwise identity (so called "twilight zone") remains a difficult problem. Many alignment algorithms have been using substitution matrices since their creation in the 1970's to generate alignments, however, these matrices do not work well to score alignments within the twilight zone. We developed Protein Embedding based Alignments, or PEbA, to better align sequences with low pairwise identity. Similar to the traditional Smith-Waterman algorithm, PEbA uses a dynamic programming algorithm but the matching score of amino acids is based on the similarity of their embeddings from a protein language model. METHODS: We tested PEbA on over twelve thousand benchmark pairwise alignments from BAliBASE, each one extracted from one of their multiple sequence alignments. Five different BAliBASE references were used, each with different sequence identities, motifs, and lengths, allowing PEbA to showcase how well it aligns under different circumstances. RESULTS: PEbA greatly outperformed BLOSUM substitution matrix-based pairwise alignments, achieving different levels of improvements of the alignment quality for pairs of sequences with different levels of similarity (over four times as well for pairs of sequences with <10% identity). We also compared PEbA with embeddings generated by different protein language models (ProtT5 and ESM-2) and found that ProtT5-XL-U50 produced the most useful embeddings for aligning protein sequences. PEbA also outperformed DEDAL and vcMSA, two recently developed protein language model embedding-based alignment methods. CONCLUSION: Our results suggested that general purpose protein language models provide useful contextual information for generating more accurate protein alignments than typically used methods.


Asunto(s)
Ácidos Borónicos , Proteínas , Proteínas/química , Secuencia de Aminoácidos , Alineación de Secuencia , Algoritmos
15.
J Biol Chem ; 299(1): 102801, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-36528065

RESUMEN

Protein phase separation is thought to be a primary driving force for the formation of membrane-less organelles, which control a wide range of biological functions from stress response to ribosome biogenesis. Among phase-separating (PS) proteins, many have intrinsically disordered regions (IDRs) that are needed for phase separation to occur. Accurate identification of IDRs that drive phase separation is important for testing the underlying mechanisms of phase separation, identifying biological processes that rely on phase separation, and designing sequences that modulate phase separation. To identify IDRs that drive phase separation, we first curated datasets of folded, ID, and PS ID sequences. We then used these sequence sets to examine how broadly existing amino acid property scales can be used to distinguish between the three classes of protein regions. We found that there are robust property differences between the classes and, consequently, that numerous combinations of amino acid property scales can be used to make robust predictions of protein phase separation. This result indicates that multiple, redundant mechanisms contribute to the formation of phase-separated droplets from IDRs. The top-performing scales were used to further optimize our previously developed predictor of PS IDRs, ParSe. We then modified ParSe to account for interactions between amino acids and obtained reasonable predictive power for mutations that have been designed to test the role of amino acid interactions in driving protein phase separation. Collectively, our findings provide further insight into the classification of IDRs and the elements involved in protein phase separation.


Asunto(s)
Proteínas Intrínsecamente Desordenadas , Proteínas Intrínsecamente Desordenadas/química , Dominios Proteicos , Aminoácidos
16.
Brief Bioinform ; 23(5)2022 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-35914952

RESUMEN

Low complexity regions are fragments of protein sequences composed of only a few types of amino acids. These regions frequently occur in proteins and can play an important role in their functions. However, scientists are mainly focused on regions characterized by high diversity of amino acid composition. Similarity between regions of protein sequences frequently reflect functional similarity between them. In this article, we discuss strengths and weaknesses of the similarity analysis of low complexity regions using BLAST, HHblits and CD-HIT. These methods are considered to be the gold standard in protein similarity analysis and were designed for comparison of high complexity regions. However, we lack specialized methods that could be used to compare the similarity of low complexity regions. Therefore, we investigated the existing methods in order to understand how they can be applied to compare such regions. Our results are supported by exploratory study, discussion of amino acid composition and biological roles of selected examples. We show that existing methods need improvements to efficiently search for similar low complexity regions. We suggest features that have to be re-designed specifically for comparing low complexity regions: scoring matrix, multiple sequence alignment, e-value, local alignment and clustering based on a set of representative sequences. Results of this analysis can either be used to improve existing methods or to create new methods for the similarity analysis of low complexity regions.


Asunto(s)
Aminoácidos , Proteínas , Algoritmos , Secuencia de Aminoácidos , Aminoácidos/genética , Análisis por Conglomerados , Proteínas/química , Proteínas/genética , Alineación de Secuencia , Análisis de Secuencia de Proteína/métodos
17.
Brief Bioinform ; 23(3)2022 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-35348602

RESUMEN

Proteins with desired functions and properties are important in fields like nanotechnology and biomedicine. De novo protein design enables the production of previously unseen proteins from the ground up and is believed as a key point for handling real social challenges. Recent introduction of deep learning into design methods exhibits a transformative influence and is expected to represent a promising and exciting future direction. In this review, we retrospect the major aspects of current advances in deep-learning-based design procedures and illustrate their novelty in comparison with conventional knowledge-based approaches through noticeable cases. We not only describe deep learning developments in structure-based protein design and direct sequence design, but also highlight recent applications of deep reinforcement learning in protein design. The future perspectives on design goals, challenges and opportunities are also comprehensively discussed.


Asunto(s)
Aprendizaje Profundo , Bases del Conocimiento , Proteínas
18.
Brief Bioinform ; 23(3)2022 05 13.
Artículo en Inglés | MEDLINE | ID: mdl-35438149

RESUMEN

Therapeutic peptides act on the skeletal system, digestive system and blood system, have antibacterial properties and help relieve inflammation. In order to reduce the resource consumption of wet experiments for the identification of therapeutic peptides, many computational-based methods have been developed to solve the identification of therapeutic peptides. Due to the insufficiency of traditional machine learning methods in dealing with feature noise. We propose a novel therapeutic peptide identification method called Structured Sparse Regularized Takagi-Sugeno-Kang Fuzzy System on Within-Class Scatter (SSR-TSK-FS-WCS). Our method achieves good performance on multiple therapeutic peptides and UCI datasets.


Asunto(s)
Algoritmos , Lógica Difusa , Aprendizaje Automático , Péptidos/uso terapéutico
19.
Expert Rev Proteomics ; : 1-10, 2024 Aug 21.
Artículo en Inglés | MEDLINE | ID: mdl-39152734

RESUMEN

INTRODUCTION: Metaproteomics offers insights into the function of complex microbial communities, while it is also capable of revealing microbe-microbe and host-microbe interactions. Data-independent acquisition (DIA) mass spectrometry is an emerging technology, which holds great potential to achieve deep and accurate metaproteomics with higher reproducibility yet still facing a series of challenges due to the inherent complexity of metaproteomics and DIA data. AREAS COVERED: This review offers an overview of the DIA metaproteomics approaches, covering aspects such as database construction, search strategy, and data analysis tools. Several cases of current DIA metaproteomics studies are presented to illustrate the procedures. Important ongoing challenges are also highlighted. Future perspectives of DIA methods for metaproteomics analysis are further discussed. Cited references are searched through and collected from Google Scholar and PubMed. EXPERT OPINION: Considering the inherent complexity of DIA metaproteomics data, data analysis strategies specifically designed for interpretation are imperative. From this point of view, we anticipate that deep learning methods and de novo sequencing methods will become more prevalent in the future, potentially improving protein coverage in metaproteomics. Moreover, the advancement of metaproteomics also depends on the development of sample preparation methods, data analysis strategies, etc. These factors are key to unlocking the full potential of metaproteomics.

20.
Stat Appl Genet Mol Biol ; 22(1)2023 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-37658681

RESUMEN

Proteins are the building blocks of all living things. Protein function must be ascertained if the molecular mechanism of life is to be understood. While CNN is good at capturing short-term relationships, GRU and LSTM can capture long-term dependencies. A hybrid approach that combines the complementary benefits of these deep-learning models motivates our work. Protein Language models, which use attention networks to gather meaningful data and build representations for proteins, have seen tremendous success in recent years processing the protein sequences. In this paper, we propose a hybrid CNN + BiGRU - Attention based model with protein language model embedding that effectively combines the output of CNN with the output of BiGRU-Attention for predicting protein functions. We evaluated the performance of our proposed hybrid model on human and yeast datasets. The proposed hybrid model improves the Fmax value over the state-of-the-art model SDN2GO for the cellular component prediction task by 1.9 %, for the molecular function prediction task by 3.8 % and for the biological process prediction task by 0.6 % for human dataset and for yeast dataset the cellular component prediction task by 2.4 %, for the molecular function prediction task by 5.2 % and for the biological process prediction task by 1.2 %.


Asunto(s)
Aprendizaje Profundo , Humanos , Saccharomyces cerevisiae/genética , Secuencia de Aminoácidos , Lenguaje , Virión
SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda