Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 14.220
Filtrar
Mais filtros

Intervalo de ano de publicação
1.
Cell ; 172(1-2): 358-372.e23, 2018 01 11.
Artigo em Inglês | MEDLINE | ID: mdl-29307493

RESUMO

Metabolite-protein interactions control a variety of cellular processes, thereby playing a major role in maintaining cellular homeostasis. Metabolites comprise the largest fraction of molecules in cells, but our knowledge of the metabolite-protein interactome lags behind our understanding of protein-protein or protein-DNA interactomes. Here, we present a chemoproteomic workflow for the systematic identification of metabolite-protein interactions directly in their native environment. The approach identified a network of known and novel interactions and binding sites in Escherichia coli, and we demonstrated the functional relevance of a number of newly identified interactions. Our data enabled identification of new enzyme-substrate relationships and cases of metabolite-induced remodeling of protein complexes. Our metabolite-protein interactome consists of 1,678 interactions and 7,345 putative binding sites. Our data reveal functional and structural principles of chemical communication, shed light on the prevalence and mechanisms of enzyme promiscuity, and enable extraction of quantitative parameters of metabolite binding on a proteome-wide scale.


Assuntos
Metaboloma , Proteoma/metabolismo , Proteômica/métodos , Transdução de Sinais , Software , Regulação Alostérica , Sítios de Ligação , Escherichia coli , Metabolômica/métodos , Ligação Proteica , Mapas de Interação de Proteínas , Proteoma/química , Saccharomyces cerevisiae , Análise de Sequência de Proteína/métodos
2.
Cell ; 168(4): 600-612, 2017 02 09.
Artigo em Inglês | MEDLINE | ID: mdl-28187283

RESUMO

Cancer immunogenomics originally was framed by research supporting the hypothesis that cancer mutations generated novel peptides seen as "non-self" by the immune system. The search for these "neoantigens" has been facilitated by the combination of new sequencing technologies, specialized computational analyses, and HLA binding predictions that evaluate somatic alterations in a cancer genome and interpret their ability to produce an immune-stimulatory peptide. The resulting information can characterize a tumor's neoantigen load, its cadre of infiltrating immune cell types, the T or B cell receptor repertoire, and direct the design of a personalized therapeutic.


Assuntos
Antígenos de Neoplasias/imunologia , Neoplasias/genética , Neoplasias/imunologia , Animais , Vacinas Anticâncer/imunologia , Genoma Humano , Antígenos HLA/imunologia , Humanos , Imunogenética , Linfócitos do Interstício Tumoral/imunologia , Mutação , Análise de Sequência de Proteína
3.
Nat Rev Mol Cell Biol ; 20(11): 681-697, 2019 11.
Artigo em Inglês | MEDLINE | ID: mdl-31417196

RESUMO

The prediction of protein three-dimensional structure from amino acid sequence has been a grand challenge problem in computational biophysics for decades, owing to its intrinsic scientific interest and also to the many potential applications for robust protein structure prediction algorithms, from genome interpretation to protein function prediction. More recently, the inverse problem - designing an amino acid sequence that will fold into a specified three-dimensional structure - has attracted growing attention as a potential route to the rational engineering of proteins with functions useful in biotechnology and medicine. Methods for the prediction and design of protein structures have advanced dramatically in the past decade. Increases in computing power and the rapid growth in protein sequence and structure databases have fuelled the development of new data-intensive and computationally demanding approaches for structure prediction. New algorithms for designing protein folds and protein-protein interfaces have been used to engineer novel high-order assemblies and to design from scratch fluorescent proteins with novel or enhanced properties, as well as signalling proteins with therapeutic potential. In this Review, we describe current approaches for protein structure prediction and design and highlight a selection of the successful applications they have enabled.


Assuntos
Algoritmos , Bases de Dados de Proteínas , Modelos Moleculares , Proteínas/química , Análise de Sequência de Proteína , Animais , Humanos , Conformação Proteica , Proteínas/genética , Proteínas/metabolismo
4.
Nature ; 633(8030): 662-669, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39261738

RESUMO

The ability to sequence single protein molecules in their native, full-length form would enable a more comprehensive understanding of proteomic diversity. Current technologies, however, are limited in achieving this goal1,2. Here, we establish a method for the long-range, single-molecule reading of intact protein strands on a commercial nanopore sensor array. By using the ClpX unfoldase to ratchet proteins through a CsgG nanopore3,4, we provide single-molecule evidence that ClpX translocates substrates in two-residue steps. This mechanism achieves sensitivity to single amino acids on synthetic protein strands hundreds of amino acids in length, enabling the sequencing of combinations of single-amino-acid substitutions and the mapping of post-translational modifications, such as phosphorylation. To enhance classification accuracy further, we demonstrate the ability to reread individual protein molecules multiple times, and we explore the potential for highly accurate protein barcode sequencing. Furthermore, we develop a biophysical model that can simulate raw nanopore signals a priori on the basis of residue volume and charge, enhancing the interpretation of raw signal data. Finally, we apply these methods to examine full-length, folded protein domains for complete end-to-end analysis. These results provide proof of concept for a platform that has the potential to identify and characterize full-length proteoforms at single-molecule resolution.


Assuntos
Nanoporos , Proteínas , Análise de Sequência de Proteína , Imagem Individual de Molécula , Substituição de Aminoácidos , Endopeptidase Clp/química , Endopeptidase Clp/metabolismo , Fosforilação , Domínios Proteicos , Processamento de Proteína Pós-Traducional , Proteínas/química , Proteínas/metabolismo , Análise de Sequência de Proteína/métodos , Imagem Individual de Molécula/métodos
5.
Proc Natl Acad Sci U S A ; 121(27): e2311887121, 2024 Jul 02.
Artigo em Inglês | MEDLINE | ID: mdl-38913900

RESUMO

Predicting which proteins interact together from amino acid sequences is an important task. We develop a method to pair interacting protein sequences which leverages the power of protein language models trained on multiple sequence alignments (MSAs), such as MSA Transformer and the EvoFormer module of AlphaFold. We formulate the problem of pairing interacting partners among the paralogs of two protein families in a differentiable way. We introduce a method called Differentiable Pairing using Alignment-based Language Models (DiffPALM) that solves it by exploiting the ability of MSA Transformer to fill in masked amino acids in multiple sequence alignments using the surrounding context. MSA Transformer encodes coevolution between functionally or structurally coupled amino acids within protein chains. It also captures inter-chain coevolution, despite being trained on single-chain data. Relying on MSA Transformer without fine-tuning, DiffPALM outperforms existing coevolution-based pairing methods on difficult benchmarks of shallow multiple sequence alignments extracted from ubiquitous prokaryotic protein datasets. It also outperforms an alternative method based on a state-of-the-art protein language model trained on single sequences. Paired alignments of interacting protein sequences are a crucial ingredient of supervised deep learning methods to predict the three-dimensional structure of protein complexes. Starting from sequences paired by DiffPALM substantially improves the structure prediction of some eukaryotic protein complexes by AlphaFold-Multimer. It also achieves competitive performance with using orthology-based pairing.


Assuntos
Proteínas , Alinhamento de Sequência , Alinhamento de Sequência/métodos , Proteínas/química , Proteínas/metabolismo , Sequência de Aminoácidos , Algoritmos , Análise de Sequência de Proteína/métodos , Biologia Computacional/métodos , Bases de Dados de Proteínas
6.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38340092

RESUMO

De novo peptide sequencing is a promising approach for novel peptide discovery, highlighting the performance improvements for the state-of-the-art models. The quality of mass spectra often varies due to unexpected missing of certain ions, presenting a significant challenge in de novo peptide sequencing. Here, we use a novel concept of complementary spectra to enhance ion information of the experimental spectrum and demonstrate it through conceptual and practical analyses. Afterward, we design suitable encoders to encode the experimental spectrum and the corresponding complementary spectrum and propose a de novo sequencing model $\pi$-HelixNovo based on the Transformer architecture. We first demonstrated that $\pi$-HelixNovo outperforms other state-of-the-art models using a series of comparative experiments. Then, we utilized $\pi$-HelixNovo to de novo gut metaproteome peptides for the first time. The results show $\pi$-HelixNovo increases the identification coverage and accuracy of gut metaproteome and enhances the taxonomic resolution of gut metaproteome. We finally trained a powerful $\pi$-HelixNovo utilizing a larger training dataset, and as expected, $\pi$-HelixNovo achieves unprecedented performance, even for peptide-spectrum matches with never-before-seen peptide sequences. We also use the powerful $\pi$-HelixNovo to identify antibody peptides and multi-enzyme cleavage peptides, and $\pi$-HelixNovo is highly robust in these applications. Our results demonstrate the effectivity of the complementary spectrum and take a significant step forward in de novo peptide sequencing.


Assuntos
Análise de Sequência de Proteína , Espectrometria de Massas em Tandem , Espectrometria de Massas em Tandem/métodos , Análise de Sequência de Proteína/métodos , Peptídeos , Sequência de Aminoácidos , Anticorpos , Algoritmos
7.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38600663

RESUMO

Protein sequence design can provide valuable insights into biopharmaceuticals and disease treatments. Currently, most protein sequence design methods based on deep learning focus on network architecture optimization, while ignoring protein-specific physicochemical features. Inspired by the successful application of structure templates and pre-trained models in the protein structure prediction, we explored whether the representation of structural sequence profile can be used for protein sequence design. In this work, we propose SPDesign, a method for protein sequence design based on structural sequence profile using ultrafast shape recognition. Given an input backbone structure, SPDesign utilizes ultrafast shape recognition vectors to accelerate the search for similar protein structures in our in-house PAcluster80 structure database and then extracts the sequence profile through structure alignment. Combined with structural pre-trained knowledge and geometric features, they are further fed into an enhanced graph neural network for sequence prediction. The results show that SPDesign significantly outperforms the state-of-the-art methods, such as ProteinMPNN, Pifold and LM-Design, leading to 21.89%, 15.54% and 11.4% accuracy gains in sequence recovery rate on CATH 4.2 benchmark, respectively. Encouraging results also have been achieved on orphan and de novo (designed) benchmarks with few homologous sequences. Furthermore, analysis conducted by the PDBench tool suggests that SPDesign performs well in subdivided structures. More interestingly, we found that SPDesign can well reconstruct the sequences of some proteins that have similar structures but different sequences. Finally, the structural modeling verification experiment indicates that the sequences designed by SPDesign can fold into the native structures more accurately.


Assuntos
Redes Neurais de Computação , Proteínas , Alinhamento de Sequência , Sequência de Aminoácidos , Proteínas/química , Análise de Sequência de Proteína/métodos
8.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-39003530

RESUMO

Protein function prediction is critical for understanding the cellular physiological and biochemical processes, and it opens up new possibilities for advancements in fields such as disease research and drug discovery. During the past decades, with the exponential growth of protein sequence data, many computational methods for predicting protein function have been proposed. Therefore, a systematic review and comparison of these methods are necessary. In this study, we divide these methods into four different categories, including sequence-based methods, 3D structure-based methods, PPI network-based methods and hybrid information-based methods. Furthermore, their advantages and disadvantages are discussed, and then their performance is comprehensively evaluated and compared. Finally, we discuss the challenges and opportunities present in this field.


Assuntos
Biologia Computacional , Proteínas , Proteínas/química , Proteínas/metabolismo , Biologia Computacional/métodos , Humanos , Análise de Sequência de Proteína/métodos , Algoritmos
9.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-39038936

RESUMO

Sequence database searches followed by homology-based function transfer form one of the oldest and most popular approaches for predicting protein functions, such as Gene Ontology (GO) terms. These searches are also a critical component in most state-of-the-art machine learning and deep learning-based protein function predictors. Although sequence search tools are the basis of homology-based protein function prediction, previous studies have scarcely explored how to select the optimal sequence search tools and configure their parameters to achieve the best function prediction. In this paper, we evaluate the effect of using different options from among popular search tools, as well as the impacts of search parameters, on protein function prediction. When predicting GO terms on a large benchmark dataset, we found that BLASTp and MMseqs2 consistently exceed the performance of other tools, including DIAMOND-one of the most popular tools for function prediction-under default search parameters. However, with the correct parameter settings, DIAMOND can perform comparably to BLASTp and MMseqs2 in function prediction. Additionally, we developed a new scoring function to derive GO prediction from homologous hits that consistently outperform previously proposed scoring functions. These findings enable the improvement of almost all protein function prediction algorithms with a few easily implementable changes in their sequence homolog-based component. This study emphasizes the critical role of search parameter settings in homology-based function transfer and should have an important contribution to the development of future protein function prediction algorithms.


Assuntos
Bases de Dados de Proteínas , Proteínas , Proteínas/química , Proteínas/metabolismo , Proteínas/genética , Biologia Computacional/métodos , Ontologia Genética , Algoritmos , Análise de Sequência de Proteína/métodos , Software , Aprendizado de Máquina
10.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38695119

RESUMO

Sequence similarity is of paramount importance in biology, as similar sequences tend to have similar function and share common ancestry. Scoring matrices, such as PAM or BLOSUM, play a crucial role in all bioinformatics algorithms for identifying similarities, but have the drawback that they are fixed, independent of context. We propose a new scoring method for amino acid similarity that remedies this weakness, being contextually dependent. It relies on recent advances in deep learning architectures that employ self-supervised learning in order to leverage the power of enormous amounts of unlabelled data to generate contextual embeddings, which are vector representations for words. These ideas have been applied to protein sequences, producing embedding vectors for protein residues. We propose the E-score between two residues as the cosine similarity between their embedding vector representations. Thorough testing on a wide variety of reference multiple sequence alignments indicate that the alignments produced using the new $E$-score method, especially ProtT5-score, are significantly better than those obtained using BLOSUM matrices. The new method proposes to change the way alignments are computed, with far-reaching implications in all areas of textual data that use sequence similarity. The program to compute alignments based on various $E$-scores is available as a web server at e-score.csd.uwo.ca. The source code is freely available for download from github.com/lucian-ilie/E-score.


Assuntos
Algoritmos , Biologia Computacional , Alinhamento de Sequência , Alinhamento de Sequência/métodos , Biologia Computacional/métodos , Software , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Proteínas/química , Proteínas/genética , Aprendizado Profundo , Bases de Dados de Proteínas
11.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38851299

RESUMO

Protein-protein interactions (PPIs) are the basis of many important biological processes, with protein complexes being the key forms implementing these interactions. Understanding protein complexes and their functions is critical for elucidating mechanisms of life processes, disease diagnosis and treatment and drug development. However, experimental methods for identifying protein complexes have many limitations. Therefore, it is necessary to use computational methods to predict protein complexes. Protein sequences can indicate the structure and biological functions of proteins, while also determining their binding abilities with other proteins, influencing the formation of protein complexes. Integrating these characteristics to predict protein complexes is very promising, but currently there is no effective framework that can utilize both protein sequence and PPI network topology for complex prediction. To address this challenge, we have developed HyperGraphComplex, a method based on hypergraph variational autoencoder that can capture expressive features from protein sequences without feature engineering, while also considering topological properties in PPI networks, to predict protein complexes. Experiment results demonstrated that HyperGraphComplex achieves satisfactory predictive performance when compared with state-of-art methods. Further bioinformatics analysis shows that the predicted protein complexes have similar attributes to known ones. Moreover, case studies corroborated the remarkable predictive capability of our model in identifying protein complexes, including 3 that were not only experimentally validated by recent studies but also exhibited high-confidence structural predictions from AlphaFold-Multimer. We believe that the HyperGraphComplex algorithm and our provided proteome-wide high-confidence protein complex prediction dataset will help elucidate how proteins regulate cellular processes in the form of complexes, and facilitate disease diagnosis and treatment and drug development. Source codes are available at https://github.com/LiDlab/HyperGraphComplex.


Assuntos
Biologia Computacional , Mapeamento de Interação de Proteínas , Biologia Computacional/métodos , Mapeamento de Interação de Proteínas/métodos , Proteínas/metabolismo , Proteínas/química , Algoritmos , Mapas de Interação de Proteínas , Bases de Dados de Proteínas , Humanos , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos
12.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38701416

RESUMO

Predicting protein function is crucial for understanding biological life processes, preventing diseases and developing new drug targets. In recent years, methods based on sequence, structure and biological networks for protein function annotation have been extensively researched. Although obtaining a protein in three-dimensional structure through experimental or computational methods enhances the accuracy of function prediction, the sheer volume of proteins sequenced by high-throughput technologies presents a significant challenge. To address this issue, we introduce a deep neural network model DeepSS2GO (Secondary Structure to Gene Ontology). It is a predictor incorporating secondary structure features along with primary sequence and homology information. The algorithm expertly combines the speed of sequence-based information with the accuracy of structure-based features while streamlining the redundant data in primary sequences and bypassing the time-consuming challenges of tertiary structure analysis. The results show that the prediction performance surpasses state-of-the-art algorithms. It has the ability to predict key functions by effectively utilizing secondary structure information, rather than broadly predicting general Gene Ontology terms. Additionally, DeepSS2GO predicts five times faster than advanced algorithms, making it highly applicable to massive sequencing data. The source code and trained models are available at https://github.com/orca233/DeepSS2GO.


Assuntos
Algoritmos , Biologia Computacional , Redes Neurais de Computação , Estrutura Secundária de Proteína , Proteínas , Proteínas/química , Proteínas/metabolismo , Proteínas/genética , Biologia Computacional/métodos , Bases de Dados de Proteínas , Ontologia Genética , Análise de Sequência de Proteína/métodos , Software
13.
Nature ; 577(7790): 399-404, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31915375

RESUMO

Alzheimer's disease is an incurable neurodegenerative disorder in which neuroinflammation has a critical function1. However, little is known about the contribution of the adaptive immune response in Alzheimer's disease2. Here, using integrated analyses of multiple cohorts, we identify peripheral and central adaptive immune changes in Alzheimer's disease. First, we performed mass cytometry of peripheral blood mononuclear cells and discovered an immune signature of Alzheimer's disease that consists of increased numbers of CD8+ T effector memory CD45RA+ (TEMRA) cells. In a second cohort, we found that CD8+ TEMRA cells were negatively associated with cognition. Furthermore, single-cell RNA sequencing revealed that T cell receptor (TCR) signalling was enhanced in these cells. Notably, by using several strategies of single-cell TCR sequencing in a third cohort, we discovered clonally expanded CD8+ TEMRA cells in the cerebrospinal fluid of patients with Alzheimer's disease. Finally, we used machine learning, cloning and peptide screens to demonstrate the specificity of clonally expanded TCRs in the cerebrospinal fluid of patients with Alzheimer's disease to two separate Epstein-Barr virus antigens. These results reveal an adaptive immune response in the blood and cerebrospinal fluid in Alzheimer's disease and provide evidence of clonal, antigen-experienced T cells patrolling the intrathecal space of brains affected by age-related neurodegeneration.


Assuntos
Doença de Alzheimer/imunologia , Linfócitos T CD8-Positivos/imunologia , Líquido Cefalorraquidiano/imunologia , Idoso , Sequência de Aminoácidos , Estudos de Coortes , Humanos , Memória Imunológica , Pessoa de Meia-Idade , Receptores de Antígenos de Linfócitos T/química , Receptores de Antígenos de Linfócitos T/imunologia , Análise de Sequência de Proteína
14.
Nucleic Acids Res ; 52(W1): W248-W255, 2024 Jul 05.
Artigo em Inglês | MEDLINE | ID: mdl-38738636

RESUMO

Knowledge of protein function is essential for elucidating disease mechanisms and discovering new drug targets. However, there is a widening gap between the exponential growth of protein sequences and their limited function annotations. In our prior studies, we have developed a series of methods including GraphPPIS, GraphSite, LMetalSite and SPROF-GO for protein function annotations at residue or protein level. To further enhance their applicability and performance, we now present GPSFun, a versatile web server for Geometry-aware Protein Sequence Function annotations, which equips our previous tools with language models and geometric deep learning. Specifically, GPSFun employs large language models to efficiently predict 3D conformations of the input protein sequences and extract informative sequence embeddings. Subsequently, geometric graph neural networks are utilized to capture the sequence and structure patterns in the protein graphs, facilitating various downstream predictions including protein-ligand binding sites, gene ontologies, subcellular locations and protein solubility. Notably, GPSFun achieves superior performance to state-of-the-art methods across diverse tasks without requiring multiple sequence alignments or experimental protein structures. GPSFun is freely available to all users at https://bio-web1.nscc-gz.cn/app/GPSFun with user-friendly interfaces and rich visualizations.


Assuntos
Proteínas , Software , Proteínas/química , Proteínas/metabolismo , Conformação Proteica , Análise de Sequência de Proteína , Aprendizado Profundo , Sítios de Ligação , Anotação de Sequência Molecular , Redes Neurais de Computação , Sequência de Aminoácidos , Humanos , Internet
15.
Nucleic Acids Res ; 52(W1): W215-W220, 2024 Jul 05.
Artigo em Inglês | MEDLINE | ID: mdl-38587188

RESUMO

DeepLoc 2.0 is a popular web server for the prediction of protein subcellular localization and sorting signals. Here, we introduce DeepLoc 2.1, which additionally classifies the input proteins into the membrane protein types Transmembrane, Peripheral, Lipid-anchored and Soluble. Leveraging pre-trained transformer-based protein language models, the server utilizes a three-stage architecture for sequence-based, multi-label predictions. Comparative evaluations with other established tools on a test set of 4933 eukaryotic protein sequences, constructed following stringent homology partitioning, demonstrate state-of-the-art performance. Notably, DeepLoc 2.1 outperforms existing models, with the larger ProtT5 model exhibiting a marginal advantage over the ESM-1B model. The web server is available at https://services.healthtech.dtu.dk/services/DeepLoc-2.1.


Assuntos
Proteínas de Membrana , Software , Proteínas de Membrana/química , Proteínas de Membrana/metabolismo , Internet , Sinais Direcionadores de Proteínas , Análise de Sequência de Proteína
16.
Nucleic Acids Res ; 52(W1): W287-W293, 2024 Jul 05.
Artigo em Inglês | MEDLINE | ID: mdl-38747351

RESUMO

The PSIRED Workbench is a long established and popular bioinformatics web service offering a wide range of machine learning based analyses for characterizing protein structure and function. In this paper we provide an update of the recent additions and developments to the webserver, with a focus on new Deep Learning based methods. We briefly discuss some trends in server usage since the publication of AlphaFold2 and we give an overview of some upcoming developments for the service. The PSIPRED Workbench is available at http://bioinf.cs.ucl.ac.uk/psipred.


Assuntos
Aprendizado Profundo , Proteínas , Software , Proteínas/química , Proteínas/genética , Internet , Conformação Proteica , Biologia Computacional/métodos , Análise de Sequência de Proteína/métodos
17.
Nucleic Acids Res ; 52(10): 5624-5642, 2024 Jun 10.
Artigo em Inglês | MEDLINE | ID: mdl-38554111

RESUMO

Gametocyte development of the Plasmodium parasite is a key step for transmission of the parasite. Male and female gametocytes are produced from a subpopulation of asexual blood-stage parasites, but the mechanisms that regulate the differentiation of sexual stages are still under investigation. In this study, we investigated the role of PbARID, a putative subunit of a SWI/SNF chromatin remodeling complex, in transcriptional regulation during the gametocyte development of P. berghei. PbARID expression starts in early gametocytes before the manifestation of male and female-specific features, and disruption of its gene results in the complete loss of gametocytes with detectable male features and the production of abnormal female gametocytes. ChIP-seq analysis of PbARID showed that it forms a complex with gSNF2, an ATPase subunit of the SWI/SNF chromatin remodeling complex, associating with the male cis-regulatory element, TGTCT. Further ChIP-seq of PbARID in gsnf2-knockout parasites revealed an association of PbARID with another cis-regulatory element, TGCACA. RIME and DNA-binding assays suggested that HDP1 is the transcription factor that recruits PbARID to the TGCACA motif. Our results indicated that PbARID could function in two chromatin remodeling events and paly essential roles in both male and female gametocyte development.


Assuntos
Montagem e Desmontagem da Cromatina , Plasmodium berghei , Proteínas de Protozoários , Fatores de Transcrição , Animais , Feminino , Masculino , Camundongos , Montagem e Desmontagem da Cromatina/genética , Plasmodium berghei/genética , Plasmodium berghei/crescimento & desenvolvimento , Proteínas de Protozoários/genética , Proteínas de Protozoários/metabolismo , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Genótipo , Análise de Sequência de RNA , Cromatina/genética , Cromatina/metabolismo , Sequência de Aminoácidos , Análise de Sequência de Proteína , Filogenia , Transcriptoma , Genoma de Protozoário
18.
Brief Bioinform ; 24(4)2023 07 20.
Artigo em Inglês | MEDLINE | ID: mdl-37258453

RESUMO

Protein is the most important component in organisms and plays an indispensable role in life activities. In recent years, a large number of intelligent methods have been proposed to predict protein function. These methods obtain different types of protein information, including sequence, structure and interaction network. Among them, protein sequences have gained significant attention where methods are investigated to extract the information from different views of features. However, how to fully exploit the views for effective protein sequence analysis remains a challenge. In this regard, we propose a multi-view, multi-scale and multi-attention deep neural model (MMSMA) for protein function prediction. First, MMSMA extracts multi-view features from protein sequences, including one-hot encoding features, evolutionary information features, deep semantic features and overlapping property features based on physiochemistry. Second, a specific multi-scale multi-attention deep network model (MSMA) is built for each view to realize the deep feature learning and preliminary classification. In MSMA, both multi-scale local patterns and long-range dependence from protein sequences can be captured. Third, a multi-view adaptive decision mechanism is developed to make a comprehensive decision based on the classification results of all the views. To further improve the prediction performance, an extended version of MMSMA, MMSMAPlus, is proposed to integrate homology-based protein prediction under the framework of multi-view deep neural model. Experimental results show that the MMSMAPlus has promising performance and is significantly superior to the state-of-the-art methods. The source code can be found at https://github.com/wzy-2020/MMSMAPlus.


Assuntos
Redes Neurais de Computação , Proteínas , Sequência de Aminoácidos , Software , Análise de Sequência de Proteína
19.
Brief Bioinform ; 24(6)2023 09 22.
Artigo em Inglês | MEDLINE | ID: mdl-37833837

RESUMO

Protein remote homology detection is essential for structure prediction, function prediction, disease mechanism understanding, etc. The remote homology relationship depends on multiple protein properties, such as structural information and local sequence patterns. Previous studies have shown the challenges for predicting remote homology relationship by protein features at sequence level (e.g. position-specific score matrix). Protein motifs have been used in structure and function analysis due to their unique sequence patterns and implied structural information. Therefore, designing a usable architecture to fuse multiple protein properties based on motifs is urgently needed to improve protein remote homology detection performance. To make full use of the characteristics of motifs, we employed the language model called the protein cubic language model (PCLM). It combines multiple properties by constructing a motif-based neural network. Based on the PCLM, we proposed a predictor called PreHom-PCLM by extracting and fusing multiple motif features for protein remote homology detection. PreHom-PCLM outperforms the other state-of-the-art methods on the test set and independent test set. Experimental results further prove the effectiveness of multiple features fused by PreHom-PCLM for remote homology detection. Furthermore, the protein features derived from the PreHom-PCLM show strong discriminative power for proteins from different structural classes in the high-dimensional space. Availability and Implementation: http://bliulab.net/PreHom-PCLM.


Assuntos
Algoritmos , Proteínas , Proteínas/química , Redes Neurais de Computação , Motivos de Aminoácidos , Idioma , Análise de Sequência de Proteína/métodos
20.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36545804

RESUMO

Monoclonal antibodies are biotechnologically produced proteins with various applications in research, therapeutics and diagnostics. Their ability to recognize and bind to specific molecule structures makes them essential research tools and therapeutic agents. Sequence information of antibodies is helpful for understanding antibody-antigen interactions and ensuring their affinity and specificity. De novo protein sequencing based on mass spectrometry is a valuable method to obtain the amino acid sequence of peptides and proteins without a priori knowledge. In this study, we evaluated six recently developed de novo peptide sequencing algorithms (Novor, pNovo 3, DeepNovo, SMSNet, PointNovo and Casanovo), which were not specifically designed for antibody data. We validated their ability to identify and assemble antibody sequences on three multi-enzymatic data sets. The deep learning-based tools Casanovo and PointNovo showed an increased peptide recall across different enzymes and data sets compared with spectrum-graph-based approaches. We evaluated different error types of de novo peptide sequencing tools and their performance for different numbers of missing cleavage sites, noisy spectra and peptides of various lengths. We achieved a sequence coverage of 97.69-99.53% on the light chains of three different antibody data sets using the de Bruijn assembler ALPS and the predictions from Casanovo. However, low sequence coverage and accuracy on the heavy chains demonstrate that complete de novo protein sequencing remains a challenging issue in proteomics that requires improved de novo error correction, alternative digestion strategies and hybrid approaches such as homology search to achieve high accuracy on long protein sequences.


Assuntos
Anticorpos Monoclonais , Peptídeos , Sequência de Aminoácidos , Anticorpos Monoclonais/genética , Peptídeos/genética , Peptídeos/química , Algoritmos , Análise de Sequência de Proteína/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA