Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 520
Filtrar
1.
J Biol Chem ; : 107850, 2024 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-39362471

RESUMO

Numerous small proteins have been discovered across all domains of life, among which many are hydrophobic and predicted to localize to the cell membrane. Based on a few that are well-studied, small membrane proteins are regulators involved in various biological processes, such as cell signaling, nutrient transport, drug resistance, and stress response. However, the function of most identified small membrane proteins remains elusive. Their small size and hydrophobicity make protein production challenging, hindering function discovery. Here, we combined a cell-free system with lipid sponge droplets and synthesized small membrane proteins in vitro. Lipid sponge droplets contain a dense network of lipid bilayers, which accommodates and extracts newly synthesized small membrane proteins from the aqueous surroundings. Using small bacterial membrane proteins MgrB, SafA, and AcrZ as proof of principle, we showed that the in vitro produced membrane proteins were functionally active, for example, modulating the activity of their target kinase as expected. The cell-free system produced small membrane proteins, including one from human, up to micromolar concentrations, indicating its high level of versatility and productivity. Furthermore, AcrZ produced in this system was used successfully for in vitro co-immunoprecipitations to identify interaction partners. This work presents a robust alternative approach for producing small membrane proteins, which opens a door to their function discovery in different domains of life.

3.
ACS Synth Biol ; 2024 Sep 23.
Artigo em Inglês | MEDLINE | ID: mdl-39313930

RESUMO

Regulation of gene expression is essential for all life. Tools to manipulate the gene expression level have therefore proven to be very valuable in efforts to engineer biological systems. However, there are few well-characterized genetic parts that reduce gene expression in plants, commonly known as transcriptional repressors. We characterized the repression activity of a library consisting of repression motifs from approximately 25% of the members of the largest known family of repressors. Combining sequence information with our trans-regulatory function data, we next generated a library of synthetic transcriptional repression motifs with function predicted in advance. After characterizing our synthetic library, we demonstrated not only that many of our synthetic constructs were functional as repressors but also that our advance predictions of repression strength were better than random guesses. Finally, we assessed the functionality of known transcriptional repression motifs from a wide range of eukaryotes. Our study represents the largest plant repressor motif library experimentally characterized to date, providing unique opportunities for tuning transcription in plants.

4.
J Comput Biol ; 2024 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-39246251

RESUMO

The identification of intrinsically disordered proteins and their functional roles is largely dependent on the performance of computational predictors, necessitating a high standard of accuracy in these tools. In this context, we introduce a novel series of computational predictors, termed PDFll (Predictors of Disorder and Function of proteins from the Language of Life), which are designed to offer precise predictions of protein disorder and associated functional roles based on protein sequences. PDFll is developed through a two-step process. Initially, it leverages large-scale protein language models (pLMs), trained on an extensive dataset comprising billions of protein sequences. Subsequently, the embeddings derived from pLMs are integrated into streamlined, yet sophisticated, deep-learning models to generate predictions. These predictions notably surpass the performance of existing state-of-the-art predictors, particularly those that forecast disorder and function without utilizing evolutionary information.

5.
Genome Biol Evol ; 16(8)2024 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-39212966

RESUMO

During de novo emergence, new protein coding genes emerge from previously nongenic sequences. The de novo proteins they encode are dissimilar in composition and predicted biochemical properties to conserved proteins. However, functional de novo proteins indeed exist. Both identification of functional de novo proteins and their structural characterization are experimentally laborious. To identify functional and structured de novo proteins in silico, we applied recently developed machine learning based tools and found that most de novo proteins are indeed different from conserved proteins both in their structure and sequence. However, some de novo proteins are predicted to adopt known protein folds, participate in cellular reactions, and to form biomolecular condensates. Apart from broadening our understanding of de novo protein evolution, our study also provides a large set of testable hypotheses for focused experimental studies on structure and function of de novo proteins in Drosophila.


Assuntos
Proteínas de Drosophila , Animais , Proteínas de Drosophila/genética , Proteínas de Drosophila/química , Proteínas de Drosophila/metabolismo , Evolução Molecular , Aprendizado de Máquina , Drosophila/genética , Drosophila melanogaster/genética , Dobramento de Proteína , Condensados Biomoleculares/metabolismo , Condensados Biomoleculares/química
6.
Front Immunol ; 15: 1452609, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-39091499

RESUMO

Galectins (Gals) are a type of S-type lectin that are widespread and evolutionarily conserved among metazoans, and can act as pattern recognition receptors (PRRs) to recognize pathogen-associated molecular patterns (PAMPs). In this study, 10 Gals (ToGals) were identified in the Golden pompano (Trachinotus ovatus), and their conserved domains, motifs, and collinearity relationships were analyzed. The expression of ToGals was regulated following infection to Cryptocaryon irritans and Streptococcus agalactiae, indicating that ToGals participate in immune responses against microbial pathogens. Further analysis was conducted on one important member, Galectin-3, subcellular localization showing that ToGal-3like protein is expressed both in the nucleus and cytoplasm. Recombinant protein obtained through prokaryotic expression showed that rToGal-3like can agglutinate red blood cells of rabbit, carp and golden pompano and also agglutinate and kill Staphylococcus aureus, Bacillus subtilis, Vibrio vulnificus, S. agalactiae, Pseudomonas aeruginosa, and Aeromonas hydrophila. This study lays the foundation for further research on the immune roles of Gals in teleosts.


Assuntos
Galectinas , Filogenia , Animais , Galectinas/genética , Galectinas/imunologia , Galectinas/metabolismo , Proteínas de Peixes/genética , Proteínas de Peixes/imunologia , Proteínas de Peixes/metabolismo , Família Multigênica , Streptococcus agalactiae/imunologia , Doenças dos Peixes/imunologia , Doenças dos Peixes/microbiologia , Peixes/imunologia , Peixes/genética , Perciformes/imunologia , Perciformes/genética , Perfilação da Expressão Gênica
7.
Proc Natl Acad Sci U S A ; 121(34): e2314999121, 2024 Aug 20.
Artigo em Inglês | MEDLINE | ID: mdl-39133844

RESUMO

Mutations in protein active sites can dramatically improve function. The active site, however, is densely packed and extremely sensitive to mutations. Therefore, some mutations may only be tolerated in combination with others in a phenomenon known as epistasis. Epistasis reduces the likelihood of obtaining improved functional variants and dramatically slows natural and lab evolutionary processes. Research has shed light on the molecular origins of epistasis and its role in shaping evolutionary trajectories and outcomes. In addition, sequence- and AI-based strategies that infer epistatic relationships from mutational patterns in natural or experimental evolution data have been used to design functional protein variants. In recent years, combinations of such approaches and atomistic design calculations have successfully predicted highly functional combinatorial mutations in active sites. These were used to design thousands of functional active-site variants, demonstrating that, while our understanding of epistasis remains incomplete, some of the determinants that are critical for accurate design are now sufficiently understood. We conclude that the space of active-site variants that has been explored by evolution may be expanded dramatically to enhance natural activities or discover new ones. Furthermore, design opens the way to systematically exploring sequence and structure space and mutational impacts on function, deepening our understanding and control over protein activity.


Assuntos
Epistasia Genética , Mutação , Evolução Molecular , Proteínas/genética , Proteínas/química , Proteínas/metabolismo , Domínio Catalítico , Engenharia de Proteínas/métodos
8.
Sheng Wu Gong Cheng Xue Bao ; 40(7): 2087-2099, 2024 Jul 25.
Artigo em Chinês | MEDLINE | ID: mdl-39044577

RESUMO

With the increasing of computer power and rapid expansion of biological data, the application of bioinformatics tools has become the mainstream approach to address biological problems. The accurate identification of protein function by bioinformatics tools is crucial for both biomedical research and drug discovery, making it a hot topic of research. In this paper, we categorize bioinformatics-based protein function prediction methods into three categories: protein sequence-based methods, protein structure-based methods, and protein interaction networks-based methods. We further analyze these specific algorithms, highlighting the latest research advancements and providing valuable references for the application of bioinformatics-based protein function prediction in biomedical research and drug discovery.


Assuntos
Algoritmos , Biologia Computacional , Proteínas , Biologia Computacional/métodos , Proteínas/genética , Proteínas/metabolismo , Proteínas/química , Conformação Proteica , Mapas de Interação de Proteínas , Análise de Sequência de Proteína , Sequência de Aminoácidos , Descoberta de Drogas
9.
Heliyon ; 10(12): e32951, 2024 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-38988537

RESUMO

The use of anti-inflammatory peptides (AIPs) as an alternative therapeutic approach for inflammatory diseases holds great research significance. Due to the high cost and difficulty in identifying AIPs with experimental methods, the discovery and design of peptides by computational methods before the experimental stage have become promising technology. In this study, we present BertAIP, a bidirectional encoder representation from transformers (BERT)-based method for predicting AIPs directly from their amino acid sequence without using any other information. BertAIP implements a BERT model to extract features of a protein, and uses a fully connected feed-forward network for AIP classification. It was constructed and evaluated using the AIP datasets that were reconstructed from the latest Immune Epitope Database. The experimental results showed that BertAIP achieved an accuracy of 0.751 and a Matthews correlation coefficient of 0.451, which were higher than other commonly used methods. The results of the independent test suggested that BertAIP outperformed the existing AIP predictors. In addition, to enhance the interpretability of BertAIP, we explored and visualized the amino acids that the model considered important for AIP prediction. We believe that the BertAIP proposed herein will be a useful tool for large-scale screening and identifying novel AIPs for drug development and therapeutic research related to inflammatory diseases.

10.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-39038936

RESUMO

Sequence database searches followed by homology-based function transfer form one of the oldest and most popular approaches for predicting protein functions, such as Gene Ontology (GO) terms. These searches are also a critical component in most state-of-the-art machine learning and deep learning-based protein function predictors. Although sequence search tools are the basis of homology-based protein function prediction, previous studies have scarcely explored how to select the optimal sequence search tools and configure their parameters to achieve the best function prediction. In this paper, we evaluate the effect of using different options from among popular search tools, as well as the impacts of search parameters, on protein function prediction. When predicting GO terms on a large benchmark dataset, we found that BLASTp and MMseqs2 consistently exceed the performance of other tools, including DIAMOND-one of the most popular tools for function prediction-under default search parameters. However, with the correct parameter settings, DIAMOND can perform comparably to BLASTp and MMseqs2 in function prediction. Additionally, we developed a new scoring function to derive GO prediction from homologous hits that consistently outperform previously proposed scoring functions. These findings enable the improvement of almost all protein function prediction algorithms with a few easily implementable changes in their sequence homolog-based component. This study emphasizes the critical role of search parameter settings in homology-based function transfer and should have an important contribution to the development of future protein function prediction algorithms.


Assuntos
Bases de Dados de Proteínas , Proteínas , Proteínas/química , Proteínas/metabolismo , Proteínas/genética , Biologia Computacional/métodos , Ontologia Genética , Algoritmos , Análise de Sequência de Proteína/métodos , Software , Aprendizado de Máquina
11.
Proteomics ; : e2300471, 2024 Jul 12.
Artigo em Inglês | MEDLINE | ID: mdl-38996351

RESUMO

Predicting protein function from protein sequence, structure, interaction, and other relevant information is important for generating hypotheses for biological experiments and studying biological systems, and therefore has been a major challenge in protein bioinformatics. Numerous computational methods had been developed to advance protein function prediction gradually in the last two decades. Particularly, in the recent years, leveraging the revolutionary advances in artificial intelligence (AI), more and more deep learning methods have been developed to improve protein function prediction at a faster pace. Here, we provide an in-depth review of the recent developments of deep learning methods for protein function prediction. We summarize the significant advances in the field, identify several remaining major challenges to be tackled, and suggest some potential directions to explore. The data sources and evaluation metrics widely used in protein function prediction are also discussed to assist the machine learning, AI, and bioinformatics communities to develop more cutting-edge methods to advance protein function prediction.

12.
Brief Bioinform ; 25(4)2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-39003530

RESUMO

Protein function prediction is critical for understanding the cellular physiological and biochemical processes, and it opens up new possibilities for advancements in fields such as disease research and drug discovery. During the past decades, with the exponential growth of protein sequence data, many computational methods for predicting protein function have been proposed. Therefore, a systematic review and comparison of these methods are necessary. In this study, we divide these methods into four different categories, including sequence-based methods, 3D structure-based methods, PPI network-based methods and hybrid information-based methods. Furthermore, their advantages and disadvantages are discussed, and then their performance is comprehensively evaluated and compared. Finally, we discuss the challenges and opportunities present in this field.


Assuntos
Biologia Computacional , Proteínas , Proteínas/química , Proteínas/metabolismo , Biologia Computacional/métodos , Humanos , Análise de Sequência de Proteína/métodos , Algoritmos
13.
Biochem Soc Trans ; 52(3): 1539-1548, 2024 06 26.
Artigo em Inglês | MEDLINE | ID: mdl-38864432

RESUMO

Mitochondria are essential organelles of eukaryotic cells and thus mitochondrial proteome is under constant quality control and remodelling. Yme1 is a multi-functional protein and subunit of the homo-hexametric complex i-AAA proteinase. Yme1 plays vital roles in the regulation of mitochondrial protein homeostasis and mitochondrial plasticity, ranging from substrate degradation to the regulation of protein functions involved in mitochondrial protein biosynthesis, energy production, mitochondrial dynamics, and lipid biosynthesis and signalling. In this mini review, we focus on discussing the current understanding of the roles of Yme1 in mitochondrial protein import via TIM22 and TIM23 pathways, oxidative phosphorylation complex function, as well as mitochondrial lipid biosynthesis and signalling, as well as a brief discussion of the role of Yme1 in modulating mitochondrial dynamics.


Assuntos
Mitocôndrias , Dinâmica Mitocondrial , Proteínas Mitocondriais , Fosforilação Oxidativa , Transporte Proteico , Proteostase , Humanos , Proteínas Mitocondriais/metabolismo , Mitocôndrias/metabolismo , Animais , ATPases Associadas a Diversas Atividades Celulares/metabolismo , Lipídeos/biossíntese , Lipídeos/química , Metabolismo dos Lipídeos , Homeostase , Transdução de Sinais , Proteases Dependentes de ATP/metabolismo
14.
Int J Mol Sci ; 25(12)2024 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-38928495

RESUMO

Polyglutamine (polyQ) disorders are a group of neurodegenerative diseases characterized by the excessive expansion of CAG (cytosine, adenine, guanine) repeats within host proteins. The quest to unravel the complex diseases mechanism has led researchers to adopt both theoretical and experimental methods, each offering unique insights into the underlying pathogenesis. This review emphasizes the significance of combining multiple approaches in the study of polyQ disorders, focusing on the structure-function correlations and the relevance of polyQ-related protein dynamics in neurodegeneration. By integrating computational/theoretical predictions with experimental observations, one can establish robust structure-function correlations, aiding in the identification of key molecular targets for therapeutic interventions. PolyQ proteins' dynamics, influenced by their length and interactions with other molecular partners, play a pivotal role in the polyQ-related pathogenic cascade. Moreover, conformational dynamics of polyQ proteins can trigger aggregation, leading to toxic assembles that hinder proper cellular homeostasis. Understanding these intricacies offers new avenues for therapeutic strategies by fine-tuning polyQ kinetics, in order to prevent and control disease progression. Last but not least, this review highlights the importance of integrating multidisciplinary efforts to advancing research in this field, bringing us closer to the ultimate goal of finding effective treatments against polyQ disorders.


Assuntos
Doenças Neurodegenerativas , Peptídeos , Humanos , Peptídeos/química , Peptídeos/metabolismo , Doenças Neurodegenerativas/metabolismo , Doenças Neurodegenerativas/genética , Relação Estrutura-Atividade , Animais
15.
Sci Rep ; 14(1): 13566, 2024 06 12.
Artigo em Inglês | MEDLINE | ID: mdl-38866950

RESUMO

The identification of protein binding residues helps to understand their biological processes as protein function is often defined through ligand binding, such as to other proteins, small molecules, ions, or nucleotides. Methods predicting binding residues often err for intrinsically disordered proteins or regions (IDPs/IDPRs), often also referred to as molecular recognition features (MoRFs). Here, we presented a novel machine learning (ML) model trained to specifically predict binding regions in IDPRs. The proposed model, IDBindT5, leveraged embeddings from the protein language model (pLM) ProtT5 to reach a balanced accuracy of 57.2 ± 3.6% (95% confidence interval). Assessed on the same data set, this did not differ at the 95% CI from the state-of-the-art (SOTA) methods ANCHOR2 and DeepDISOBind that rely on expert-crafted features and evolutionary information from multiple sequence alignments (MSAs). Assessed on other data, methods such as SPOT-MoRF reached higher MCCs. IDBindT5's SOTA predictions are much faster than other methods, easily enabling full-proteome analyses. Our findings emphasize the potential of pLMs as a promising approach for exploring and predicting features of disordered proteins. The model and a comprehensive manual are publicly available at https://github.com/jahnl/binding_in_disorder .


Assuntos
Proteínas Intrinsicamente Desordenadas , Aprendizado de Máquina , Ligação Proteica , Proteínas Intrinsicamente Desordenadas/química , Proteínas Intrinsicamente Desordenadas/metabolismo , Sítios de Ligação , Biologia Computacional/métodos , Bases de Dados de Proteínas , Humanos
16.
Biophys Rev ; 16(2): 189-218, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38737201

RESUMO

The formation of a heterogeneous set of advanced glycation end products (AGEs) is the final outcome of a non-enzymatic process that occurs in vivo on long-life biomolecules. This process, known as glycation, starts with the reaction between reducing sugars, or their autoxidation products, with the amino groups of proteins, DNA, or lipids, thus gaining relevance under hyperglycemic conditions. Once AGEs are formed, they might affect the biological function of the biomacromolecule and, therefore, induce the development of pathophysiological events. In fact, the accumulation of AGEs has been pointed as a triggering factor of obesity, diabetes-related diseases, coronary artery disease, neurological disorders, or chronic renal failure, among others. Given the deleterious consequences of glycation, evolution has designed endogenous mechanisms to undo glycation or to prevent it. In addition, many exogenous molecules have also emerged as powerful glycation inhibitors. This review aims to provide an overview on what glycation is. It starts by explaining the similarities and differences between glycation and glycosylation. Then, it describes in detail the molecular mechanism underlying glycation reactions, and the bio-molecular targets with higher propensity to be glycated. Next, it discusses the precise effects of glycation on protein structure, function, and aggregation, and how computational chemistry has provided insights on these aspects. Finally, it reports the most prevalent diseases induced by glycation, and the endogenous mechanisms and the current therapeutic interventions against it.

17.
BMC Bioinformatics ; 25(1): 174, 2024 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-38698340

RESUMO

BACKGROUND: In last two decades, the use of high-throughput sequencing technologies has accelerated the pace of discovery of proteins. However, due to the time and resource limitations of rigorous experimental functional characterization, the functions of a vast majority of them remain unknown. As a result, computational methods offering accurate, fast and large-scale assignment of functions to new and previously unannotated proteins are sought after. Leveraging the underlying associations between the multiplicity of features that describe proteins could reveal functional insights into the diverse roles of proteins and improve performance on the automatic function prediction task. RESULTS: We present GO-LTR, a multi-view multi-label prediction model that relies on a high-order tensor approximation of model weights combined with non-linear activation functions. The model is capable of learning high-order relationships between multiple input views representing the proteins and predicting high-dimensional multi-label output consisting of protein functional categories. We demonstrate the competitiveness of our method on various performance measures. Experiments show that GO-LTR learns polynomial combinations between different protein features, resulting in improved performance. Additional investigations establish GO-LTR's practical potential in assigning functions to proteins under diverse challenging scenarios: very low sequence similarity to previously observed sequences, rarely observed and highly specific terms in the gene ontology. IMPLEMENTATION: The code and data used for training GO-LTR is available at https://github.com/aalto-ics-kepaco/GO-LTR-prediction .


Assuntos
Biologia Computacional , Proteínas , Proteínas/química , Proteínas/metabolismo , Biologia Computacional/métodos , Bases de Dados de Proteínas , Algoritmos
18.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38701416

RESUMO

Predicting protein function is crucial for understanding biological life processes, preventing diseases and developing new drug targets. In recent years, methods based on sequence, structure and biological networks for protein function annotation have been extensively researched. Although obtaining a protein in three-dimensional structure through experimental or computational methods enhances the accuracy of function prediction, the sheer volume of proteins sequenced by high-throughput technologies presents a significant challenge. To address this issue, we introduce a deep neural network model DeepSS2GO (Secondary Structure to Gene Ontology). It is a predictor incorporating secondary structure features along with primary sequence and homology information. The algorithm expertly combines the speed of sequence-based information with the accuracy of structure-based features while streamlining the redundant data in primary sequences and bypassing the time-consuming challenges of tertiary structure analysis. The results show that the prediction performance surpasses state-of-the-art algorithms. It has the ability to predict key functions by effectively utilizing secondary structure information, rather than broadly predicting general Gene Ontology terms. Additionally, DeepSS2GO predicts five times faster than advanced algorithms, making it highly applicable to massive sequencing data. The source code and trained models are available at https://github.com/orca233/DeepSS2GO.


Assuntos
Algoritmos , Biologia Computacional , Redes Neurais de Computação , Estrutura Secundária de Proteína , Proteínas , Proteínas/química , Proteínas/metabolismo , Proteínas/genética , Biologia Computacional/métodos , Bases de Dados de Proteínas , Ontologia Genética , Análise de Sequência de Proteína/métodos , Software
19.
Int J Mol Sci ; 25(10)2024 May 18.
Artigo em Inglês | MEDLINE | ID: mdl-38791544

RESUMO

Antimicrobial peptides (AMPs) are promising candidates for new antibiotics due to their broad-spectrum activity against pathogens and reduced susceptibility to resistance development. Deep-learning techniques, such as deep generative models, offer a promising avenue to expedite the discovery and optimization of AMPs. A remarkable example is the Feedback Generative Adversarial Network (FBGAN), a deep generative model that incorporates a classifier during its training phase. Our study aims to explore the impact of enhanced classifiers on the generative capabilities of FBGAN. To this end, we introduce two alternative classifiers for the FBGAN framework, both surpassing the accuracy of the original classifier. The first classifier utilizes the k-mers technique, while the second applies transfer learning from the large protein language model Evolutionary Scale Modeling 2 (ESM2). Integrating these classifiers into FBGAN not only yields notable performance enhancements compared to the original FBGAN but also enables the proposed generative models to achieve comparable or even superior performance to established methods such as AMPGAN and HydrAMP. This achievement underscores the effectiveness of leveraging advanced classifiers within the FBGAN framework, enhancing its computational robustness for AMP de novo design and making it comparable to existing literature.


Assuntos
Peptídeos Antimicrobianos , Peptídeos Antimicrobianos/química , Peptídeos Antimicrobianos/farmacologia , Desenho de Fármacos/métodos , Redes Neurais de Computação , Aprendizado Profundo , Algoritmos
20.
Interdiscip Sci ; 16(3): 1-12, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-38568406

RESUMO

With the rapid development of NGS technology, the number of protein sequences has increased exponentially. Computational methods have been introduced in protein functional studies because the analysis of large numbers of proteins through biological experiments is costly and time-consuming. In recent years, new approaches based on deep learning have been proposed to overcome the limitations of conventional methods. Although deep learning-based methods effectively utilize features of protein function, they are limited to sequences of fixed-length and consider information from adjacent amino acids. Therefore, new protein analysis tools that extract functional features from proteins of flexible length and train models are required. We introduce DeepPI, a deep learning-based tool for analyzing proteins in large-scale database. The proposed model that utilizes Global Average Pooling is applied to proteins of flexible length and leads to reduced information loss compared to existing algorithms that use fixed sizes. The image generator converts a one-dimensional sequence into a distinct two-dimensional structure, which can extract common parts of various shapes. Finally, filtering techniques automatically detect representative data from the entire database and ensure coverage of large protein databases. We demonstrate that DeepPI has been successfully applied to large databases such as the Pfam-A database. Comparative experiments on four types of image generators illustrated the impact of structure on feature extraction. The filtering performance was verified by varying the parameter values and proved to be applicable to large databases. Compared to existing methods, DeepPI outperforms in family classification accuracy for protein function inference.


Assuntos
Aprendizado Profundo , Proteínas , Proteínas/química , Algoritmos , Bases de Dados de Proteínas , Biologia Computacional/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA