Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 50
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Int J Mol Sci ; 25(10)2024 May 16.
Artigo em Inglês | MEDLINE | ID: mdl-38791479

RESUMO

The subcellular location of a protein provides valuable insights to bioinformaticians in terms of drug designs and discovery, genomics, and various other aspects of medical research. Experimental methods for protein subcellular localization determination are time-consuming and expensive, whereas computational methods, if accurate, would represent a much more efficient alternative. This article introduces an ab initio protein subcellular localization predictor based on an ensemble of Deep N-to-1 Convolutional Neural Networks. Our predictor is trained and tested on strict redundancy-reduced datasets and achieves 63% accuracy for the diverse number of classes. This predictor is a step towards bridging the gap between a protein sequence and the protein's function. It can potentially provide information about protein-protein interaction to facilitate drug design and processes like vaccine production that are essential to disease prevention.


Assuntos
Biologia Computacional , Redes Neurais de Computação , Biologia Computacional/métodos , Proteínas/metabolismo , Proteínas/análise , Software , Bases de Dados de Proteínas , Humanos
2.
Proteins ; 89(10): 1233-1239, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-33983651

RESUMO

The knowledge of the subcellular location of a protein is a valuable source of information in genomics, drug design, and various other theoretical and analytical perspectives of bioinformatics. Due to the expensive and time-consuming nature of experimental methods of protein subcellular location determination, various computational methods have been developed for subcellular localization prediction. We introduce "SCLpred-MEM," an ab initio protein subcellular localization predictor, powered by an ensemble of Deep N-to-1 Convolutional Neural Networks (N1-NN) trained and tested on strict redundancy reduced datasets. SCLpred-MEM is available as a web-server predicting query proteins into two classes, membrane and non-membrane proteins. SCLpred-MEM achieves a Matthews correlation coefficient of 0.52 on a strictly homology-reduced independent test set and 0.62 on a less strict homology reduced independent test set, surpassing or matching other state-of-the-art subcellular localization predictors.


Assuntos
Biologia Computacional/métodos , Proteínas de Membrana , Animais , Bases de Dados de Proteínas , Aprendizado Profundo , Fungos/metabolismo , Humanos , Proteínas de Membrana/química , Proteínas de Membrana/metabolismo , Membranas/metabolismo , Redes Neurais de Computação , Plantas/metabolismo
3.
Bioinformatics ; 36(12): 3897-3898, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32207516

RESUMO

MOTIVATION: Protein structural annotations (PSAs) are essential abstractions to deal with the prediction of protein structures. Many increasingly sophisticated PSAs have been devised in the last few decades. However, the need for annotations that are easy to compute, process and predict has not diminished. This is especially true for protein structures that are hardest to predict, such as novel folds. RESULTS: We propose Brewery, a suite of ab initio predictors of 1D PSAs. Brewery uses multiple sources of evolutionary information to achieve state-of-the-art predictions of secondary structure, structural motifs, relative solvent accessibility and contact density. AVAILABILITY AND IMPLEMENTATION: The web server, standalone program, Docker image and training sets of Brewery are available at http://distilldeep.ucd.ie/brewery/. CONTACT: gianluca.pollastri@ucd.ie.


Assuntos
Aprendizado Profundo , Biologia Computacional , Estrutura Secundária de Proteína , Proteínas , Solventes
4.
Bioinformatics ; 36(11): 3343-3349, 2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32142105

RESUMO

MOTIVATION: The subcellular location of a protein can provide useful information for protein function prediction and drug design. Experimentally determining the subcellular location of a protein is an expensive and time-consuming task. Therefore, various computer-based tools have been developed, mostly using machine learning algorithms, to predict the subcellular location of proteins. RESULTS: Here, we present a neural network-based algorithm for protein subcellular location prediction. We introduce SCLpred-EMS a subcellular localization predictor powered by an ensemble of Deep N-to-1 Convolutional Neural Networks. SCLpred-EMS predicts the subcellular location of a protein into two classes, the endomembrane system and secretory pathway versus all others, with a Matthews correlation coefficient of 0.75-0.86 outperforming the other state-of-the-art web servers we tested. AVAILABILITY AND IMPLEMENTATION: SCLpred-EMS is freely available for academic users at http://distilldeep.ucd.ie/SCLpred2/. CONTACT: catherine.mooney@ucd.ie.


Assuntos
Biologia Computacional , Via Secretória , Algoritmos , Aprendizado de Máquina , Redes Neurais de Computação , Proteínas/metabolismo
7.
Amino Acids ; 51(9): 1289-1296, 2019 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-31388850

RESUMO

Predicting the three-dimensional structure of proteins is a long-standing challenge of computational biology, as the structure (or lack of a rigid structure) is well known to determine a protein's function. Predicting relative solvent accessibility (RSA) of amino acids within a protein is a significant step towards resolving the protein structure prediction challenge especially in cases in which structural information about a protein is not available by homology transfer. Today, arguably the core of the most powerful prediction methods for predicting RSA and other structural features of proteins is some form of deep learning, and all the state-of-the-art protein structure prediction tools rely on some machine learning algorithm. In this article we present a deep neural network architecture composed of stacks of bidirectional recurrent neural networks and convolutional layers which is capable of mining information from long-range interactions within a protein sequence and apply it to the prediction of protein RSA using a novel encoding method that we shall call "clipped". The final system we present, PaleAle 5.0, which is available as a public server, predicts RSA into two, three and four classes at an accuracy exceeding 80% in two classes, surpassing the performances of all the other predictors we have benchmarked.


Assuntos
Aminoácidos/química , Aprendizado Profundo , Proteínas/química , Algoritmos , Biologia Computacional/métodos , Entropia , Evolução Química , Estrutura Secundária de Proteína , Software , Solventes/química
8.
Brief Bioinform ; 17(5): 831-40, 2016 09.
Artigo em Inglês | MEDLINE | ID: mdl-26411473

RESUMO

Machine learning methods are becoming increasingly popular to predict protein features from sequences. Machine learning in bioinformatics can be powerful but carries also the risk of introducing unexpected biases, which may lead to an overestimation of the performance. This article espouses a set of guidelines to allow both peer reviewers and authors to avoid common machine learning pitfalls. Understanding biology is necessary to produce useful data sets, which have to be large and diverse. Separating the training and test process is imperative to avoid over-selling method performance, which is also dependent on several hidden parameters. A novel predictor has always to be compared with several existing methods, including simple baseline strategies. Using the presented guidelines will help nonspecialists to appreciate the critical issues in machine learning.


Assuntos
Aprendizado de Máquina , Algoritmos , Sequência de Aminoácidos , Biologia Computacional , Humanos , Proteínas
9.
Int J Mol Sci ; 16(8): 19868-85, 2015 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-26307973

RESUMO

Intrinsically-disordered regions lack a well-defined 3D structure, but play key roles in determining the function of many proteins. Although predictors of disorder have been shown to achieve relatively high rates of correct classification of these segments, improvements over the the years have been slow, and accurate methods are needed that are capable of accommodating the ever-increasing amount of structurally-determined protein sequences to try to boost predictive performances. In this paper, we propose a predictor for short disordered regions based on bidirectional recurrent neural networks and tested by rigorous five-fold cross-validation on a large, non-redundant dataset collected from MobiDB, a new comprehensive source of protein disorder annotations. The system exploits sequence and structural information in the forms of frequency profiles, predicted secondary structure and solvent accessibility and direct disorder annotations from homologous protein structures (templates) deposited in the Protein Data Bank. The contributions of sequence, structure and homology information result in large improvements in predictive accuracy. Additionally, the large scale of the training set leads to low false positive rates, making our systems a robust and efficient way to address high-throughput disorder prediction.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Algoritmos , Bases de Dados de Proteínas , Redes Neurais de Computação , Conformação Proteica , Curva ROC
10.
BMC Bioinformatics ; 15: 6, 2014 Jan 10.
Artigo em Inglês | MEDLINE | ID: mdl-24410833

RESUMO

BACKGROUND: Protein inter-residue contact maps provide a translation and rotation invariant topological representation of a protein. They can be used as an intermediary step in protein structure predictions. However, the prediction of contact maps represents an unbalanced problem as far fewer examples of contacts than non-contacts exist in a protein structure.In this study we explore the possibility of completely eliminating the unbalanced nature of the contact map prediction problem by predicting real-value distances between residues. Predicting full inter-residue distance maps and applying them in protein structure predictions has been relatively unexplored in the past. RESULTS: We initially demonstrate that the use of native-like distance maps is able to reproduce 3D structures almost identical to the targets, giving an average RMSD of 0.5Å. In addition, the corrupted physical maps with an introduced random error of ±6Å are able to reconstruct the targets within an average RMSD of 2Å.After demonstrating the reconstruction potential of distance maps, we develop two classes of predictors using two-dimensional recursive neural networks: an ab initio predictor that relies only on the protein sequence and evolutionary information, and a template-based predictor in which additional structural homology information is provided. We find that the ab initio predictor is able to reproduce distances with an RMSD of 6Å, regardless of the evolutionary content provided. Furthermore, we show that the template-based predictor exploits both sequence and structure information even in cases of dubious homology and outperforms the best template hit with a clear margin of up to 3.7Å.Lastly, we demonstrate the ability of the two predictors to reconstruct the CASP9 targets shorter than 200 residues producing the results similar to the state of the machine learning art approach implemented in the Distill server. CONCLUSIONS: The methodology presented here, if complemented by more complex reconstruction protocols, can represent a possible path to improve machine learning algorithms for 3D protein structure prediction. Moreover, it can be used as an intermediary step in protein structure predictions either on its own or complemented by NMR restraints.


Assuntos
Biologia Computacional/métodos , Modelos Moleculares , Redes Neurais de Computação , Proteínas/química , Algoritmos , Bases de Dados de Proteínas , Conformação Proteica , Análise de Sequência de Proteína
11.
Bioinformatics ; 29(16): 2056-8, 2013 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-23772049

RESUMO

SUMMARY: Protein secondary structure and solvent accessibility predictions are a fundamental intermediate step towards protein structure and function prediction. We present new systems for the ab initio prediction of protein secondary structure and solvent accessibility, Porter 4.0 and PaleAle 4.0. Porter 4.0 predicts secondary structure correctly for 82.2% of residues. PaleAle 4.0's accuracy is 80.0% for prediction in two classes with a 25% accessibility threshold. We show that the increasing training set sizes that come with the continuing growth of the Protein Data Bank keep yielding prediction quality improvements and examine the impact of protein resolution on prediction performances. AVAILABILITY: Porter 4.0 and PaleAle 4.0 are freely available for academic users at http://distillf.ucd.ie/porterpaleale/. Up to 64 kb of input in FASTA format can be processed in a single submission, with predictions now being returned to the user within a single web page and, optionally, a single email.


Assuntos
Estrutura Secundária de Proteína , Software , Solventes/química , Internet , Proteínas/química
12.
Bioinformatics ; 29(9): 1120-6, 2013 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-23505299

RESUMO

MOTIVATION: Peptides play important roles in signalling, regulation and immunity within an organism. Many have successfully been used as therapeutic products often mimicking naturally occurring peptides. Here we present PeptideLocator for the automated prediction of functional peptides in a protein sequence. RESULTS: We have trained a machine learning algorithm to predict bioactive peptides within protein sequences. PeptideLocator performs well on training data achieving an area under the curve of 0.92 when tested in 5-fold cross-validation on a set of 2202 redundancy reduced peptide containing protein sequences. It has predictive power when applied to antimicrobial peptides, cytokines, growth factors, peptide hormones, toxins, venoms and other peptides. It can be applied to refine the choice of experimental investigations in functional studies of proteins. AVAILABILITY AND IMPLEMENTATION: PeptideLocator is freely available for academic users at http://bioware.ucd.ie/.


Assuntos
Algoritmos , Peptídeos/química , Análise de Sequência de Proteína/métodos , Peptídeos Catiônicos Antimicrobianos/química , Inteligência Artificial , Peptídeos/classificação , Proteínas/química
13.
Bioinformatics ; 29(23): 3094-6, 2013 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-24064418

RESUMO

Cell penetrating peptides (CPPs) are attracting much attention as a means of overcoming the inherently poor cellular uptake of various bioactive molecules. Here, we introduce CPPpred, a web server for the prediction of CPPs using a N-to-1 neural network. The server takes one or more peptide sequences, between 5 and 30 amino acids in length, as input and returns a prediction of how likely each peptide is to be cell penetrating. CPPpred was developed with redundancy reduced training and test sets, offering an advantage over the only other currently available CPP prediction method.


Assuntos
Peptídeos Penetradores de Células/química , Biologia Computacional , Redes Neurais de Computação , Análise de Sequência de Proteína , Software , Peptídeos Penetradores de Células/metabolismo , Bases de Dados de Proteínas , Humanos , Internet
14.
Comput Struct Biotechnol J ; 23: 1796-1807, 2024 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38707539

RESUMO

Protein subcellular localization prediction is of great significance in bioinformatics and biological research. Most of the proteins do not have experimentally determined localization information, computational prediction methods and tools have been acting as an active research area for more than two decades now. Knowledge of the subcellular location of a protein provides valuable information about its functionalities, the functioning of the cell, and other possible interactions with proteins. Fast, reliable, and accurate predictors provides platforms to harness the abundance of sequence data to predict subcellular locations accordingly. During the last decade, there has been a considerable amount of research effort aimed at developing subcellular localization predictors. This paper reviews recent subcellular localization prediction tools in the Eukaryotic, Prokaryotic, and Virus-based categories followed by a detailed analysis. Each predictor is discussed based on its main features, strengths, weaknesses, algorithms used, prediction techniques, and analysis. This review is supported by prediction tools taxonomies that highlight their rele- vant area and examples for uncomplicated categorization and ease of understandability. These taxonomies help users find suitable tools according to their needs. Furthermore, recent research gaps and challenges are discussed to cover areas that need the utmost attention. This survey provides an in-depth analysis of the most recent prediction tools to facilitate readers and can be considered a quick guide for researchers to identify and explore the recent literature advancements.

15.
BMC Bioinformatics ; 14 Suppl 1: S11, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-23368876

RESUMO

We present a novel ab initio predictor of protein enzymatic class. The predictor can classify proteins, solely based on their sequences, into one of six classes extracted from the enzyme commission (EC) classification scheme and is trained on a large, curated database of over 6,000 non-redundant proteins which we have assembled in this work. The predictor is powered by an ensemble of N-to-1 Neural Network, a novel architecture which we have recently developed. N-to-1 Neural Networks operate on the full sequence and not on predefined features. All motifs of a predefined length (31 residues in this work) are considered and are compressed by an N-to-1 Neural Network into a feature vector which is automatically determined during training. We test our predictor in 10-fold cross-validation and obtain state of the art results, with a 96% correct classification and 86% generalized correlation. All six classes are predicted with a specificity of at least 80% and false positive rates never exceeding 7%. We are currently investigating enhanced input encoding schemes which include structural information, and are analyzing trained networks to mine motifs that are most informative for the prediction, hence, likely, functionally relevant.


Assuntos
Enzimas/classificação , Redes Neurais de Computação , Proteínas/classificação , Algoritmos , Motivos de Aminoácidos , Animais , Bases de Dados de Proteínas , Proteínas/química , Alinhamento de Sequência , Análise de Sequência de Proteína
16.
Amino Acids ; 45(2): 291-9, 2013 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-23568340

RESUMO

Knowledge of the subcellular location of a protein provides valuable information about its function, possible interaction with other proteins and drug targetability, among other things. The experimental determination of a protein's location in the cell is expensive, time consuming and open to human error. Fast and accurate predictors of subcellular location have an important role to play if the abundance of sequence data which is now available is to be fully exploited. In the post-genomic era, genomes in many diverse organisms are available. Many of these organisms are important in human and veterinary disease and fall outside of the well-studied plant, animal and fungi groups. We have developed a general eukaryotic subcellular localisation predictor (SCL-Epred) which predicts the location of eukaryotic proteins into three classes which are important, in particular, for determining the drug targetability of a protein-secreted proteins, membrane proteins and proteins that are neither secreted nor membrane. The algorithm powering SCL-Epred is a N-to-1 neural network and is trained on very large non-redundant sets of protein sequences. SCL-Epred performs well on training data achieving a Q of 86 % and a generalised correlation of 0.75 when tested in tenfold cross-validation on a set of 15,202 redundancy reduced protein sequences. The three class accuracy of SCL-Epred and LocTree2, and in particular a consensus predictor comprising both methods, surpasses that of other widely used predictors when benchmarked using a large redundancy reduced independent test set of 562 proteins. SCL-Epred is publicly available at http://distillf.ucd.ie/distill/ .


Assuntos
Biologia Computacional/métodos , Redes Neurais de Computação , Proteínas/metabolismo , Frações Subcelulares/metabolismo , Algoritmos , Sequência de Aminoácidos , Células Eucarióticas/citologia , Células Eucarióticas/metabolismo , Humanos , Proteínas de Membrana/metabolismo , Proteínas/genética , Proteoma/metabolismo
17.
J Chem Inf Model ; 53(7): 1563-75, 2013 Jul 22.
Artigo em Inglês | MEDLINE | ID: mdl-23795551

RESUMO

Shallow machine learning methods have been applied to chemoinformatics problems with some success. As more data becomes available and more complex problems are tackled, deep machine learning methods may also become useful. Here, we present a brief overview of deep learning methods and show in particular how recursive neural network approaches can be applied to the problem of predicting molecular properties. However, molecules are typically described by undirected cyclic graphs, while recursive approaches typically use directed acyclic graphs. Thus, we develop methods to address this discrepancy, essentially by considering an ensemble of recursive neural networks associated with all possible vertex-centered acyclic orientations of the molecular graph. One advantage of this approach is that it relies only minimally on the identification of suitable molecular descriptors because suitable representations are learned automatically from the data. Several variants of this approach are applied to the problem of predicting aqueous solubility and tested on four benchmark data sets. Experimental results show that the performance of the deep learning methods matches or exceeds the performance of other state-of-the-art methods according to several evaluation metrics and expose the fundamental limitations arising from training sets that are too small or too noisy. A Web-based predictor, AquaSol, is available online through the ChemDB portal ( cdb.ics.uci.edu ) together with additional material.


Assuntos
Inteligência Artificial , Informática/métodos , Preparações Farmacêuticas/química , Água/química , Ácido Acético/química , Gráficos por Computador , Bases de Dados de Produtos Farmacêuticos , Internet , Redes Neurais de Computação , Solubilidade
18.
Nucleic Acids Res ; 39(Web Server issue): W190-6, 2011 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-21646342

RESUMO

CSpritz is a web server for the prediction of intrinsic protein disorder. It is a combination of previous Spritz with two novel orthogonal systems developed by our group (Punch and ESpritz). Punch is based on sequence and structural templates trained with support vector machines. ESpritz is an efficient single sequence method based on bidirectional recursive neural networks. Spritz was extended to filter predictions based on structural homologues. After extensive testing, predictions are combined by averaging their probabilities. The CSpritz website can elaborate single or multiple predictions for either short or long disorder. The server provides a global output page, for download and simultaneous statistics of all predictions. Links are provided to each individual protein where the amino acid sequence and disorder prediction are displayed along with statistics for the individual protein. As a novel feature, CSpritz provides information about structural homologues as well as secondary structure and short functional linear motifs in each disordered segment. Benchmarking was performed on the very recent CASP9 data, where CSpritz would have ranked consistently well with a Sw measure of 49.27 and AUC of 0.828. The server, together with help and methods pages including examples, are freely available at URL: http://protein.bio.unipd.it/cspritz/.


Assuntos
Conformação Proteica , Software , Motivos de Aminoácidos , Internet , Anotação de Sequência Molecular , Estrutura Secundária de Proteína , Homologia Estrutural de Proteína
19.
Comput Struct Biotechnol J ; 21: 3024-3031, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37266407

RESUMO

Motivation: One of the most relevant mechanisms involved in the determination of chromatin structure is the formation of structural loops that are also related with the conservation of chromatin states. Many of these loops are stabilized by CCCTC-binding factor (CTCF) proteins at their base. Despite the relevance of chromatin structure and the key role of CTCF, the role of the epigenetic factors that are involved in the regulation of CTCF binding, and thus, in the formation of structural loops in the chromatin, is not thoroughly understood. Results: Here we describe a CTCF binding predictor based on Random Forest that employs different epigenetic data and genomic features. Importantly, given the ability of Random Forests to determine the relevance of features for the prediction, our approach also shows how the different types of descriptors impact the binding of CTCF, confirming previous knowledge on the relevance of chromatin accessibility and DNA methylation, but demonstrating the effect of epigenetic modifications on the activity of CTCF. We compared our approach against other predictors and found improved performance in terms of areas under PR and ROC curves (PRAUC-ROCAUC), outperforming current state-of-the-art methods.

20.
Bioinformatics ; 27(20): 2812-9, 2011 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-21873639

RESUMO

SUMMARY: Knowledge of the subcellular location of a protein provides valuable information about its function and possible interaction with other proteins. In the post-genomic era, fast and accurate predictors of subcellular location are required if this abundance of sequence data is to be fully exploited. We have developed a subcellular localization predictor (SCLpred), which predicts the location of a protein into four classes for animals and fungi and five classes for plants (secreted, cytoplasm, nucleus, mitochondrion and chloroplast) using machine learning models trained on large non-redundant sets of protein sequences. The algorithm powering SCLpred is a novel Neural Network (N-to-1 Neural Network, or N1-NN) we have developed, which is capable of mapping whole sequences into single properties (a functional class, in this work) without resorting to predefined transformations, but rather by adaptively compressing the sequence into a hidden feature vector. We benchmark SCLpred against other publicly available predictors using two benchmarks including a new subset of Swiss-Prot Release 2010_06. We show that SCLpred surpasses the state of the art. The N1-NN algorithm is fully general and may be applied to a host of problems of similar shape, that is, in which a whole sequence needs to be mapped into a fixed-size array of properties, and the adaptive compression it operates may shed light on the space of protein sequences. AVAILABILITY: The predictive systems described in this article are publicly available as a web server at http://distill.ucd.ie/distill/. CONTACT: gianluca.pollastri@ucd.ie.


Assuntos
Redes Neurais de Computação , Proteínas/análise , Análise de Sequência de Proteína , Algoritmos , Animais , Inteligência Artificial , Proteínas de Cloroplastos/análise , Citoplasma/química , Proteínas Fúngicas/análise , Proteínas Mitocondriais/análise , Proteínas Nucleares/análise , Proteínas de Plantas/análise , Proteínas/química
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA