Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 62
Filtrar
Mais filtros

Bases de dados
Tipo de documento
Intervalo de ano de publicação
1.
J Biomed Inform ; 113: 103627, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-33259944

RESUMO

In the last few years, the application of Machine Learning approaches like Deep Neural Network (DNN) models have become more attractive in the healthcare system given the rising complexity of the healthcare data. Machine Learning (ML) algorithms provide efficient and effective data analysis models to uncover hidden patterns and other meaningful information from the considerable amount of health data that conventional analytics are not able to discover in a reasonable time. In particular, Deep Learning (DL) techniques have been shown as promising methods in pattern recognition in the healthcare systems. Motivated by this consideration, the contribution of this paper is to investigate the deep learning approaches applied to healthcare systems by reviewing the cutting-edge network architectures, applications, and industrial trends. The goal is first to provide extensive insight into the application of deep learning models in healthcare solutions to bridge deep learning techniques and human healthcare interpretability. And then, to present the existing open challenges and future directions.


Assuntos
Aprendizado Profundo , Algoritmos , Atenção à Saúde , Humanos , Aprendizado de Máquina , Redes Neurais de Computação
2.
Bioinformatics ; 35(20): 4140-4146, 2019 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-30903686

RESUMO

MOTIVATION: Protein glycosylation is one of the most abundant post-translational modifications that plays an important role in immune responses, intercellular signaling, inflammation and host-pathogen interactions. However, due to the poor ionization efficiency and microheterogeneity of glycopeptides identifying glycosylation sites is a challenging task, and there is a demand for computational methods. Here, we constructed the largest dataset of human and mouse glycosylation sites to train deep learning neural networks and support vector machine classifiers to predict N-/O-linked glycosylation sites, respectively. RESULTS: The method, called SPRINT-Gly, achieved consistent results between ten-fold cross validation and independent test for predicting human and mouse glycosylation sites. For N-glycosylation, a mouse-trained model performs equally well in human glycoproteins and vice versa, however, due to significant differences in O-linked sites separate models were generated. Overall, SPRINT-Gly is 18% and 50% higher in Matthews correlation coefficient than the next best method compared in N-linked and O-linked sites, respectively. This improved performance is due to the inclusion of novel structure and sequence-based features. AVAILABILITY AND IMPLEMENTATION: http://sparks-lab.org/server/SPRINT-Gly/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Máquina de Vetores de Suporte , Animais , Glicoproteínas , Glicosilação , Humanos , Camundongos , Processamento de Proteína Pós-Traducional
3.
Bioinformatics ; 35(19): 3831-3833, 2019 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-30850831

RESUMO

MOTIVATION: Extracting useful feature set which contains significant discriminatory information is a critical step in effectively presenting sequence data to predict structural, functional, interaction and expression of proteins, DNAs and RNAs. Also, being able to filter features with significant information and avoid sparsity in the extracted features require the employment of efficient feature selection techniques. Here we present PyFeat as a practical and easy to use toolkit implemented in Python for extracting various features from proteins, DNAs and RNAs. To build PyFeat we mainly focused on extracting features that capture information about the interaction of neighboring residues to be able to provide more local information. We then employ AdaBoost technique to select features with maximum discriminatory information. In this way, we can significantly reduce the number of extracted features and enable PyFeat to represent the combination of effective features from large neighboring residues. As a result, PyFeat is able to extract features from 13 different techniques and represent context free combination of effective features. The source code for PyFeat standalone toolkit and employed benchmarks with a comprehensive user manual explaining its system and workflow in a step by step manner are publicly available. RESULTS: https://github.com/mrzResearchArena/PyFeat/blob/master/RESULTS.md. AVAILABILITY AND IMPLEMENTATION: Toolkit, source code and manual to use PyFeat: https://github.com/mrzResearchArena/PyFeat/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Sequência de Aminoácidos , DNA , Proteínas , RNA
4.
J Theor Biol ; 496: 110278, 2020 07 07.
Artigo em Inglês | MEDLINE | ID: mdl-32298689

RESUMO

MOTIVATION: Interactions between proteins and peptides influence biological functions. Predicting such bio-molecular interactions can lead to faster disease prevention and help in drug discovery. Experimental methods for determining protein-peptide binding sites are costly and time-consuming. Therefore, computational methods have become prevalent. However, existing models show extremely low detection rates of actual peptide binding sites in proteins. To address this problem, we employed a two-stage technique - first, we extracted the relevant features from protein sequences and transformed them into images applying a novel method and then, we applied a convolutional neural network to identify the peptide binding sites in proteins. RESULTS: We found that our approach achieves 67% sensitivity or recall (true positive rate) surpassing existing methods by over 35%.


Assuntos
Redes Neurais de Computação , Proteínas , Sítios de Ligação , Peptídeos/metabolismo , Ligação Proteica
5.
BMC Bioinformatics ; 19(Suppl 13): 547, 2019 Feb 04.
Artigo em Inglês | MEDLINE | ID: mdl-30717650

RESUMO

BACKGROUND: Glycation is a one of the post-translational modifications (PTM) where sugar molecules and residues in protein sequences are covalently bonded. It has become one of the clinically important PTM in recent times attributed to many chronic and age related complications. Being a non-enzymatic reaction, it is a great challenge when it comes to its prediction due to the lack of significant bias in the sequence motifs. RESULTS: We developed a classifier, GlyStruct based on support vector machine, to predict glycated and non-glycated lysine residues using structural properties of amino acid residues. The features used were secondary structure, accessible surface area and the local backbone torsion angles. For this work, a benchmark dataset was extracted containing 235 glycated and 303 non-glycated lysine residues. GlyStruct demonstrated improved performance of approximately 10% in comparison to benchmark method of Gly-PseAAC. The performance for GlyStruct on the metrics, sensitivity, specificity, accuracy and Mathew's correlation coefficient were 0.7013, 0.7989, 0.7562, and 0.5065, respectively for 10-fold cross-validation. CONCLUSION: Glycation has emerged to be one of the clinically important PTM of proteins in recent times. Therefore, the development of computational tools become necessary to predict glycation, which could help medical professionals administer drugs and manage patients more effectively. The proposed predictor manages to classify glycated and non-glycated lysine residues with promising results consistently on various cross-validation schemes and outperforms other state of the art methods.


Assuntos
Algoritmos , Aminoácidos/química , Biologia Computacional/métodos , Sequência de Aminoácidos , Área Sob a Curva , Benchmarking , Glicosilação , Humanos , Peptídeos/química , Máquina de Vetores de Suporte
6.
BMC Genomics ; 19(Suppl 9): 984, 2019 Apr 18.
Artigo em Inglês | MEDLINE | ID: mdl-30999859

RESUMO

BACKGROUND: Post-translational modification (PTM), which is a biological process, tends to modify proteome that leads to changes in normal cell biology and pathogenesis. In the recent times, there has been many reported PTMs. Out of the many modifications, phosphoglycerylation has become particularly the subject of interest. The experimental procedure for identification of phosphoglycerylated residues continues to be an expensive, inefficient and time-consuming effort, even with a large number of proteins that are sequenced in the post-genomic period. Computational methods are therefore being anticipated in order to effectively predict phosphoglycerylated lysines. Even though there are predictors available, the ability to detect phosphoglycerylated lysine residues still remains inadequate. RESULTS: We have introduced a new predictor in this paper named EvolStruct-Phogly that uses structural and evolutionary information relating to amino acids to predict phosphoglycerylated lysine residues. Benchmarked data is employed containing experimentally identified phosphoglycerylated and non-phosphoglycerylated lysines. We have then extracted the three structural information which are accessible surface area of amino acids, backbone torsion angles, amino acid's local structure conformations and profile bigrams of position-specific scoring matrices. CONCLUSION: EvolStruct-Phogly showed a noteworthy improvement in regards to the performance when compared with the previous predictors. The performance metrics obtained are as follows: sensitivity 0.7744, specificity 0.8533, precision 0.7368, accuracy 0.8275, and Mathews correlation coefficient of 0.6242. The software package and data of this work can be obtained from https://github.com/abelavit/EvolStruct-Phogly or www.alok-ai-lab.com.


Assuntos
Biologia Computacional/métodos , Ácidos Difosfoglicéricos/química , Evolução Molecular , Processamento de Proteína Pós-Traducional , Proteínas/química , Algoritmos , Sítios de Ligação , Ácidos Difosfoglicéricos/metabolismo , Humanos , Lisina/química , Lisina/metabolismo , Proteínas/metabolismo , Análise de Sequência de Proteína , Software , Máquina de Vetores de Suporte
7.
BMC Genomics ; 19(Suppl 9): 982, 2019 Apr 18.
Artigo em Inglês | MEDLINE | ID: mdl-30999862

RESUMO

BACKGROUND: Post-translational modifications are viewed as an important mechanism for controlling protein function and are believed to be involved in multiple important diseases. However, their profiling using laboratory-based techniques remain challenging. Therefore, making the development of accurate computational methods to predict post-translational modifications is particularly important for making progress in this area of research. RESULTS: This work explores the use of four half-sphere exposure-based features for computational prediction of sumoylation sites. Unlike most of the previously proposed approaches, which focused on patterns of amino acid co-occurrence, we were able to demonstrate that protein structural based features could be sufficiently informative to achieve good predictive performance. The evaluation of our method has demonstrated high sensitivity (0.9), accuracy (0.89) and Matthew's correlation coefficient (0.78-0.79). We have compared these results to the recently released pSumo-CD method and were able to demonstrate better performance of our method on the same evaluation dataset. CONCLUSIONS: The proposed predictor HseSUMO uses half-sphere exposures of amino acids to predict sumoylation sites. It has shown promising results on a benchmark dataset when compared with the state-of-the-art method. The extracted data of this study can be accessed at https://github.com/YosvanyLopez/HseSUMO .


Assuntos
Algoritmos , Aminoácidos/química , Biologia Computacional/métodos , Proteínas/química , Proteínas/metabolismo , Sumoilação , Sítios de Ligação , Humanos , Máquina de Vetores de Suporte
8.
J Theor Biol ; 464: 1-8, 2019 03 07.
Artigo em Inglês | MEDLINE | ID: mdl-30578798

RESUMO

Drug target interaction prediction is a very labor-intensive and expensive experimental process which has motivated researchers to focus on in silico prediction to provide information on potential interaction. In recent years, researchers have proposed several computational approaches for predicting new drug target interactions. In this paper, we present CFSBoost, a simple and computationally cheap ensemble boosting classification model for identification and prediction of drug-target interactions using evolutionary and structural features. CFSBoost uses a simple yet novel feature group selection procedure which allows the model to be computationally very cheap while being able to achieve state of the art performance. The ensemble model uses extra tree as weak learners inside a boosting scheme while holding on to the best model per iteration. We tested our method of four benchmark datasets, which are also referred as gold standard datasets. Our method was able to achieve better score in terms of area under receiver operating characteristic (auROC) curve on 2 out of the 4 datasets. It was also able to achieve higher area under precision recall (auPR) curve on 3 out of the 4 datasets. It has been argued by researchers that auPR metric is more suitable than auROC for comparison of performance on imbalanced datasets such our benchmark datasets. Our reported result shows that, despite of its simplicity in design, CFSBoost's performance is very satisfactory comparing to other literatures. We also provide 5 new possible interactions for each dataset based on CFSBoost's prediction score.


Assuntos
Algoritmos , Biologia Computacional , Simulação por Computador , Descoberta de Drogas , Modelos Químicos , Humanos
9.
Proteins ; 86(7): 777-789, 2018 07.
Artigo em Inglês | MEDLINE | ID: mdl-29675975

RESUMO

Glycation is chemical reaction by which sugar molecule bonds with a protein without the help of enzymes. This is often cause to many diseases and therefore the knowledge about glycation is very important. In this paper, we present iProtGly-SS, a protein lysine glycation site identification method based on features extracted from sequence and secondary structural information. In the experiments, we found the best feature groups combination: Amino Acid Composition, Secondary Structure Motifs, and Polarity. We used support vector machine classifier to train our model and used an optimal set of features using a group based forward feature selection technique. On standard benchmark datasets, our method is able to significantly outperform existing methods for glycation prediction. A web server for iProtGly-SS is implemented and publicly available to use: http://brl.uiu.ac.bd/iprotgly-ss/.


Assuntos
Lisina/química , Bases de Dados de Proteínas , Estrutura Secundária de Proteína , Análise de Sequência de Proteína , Máquina de Vetores de Suporte
10.
Proteins ; 86(6): 629-633, 2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-29508448

RESUMO

Designing protein sequences that can fold into a given structure is a well-known inverse protein-folding problem. One important characteristic to attain for a protein design program is the ability to recover wild-type sequences given their native backbone structures. The highest average sequence identity accuracy achieved by current protein-design programs in this problem is around 30%, achieved by our previous system, SPIN. SPIN is a program that predicts sequences compatible with a provided structure using a neural network with fragment-based local and energy-based nonlocal profiles. Our new model, SPIN2, uses a deep neural network and additional structural features to improve on SPIN. SPIN2 achieves over 34% in sequence recovery in 10-fold cross-validation and independent tests, a 4% improvement over the previous version. The sequence profiles generated from SPIN2 are expected to be useful for improving existing fold recognition and protein design techniques. SPIN2 is available at http://sparks-lab.org.


Assuntos
Redes Neurais de Computação , Proteínas/química , Software , Sequência de Aminoácidos , Bases de Dados de Proteínas , Modelos Moleculares , Dobramento de Proteína , Estrutura Secundária de Proteína
11.
BMC Genomics ; 19(Suppl 1): 923, 2018 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-29363424

RESUMO

BACKGROUND: Post-translational modification is considered an important biological mechanism with critical impact on the diversification of the proteome. Although a long list of such modifications has been studied, succinylation of lysine residues has recently attracted the interest of the scientific community. The experimental detection of succinylation sites is an expensive process, which consumes a lot of time and resources. Therefore, computational predictors of this covalent modification have emerged as a last resort to tackling lysine succinylation. RESULTS: In this paper, we propose a novel computational predictor called 'Success', which efficiently uses the structural and evolutionary information of amino acids for predicting succinylation sites. To do this, each lysine was described as a vector that combined the above information of surrounding amino acids. We then designed a support vector machine with a radial basis function kernel for discriminating between succinylated and non-succinylated residues. We finally compared the Success predictor with three state-of-the-art predictors in the literature. As a result, our proposed predictor showed a significant improvement over the compared predictors in statistical metrics, such as sensitivity (0.866), accuracy (0.838) and Matthews correlation coefficient (0.677) on a benchmark dataset. CONCLUSIONS: The proposed predictor effectively uses the structural and evolutionary information of the amino acids surrounding a lysine. The bigram feature extraction approach, while retaining the same number of features, facilitates a better description of lysines. A support vector machine with a radial basis function kernel was used to discriminate between modified and unmodified lysines. The aforementioned aspects make the Success predictor outperform three state-of-the-art predictors in succinylation detection.


Assuntos
Algoritmos , Aminoácidos/química , Evolução Molecular , Lisina/química , Processamento de Proteína Pós-Traducional , Ácido Succínico/metabolismo , Sequência de Aminoácidos , Aminoácidos/metabolismo , Biologia Computacional/métodos , Lisina/metabolismo
12.
J Theor Biol ; 443: 138-146, 2018 04 14.
Artigo em Inglês | MEDLINE | ID: mdl-29421211

RESUMO

Determining subcellular localization of proteins is considered as an important step towards understanding their functions. Previous studies have mainly focused solely on Gene Ontology (GO) as the main feature to tackle this problem. However, it was shown that features extracted based on GO is hard to be used for new proteins with unknown GO. At the same time, evolutionary information extracted from Position Specific Scoring Matrix (PSSM) have been shown as another effective features to tackle this problem. Despite tremendous advancement using these sources for feature extraction, this problem still remains unsolved. In this study we propose EvoStruct-Sub which employs predicted structural information in conjunction with evolutionary information extracted directly from the protein sequence to tackle this problem. To do this we use several different feature extraction method that have been shown promising in subcellular localization as well as similar studies to extract effective local and global discriminatory information. We then use Support Vector Machine (SVM) as our classification technique to build EvoStruct-Sub. As a result, we are able to enhance Gram-positive subcellular localization prediction accuracies by up to 5.6% better than previous studies including the studies that used GO for feature extraction.


Assuntos
Proteínas de Bactérias/genética , Biologia Computacional , Bases de Dados de Proteínas , Ontologia Genética , Bactérias Gram-Positivas/genética , Máquina de Vetores de Suporte
13.
Molecules ; 23(12)2018 Dec 10.
Artigo em Inglês | MEDLINE | ID: mdl-30544729

RESUMO

Post Translational Modification (PTM) is defined as the modification of amino acids along the protein sequences after the translation process. These modifications significantly impact on the functioning of proteins. Therefore, having a comprehensive understanding of the underlying mechanism of PTMs turns out to be critical in studying the biological roles of proteins. Among a wide range of PTMs, sumoylation is one of the most important modifications due to its known cellular functions which include transcriptional regulation, protein stability, and protein subcellular localization. Despite its importance, determining sumoylation sites via experimental methods is time-consuming and costly. This has led to a great demand for the development of fast computational methods able to accurately determine sumoylation sites in proteins. In this study, we present a new machine learning-based method for predicting sumoylation sites called SumSec. To do this, we employed the predicted secondary structure of amino acids to extract two types of structural features from neighboring amino acids along the protein sequence which has never been used for this task. As a result, our proposed method is able to enhance the sumoylation site prediction task, outperforming previously proposed methods in the literature. SumSec demonstrated high sensitivity (0.91), accuracy (0.94) and MCC (0.88). The prediction accuracy achieved in this study is 21% better than those reported in previous studies. The script and extracted features are publicly available at: https://github.com/YosvanyLopez/SumSec.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Sequência de Aminoácidos , Sítios de Ligação , Internet , Aprendizado de Máquina , Modelos Moleculares , Estrutura Secundária de Proteína , Proteínas/genética , Sumoilação
14.
Bioinformatics ; 32(6): 843-9, 2016 03 15.
Artigo em Inglês | MEDLINE | ID: mdl-26568622

RESUMO

MOTIVATION: Solvent exposure of amino acid residues of proteins plays an important role in understanding and predicting protein structure, function and interactions. Solvent exposure can be characterized by several measures including solvent accessible surface area (ASA), residue depth (RD) and contact numbers (CN). More recently, an orientation-dependent contact number called half-sphere exposure (HSE) was introduced by separating the contacts within upper and down half spheres defined according to the Cα-Cß (HSEß) vector or neighboring Cα-Cα vectors (HSEα). HSEα calculated from protein structures was found to better describe the solvent exposure over ASA, CN and RD in many applications. Thus, a sequence-based prediction is desirable, as most proteins do not have experimentally determined structures. To our best knowledge, there is no method to predict HSEα and only one method to predict HSEß. RESULTS: This study developed a novel method for predicting both HSEα and HSEß (SPIDER-HSE) that achieved a consistent performance for 10-fold cross validation and two independent tests. The correlation coefficients between predicted and measured HSEß (0.73 for upper sphere, 0.69 for down sphere and 0.76 for contact numbers) for the independent test set of 1199 proteins are significantly higher than existing methods. Moreover, predicted HSEα has a higher correlation coefficient (0.46) to the stability change by residue mutants than predicted HSEß (0.37) and ASA (0.43). The results, together with its easy Cα-atom-based calculation, highlight the potential usefulness of predicted HSEα for protein structure prediction and refinement as well as function prediction. AVAILABILITY AND IMPLEMENTATION: The method is available at http://sparks-lab.org CONTACT: yuedong.yang@griffith.edu.au or yaoqi.zhou@griffith.edu.au SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Proteínas/química , Sequência de Aminoácidos , Aminoácidos , Modelos Moleculares , Solventes
15.
Anal Biochem ; 527: 24-32, 2017 06 15.
Artigo em Inglês | MEDLINE | ID: mdl-28363440

RESUMO

Post-Translational Modification (PTM) is a biological reaction which contributes to diversify the proteome. Despite many modifications with important roles in cellular activity, lysine succinylation has recently emerged as an important PTM mark. It alters the chemical structure of lysines, leading to remarkable changes in the structure and function of proteins. In contrast to the huge amount of proteins being sequenced in the post-genome era, the experimental detection of succinylated residues remains expensive, inefficient and time-consuming. Therefore, the development of computational tools for accurately predicting succinylated lysines is an urgent necessity. To date, several approaches have been proposed but their sensitivity has been reportedly poor. In this paper, we propose an approach that utilizes structural features of amino acids to improve lysine succinylation prediction. Succinylated and non-succinylated lysines were first retrieved from 670 proteins and characteristics such as accessible surface area, backbone torsion angles and local structure conformations were incorporated. We used the k-nearest neighbors cleaning treatment for dealing with class imbalance and designed a pruned decision tree for classification. Our predictor, referred to as SucStruct (Succinylation using Structural features), proved to significantly improve performance when compared to previous predictors, with sensitivity, accuracy and Mathew's correlation coefficient equal to 0.7334-0.7946, 0.7444-0.7608 and 0.4884-0.5240, respectively.


Assuntos
Aminoácidos/metabolismo , Lisina/metabolismo , Modelos Estatísticos , Processamento de Proteína Pós-Traducional , Proteoma/metabolismo , Ácido Succínico/metabolismo , Algoritmos , Sequência de Aminoácidos , Animais , Humanos , Proteoma/genética , Roedores/genética , Roedores/metabolismo
16.
J Theor Biol ; 435: 229-237, 2017 12 21.
Artigo em Inglês | MEDLINE | ID: mdl-28943403

RESUMO

Bacteriophage proteins are viruses that can significantly impact on the functioning of bacteria and can be used in phage based therapy. The functioning of Bacteriophage in the host bacteria depends on its location in those host cells. It is very important to know the subcellular location of the phage proteins in a host cell in order to understand their working mechanism. In this paper, we propose iPHLoc-ES, a prediction method for subcellular localization of bacteriophage proteins. We aim to solve two problems: discriminating between host located and non-host located phage proteins and discriminating between the locations of host located protein in a host cell (membrane or cytoplasm). To do this, we extract sets of evolutionary and structural features of phage protein and employ Support Vector Machine (SVM) as our classifier. We also use recursive feature elimination (RFE) to reduce the number of features for effective prediction. On standard dataset using standard evaluation criteria, our method significantly outperforms the state-of-the-art predictor. iPHLoc-ES is readily available to use as a standalone tool from: https://github.com/swakkhar/iPHLoc-ES/ and as a web application from: http://brl.uiu.ac.bd/iPHLoc-ES/.


Assuntos
Bacteriófagos/química , Compartimento Celular , Máquina de Vetores de Suporte/normas , Proteínas Virais/metabolismo , Evolução Molecular , Interações Hospedeiro-Patógeno , Espaço Intracelular/virologia , Modelos Biológicos , Proteínas Virais/genética
17.
J Theor Biol ; 425: 97-102, 2017 07 21.
Artigo em Inglês | MEDLINE | ID: mdl-28483566

RESUMO

Post-translational modification (PTM) is a covalent and enzymatic modification of proteins, which contributes to diversify the proteome. Despite many reported PTMs with essential roles in cellular functioning, lysine succinylation has emerged as a subject of particular interest. Because its experimental identification remains a costly and time-consuming process, computational predictors have been recently proposed for tackling this important issue. However, the performance of current predictors is still very limited. In this paper, we propose a new predictor called PSSM-Suc which employs evolutionary information of amino acids for predicting succinylated lysine residues. Here we described each lysine residue in terms of profile bigrams extracted from position specific scoring matrices. We compared the performance of PSSM-Suc to that of existing predictors using a widely used benchmark dataset. PSSM-Suc showed a significant improvement in performance over state-of-the-art predictors. Its sensitivity, accuracy and Matthews correlation coefficient were 0.8159, 0.8199 and 0.6396, respectively.


Assuntos
Biologia Computacional/métodos , Lisina/metabolismo , Matrizes de Pontuação de Posição Específica , Processamento de Proteína Pós-Traducional , Algoritmos , Sequência de Aminoácidos , Aminoácidos/química , Animais , Evolução Molecular , Sensibilidade e Especificidade
18.
J Theor Biol ; 402: 117-28, 2016 08 07.
Artigo em Inglês | MEDLINE | ID: mdl-27164998

RESUMO

Predicting the three-dimensional (3-D) structure of a protein is an important task in the field of bioinformatics and biological sciences. However, directly predicting the 3-D structure from the primary structure is hard to achieve. Therefore, predicting the fold or structural class of a protein sequence is generally used as an intermediate step in determining the protein's 3-D structure. For protein fold recognition (PFR) and structural class prediction (SCP), two steps are required - feature extraction step and classification step. Feature extraction techniques generally utilize syntactical-based information, evolutionary-based information and physicochemical-based information to extract features. In this study, we explore the importance of utilizing the physicochemical properties of amino acids for improving PFR and SCP accuracies. For this, we propose a Forward Consecutive Search (FCS) scheme which aims to strategically select physicochemical attributes that will supplement the existing feature extraction techniques for PFR and SCP. An exhaustive search is conducted on all the existing 544 physicochemical attributes using the proposed FCS scheme and a subset of physicochemical attributes is identified. Features extracted from these selected attributes are then combined with existing syntactical-based and evolutionary-based features, to show an improvement in the recognition and prediction performance on benchmark datasets.


Assuntos
Aminoácidos/química , Fenômenos Químicos , Reconhecimento Automatizado de Padrão/métodos , Dobramento de Proteína , Proteínas/química , Bases de Dados de Proteínas , Matrizes de Pontuação de Posição Específica , Reprodutibilidade dos Testes
19.
J Theor Biol ; 393: 67-74, 2016 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-26801876

RESUMO

Detecting three dimensional structures of protein sequences is a challenging task in biological sciences. For this purpose, protein fold recognition has been utilized as an intermediate step which helps in classifying a novel protein sequence into one of its folds. The process of protein fold recognition encompasses feature extraction of protein sequences and feature identification through suitable classifiers. Several feature extractors are developed to retrieve useful information from protein sequences. These features are generally extracted by constituting protein's sequential, physicochemical and evolutionary properties. The performance in terms of recognition accuracy has also been gradually improved over the last decade. However, it is yet to reach a well reasonable and accepted level. In this work, we first applied HMM-HMM alignment of protein sequence from HHblits to extract profile HMM (PHMM) matrix. Then we computed the distance between respective PHMM matrices using kernalized dynamic programming. We have recorded significant improvement in fold recognition over the state-of-the-art feature extractors. The improvement of recognition accuracy is in the range of 2.7-11.6% when experimented on three benchmark datasets from Structural Classification of Proteins.


Assuntos
Cadeias de Markov , Proteínas/química , Alinhamento de Sequência/métodos , Bases de Dados de Proteínas , Estrutura Secundária de Proteína , Reprodutibilidade dos Testes , Máquina de Vetores de Suporte
20.
BMC Bioinformatics ; 16 Suppl 4: S1, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25734546

RESUMO

BACKGROUND: The functioning of a protein relies on its location in the cell. Therefore, predicting protein subcellular localization is an important step towards protein function prediction. Recent studies have shown that relying on Gene Ontology (GO) for feature extraction can improve the prediction performance. However, for newly sequenced proteins, the GO is not available. Therefore, for these cases, the prediction performance of GO based methods degrade significantly. RESULTS: In this study, we develop a method to effectively employ physicochemical and evolutionary-based information in the protein sequence. To do this, we propose segmentation based feature extraction method to explore potential discriminatory information based on physicochemical properties of the amino acids to tackle Gram-positive and Gram-negative subcellular localization. We explore our proposed feature extraction techniques using 10 attributes that have been experimentally selected among a wide range of physicochemical attributes. Finally by applying the Rotation Forest classification technique to our extracted features, we enhance Gram-positive and Gram-negative subcellular localization accuracies up to 3.4% better than previous studies which used GO for feature extraction. CONCLUSION: By proposing segmentation based feature extraction method to explore potential discriminatory information based on physicochemical properties of the amino acids as well as using Rotation Forest classification technique, we are able to enhance the Gram-positive and Gram-negative subcellular localization prediction accuracies, significantly.


Assuntos
Proteínas de Bactérias/metabolismo , Biologia Computacional/métodos , Bactérias Gram-Negativas/isolamento & purificação , Bactérias Gram-Positivas/isolamento & purificação , Análise de Sequência de Proteína/métodos , Algoritmos , Análise por Conglomerados , Bases de Dados de Proteínas , Bactérias Gram-Negativas/metabolismo , Bactérias Gram-Positivas/metabolismo , Modelos Estatísticos , Valor Preditivo dos Testes , Frações Subcelulares
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA