Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 62
Filtrar
1.
Math Biosci Eng ; 19(3): 2381-2402, 2022 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-35240789

RESUMO

Myocarditis is the form of an inflammation of the middle layer of the heart wall which is caused by a viral infection and can affect the heart muscle and its electrical system. It has remained one of the most challenging diagnoses in cardiology. Myocardial is the prime cause of unexpected death in approximately 20% of adults less than 40 years of age. Cardiac MRI (CMR) has been considered a noninvasive and golden standard diagnostic tool for suspected myocarditis and plays an indispensable role in diagnosing various cardiac diseases. However, the performance of CMR depends heavily on the clinical presentation and features such as chest pain, arrhythmia, and heart failure. Besides, other imaging factors like artifacts, technical errors, pulse sequence, acquisition parameters, contrast agent dose, and more importantly qualitatively visual interpretation can affect the result of the diagnosis. This paper introduces a new deep learning-based model called Convolutional Neural Network-Clustering (CNN-KCL) to diagnose Myocarditis. In this study, we used 47 subjects with a total number of 98,898 images to diagnose myocarditis disease. Our results demonstrate that the proposed method achieves an accuracy of 97.41% based on 10 fold-cross validation technique with 4 clusters for diagnosis of Myocarditis. To the best of our knowledge, this research is the first to use deep learning algorithms for the diagnosis of myocarditis.


Assuntos
Miocardite , Adulto , Algoritmos , Análise por Conglomerados , Humanos , Imageamento por Ressonância Magnética , Miocardite/diagnóstico por imagem , Redes Neurais de Computação
2.
Sci Rep ; 11(1): 23676, 2021 12 08.
Artigo em Inglês | MEDLINE | ID: mdl-34880291

RESUMO

Although advancing the therapeutic alternatives for treating deadly cancers has gained much attention globally, still the primary methods such as chemotherapy have significant downsides and low specificity. Most recently, Anticancer peptides (ACPs) have emerged as a potential alternative to therapeutic alternatives with much fewer negative side-effects. However, the identification of ACPs through wet-lab experiments is expensive and time-consuming. Hence, computational methods have emerged as viable alternatives. During the past few years, several computational ACP identification techniques using hand-engineered features have been proposed to solve this problem. In this study, we propose a new multi headed deep convolutional neural network model called ACP-MHCNN, for extracting and combining discriminative features from different information sources in an interactive way. Our model extracts sequence, physicochemical, and evolutionary based features for ACP identification using different numerical peptide representations while restraining parameter overhead. It is evident through rigorous experiments using cross-validation and independent-dataset that ACP-MHCNN outperforms other models for anticancer peptide identification by a substantial margin on our employed benchmarks. ACP-MHCNN outperforms state-of-the-art model by 6.3%, 8.6%, 3.7%, 4.0%, and 0.20 in terms of accuracy, sensitivity, specificity, precision, and MCC respectively. ACP-MHCNN and its relevant codes and datasets are publicly available at: https://github.com/mrzResearchArena/Anticancer-Peptides-CNN . ACP-MHCNN is also publicly available as an online predictor at: https://anticancer.pythonanywhere.com/ .


Assuntos
Antineoplásicos/química , Antineoplásicos/farmacologia , Biologia Computacional/métodos , Aprendizado Profundo , Descoberta de Drogas/métodos , Redes Neurais de Computação , Peptídeos/química , Peptídeos/farmacologia , Algoritmos , Sequência de Aminoácidos , Fenômenos Químicos , Humanos , Curva ROC , Reprodutibilidade dos Testes
3.
J Neurosci Methods ; 364: 109373, 2021 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-34606773

RESUMO

BACKGROUND: The classification of motor imagery electroencephalogram (MI-EEG) is a pivotal task in the biosignal classification process in the brain-computer interface (BCI) applications. Currently, this bio-engineering-based technology is being employed by researchers in various fields to develop cutting-edge applications. The classification of real-time MI-EEG signals is the most challenging task in these applications. The prediction performance of the existing classification methods is still limited due to the high dimensionality and dynamic behaviors of the real-time EEG data. PROPOSED METHOD: To enhance the classification performance of real-time BCI applications, this paper presents a new clustering-based ensemble technique called CluSem to mitigate this problem. We also develop a new brain game called CluGame using this method to evaluate the classification performance of real-time motor imagery movements. In this game, real-time EEG signal classification and prediction tabulation through animated balls are controlled via threads. By playing this game, users can control the movements of the balls via the brain signals of motor imagery movements without using any traditional input devices. RESULTS: Our results demonstrate that CluSem is able to improve the classification accuracy between 5% and 15% compared to the existing methods on our collected as well as the publicly available EEG datasets. The source codes used to implement CluSem and CluGame are publicly available at https://github.com/MdOchiuddinMiah/MI-BCI_ML.


Assuntos
Interfaces Cérebro-Computador , Algoritmos , Análise por Conglomerados , Eletroencefalografia , Imaginação , Movimento
5.
Cancers (Basel) ; 13(17)2021 Aug 30.
Artigo em Inglês | MEDLINE | ID: mdl-34503185

RESUMO

It is now known that at least 10% of samples with pancreatic cancers (PC) contain a causative mutation in the known susceptibility genes, suggesting the importance of identifying cancer-associated genes that carry the causative mutations in high-risk individuals for early detection of PC. In this study, we develop a statistical pipeline using a new concept, called gene-motif, that utilizes both mutated genes and mutational processes to identify 4211 3-nucleotide PC-associated gene-motifs within 203 significantly mutated genes in PC. Using these gene-motifs as distinguishable features for pancreatic cancer subtyping results in identifying five PC subtypes with distinguishable phenotypes and genotypes. Our comprehensive biological characterization reveals that these PC subtypes are associated with different molecular mechanisms including unique cancer related signaling pathways, in which for most of the subtypes targeted treatment options are currently available. Some of the pathways we identified in all five PC subtypes, including cell cycle and the Axon guidance pathway are frequently seen and mutated in cancer. We also identified Protein kinase C, EGFR (epidermal growth factor receptor) signaling pathway and P53 signaling pathways as potential targets for treatment of the PC subtypes. Altogether, our results uncover the importance of considering both the mutation type and mutated genes in the identification of cancer subtypes and biomarkers.

6.
ACS Omega ; 6(18): 12306-12317, 2021 May 11.
Artigo em Inglês | MEDLINE | ID: mdl-34056383

RESUMO

Toxicity prediction using quantitative structure-activity relationship has achieved significant progress in recent years. However, most existing machine learning methods in toxicity prediction utilize only one type of feature representation and one type of neural network, which essentially restricts their performance. Moreover, methods that use more than one type of feature representation struggle with the aggregation of information captured within the features since they use predetermined aggregation formulas. In this paper, we propose a deep learning framework for quantitative toxicity prediction using five individual base deep learning models and their own base feature representations. We then propose to adopt a meta ensemble approach using another separate deep learning model to perform aggregation of the outputs of the individual base deep learning models. We train our deep learning models in a weighted multitask fashion combining four quantitative toxicity data sets of LD50, IGC50, LC50, and LC50-DM and minimizing the root-mean-square errors. Compared to the current state-of-the-art toxicity prediction method TopTox on LD50, IGC50, and LC50-DM, that is, three out of four data sets, our method, respectively, obtains 5.46, 16.67, and 6.34% better root-mean-square errors, 6.41, 11.80, and 12.16% better mean absolute errors, and 5.21, 7.36, and 2.54% better coefficients of determination. We named our method QuantitativeTox, and our implementation is available from the GitHub repository https://github.com/Abdulk084/QuantitativeTox.

7.
Genes (Basel) ; 12(2)2021 01 27.
Artigo em Inglês | MEDLINE | ID: mdl-33514039

RESUMO

Bioinformatics and computational biology have significantly contributed to the generation of vast and important knowledge that can lead to great improvements and advancements in biology and its related fields. Over the past three decades, a wide range of tools and methods have been developed and proposed to enhance performance, diagnosis, and throughput while maintaining feasibility and convenience for users. Here, we propose a new user-friendly comprehensive tool called VIRMOTIF to analyze DNA sequences. VIRMOTIF brings different tools together as one package so that users can perform their analysis as a whole and in one place. VIRMOTIF is able to complete different tasks, including computing the number or probability of motifs appearing in DNA sequences, visualizing data using the matplotlib and heatmap libraries, and clustering data using four different methods, namely K-means, PCA, Mean Shift, and ClusterMap. VIRMOTIF is the only tool with the ability to analyze genomic motifs based on their frequency and representation (D-ratio) in a virus genome.


Assuntos
Biologia Computacional/métodos , Genoma Viral , Análise de Sequência de DNA , Software , Algoritmos , Análise por Conglomerados , Bases de Dados Genéticas , Variação Genética , Motivos de Nucleotídeos , Análise de Sequência de DNA/métodos , Interface Usuário-Computador
8.
J Biomed Inform ; 113: 103627, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-33259944

RESUMO

In the last few years, the application of Machine Learning approaches like Deep Neural Network (DNN) models have become more attractive in the healthcare system given the rising complexity of the healthcare data. Machine Learning (ML) algorithms provide efficient and effective data analysis models to uncover hidden patterns and other meaningful information from the considerable amount of health data that conventional analytics are not able to discover in a reasonable time. In particular, Deep Learning (DL) techniques have been shown as promising methods in pattern recognition in the healthcare systems. Motivated by this consideration, the contribution of this paper is to investigate the deep learning approaches applied to healthcare systems by reviewing the cutting-edge network architectures, applications, and industrial trends. The goal is first to provide extensive insight into the application of deep learning models in healthcare solutions to bridge deep learning techniques and human healthcare interpretability. And then, to present the existing open challenges and future directions.


Assuntos
Aprendizado Profundo , Algoritmos , Atenção à Saúde , Humanos , Aprendizado de Máquina , Redes Neurais de Computação
9.
IEEE Access ; 8: 77888-77902, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33354488

RESUMO

Post Translational Modification (PTM) is considered an important biological process with a tremendous impact on the function of proteins in both eukaryotes, and prokaryotes cells. During the past decades, a wide range of PTMs has been identified. Among them, malonylation is a recently identified PTM which plays a vital role in a wide range of biological interactions. Notwithstanding, this modification plays a potential role in energy metabolism in different species including Homo Sapiens. The identification of PTM sites using experimental methods is time-consuming and costly. Hence, there is a demand for introducing fast and cost-effective computational methods. In this study, we propose a new machine learning method, called Mal-Light, to address this problem. To build this model, we extract local evolutionary-based information according to the interaction of neighboring amino acids using a bi-peptide based method. We then use Light Gradient Boosting (LightGBM) as our classifier to predict malonylation sites. Our results demonstrate that Mal-Light is able to significantly improve malonylation site prediction performance compared to previous studies found in the literature. Using Mal-Light we achieve Matthew's correlation coefficient (MCC) of 0.74 and 0.60, Accuracy of 86.66% and 79.51%, Sensitivity of 78.26% and 67.27%, and Specificity of 95.05% and 91.75%, for Homo Sapiens and Mus Musculus proteins, respectively. Mal-Light is implemented as an online predictor which is publicly available at: (http://brl.uiu.ac.bd/MalLight/).

10.
Comput Struct Biotechnol J ; 18: 3528-3538, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33304452

RESUMO

RNA modification is an essential step towards generation of new RNA structures. Such modification is potentially able to modify RNA function or its stability. Among different modifications, 5-Hydroxymethylcytosine (5hmC) modification of RNA exhibit significant potential for a series of biological processes. Understanding the distribution of 5hmC in RNA is essential to determine its biological functionality. Although conventional sequencing techniques allow broad identification of 5hmC, they are both time-consuming and resource-intensive. In this study, we propose a new computational tool called iRNA5hmC-PS to tackle this problem. To build iRNA5hmC-PS we extract a set of novel sequence-based features called Position-Specific Gapped k-mer (PSG k-mer) to obtain maximum sequential information. Our feature analysis shows that our proposed PSG k-mer features contain vital information for the identification of 5hmC sites. We also use a group-wise feature importance calculation strategy to select a small subset of features containing maximum discriminative information. Our experimental results demonstrate that iRNA5hmC-PS is able to enhance the prediction performance, dramatically. iRNA5hmC-PS achieves 78.3% prediction performance, which is 12.8% better than those reported in the previous studies. iRNA5hmC-PS is publicly available as an online tool at http://103.109.52.8:81/iRNA5hmC-PS. Its benchmark dataset, source codes, and documentation are available at https://github.com/zahid6454/iRNA5hmC-PS.

11.
Genes (Basel) ; 11(12)2020 11 28.
Artigo em Inglês | MEDLINE | ID: mdl-33260770

RESUMO

Post-translational modification (PTM) is a critical biological reaction which adds to the diversification of the proteome. With numerous known modifications being studied, pupylation has gained focus in the scientific community due to its significant role in regulating biological processes. The traditional experimental practice to detect pupylation sites proved to be expensive and requires a lot of time and resources. Thus, there have been many computational predictors developed to challenge this issue. However, performance is still limited. In this study, we propose another computational method, named PupStruct, which uses the structural information of amino acids with a radial basis kernel function Support Vector Machine (SVM) to predict pupylated lysine residues. We compared PupStruct with three state-of-the-art predictors from the literature where PupStruct has validated a significant improvement in performance over them with statistical metrics such as sensitivity (0.9234), specificity (0.9359), accuracy (0.9296), precision (0.9349), and Mathew's correlation coefficient (0.8616) on a benchmark dataset.


Assuntos
Biologia Computacional , Bases de Dados de Proteínas , Lisina , Processamento de Proteína Pós-Traducional , Proteoma , Máquina de Vetores de Suporte , Proteoma/química , Proteoma/genética
12.
Sci Rep ; 10(1): 19430, 2020 11 10.
Artigo em Inglês | MEDLINE | ID: mdl-33173130

RESUMO

Protein structure prediction is a grand challenge. Prediction of protein structures via the representations using backbone dihedral angles has recently achieved significant progress along with the on-going surge of deep neural network (DNN) research in general. However, we observe that in the protein backbone angle prediction research, there is an overall trend to employ more and more complex neural networks and then to throw more and more features to the neural networks. While more features might add more predictive power to the neural network, we argue that redundant features could rather clutter the scenario and more complex neural networks then just could counterbalance the noise. From artificial intelligence and machine learning perspectives, problem representations and solution approaches do mutually interact and thus affect performance. We also argue that comparatively simpler predictors can more easily be reconstructed than the more complex ones. With these arguments in mind, we present a deep learning method named Simpler Angle Predictor (SAP) to train simpler DNN models that enhance protein backbone angle prediction. We then empirically show that SAP can significantly outperform existing state-of-the-art methods on well-known benchmark datasets: for some types of angles, the differences are 6-8 in terms of mean absolute error (MAE). The SAP program along with its data is available from the website https://gitlab.com/mahnewton/sap .


Assuntos
Fígado/efeitos dos fármacos , Fígado/metabolismo , Animais , Apoptose/efeitos dos fármacos , Dieta Hiperlipídica/efeitos adversos , Inibidores da Dipeptidil Peptidase IV/uso terapêutico , Células Hep G2 , Hepatócitos/efeitos dos fármacos , Hepatócitos/metabolismo , Humanos , Marcação In Situ das Extremidades Cortadas , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Redes Neurais de Computação , Receptores do Ligante Indutor de Apoptose Relacionado a TNF/metabolismo
13.
Comput Biol Med ; 125: 104022, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-33022522

RESUMO

Post Transactional Modification (PTM) is a vital process which plays an important role in a wide range of biological interactions. One of the most recently identified PTMs is Malonylation. It has been shown that Malonylation has an important impact on different biological pathways including glucose and fatty acid metabolism. Malonylation can be detected experimentally using mass spectrometry. However, this process is both costly and time-consuming which has inspired research to find more efficient and fast computational methods to solve this problem. This paper proposes a novel approach, called SEMal, to identify Malonylation sites in protein sequences. It uses both structural and evolutionary-based features to solve this problem. It also uses Rotation Forest (RoF) as its classification technique to predict Malonylation sites. To the best of our knowledge, our extracted features as well as our employed classifier have never been used for this problem. Compared to the previously proposed methods, SEMal outperforms them in all metrics such as sensitivity (0.94 and 0.89), accuracy (0.94 and 0.91), and Matthews correlation coefficient (0.88 and 0.82), for Homo Sapiens and Mus Musculus species, respectively. SEMal is publicly available as an online predictor at: http://brl.uiu.ac.bd/SEMal/.


Assuntos
Lisina , Processamento de Proteína Pós-Traducional , Sequência de Aminoácidos , Animais , Evolução Biológica , Humanos , Lisina/metabolismo , Camundongos
14.
Genes (Basel) ; 11(9)2020 08 31.
Artigo em Inglês | MEDLINE | ID: mdl-32878321

RESUMO

Post Translational Modification (PTM) is defined as the alteration of protein sequence upon interaction with different macromolecules after the translation process. Glutarylation is considered one of the most important PTMs, which is associated with a wide range of cellular functioning, including metabolism, translation, and specified separate subcellular localizations. During the past few years, a wide range of computational approaches has been proposed to predict Glutarylation sites. However, despite all the efforts that have been made so far, the prediction performance of the Glutarylation sites has remained limited. One of the main challenges to tackle this problem is to extract features with significant discriminatory information. To address this issue, we propose a new machine learning method called BiPepGlut using the concept of a bi-peptide-based evolutionary method for feature extraction. To build this model, we also use the Extra-Trees (ET) classifier for the classification purpose, which, to the best of our knowledge, has never been used for this task. Our results demonstrate BiPepGlut is able to significantly outperform previously proposed models to tackle this problem. BiPepGlut achieves 92.0%, 84.8%, 95.6%, 0.82, and 0.88 in accuracy, sensitivity, specificity, Matthew's Correlation Coefficient, and F1-score, respectively. BiPepGlut is implemented as a publicly available online predictor.


Assuntos
Evolução Molecular , Glutaratos/química , Lisina/química , Mycobacterium tuberculosis/metabolismo , Fragmentos de Peptídeos/química , Processamento de Proteína Pós-Traducional , Proteínas/química , Algoritmos , Sequência de Aminoácidos , Animais , Biologia Computacional , Glutaratos/metabolismo , Lisina/metabolismo , Aprendizado de Máquina , Camundongos , Mycobacterium tuberculosis/crescimento & desenvolvimento , Fragmentos de Peptídeos/metabolismo , Proteínas/metabolismo , Máquina de Vetores de Suporte
15.
Comput Biol Chem ; 87: 107235, 2020 Feb 19.
Artigo em Inglês | MEDLINE | ID: mdl-32604027

RESUMO

Post-translational modifications are considered important molecular interactions in protein science. One of these modifications is "sumoylation" whose computational detection has recently become a challenge. In this paper, we propose a new computational predictor which makes use of the sine and cosine of backbone torsion angles and the accessible surface area for predicting sumoylation sites. The aforementioned features were computed for all the proteins in our benchmark dataset, and a training matrix consisting of sumoylation and non-sumoylation sites was ultimately created. This training matrix was balanced by undersampling the majority class (non-sumoylation sites) using the NearMiss method. Finally, an AdaBoost classifier was used for discriminating between sumoylation and non-sumoylation sites. Our predictor was called "C-iSumo" because of its effective use of circular functions. C-iSumo was compared with another predictor which was outperformed in statistical metrics such as sensitivity (0.734), accuracy (0.746) and Matthews correlation coefficient (0.494).

16.
Math Biosci Eng ; 17(3): 2193-2217, 2020 01 13.
Artigo em Inglês | MEDLINE | ID: mdl-32233531

RESUMO

Modern next generation sequencing technologies produce huge amounts of genome-wide data that allow researchers to have a deeper understanding of genomics of organisms. Despite these huge amounts of data, our understanding of the transcriptional regulatory networks is still incomplete. Conformation dependent chromosome interaction maps technologies (Hi-C) have enabled us to detect elements in the genome which interact with each other and regulate the genes. Summarizing these interactions as a data network leads to investigation of the most important properties of the 3D genome structure such as gene co-expression networks. In this work, a Pareto-Based Multi-Objective Optimization algorithm is proposed to detect the co-expressed genomic regions in Hi-C interactions. The proposed method uses fixed sized genomic regions as the vertices of the graph. Number of read between two interacting genomic regions indicate the weight of each edge. The performance of our proposed algorithm was compared to the Multi-Objective PSO algorithm on five networks derived from cis genomic interactions in three Hi-C datasets (GM12878, CD34+ and ESCs). The experimental results show that our proposed algorithm outperforms Multi-Objective PSO technique in the identification of co-interacting genomic regions.


Assuntos
Redes Reguladoras de Genes , Genômica , Algoritmos , Cromossomos
17.
J Theor Biol ; 496: 110278, 2020 07 07.
Artigo em Inglês | MEDLINE | ID: mdl-32298689

RESUMO

MOTIVATION: Interactions between proteins and peptides influence biological functions. Predicting such bio-molecular interactions can lead to faster disease prevention and help in drug discovery. Experimental methods for determining protein-peptide binding sites are costly and time-consuming. Therefore, computational methods have become prevalent. However, existing models show extremely low detection rates of actual peptide binding sites in proteins. To address this problem, we employed a two-stage technique - first, we extracted the relevant features from protein sequences and transformed them into images applying a novel method and then, we applied a convolutional neural network to identify the peptide binding sites in proteins. RESULTS: We found that our approach achieves 67% sensitivity or recall (true positive rate) surpassing existing methods by over 35%.


Assuntos
Redes Neurais de Computação , Proteínas , Sítios de Ligação , Peptídeos/metabolismo , Ligação Proteica
18.
Genes (Basel) ; 11(12)2020 12 20.
Artigo em Inglês | MEDLINE | ID: mdl-33419274

RESUMO

BACKGROUND: Post-translational modification (PTM) is a biological process that is associated with the modification of proteome, which results in the alteration of normal cell biology and pathogenesis. There have been numerous PTM reports in recent years, out of which, lysine phosphoglycerylation has emerged as one of the recent developments. The traditional methods of identifying phosphoglycerylated residues, which are experimental procedures such as mass spectrometry, have shown to be time-consuming and cost-inefficient, despite the abundance of proteins being sequenced in this post-genomic era. Due to these drawbacks, computational techniques are being sought to establish an effective identification system of phosphoglycerylated lysine residues. The development of a predictor for phosphoglycerylation prediction is not a first, but it is necessary as the latest predictor falls short in adequately detecting phosphoglycerylated and non-phosphoglycerylated lysine residues. RESULTS: In this work, we introduce a new predictor named RAM-PGK, which uses sequence-based information relating to amino acid residues to predict phosphoglycerylated and non-phosphoglycerylated sites. A benchmark dataset was employed for this purpose, which contained experimentally identified phosphoglycerylated and non-phosphoglycerylated lysine residues. From the dataset, we extracted the residue adjacency matrix pertaining to each lysine residue in the protein sequences and converted them into feature vectors, which is used to build the phosphoglycerylation predictor. CONCLUSION: RAM-PGK, which is based on sequential features and support vector machine classifiers, has shown a noteworthy improvement in terms of performance in comparison to some of the recent prediction methods. The performance metrics of the RAM-PGK predictor are: 0.5741 sensitivity, 0.6436 specificity, 0.0531 precision, 0.6414 accuracy, and 0.0824 Mathews correlation coefficient.


Assuntos
Conjuntos de Dados como Assunto , Ácidos Glicéricos/metabolismo , Lisina/metabolismo , Processamento de Proteína Pós-Traducional , Máquina de Vetores de Suporte , Algoritmos , Sequência de Aminoácidos , Lisina/química , Curva ROC , Software
19.
PLoS One ; 14(12): e0226115, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31825992

RESUMO

Disease causing gene identification is considered as an important step towards drug design and drug discovery. In disease gene identification and classification, the main aim is to identify disease genes while identifying non-disease genes are of less or no significant. Hence, this task can be defined as a one-class classification problem. Existing machine learning methods typically take into consideration known disease genes as positive training set and unknown genes as negative samples to build a binary-class classification model. Here we propose a new One-class Classification Support Vector Machines (OCSVM) method to precisely classify candidate disease genes. Our aim is to build a model that concentrate its focus on detecting known disease-causing gene to increase sensitivity and precision. We investigate the impact of our proposed model using a benchmark consisting of the gene expression dataset for Acute Myeloid Leukemia (AML) cancer. Compared with the traditional methods, our experimental result shows the superiority of our proposed method in terms of precision, recall, and F-measure to detect disease causing genes for AML. OCSVM codes and our extracted AML benchmark are publicly available at: https://github.com/imandehzangi/OCSVM.


Assuntos
Predisposição Genética para Doença/genética , Leucemia Mieloide Aguda/genética , Máquina de Vetores de Suporte , Bases de Dados Genéticas , Humanos , Leucemia Mieloide Aguda/patologia , Interface Usuário-Computador
20.
BMC Mol Cell Biol ; 20(Suppl 2): 57, 2019 Dec 20.
Artigo em Inglês | MEDLINE | ID: mdl-31856704

RESUMO

BACKGROUND: The biological process known as post-translational modification (PTM) is a condition whereby proteomes are modified that affects normal cell biology, and hence the pathogenesis. A number of PTMs have been discovered in the recent years and lysine phosphoglycerylation is one of the fairly recent developments. Even with a large number of proteins being sequenced in the post-genomic era, the identification of phosphoglycerylation remains a big challenge due to factors such as cost, time consumption and inefficiency involved in the experimental efforts. To overcome this issue, computational techniques have emerged to accurately identify phosphoglycerylated lysine residues. However, the computational techniques proposed so far hold limitations to correctly predict this covalent modification. RESULTS: We propose a new predictor in this paper called Bigram-PGK which uses evolutionary information of amino acids to try and predict phosphoglycerylated sites. The benchmark dataset which contains experimentally labelled sites is employed for this purpose and profile bigram occurrences is calculated from position specific scoring matrices of amino acids in the protein sequences. The statistical measures of this work, such as sensitivity, specificity, precision, accuracy, Mathews correlation coefficient and area under ROC curve have been reported to be 0.9642, 0.8973, 0.8253, 0.9193, 0.8330, 0.9306, respectively. CONCLUSIONS: The proposed predictor, based on the feature of evolutionary information and support vector machine classifier, has shown great potential to effectively predict phosphoglycerylated and non-phosphoglycerylated lysine residues when compared against the existing predictors. The data and software of this work can be acquired from https://github.com/abelavit/Bigram-PGK.


Assuntos
Biologia Computacional/métodos , Lisina/metabolismo , Processamento de Proteína Pós-Traducional , Algoritmos , Sequência de Aminoácidos , Glicólise , Lisina/química , Matrizes de Pontuação de Posição Específica , Reprodutibilidade dos Testes , Software , Máquina de Vetores de Suporte
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...