Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
País de afiliación
Intervalo de año de publicación
1.
Brief Bioinform ; 23(5)2022 09 20.
Artículo en Inglés | MEDLINE | ID: mdl-35929355

RESUMEN

A newly invented post-translational modification (PTM), phosphoglycerylation, has shown its essential role in the construction and functional properties of proteins and dangerous human diseases. Hence, it is very urgent to know about the molecular mechanism behind the phosphoglycerylation process to develop the drugs for related diseases. But accurately identifying of phosphoglycerylation site from a protein sequence in a laboratory is a very difficult and challenging task. Hence, the construction of an efficient computation model is greatly sought for this purpose. A little number of computational models are currently available for identifying the phosphoglycerylation sites, which are not able to reach their prediction capability at a satisfactory level. Therefore, an effective predictor named PLP_FS has been designed and constructed to identify phosphoglycerylation sites in this study. For the training purpose, an optimal number of feature sets was obtained by fusion of multiple F_Score feature selection techniques from the features generated by three types of sequence-based feature extraction methods and fitted with the support vector machine classification technique to the prediction model. On the other hand, the k-neighbor near cleaning and SMOTE methods were also implemented to balance the benchmark dataset. The suggested model in 10-fold cross-validation obtained an accuracy of 99.22%, a sensitivity of 98.17% and a specificity of 99.75% according to the experimental findings, which are better than other currently available predictors for accurately identifying the phosphoglycerylation sites.


Asunto(s)
Lisina , Máquina de Vectores de Soporte , Algoritmos , Secuencia de Aminoácidos , Biología Computacional/métodos , Humanos , Lisina/metabolismo , Procesamiento Proteico-Postraduccional , Proteínas/metabolismo
2.
Anal Biochem ; 650: 114707, 2022 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-35568159

RESUMEN

Cancer is one of the most dangerous diseases in the world that often leads to misery and death. Current treatments include different kinds of anticancer therapy which exhibit different types of side effects. Because of certain physicochemical properties, anticancer peptides (ACPs) have opened a new path of treatments for this deadly disease. That is why a well-performed methodology for identifying novel anticancer peptides has great importance in the fight against cancer. In addition to the laboratory techniques, various machine learning and deep learning methodologies have developed in recent years for this task. Although these models have shown reasonable predictive ability, there's still room for improvement in terms of performance and exploring new types of algorithms. In this work, we have proposed a novel multi-channel convolutional neural network (CNN) for identifying anticancer peptides from protein sequences. We have collected data from the existing state-of-the-art methodologies and applied binary encoding for data preprocessing. We have also employed k-fold cross-validation to train our models on benchmark datasets and compared our models' performance on the independent datasets. The comparison has indicated our models' superiority on various evaluation metrics. We think our work can be a valuable asset in finding novel anticancer peptides. We have provided a user-friendly web server for academic purposes and it is publicly available at: http://103.99.176.239/iacp-cnn/.


Asunto(s)
Antineoplásicos , Neoplasias , Secuencia de Aminoácidos , Antineoplásicos/química , Humanos , Neoplasias/tratamiento farmacológico , Redes Neurales de la Computación , Péptidos/química
3.
Sensors (Basel) ; 22(15)2022 Jul 29.
Artículo en Inglés | MEDLINE | ID: mdl-35957257

RESUMEN

Fitness is important in people's lives. Good fitness habits can improve cardiopulmonary capacity, increase concentration, prevent obesity, and effectively reduce the risk of death. Home fitness does not require large equipment but uses dumbbells, yoga mats, and horizontal bars to complete fitness exercises and can effectively avoid contact with people, so it is deeply loved by people. People who work out at home use social media to obtain fitness knowledge, but learning ability is limited. Incomplete fitness is likely to lead to injury, and a cheap, timely, and accurate fitness detection system can reduce the risk of fitness injuries and can effectively improve people's fitness awareness. In the past, many studies have engaged in the detection of fitness movements, among which the detection of fitness movements based on wearable devices, body nodes, and image deep learning has achieved better performance. However, a wearable device cannot detect a variety of fitness movements, may hinder the exercise of the fitness user, and has a high cost. Both body-node-based and image-deep-learning-based methods have lower costs, but each has some drawbacks. Therefore, this paper used a method based on deep transfer learning to establish a fitness database. After that, a deep neural network was trained to detect the type and completeness of fitness movements. We used Yolov4 and Mediapipe to instantly detect fitness movements and stored the 1D fitness signal of movement to build a database. Finally, MLP was used to classify the 1D signal waveform of fitness. In the performance of the classification of fitness movement types, the mAP was 99.71%, accuracy was 98.56%, precision was 97.9%, recall was 98.56%, and the F1-score was 98.23%, which is quite a high performance. In the performance of fitness movement completeness classification, accuracy was 92.84%, precision was 92.85, recall was 92.84%, and the F1-score was 92.83%. The average FPS in detection was 17.5. Experimental results show that our method achieves higher accuracy compared to other methods.


Asunto(s)
Aprendizaje Automático , Redes Neurales de la Computación , Bases de Datos Factuales , Humanos , Movimiento
4.
Sensors (Basel) ; 21(17)2021 Aug 31.
Artículo en Inglés | MEDLINE | ID: mdl-34502747

RESUMEN

Sign language is designed to assist the deaf and hard of hearing community to convey messages and connect with society. Sign language recognition has been an important domain of research for a long time. Previously, sensor-based approaches have obtained higher accuracy than vision-based approaches. Due to the cost-effectiveness of vision-based approaches, researchers have been conducted here also despite the accuracy drop. The purpose of this research is to recognize American sign characters using hand images obtained from a web camera. In this work, the media-pipe hands algorithm was used for estimating hand joints from RGB images of hands obtained from a web camera and two types of features were generated from the estimated coordinates of the joints obtained for classification: one is the distances between the joint points and the other one is the angles between vectors and 3D axes. The classifiers utilized to classify the characters were support vector machine (SVM) and light gradient boosting machine (GBM). Three character datasets were used for recognition: the ASL Alphabet dataset, the Massey dataset, and the finger spelling A dataset. The results obtained were 99.39% for the Massey dataset, 87.60% for the ASL Alphabet dataset, and 98.45% for Finger Spelling A dataset. The proposed design for automatic American sign language recognition is cost-effective, computationally inexpensive, does not require any special sensors or devices, and has outperformed previous studies.


Asunto(s)
Mano , Lengua de Signos , Algoritmos , Dedos , Humanos , Reconocimiento en Psicología , Estados Unidos
5.
Sensors (Basel) ; 21(24)2021 Dec 16.
Artículo en Inglés | MEDLINE | ID: mdl-34960499

RESUMEN

The act of writing letters or words in free space with body movements is known as air-writing. Air-writing recognition is a special case of gesture recognition in which gestures correspond to characters and digits written in the air. Air-writing, unlike general gestures, does not require the memorization of predefined special gesture patterns. Rather, it is sensitive to the subject and language of interest. Traditional air-writing requires an extra device containing sensor(s), while the wide adoption of smart-bands eliminates the requirement of the extra device. Therefore, air-writing recognition systems are becoming more flexible day by day. However, the variability of signal duration is a key problem in developing an air-writing recognition model. Inconsistent signal duration is obvious due to the nature of the writing and data-recording process. To make the signals consistent in length, researchers attempted various strategies including padding and truncating, but these procedures result in significant data loss. Interpolation is a statistical technique that can be employed for time-series signals to ensure minimum data loss. In this paper, we extensively investigated different interpolation techniques on seven publicly available air-writing datasets and developed a method to recognize air-written characters using a 2D-CNN model. In both user-dependent and user-independent principles, our method outperformed all the state-of-the-art methods by a clear margin for all datasets.


Asunto(s)
Aprendizaje Profundo , Redes Neurales de la Computación , Gestos , Reconocimiento en Psicología , Escritura
6.
Anal Biochem ; 525: 107-113, 2017 05 15.
Artículo en Inglés | MEDLINE | ID: mdl-28286168

RESUMEN

The carbonylation is found as an irreversible post-translational modification and considered a biomarker of oxidative stress. It plays major role not only in orchestrating various biological processes but also associated with some diseases such as Alzheimer's disease, diabetes, and Parkinson's disease. However, since the experimental technologies are costly and time-consuming to detect the carbonylation sites in proteins, an accurate computational method for predicting carbonylation sites is an urgent issue which can be useful for drug development. In this study, a novel computational tool termed predCar-Site has been developed to predict protein carbonylation sites by (1) incorporating the sequence-coupled information into the general pseudo amino acid composition, (2) balancing the effect of skewed training dataset by Different Error Costs method, and (3) constructing a predictor using support vector machine as classifier. This predCar-Site predictor achieves an average AUC (area under curve) score of 0.9959, 0.9999, 1, and 0.9997 in predicting the carbonylation sites of K, P, R, and T, respectively. All of the experimental results along with AUC are found from the average of 5 complete runs of the 10-fold cross-validation and those results indicate significantly better performance than existing predictors. A user-friendly web server of predCar-Site is available at http://research.ru.ac.bd/predCar-Site/.


Asunto(s)
Biología Computacional/métodos , Carbonilación Proteica , Procesamiento Proteico-Postraduccional , Proteínas/química , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Máquina de Vectores de Soporte , Algoritmos , Humanos , Modelos Biológicos
7.
Sci Rep ; 13(1): 3771, 2023 03 07.
Artículo en Inglés | MEDLINE | ID: mdl-36882493

RESUMEN

Hepatocellular carcinoma (HCC) is the most common lethal malignancy of the liver worldwide. Thus, it is important to dig the key genes for uncovering the molecular mechanisms and to improve diagnostic and therapeutic options for HCC. This study aimed to encompass a set of statistical and machine learning computational approaches for identifying the key candidate genes for HCC. Three microarray datasets were used in this work, which were downloaded from the Gene Expression Omnibus Database. At first, normalization and differentially expressed genes (DEGs) identification were performed using limma for each dataset. Then, support vector machine (SVM) was implemented to determine the differentially expressed discriminative genes (DEDGs) from DEGs of each dataset and select overlapping DEDGs genes among identified three sets of DEDGs. Enrichment analysis was performed on common DEDGs using DAVID. A protein-protein interaction (PPI) network was constructed using STRING and the central hub genes were identified depending on the degree, maximum neighborhood component (MNC), maximal clique centrality (MCC), centralities of closeness, and betweenness criteria using CytoHubba. Simultaneously, significant modules were selected using MCODE scores and identified their associated genes from the PPI networks. Moreover, metadata were created by listing all hub genes from previous studies and identified significant meta-hub genes whose occurrence frequency was greater than 3 among previous studies. Finally, six key candidate genes (TOP2A, CDC20, ASPM, PRC1, NUSAP1, and UBE2C) were determined by intersecting shared genes among central hub genes, hub module genes, and significant meta-hub genes. Two independent test datasets (GSE76427 and TCGA-LIHC) were utilized to validate these key candidate genes using the area under the curve. Moreover, the prognostic potential of these six key candidate genes was also evaluated on the TCGA-LIHC cohort using survival analysis.


Asunto(s)
Carcinoma Hepatocelular , Neoplasias Hepáticas , Humanos , Carcinoma Hepatocelular/genética , Neoplasias Hepáticas/genética , Genes cdc , Aprendizaje Automático
8.
IEEE/ACM Trans Comput Biol Bioinform ; 20(6): 3786-3799, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37812547

RESUMEN

Biomarkers associated with hepatocellular carcinoma (HCC) are of great importance to better understand biological response mechanisms to internal or external intervention. The study aimed to identify key candidate genes for HCC using machine learning (ML) and statistics-based bioinformatics models. Differentially expressed genes (DEGs) were identified using limma and then selected their common genes among DEGs identified from four datasets. After that, protein-protein interaction networks were constructed using STRING and then Cytoscape was used to determine hub genes, significant modules, and their associated genes. Simultaneously, three ML-based techniques such as support vector machine (SVM), least absolute shrinkage and selection operator-logistic regression (LASSO-LR), and partial least squares-discriminant analysis (PLS-DA) were implemented to determine the discriminative genes of HCC from common DEGs. Moreover, metadata of hub genes were formed by listing all hub genes from existing studies to incorporate other findings in our analysis. Finally, seven key candidate genes (ASPM, CCNB1, CDK1, DLGAP5, KIF20 A, MT1X, and TOP2A) were identified by intersecting common genes among hub genes, significant modules genes, discriminative genes from SVM, LASSO-LR, and PLS-DA, and meta hub genes from existing studies. Another three independent test datasets were also used to validate these seven key candidate genes using AUC, computed from ROC.


Asunto(s)
Carcinoma Hepatocelular , Neoplasias Hepáticas , Humanos , Carcinoma Hepatocelular/genética , Neoplasias Hepáticas/genética , Metadatos , Redes Reguladoras de Genes/genética , Biología Computacional/métodos , Modelos Estadísticos , Regulación Neoplásica de la Expresión Génica , Expresión Génica , Perfilación de la Expresión Génica
9.
Comput Biol Chem ; 104: 107834, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-36863243

RESUMEN

Protein Structure Prediction (PSP) has achieved significant progress lately. Prediction of inter-residue distances by machine learning and their exploitation during the conformational search is largely among the critical factors behind the progress. Real values than bin probabilities could more naturally represent inter-residue distances, while the latter, via spline curves more naturally helps obtain differentiable objective functions than the former. Consequently, PSP methods that exploit predicted binned distances perform better than those that exploit predicted real-valued distances. To leverage the advantage of bin probabilities in getting differentiable objective functions, in this work, we propose techniques to convert real-valued distances into distance bin probabilities. Using standard benchmark proteins, we then show that our real-to-bin converted distances help PSP methods obtain three-dimensional structures with 4%-16% better root mean squared deviation (RMSD), template modeling score (TM-Score), and global distance test (GDT) values than existing similar PSP methods. Our proposed PSP method is named real to bin (R2B) inter-residue distance predictor, and its code is available from https://gitlab.com/mahnewton/r2b.


Asunto(s)
Aprendizaje Automático , Proteínas , Modelos Moleculares , Bases de Datos de Proteínas , Proteínas/química , Conformación Proteica , Biología Computacional/métodos , Algoritmos
10.
IEEE/ACM Trans Comput Biol Bioinform ; 19(6): 3624-3634, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-34546927

RESUMEN

Identifying of post-translational modifications (PTM) is crucial in the study of computational proteomics, cell biology, pathogenesis, and drug development due to its role in many bio-molecular mechanisms. Computational methods for predicting multiple PTM at the same lysine residues, often referred to as K-PTM, is still evolving. This paper presents a novel computational tool, abbreviated as predML-Site, for predicting KPTM, such as acetylation, crotonylation, methylation, succinylation from an uncategorized peptide sample involving single, multiple, or no modification. For informative feature representation, multiple sequence encoding schemes, such as the sequence-coupling, binary encoding, k-spaced amino acid pairs, amino acid factor have been used with ANOVA and incremental feature selection. As a core predictor, a cost-sensitive SVM classifier has been adopted which effectively mitigates the effect of class-label imbalance in the dataset. predML-Site predicts multi-label PTM sites with 84.18% accuracy using the top 91 features. It has also achieved 85.34% aiming and 86.58% coverage rate which are much better than the existing state-of-the-art predictors on the same rigorous validation test. This performance indicates that predML-Site can be used as a supportive tool for further K-PTM study. For the convenience of the experimental scientists, predML-Site has been deployed as a user-friendly web-server at http://103.99.176.239/predML-Site.


Asunto(s)
Algoritmos , Lisina , Lisina/química , Biología Computacional/métodos , Aminoácidos/química , Péptidos
11.
Mol Omics ; 18(7): 652-661, 2022 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-35616228

RESUMEN

RNA-Seq has made significant contributions to various fields, particularly in cancer research. Recent studies on differential gene expression analysis and the discovery of novel cancer biomarkers have extensively used RNA-Seq data. New biomarker identification is essential for moving cancer research forward, and early cancer diagnosis improves patients' chances of recovery and increases life expectancy. There is an urgency and scope of improvement in both sections. In this paper, we developed an autoencoder-based biomarker identification method by reversing the learning mechanism of the trained encoders. We devised an explainable post hoc methodology for identifying influential genes with a high likelihood of becoming biomarkers. We applied recursive feature elimination to shorten the list further and presented a list of 17 potential biomarkers that are 99.93% accurate in identifying cancer types using support vector machine for the UCI gene expression cancer RNA-Seq dataset consisting of five cancerous tumor types. Our methodology outperforms all of the state-of-the-art methods, confirming the potential of the newly identified biomarkers as well as the efficacy of the biomarker identification procedure. Moreover, we have evaluated the performance of our methodology using six independent RNA-Seq gene expression datasets for several tasks, i.e., classification of tumors from non-tumors, detecting the origin of circulating tumor cells (CTCs), and predicting if metastasis occurs or not. Our methodology achieved stimulating results for these tasks as well. The source code of this project is available at https://github.com/fuad021/biomarker-identification.


Asunto(s)
Neoplasias , Máquina de Vectores de Soporte , Biomarcadores de Tumor/genética , Humanos , Neoplasias/diagnóstico , Neoplasias/genética , RNA-Seq , Programas Informáticos
12.
Comput Biol Med ; 148: 105824, 2022 09.
Artículo en Inglés | MEDLINE | ID: mdl-35863250

RESUMEN

Predicted inter-residue distances are a key behind recent success in high quality protein structure prediction (PSP). However, prediction of both short and long distance values together is challenging. Consequently, predicted short distances are mostly used by existing PSP methods. In this paper, we use a stacked meta-ensemble method to combine deep learning models trained for different ranges of real-valued distances. On five benchmark sets of proteins, our proposed inter-residue distance prediction method improves mean Local Distance Different Test (LDDT) scores at least by 5% over existing such methods. Moreover, using a real-valued distance based conformational search algorithm, we also show that predicted long distances help obtain significantly better protein conformations than when only predicted short distances are used. Our method is named meta-ensemble for distance prediction (MDP) and its program is available from https://gitlab.com/mahnewton/mdp.


Asunto(s)
Algoritmos , Proteínas , Conformación Proteica
13.
PLoS One ; 16(2): e0247511, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33621235

RESUMEN

Pseudouridine(Ψ) is widely popular among various RNA modifications which have been confirmed to occur in rRNA, mRNA, tRNA, and nuclear/nucleolar RNA. Hence, identifying them has vital significance in academic research, drug development and gene therapies. Several laboratory techniques for Ψ identification have been introduced over the years. Although these techniques produce satisfactory results, they are costly, time-consuming and requires skilled experience. As the lengths of RNA sequences are getting longer day by day, an efficient method for identifying pseudouridine sites using computational approaches is very important. In this paper, we proposed a multi-channel convolution neural network using binary encoding. We employed k-fold cross-validation and grid search to tune the hyperparameters. We evaluated its performance in the independent datasets and found promising results. The results proved that our method can be used to identify pseudouridine sites for associated purposes. We have also implemented an easily accessible web server at http://103.99.176.239/ipseumulticnn/.


Asunto(s)
Biología Computacional/métodos , Aprendizaje Profundo , Seudouridina/metabolismo , ARN/metabolismo , Animales , Humanos , Ratones , ARN Ribosómico , Saccharomyces cerevisiae
14.
Comput Biol Chem ; 94: 107553, 2021 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-34384997

RESUMEN

Formylation is one of the newly discovered post-translational modifications in lysine residue which is responsible for different kinds of diseases. In this work, a novel predictor, named predForm-Site, has been developed to predict formylation sites with higher accuracy. We have integrated multiple sequence features for developing a more informative representation of formylation sites. Moreover, decision function of the underlying classifier have been optimized on skewed formylation dataset during prediction model training for prediction quality improvement. On the dataset used by LFPred and Formator predictor, predForm-Site achieved 99.5% sensitivity, 99.8% specificity and 99.8% overall accuracy with AUC of 0.999 in the jackknife test. In the independent test, it has also achieved more than 97% sensitivity and 99% specificity. Similarly, in benchmarking with recent method CKSAAP_FormSite, the proposed predictor significantly outperformed in all the measures, particularly sensitivity by around 20%, specificity by nearly 30% and overall accuracy by more than 22%. These experimental results show that the proposed predForm-Site can be used as a complementary tool for the fast exploration of formylation sites. For convenience of the scientific community, predForm-Site has been deployed as an online tool, accessible at http://103.99.176.239:8080/predForm-Site.

15.
Sci Rep ; 11(1): 18882, 2021 09 23.
Artículo en Inglés | MEDLINE | ID: mdl-34556767

RESUMEN

Identification of post-translational modifications (PTM) is significant in the study of computational proteomics, cell biology, pathogenesis, and drug development due to its role in many bio-molecular mechanisms. Though there are several computational tools to identify individual PTMs, only three predictors have been established to predict multiple PTMs at the same lysine residue. Furthermore, detailed analysis and assessment on dataset balancing and the significance of different feature encoding techniques for a suitable multi-PTM prediction model are still lacking. This study introduces a computational method named 'iMul-kSite' for predicting acetylation, crotonylation, methylation, succinylation, and glutarylation, from an unrecognized peptide sample with one, multiple, or no modifications. After successfully eliminating the redundant data samples from the majority class by analyzing the hardness of the sequence-coupling information, feature representation has been optimized by adopting the combination of ANOVA F-Test and incremental feature selection approach. The proposed predictor predicts multi-label PTM sites with 92.83% accuracy using the top 100 features. It has also achieved a 93.36% aiming rate and 96.23% coverage rate, which are much better than the existing state-of-the-art predictors on the validation test. This performance indicates that 'iMul-kSite' can be used as a supportive tool for further K-PTM study. For the convenience of the experimental scientists, 'iMul-kSite' has been deployed as a user-friendly web-server at http://103.99.176.239/iMul-kSite .


Asunto(s)
Algoritmos , Lisina/metabolismo , Biología Computacional/métodos , Conjuntos de Datos como Asunto , Humanos , Procesamiento Proteico-Postraduccional
16.
Diabetes Metab Syndr ; 15(5): 102263, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34482122

RESUMEN

AIMS: This research work presented a comparative study of machine learning (ML), including two objectives: (i) determination of the risk factors of diabetic nephropathy (DN) based on principal component analysis (PCA) via different cutoffs; (ii) prediction of DN patients using ML-based techniques. METHODS: The combination of PCA and ML-based techniques has been implemented to select the best features at different PCA cutoff values and choose the optimal PCA cutoff in which ML-based techniques give the highest accuracy. These optimum features are fed into six ML-based techniques: linear discriminant analysis, support vector machine (SVM), logistic regression, K-nearest neighborhood, naïve Bayes, and artificial neural network. The leave-one-out cross-validation protocol is executed and compared ML-based techniques performance using accuracy and area under the curve (AUC). RESULTS: The data utilized in this work consists of 133 respondents having 73 DN patients with an average age of 69.6±10.2 years and 54.2% of DN patients are female. Our findings illustrate that PCA combined with SVM-RBF classifier yields 88.7% accuracy and 0.91 AUC at 0.96 PCA cutoff. CONCLUSIONS: This study also suggests that PCA combined with SVM-RBF classifier may correctly classify DN patients with the highest accuracy when compared to the models published in the existing research. Prospective studies are warranted to further validate the applicability of our model in clinical settings.


Asunto(s)
Teorema de Bayes , Diabetes Mellitus Tipo 2/complicaciones , Nefropatías Diabéticas/diagnóstico , Aprendizaje Automático , Análisis de Componente Principal , Medición de Riesgo/métodos , Máquina de Vectores de Soporte , Estudios de Casos y Controles , Nefropatías Diabéticas/etiología , Femenino , Estudios de Seguimiento , Humanos , Masculino , Persona de Mediana Edad , Proyectos Piloto , Pronóstico , Reproducibilidad de los Resultados
17.
PLoS One ; 16(4): e0249396, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33793659

RESUMEN

Post-translational modification (PTM) involves covalent modification after the biosynthesis process and plays an essential role in the study of cell biology. Lysine phosphoglycerylation, a newly discovered reversible type of PTM that affects glycolytic enzyme activities, and is responsible for a wide variety of diseases, such as heart failure, arthritis, and degeneration of the nervous system. Our goal is to computationally characterize potential phosphoglycerylation sites to understand the functionality and causality more accurately. In this study, a novel computational tool, referred to as predPhogly-Site, has been developed to predict phosphoglycerylation sites in the protein. It has effectively utilized the probabilistic sequence-coupling information among the nearby amino acid residues of phosphoglycerylation sites along with a variable cost adjustment for the skewed training dataset to enhance the prediction characteristics. It has achieved around 99% accuracy with more than 0.96 MCC and 0.97 AUC in both 10-fold cross-validation and independent test. Even, the standard deviation in 10-fold cross-validation is almost negligible. This performance indicates that predPhogly-Site remarkably outperformed the existing prediction tools and can be used as a promising predictor, preferably with its web interface at http://103.99.176.239/predPhogly-Site.


Asunto(s)
Interfaz Usuario-Computador , Algoritmos , Área Bajo la Curva , Biología Computacional/métodos , Procesamiento Proteico-Postraduccional , Proteínas/metabolismo , Curva ROC
18.
Genes (Basel) ; 11(9)2020 08 31.
Artículo en Inglés | MEDLINE | ID: mdl-32878321

RESUMEN

Post Translational Modification (PTM) is defined as the alteration of protein sequence upon interaction with different macromolecules after the translation process. Glutarylation is considered one of the most important PTMs, which is associated with a wide range of cellular functioning, including metabolism, translation, and specified separate subcellular localizations. During the past few years, a wide range of computational approaches has been proposed to predict Glutarylation sites. However, despite all the efforts that have been made so far, the prediction performance of the Glutarylation sites has remained limited. One of the main challenges to tackle this problem is to extract features with significant discriminatory information. To address this issue, we propose a new machine learning method called BiPepGlut using the concept of a bi-peptide-based evolutionary method for feature extraction. To build this model, we also use the Extra-Trees (ET) classifier for the classification purpose, which, to the best of our knowledge, has never been used for this task. Our results demonstrate BiPepGlut is able to significantly outperform previously proposed models to tackle this problem. BiPepGlut achieves 92.0%, 84.8%, 95.6%, 0.82, and 0.88 in accuracy, sensitivity, specificity, Matthew's Correlation Coefficient, and F1-score, respectively. BiPepGlut is implemented as a publicly available online predictor.


Asunto(s)
Evolución Molecular , Glutaratos/química , Lisina/química , Mycobacterium tuberculosis/metabolismo , Fragmentos de Péptidos/química , Procesamiento Proteico-Postraduccional , Proteínas/química , Algoritmos , Secuencia de Aminoácidos , Animales , Biología Computacional , Glutaratos/metabolismo , Lisina/metabolismo , Aprendizaje Automático , Ratones , Mycobacterium tuberculosis/crecimiento & desarrollo , Fragmentos de Péptidos/metabolismo , Proteínas/metabolismo , Máquina de Vectores de Soporte
19.
Comput Biol Med ; 113: 103385, 2019 10.
Artículo en Inglés | MEDLINE | ID: mdl-31437626

RESUMEN

Identification of genes whose regulation of expression is functionally similar in both brain tissue and blood cells could in principle enable monitoring of significant neurological traits and disorders by analysis of blood samples. We thus employed transcriptional analysis of pathologically affected tissues, using agnostic approaches to identify overlapping gene functions and integrating this transcriptomic information with expression quantitative trait loci (eQTL) data. Here, we estimate the correlation of gene expression in the top-associated cis-eQTLs of brain tissue and blood cells in Parkinson's Disease (PD). We introduced quantitative frameworks to reveal the complex relationship of various biasing genetic factors in PD, a neurodegenerative disease. We examined gene expression microarray and RNA-Seq datasets from human brain and blood tissues from PD-affected and control individuals. Differentially expressed genes (DEG) were identified for both brain and blood cells to determine common DEG overlaps. Based on neighborhood-based benchmarking and multilayer network topology approaches we then developed genetic associations of factors with PD. Overlapping DEG sets underwent gene enrichment using pathway analysis and gene ontology methods, which identified candidate common genes and pathways. We identified 12 significantly dysregulated genes shared by brain and blood cells, which were validated using dbGaP (gene SNP-disease linkage) database for gold-standard benchmarking of their significance in disease processes. Ontological and pathway analyses identified significant gene ontology and molecular pathways that indicate PD progression. In sum, we found possible novel links between pathological processes in brain tissue and blood cells by examining cell pathway commonalities, corroborating these associations using well validated datasets. This demonstrates that for brain-related pathologies combining gene expression analysis and blood cell cis-eQTL is a potentially powerful analytical approach. Thus, our methodologies facilitate data-driven approaches that can advance knowledge of disease mechanisms and may, with clinical validation, enable prediction of neurological dysfunction using blood cell transcript profiling.


Asunto(s)
Células Sanguíneas/metabolismo , Encéfalo/metabolismo , Simulación por Computador , Bases de Datos de Ácidos Nucleicos , Regulación de la Expresión Génica , Enfermedad de Parkinson/metabolismo , Biomarcadores/metabolismo , Células Sanguíneas/patología , Encéfalo/patología , Estudio de Asociación del Genoma Completo , Humanos , Enfermedad de Parkinson/patología
20.
Mol Biosyst ; 13(4): 785-795, 2017 Mar 28.
Artículo en Inglés | MEDLINE | ID: mdl-28247893

RESUMEN

Predicting the subcellular locations of proteins can provide useful hints that reveal their functions, increase our understanding of the mechanisms of some diseases, and finally aid in the development of novel drugs. As the number of newly discovered proteins has been growing exponentially, which in turns, makes the subcellular localization prediction by purely laboratory tests prohibitively laborious and expensive. In this context, to tackle the challenges, computational methods are being developed as an alternative choice to aid biologists in selecting target proteins and designing related experiments. However, the success of protein subcellular localization prediction is still a complicated and challenging issue, particularly, when query proteins have multi-label characteristics, i.e., if they exist simultaneously in more than one subcellular location or if they move between two or more different subcellular locations. To date, to address this problem, several types of subcellular localization prediction methods with different levels of accuracy have been proposed. The support vector machine (SVM) has been employed to provide potential solutions to the protein subcellular localization prediction problem. However, the practicability of an SVM is affected by the challenges of selecting an appropriate kernel and selecting the parameters of the selected kernel. To address this difficulty, in this study, we aimed to develop an efficient multi-label protein subcellular localization prediction system, named as MKLoc, by introducing multiple kernel learning (MKL) based SVM. We evaluated MKLoc using a combined dataset containing 5447 single-localized proteins (originally published as part of the Höglund dataset) and 3056 multi-localized proteins (originally published as part of the DBMLoc set). Note that this dataset was used by Briesemeister et al. in their extensive comparison of multi-localization prediction systems. Finally, our experimental results indicate that MKLoc not only achieves higher accuracy than a single kernel based SVM system but also shows significantly better results than those obtained from other top systems (MDLoc, BNCs, YLoc+). Moreover, MKLoc requires less computation time to tune and train the system than that required for BNCs and single kernel based SVM.


Asunto(s)
Biología Computacional/métodos , Proteínas/metabolismo , Máquina de Vectores de Soporte , Algoritmos , Conjuntos de Datos como Asunto , Espacio Intracelular/metabolismo , Transporte de Proteínas , Reproducibilidad de los Resultados
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA