Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 109
Filter
Add more filters

Publication year range
1.
Methods ; 223: 56-64, 2024 Mar.
Article in English | MEDLINE | ID: mdl-38237792

ABSTRACT

DNA-binding proteins are a class of proteins that can interact with DNA molecules through physical and chemical interactions. Their main functions include regulating gene expression, maintaining chromosome structure and stability, and more. DNA-binding proteins play a crucial role in cellular and molecular biology, as they are essential for maintaining normal cellular physiological functions and adapting to environmental changes. The prediction of DNA-binding proteins has been a hot topic in the field of bioinformatics. The key to accurately classifying DNA-binding proteins is to find suitable feature sources and explore the information they contain. Although there are already many models for predicting DNA-binding proteins, there is still room for improvement in mining feature source information and calculation methods. In this study, we created a model called DBPboost to better identify DNA-binding proteins. The innovation of this study lies in the use of eight feature extraction methods, the improvement of the feature selection step, which involves selecting some features first and then performing feature selection again after feature fusion, and the optimization of the differential evolution algorithm in feature fusion, which improves the performance of feature fusion. The experimental results show that the prediction accuracy of the model on the UniSwiss dataset is 89.32%, and the sensitivity is 89.01%, which is better than most existing models.


Subject(s)
DNA-Binding Proteins , Support Vector Machine , DNA-Binding Proteins/chemistry , Algorithms , DNA/chemistry , Computational Biology/methods
2.
Proteomics ; 24(17): e2300184, 2024 Sep.
Article in English | MEDLINE | ID: mdl-38643383

ABSTRACT

Unconventional secretory proteins (USPs) are vital for cell-to-cell communication and are necessary for proper physiological processes. Unlike classical proteins that follow the conventional secretory pathway via the Golgi apparatus, these proteins are released using unconventional pathways. The primary modes of secretion for USPs are exosomes and ectosomes, which originate from the endoplasmic reticulum. Accurate and rapid identification of exosome-mediated secretory proteins is crucial for gaining valuable insights into the regulation of non-classical protein secretion and intercellular communication, as well as for the advancement of novel therapeutic approaches. Although computational methods based on amino acid sequence prediction exist for predicting unconventional proteins secreted by exosomes (UPSEs), they suffer from significant limitations in terms of algorithmic accuracy. In this study, we propose a novel approach to predict UPSEs by combining multiple deep learning models that incorporate both protein sequences and evolutionary information. Our approach utilizes a convolutional neural network (CNN) to extract protein sequence information, while various densely connected neural networks (DNNs) are employed to capture evolutionary conservation patterns.By combining six distinct deep learning models, we have created a superior framework that surpasses previous approaches, achieving an ACC score of 77.46% and an MCC score of 0.5406 on an independent test dataset.


Subject(s)
Deep Learning , Exosomes , Exosomes/metabolism , Exosomes/chemistry , Neural Networks, Computer , Humans , Computational Biology/methods , Algorithms , Amino Acid Sequence , Proteins/metabolism , Proteins/analysis , Proteins/chemistry
3.
J Cell Biochem ; : e30642, 2024 Aug 20.
Article in English | MEDLINE | ID: mdl-39164870

ABSTRACT

The Type III secretion effectors (T3SEs) are bacterial proteins synthesized by Gram-negative pathogens and delivered into host cells via the Type III secretion system (T3SS). These effectors usually play a pivotal role in the interactions between bacteria and hosts. Hence, the precise identification of T3SEs aids researchers in exploring the pathogenic mechanisms of bacterial infections. Since the diversity and complexity of T3SE sequences often make traditional experimental methods time-consuming, it is imperative to explore more efficient and convenient computational approaches for T3SE prediction. Inspired by the promising potential exhibited by pre-trained language models in protein recognition tasks, we proposed a method called PLM-T3SE that utilizes protein language models (PLMs) for effective recognition of T3SEs. First, we utilized PLM embeddings and evolutionary features from the position-specific scoring matrix (PSSM) profiles to transform protein sequences into fixed-length vectors for model training. Second, we employed the extreme gradient boosting (XGBoost) algorithm to rank these features based on their importance. Finally, a MLP neural network model was used to predict T3SEs based on the selected optimal feature set. Experimental results from the cross-validation and independent test demonstrated that our model exhibited superior performance compared to the existing models. Specifically, our model achieved an accuracy of 98.1%, which is 1.8%-42.4% higher than the state-of-the-art predictors based on the same independent data set test. These findings highlight the superiority of the PLM-T3SE and the remarkable characterization ability of PLM embeddings for T3SE prediction.

4.
Brief Bioinform ; 23(1)2022 01 17.
Article in English | MEDLINE | ID: mdl-34472594

ABSTRACT

In the past decade, convolutional neural networks (CNNs) have been used as powerful tools by scientists to solve visual data tasks. However, many efforts of convolutional neural networks in solving protein function prediction and extracting useful information from protein sequences have certain limitations. In this research, we propose a new method to improve the weaknesses of the previous method. mCNN-ETC is a deep learning model which can transform the protein evolutionary information into image-like data composed of 20 channels, which correspond to the 20 amino acids in the protein sequence. We constructed CNN layers with different scanning windows in parallel to enhance the useful pattern detection ability of the proposed model. Then we filtered specific patterns through the 1-max pooling layer before inputting them into the prediction layer. This research attempts to solve a basic problem in biology in terms of application: predicting electron transporters and classifying their corresponding complexes. The performance result reached an accuracy of 97.41%, which was nearly 6% higher than its predecessor. We have also published a web server on http://bio219.bioinfo.yzu.edu.tw, which can be used for research purposes free of charge.


Subject(s)
Electrons , Neural Networks, Computer , Amino Acid Sequence , Biological Evolution , Humans , Proteins/chemistry
5.
Anal Biochem ; 694: 115603, 2024 Nov.
Article in English | MEDLINE | ID: mdl-38986796

ABSTRACT

The recognition of DNA-binding proteins (DBPs) is the crucial step to understanding their roles in various biological processes such as genetic regulation, gene expression, cell cycle control, DNA repair, and replication within cells. However, conventional experimental methods for identifying DBPs are usually time-consuming and expensive. Therefore, there is an urgent need to develop rapid and efficient computational methods for the prediction of DBPs. In this study, we proposed a novel predictor named PreDBP-PLMs to further improve the identification accuracy of DBPs by fusing the pre-trained protein language model (PLM) ProtT5 embedding with evolutionary features as input to the classic convolutional neural network (CNN) model. Firstly, the ProtT5 embedding was combined with different evolutionary features derived from the position-specific scoring matrix (PSSM) to represent protein sequences. Then, the optimal feature combination was selected and input to the CNN classifier for the prediction of DBPs. Finally, the 5-fold cross-validation (CV), the leave-one-out CV (LOOCV), and the independent set test were adopted to examine the performance of PreDBP-PLMs on the benchmark datasets. Compared to the existing state-of-the-art predictors, PreDBP-PLMs exhibits an accuracy improvement of 0.5 % and 5.2 % on the PDB186 and PDB2272 datasets, respectively. It demonstrated that the proposed method could serve as a useful tool for the recognition of DBPs.


Subject(s)
DNA-Binding Proteins , Neural Networks, Computer , DNA-Binding Proteins/metabolism , DNA-Binding Proteins/chemistry , Computational Biology/methods , Databases, Protein , Humans
6.
Protein Expr Purif ; 219: 106485, 2024 Jul.
Article in English | MEDLINE | ID: mdl-38642863

ABSTRACT

BACKGROUND: Rational design of synthetic phage-displayed libraries requires the identification of the most appropriate positions for randomization using defined amino acid sets to recapitulate the natural occurrence. The present study uses position-specific scoring matrixes (PSSMs) for identifying and randomizing Camelidae nanobody (VHH) CDR3. The functionality of a synthetic VHH repertoire designed by this method was tested for discovering new VHH binders to recombinant coagulation factor VII (rfVII). METHODS: Based on PSSM analysis, the CDR3 of cAbBCII10 VHH framework was identified, and a set of amino acids for the substitution of each PSSM-CDR3 position was defined. Using the Rosetta design SwiftLib tool, the final repertoire was back-translated to a degenerate nucleotide sequence. A synthetic phage-displayed library was constructed based on this repertoire and screened for anti-rfVII binders. RESULTS: A synthetic phage-displayed VHH library with 1 × 108 variants was constructed. Three VHH binders to rfVII were isolated from this library with estimated dissociation constants (KD) of 1 × 10-8 M, 5.8 × 10-8 M and 2.6 × 10-7 M. CONCLUSION: PSSM analysis is a simple and efficient way to design synthetic phage-displayed libraries.


Subject(s)
Computational Biology , Peptide Library , Single-Domain Antibodies , Single-Domain Antibodies/genetics , Single-Domain Antibodies/chemistry , Single-Domain Antibodies/immunology , Animals , Camelidae/genetics , Camelidae/immunology , Factor VII/genetics , Factor VII/chemistry , Factor VII/immunology , Recombinant Proteins/genetics , Recombinant Proteins/chemistry , Recombinant Proteins/immunology , Amino Acid Sequence
7.
Brief Bioinform ; 22(6)2021 11 05.
Article in English | MEDLINE | ID: mdl-34131696

ABSTRACT

Major histocompatibility complex (MHC) possesses important research value in the treatment of complex human diseases. A plethora of computational tools has been developed to predict MHC class I binders. Here, we comprehensively reviewed 27 up-to-date MHC I binding prediction tools developed over the last decade, thoroughly evaluating feature representation methods, prediction algorithms and model training strategies on a benchmark dataset from Immune Epitope Database. A common limitation was identified during the review that all existing tools can only handle a fixed peptide sequence length. To overcome this limitation, we developed a bilateral and variable long short-term memory (BVLSTM)-based approach, named BVLSTM-MHC. It is the first variable-length MHC class I binding predictor. In comparison to the 10 mainstream prediction tools on an independent validation dataset, BVLSTM-MHC achieved the best performance in six out of eight evaluated metrics. A web server based on the BVLSTM-MHC model was developed to enable accurate and efficient MHC class I binder prediction in human, mouse, macaque and chimpanzee.


Subject(s)
Binding Sites , Carrier Proteins/chemistry , Computational Biology/methods , Histocompatibility Antigens Class I/chemistry , Neural Networks, Computer , Software , Amino Acid Sequence , Carrier Proteins/metabolism , Databases, Factual , Deep Learning , Epitopes/chemistry , Epitopes/immunology , Epitopes/metabolism , Histocompatibility Antigens Class I/immunology , Histocompatibility Antigens Class I/metabolism , Machine Learning , Protein Binding , ROC Curve , Reproducibility of Results , Web Browser
8.
Proteins ; 90(7): 1486-1492, 2022 07.
Article in English | MEDLINE | ID: mdl-35246878

ABSTRACT

Protein multiple sequence alignment information has long been important features to know about functions of proteins inferred from related sequences with known functions. It is therefore one of the underlying ideas of Alpha fold 2, a breakthrough study and model for the prediction of three-dimensional structures of proteins from their primary sequence. Our study used protein multiple sequence alignment information in the form of position-specific scoring matrices as input. We also refined the use of a convolutional neural network, a well-known deep-learning architecture with impressive achievement on image and image-like data. Specifically, we revisited the study of prediction of adenosine triphosphate (ATP)-binding sites with more efficient convolutional neural networks. We applied multiple convolutional window scanning filters of a convolutional neural network on position-specific scoring matrices for as much as useful information as possible. Furthermore, only the most specific motifs are retained at each feature map output through the one-max pooling layer before going to the next layer. We assumed that this way could help us retain the most conserved motifs which are discriminative information for prediction. Our experiment results show that a convolutional neural network with not too many convolutional layers can be enough to extract the conserved information of proteins, which leads to higher performance. Our best prediction models were obtained after examining them with different hyper-parameters. Our experiment results showed that our models were superior to traditional use of convolutional neural networks on the same datasets as well as other machine-learning classification algorithms.


Subject(s)
Adenosine Triphosphate , Carrier Proteins , Algorithms , Binding Sites , Machine Learning , Neural Networks, Computer , Proteins/chemistry
9.
J Mol Recognit ; 34(6): e2887, 2021 06.
Article in English | MEDLINE | ID: mdl-33442949

ABSTRACT

Protein-RNA interactions play essential roles in a wide variety of biological processes. Recognition of RNA-binding residues on proteins has been a challenging problem. Most of methods utilize the position-specific scoring matrix (PSSM). It has been found that considering the evolutionary information of sequence neighboring residues can improve the prediction. In this work, we introduce a novel method SNB-PSSM (spatial neighbor-based PSSM) combined with the structure window scheme where the evolutionary information of spatially neighboring residues is considered. The results show our method consistently outperforms the standard and smoothed PSSM methods. Tested on multiple datasets, this approach shows an encouraging performance compared with RNABindRPlus, BindN+, PPRInt, xypan, Predict_RBP, SpaPF, PRNA, and KYG, although is inferior to RNAProSite, RBscore, and aaRNA. In addition, since our method is not sensitive to protein structure changes, it can be applied well on binding site predictions of modeled structures. Thus, the result also suggests the evolution of binding sites is spatially cooperative. The proposed method as an effective tool of considering evolutionary information can be widely used for the nucleic acid-/protein-binding site prediction and functional motif finding.


Subject(s)
Binding Sites/physiology , Protein Binding/physiology , RNA-Binding Proteins/metabolism , RNA/metabolism , Algorithms , Computational Biology/methods , Databases, Protein , Position-Specific Scoring Matrices
10.
Biopolymers ; 112(2): e23419, 2021 Feb.
Article in English | MEDLINE | ID: mdl-33476047

ABSTRACT

DNA-binding proteins perform an indispensable function in the maintenance and processing of genetic information and are inefficiently identified by traditional experimental methods due to their huge quantities. On the contrary, machine learning methods as an emerging technique demonstrate satisfactory speed and accuracy when used to study these molecules. This work focuses on extracting four different features from primary and secondary sequence features: Reduced sequence and index-vectors (RS), Pseudo-amino acid components (PseAACS), Position-specific scoring matrix-Auto Cross Covariance Transform (PSSM-ACCT), and Position-specific scoring matrix-Discrete Wavelet Transform (PSSM-DWT). Using the LASSO dimension reduction method, we experiment on the combination of feature submodels to obtain the optimized number of top rank features. These features are respectively input into the training Ensemble subspace discriminant, Ensemble bagged tree and KNN to predict the DNA-binding proteins. Three different datasets, PDB594, PDB1075, and PDB186, are adopted to evaluate the performance of the as-proposed approach in this work. The PDB1075 and PDB594 datasets are adopted for the five-fold cross-validation, and the PDB186 is used for the independent experiment. In the five-fold cross-validation, both the PDB1075 and PDB594 show extremely high accuracy, reaching 86.98% and 88.9% by Ensemble subspace discriminant, respectively. The accuracy of independent experiment by multi-classifiers voting is 83.33%, which suggests that the methodology proposed in this work is capable of predicting DNA-binding proteins effectively.


Subject(s)
Algorithms , DNA-Binding Proteins/chemistry , Proteomics/methods , Databases, Protein , Position-Specific Scoring Matrices
11.
Molecules ; 26(17)2021 Sep 03.
Article in English | MEDLINE | ID: mdl-34500792

ABSTRACT

Identification of drug-target interactions (DTIs) is vital for drug discovery. However, traditional biological approaches have some unavoidable shortcomings, such as being time consuming and expensive. Therefore, there is an urgent need to develop novel and effective computational methods to predict DTIs in order to shorten the development cycles of new drugs. In this study, we present a novel computational approach to identify DTIs, which uses protein sequence information and the dual-tree complex wavelet transform (DTCWT). More specifically, a position-specific scoring matrix (PSSM) was performed on the target protein sequence to obtain its evolutionary information. Then, DTCWT was used to extract representative features from the PSSM, which were then combined with the drug fingerprint features to form the feature descriptors. Finally, these descriptors were sent to the Rotation Forest (RoF) model for classification. A 5-fold cross validation (CV) was adopted on four datasets (Enzyme, Ion Channel, GPCRs (G-protein-coupled receptors), and NRs (Nuclear Receptors)) to validate the proposed model; our method yielded high average accuracies of 89.21%, 85.49%, 81.02%, and 74.44%, respectively. To further verify the performance of our model, we compared the RoF classifier with two state-of-the-art algorithms: the support vector machine (SVM) and the k-nearest neighbor (KNN) classifier. We also compared it with some other published methods. Moreover, the prediction results for the independent dataset further indicated that our method is effective for predicting potential DTIs. Thus, we believe that our method is suitable for facilitating drug discovery and development.


Subject(s)
Drug Development , Support Vector Machine , Wavelet Analysis , Databases, Protein , Enzymes/chemistry , Ion Channels/chemistry , Receptors, Cytoplasmic and Nuclear/chemistry , Receptors, G-Protein-Coupled/chemistry
12.
Molecules ; 26(9)2021 Apr 24.
Article in English | MEDLINE | ID: mdl-33923273

ABSTRACT

Many gram-negative bacteria use type IV secretion systems to deliver effector molecules to a wide range of target cells. These substrate proteins, which are called type IV secreted effectors (T4SE), manipulate host cell processes during infection, often resulting in severe diseases or even death of the host. Therefore, identification of putative T4SEs has become a very active research topic in bioinformatics due to its vital roles in understanding host-pathogen interactions. PSI-BLAST profiles have been experimentally validated to provide important and discriminatory evolutionary information for various protein classification tasks. In the present study, an accurate computational predictor termed iT4SE-EP was developed for identifying T4SEs by extracting evolutionary features from the position-specific scoring matrix and the position-specific frequency matrix profiles. First, four types of encoding strategies were designed to transform protein sequences into fixed-length feature vectors based on the two profiles. Then, the feature selection technique based on the random forest algorithm was utilized to reduce redundant or irrelevant features without much loss of information. Finally, the optimal features were input into a support vector machine classifier to carry out the prediction of T4SEs. Our experimental results demonstrated that iT4SE-EP outperformed most of existing methods based on the independent dataset test.


Subject(s)
Evolution, Molecular , Gram-Negative Bacteria/genetics , Host-Pathogen Interactions/genetics , Type IV Secretion Systems/genetics , Amino Acid Sequence/genetics , Bacterial Infections/drug therapy , Bacterial Infections/genetics , Bacterial Infections/microbiology , Computational Biology , Gram-Negative Bacteria/pathogenicity , Humans , Type IV Secretion Systems/chemistry
13.
BMC Bioinformatics ; 21(1): 212, 2020 May 24.
Article in English | MEDLINE | ID: mdl-32448129

ABSTRACT

BACKGROUND: Apoptosis, also called programmed cell death, refers to the spontaneous and orderly death of cells controlled by genes in order to maintain a stable internal environment. Identifying the subcellular location of apoptosis proteins is very helpful in understanding the mechanism of apoptosis and designing drugs. Therefore, the subcellular localization of apoptosis proteins has attracted increased attention in computational biology. Effective feature extraction methods play a critical role in predicting the subcellular location of proteins. RESULTS: In this paper, we proposed two novel feature extraction methods based on evolutionary information. One of the features obtained the evolutionary information via the transition matrix of the consensus sequence (CTM). And the other utilized the evolutionary information from PSSM based on absolute entropy correlation analysis (AECA-PSSM). After fusing the two kinds of features, linear discriminant analysis (LDA) was used to reduce the dimension of the proposed features. Finally, the support vector machine (SVM) was adopted to predict the protein subcellular locations. The proposed CTM-AECA-PSSM-LDA subcellular location prediction method was evaluated using the CL317 dataset and ZW225 dataset. By jackknife test, the overall accuracy was 99.7% (CL317) and 95.6% (ZW225) respectively. CONCLUSIONS: The experimental results show that the proposed method which is hopefully to be a complementary tool for the existing methods of subcellular localization, can effectively extract more abundant features of protein sequence and is feasible in predicting the subcellular location of apoptosis proteins.


Subject(s)
Algorithms , Apoptosis Regulatory Proteins/metabolism , Discriminant Analysis , Evolution, Molecular , Amino Acid Sequence , Apoptosis Regulatory Proteins/chemistry , Consensus Sequence , Databases, Protein , Entropy , Position-Specific Scoring Matrices , ROC Curve , Subcellular Fractions/metabolism , Support Vector Machine
14.
Genomics ; 111(6): 1839-1852, 2019 12.
Article in English | MEDLINE | ID: mdl-30550813

ABSTRACT

The identification of drug-target interactions has great significance for pharmaceutical scientific research. Since traditional experimental methods identifying drug-target interactions is costly and time-consuming, the use of machine learning methods to predict potential drug-target interactions has attracted widespread attention. This paper presents a novel drug-target interactions prediction method called LRF-DTIs. Firstly, the pseudo-position specific scoring matrix (PsePSSM) and FP2 molecular fingerprinting were used to extract the features of drug-target. Secondly, using Lasso to reduce the dimension of the extracted feature information and then the Synthetic Minority Oversampling Technique (SMOTE) method was used to deal with unbalanced data. Finally, the processed feature vectors were input into a random forest (RF) classifier to predict drug-target interactions. Through 10 trials of 5-fold cross-validation, the overall prediction accuracies on the enzyme, ion channel (IC), G-protein-coupled receptor (GPCR) and nuclear receptor (NR) datasets reached 98.09%, 97.32%, 95.69%, and 94.88%, respectively, and compared with other prediction methods. In addition, we have tested and verified that our method not only could be applied to predict the new interactions but also could obtain a satisfactory result on the new dataset. All the experimental results indicate that our method can significantly improve the prediction accuracy of drug-target interactions and play a vital role in the new drug research and target protein development. The source code and all datasets are available at https://github.com/QUST-AIBBDRC/LRF-DTIs/ for academic use.


Subject(s)
Databases, Protein , Ion Channels/genetics , Machine Learning , Receptors, Cytoplasmic and Nuclear/genetics , Receptors, G-Protein-Coupled/genetics , Software , Drug Development , Position-Specific Scoring Matrices , Protein Conformation
15.
BMC Bioinformatics ; 20(1): 377, 2019 Jul 06.
Article in English | MEDLINE | ID: mdl-31277574

ABSTRACT

BACKGROUND: Electron transport chain is a series of protein complexes embedded in the process of cellular respiration, which is an important process to transfer electrons and other macromolecules throughout the cell. It is also the major process to extract energy via redox reactions in the case of oxidation of sugars. Many studies have determined that the electron transport protein has been implicated in a variety of human diseases, i.e. diabetes, Parkinson, Alzheimer's disease and so on. Few bioinformatics studies have been conducted to identify the electron transport proteins with high accuracy, however, their performance results require a lot of improvements. Here, we present a novel deep neural network architecture to address this problem. RESULTS: Most of the previous studies could not use the original position specific scoring matrix (PSSM) profiles to feed into neural networks, leading to a lack of information and the neural networks consequently could not achieve the best results. In this paper, we present a novel approach by using deep gated recurrent units (GRU) on full PSSMs to resolve this problem. Our approach can precisely predict the electron transporters with the cross-validation and independent test accuracy of 93.5 and 92.3%, respectively. Our approach demonstrates superior performance to all of the state-of-the-art predictors on electron transport proteins. CONCLUSIONS: Through the proposed study, we provide ET-GRU, a web server for discriminating electron transport proteins in particular and other protein functions in general. Also, our achievement could promote the use of GRU in computational biology, especially in protein function prediction.


Subject(s)
Electron Transport Chain Complex Proteins/chemistry , Neural Networks, Computer , Software , Electron Transport , Humans , Position-Specific Scoring Matrices
16.
J Proteome Res ; 18(9): 3503-3511, 2019 09 06.
Article in English | MEDLINE | ID: mdl-31362508

ABSTRACT

Protein function prediction is one of the well-known problems in proteome research, attracting the attention of numerous researchers. However, the implementation of deep neural networks, which helps to increase the protein function prediction, still poses a big challenge. This study proposes a deep learning approach namely Fertility-GRU that incorporates gated recurrent units and position-specific scoring matrix profiles to predict the function of fertility-related protein, which is a highly crucial biological function. Fertility-related proteins also have been proven to be important in many biological entities (i.e., bone marrow and peripheral blood, postnatal mammalian ovary) and parameters (i.e., daily sperm production). As a result, our model can achieve a cross-validation accuracy of 85.8% and an independent accuracy of 91.1%. We also solve the problem of overfitting in the data set by adding dropout layers in the deep learning model. The independent testing results showed sensitivity, specificity, and Matthews correlation coefficient (MCC) values of 90.5%, 91.7%, and 0.82, respectively. Fertility-GRU demonstrates superiority in performance against the state-of-the-art predictor on the same data set. In our proposed study, we provided a method that enables more proteins to be discovered, especially proteins associated with fertility. Moreover, our achievement could promote the use of recurrent networks and gated recurrent units in proteome research. The source code and data set are freely accessible via https://github.com/khanhlee/fertility-gru .


Subject(s)
Fertility/genetics , Proteins/genetics , Proteomics/methods , Software , Algorithms , Databases, Genetic , Deep Learning , Embryonic Development/genetics , Female , Humans , Male , Neural Networks, Computer , Oogenesis/genetics , Position-Specific Scoring Matrices , Proteins/classification , Proteins/isolation & purification , Proteomics/statistics & numerical data , Spermatogenesis/genetics
17.
J Comput Chem ; 40(15): 1521-1529, 2019 06 05.
Article in English | MEDLINE | ID: mdl-30883833

ABSTRACT

The movement of ions across the cell membrane is an essential for many biological processes. This study is focused on ion channels and ion transporters (pumps) as types of border guards control the incessant traffic of ions across cell membranes. Ion channels and ion transporters function to regulate membrane potential and electrical signaling and play important roles in cell proliferation, migration, apoptosis, and differentiation. In their behaviors, it is found that ion channels differ significantly from ion transporters. Therefore, a method for automatically classifying ion transporters and ion channels from membrane proteins is proposed by training deep neural networks and using the position-specific scoring matrix profile as an input. The key of novelty is the three-stage approach, in which five techniques for data normalization are used; next three imbalanced data techniques are applied to the minority classes and then, six classifiers are compared with the proposed method. © 2019 Wiley Periodicals, Inc.


Subject(s)
Deep Learning , Ion Channels/chemistry , Ion Channels/classification , Automation , Humans , Ion Transport
18.
Anal Biochem ; 564-565: 123-132, 2019 01 01.
Article in English | MEDLINE | ID: mdl-30393088

ABSTRACT

Membrane protein is a pivotal constituent of a cell that exerts a crucial influence on diverse biological processes. The accurate identification of membrane protein types is deeply essential for revealing molecular mechanisms and drug development. Primarily, several traditional methods were exploited to classify these types. However, experimental methods are laborious, time-consuming, and costly due to rapid exploration of uncharacterized protein sequences generated in the postgenomic era. Hence, machine learning-based methods are more indispensable for reliable and fast identification of membrane protein types. A variety of state-of-the-art investigations have been elucidated to improve prediction performance, but predictive validity is still insufficient. Motivated by this, we designed a promising sequential support vector machine based predictor called TargetHMP to predict types of membrane proteins. We captured the local informative features by exploring evolutionary profiles through a novel method called the segmentation-based pseudo position-specific scoring matrix (Seg-PsePSSM). TargetHMP attained high accuracy of 94.99%, 93.48%, and 90.36% on the S1, S2, and S3 datasets, respectively, using a vigorous leave-one-out-cross-validation test. The results indicate that the performance of the proposed method outperformed prior predictors. We expect that the proposed approach will help research academia in general and pharmaceutical drug discovery in particular.


Subject(s)
Membrane Proteins/analysis , Algorithms , Computational Biology/methods , Databases, Protein , Membrane Proteins/classification , Membrane Proteins/genetics , Support Vector Machine
19.
Anal Biochem ; 575: 17-26, 2019 06 15.
Article in English | MEDLINE | ID: mdl-30930199

ABSTRACT

Motor proteins are the driving force behind muscle contraction and are responsible for the active transportation of most proteins and vesicles in the cytoplasm. There are three superfamilies of cytoskeletal motor proteins with various molecular functions and structures: dynein, kinesin, and myosin. The functional loss of a specific motor protein molecular function has linked to a variety of human diseases, e.g., Charcot-Marie-Tooth disease, kidney disease. Therefore, creating a precise model to classify motor proteins is essential for helping biologists understand their molecular functions and design drug targets according to their impact on human diseases. Here we attempt to classify cytoskeleton motor proteins using deep learning, which has been increasingly and widely used to address numerous problems in a variety of fields resulting in state-of-the-art results. Our effective deep convolutional neural network is able to achieve an independent test accuracy of 97.5%, 96.4%, and 96.1% for each superfamily, respectively. Compared to other state-of-the-art methods, our approach showed a significant improvement in performance across a range of evaluation metrics. Through the proposed study, we provide an effective model for classifying motor proteins and a basis for further research that can enhance the performance of protein function classification using deep learning.


Subject(s)
Cytoskeletal Proteins/physiology , Molecular Motor Proteins/physiology , Neural Networks, Computer , Algorithms , Humans , Machine Learning
20.
J Theor Biol ; 461: 230-238, 2019 01 14.
Article in English | MEDLINE | ID: mdl-30321541

ABSTRACT

RNA-protein interaction (RPI) plays an important role in the basic cellular processes of organisms. Unfortunately, due to time and cost constraints, it is difficult for biological experiments to determine the relationship between RNA and protein to a large extent. So there is an urgent need for reliable computational methods to quickly and accurately predict RNA-protein interaction. In this study, we propose a novel computational method RPIFSE (predicting RPI with Feature Selection Ensemble method) based on RNA and protein sequence information to predict RPI. Firstly, RPIFSE disturbs the features extracted by the convolution neural network (CNN) and generates multiple data sets according to the weight of the feature, and then use extreme learning machine (ELM) classifier to classify these data sets. Finally, the results of each classifier are combined, and the highest score is chosen as the final prediction result by weighting voting method. In 5-fold cross-validation experiments, RPIFSE achieved 91.87%, 89.74%, 97.76% and 98.98% accuracy on RPI369, RPI2241, RPI488 and RPI1807 data sets, respectively. To further evaluate the performance of RPIFSE, we compare it with the state-of-the-art support vector machine (SVM) classifier and other exiting methods on those data sets. Furthermore, we also predicted the RPI on the independent data set NPInter2.0 and drew the network graph based on the prediction results. These promising comparison results demonstrated the effectiveness of RPIFSE and indicated that RPIFSE could be a useful tool for predicting RPI.


Subject(s)
Neural Networks, Computer , RNA/metabolism , Computational Biology/methods , Datasets as Topic , Protein Binding , Sequence Analysis , Support Vector Machine
SELECTION OF CITATIONS
SEARCH DETAIL