Pesquisa | Portal de Pesquisa da BVS

1.

MLm5C: A high-precision human RNA 5-methylcytosine sites predictor based on a combination of hybrid machine learning models.

Kurata, Hiroyuki; Harun-Or-Roshid, Md; Mehedi Hasan, Md; Tsukiyama, Sho; Maeda, Kazuhiro; Manavalan, Balachandran.

Methods ; 227: 37-47, 2024 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-38729455

RESUMO

RNA modification serves as a pivotal component in numerous biological processes. Among the prevalent modifications, 5-methylcytosine (m5C) significantly influences mRNA export, translation efficiency and cell differentiation and are also associated with human diseases, including Alzheimer's disease, autoimmune disease, cancer, and cardiovascular diseases. Identification of m5C is critically responsible for understanding the RNA modification mechanisms and the epigenetic regulation of associated diseases. However, the large-scale experimental identification of m5C present significant challenges due to labor intensity and time requirements. Several computational tools, using machine learning, have been developed to supplement experimental methods, but identifying these sites lack accuracy and efficiency. In this study, we introduce a new predictor, MLm5C, for precise prediction of m5C sites using sequence data. Briefly, we evaluated eleven RNA sequence-derived features with four basic machine learning algorithms to generate baseline models. From these 44 models, we ranked them based on their performance and subsequently stacked the Top 20 baseline models as the best model, named MLm5C. The MLm5C outperformed the-state-of-the-art predictors. Notably, the optimization of the sequence length surrounding the modification sites significantly improved the prediction performance. MLm5C is an invaluable tool in accelerating the detection of m5C sites within the human genome, thereby facilitating in the characterization of their roles in post-transcriptional regulation.

Assuntos

5-Metilcitosina , Aprendizado de Máquina , RNA , Humanos , 5-Metilcitosina/metabolismo , 5-Metilcitosina/química , RNA/genética , RNA/química , RNA/metabolismo , Biologia Computacional/métodos , Processamento Pós-Transcricional do RNA , Algoritmos

2.

iACVP: markedly enhanced identification of anti-coronavirus peptides using a dataset-specific word2vec model.

Kurata, Hiroyuki; Tsukiyama, Sho; Manavalan, Balachandran.

Brief Bioinform ; 23(4)2022 07 18.

Artigo em Inglês | MEDLINE | ID: mdl-35772910

RESUMO

The COVID-19 pandemic caused several million deaths worldwide. Development of anti-coronavirus drugs is thus urgent. Unlike conventional non-peptide drugs, antiviral peptide drugs are highly specific, easy to synthesize and modify, and not highly susceptible to drug resistance. To reduce the time and expense involved in screening thousands of peptides and assaying their antiviral activity, computational predictors for identifying anti-coronavirus peptides (ACVPs) are needed. However, few experimentally verified ACVP samples are available, even though a relatively large number of antiviral peptides (AVPs) have been discovered. In this study, we attempted to predict ACVPs using an AVP dataset and a small collection of ACVPs. Using conventional features, a binary profile and a word-embedding word2vec (W2V), we systematically explored five different machine learning methods: Transformer, Convolutional Neural Network, bidirectional Long Short-Term Memory, Random Forest (RF) and Support Vector Machine. Via exhaustive searches, we found that the RF classifier with W2V consistently achieved better performance on different datasets. The two main controlling factors were: (i) the dataset-specific W2V dictionary was generated from the training and independent test datasets instead of the widely used general UniProt proteome and (ii) a systematic search was conducted and determined the optimal k-mer value in W2V, which provides greater discrimination between positive and negative samples. Therefore, our proposed method, named iACVP, consistently provides better prediction performance compared with existing state-of-the-art methods. To assist experimentalists in identifying putative ACVPs, we implemented our model as a web server accessible via the following link: http://kurata35.bio.kyutech.ac.jp/iACVP.

Assuntos

Tratamento Farmacológico da COVID-19 , Pandemias , Antivirais/farmacologia , Humanos , Aprendizado de Máquina , Peptídeos

3.

BERT6mA: prediction of DNA N6-methyladenine site using deep learning-based approaches.

Tsukiyama, Sho; Hasan, Md Mehedi; Deng, Hong-Wen; Kurata, Hiroyuki.

Brief Bioinform ; 23(2)2022 03 10.

Artigo em Inglês | MEDLINE | ID: mdl-35225328

RESUMO

N6-methyladenine (6mA) is associated with important roles in DNA replication, DNA repair, transcription, regulation of gene expression. Several experimental methods were used to identify DNA modifications. However, these experimental methods are costly and time-consuming. To detect the 6mA and complement these shortcomings of experimental methods, we proposed a novel, deep leaning approach called BERT6mA. To compare the BERT6mA with other deep learning approaches, we used the benchmark datasets including 11 species. The BERT6mA presented the highest AUCs in eight species in independent tests. Furthermore, BERT6mA showed higher and comparable performance with the state-of-the-art models while the BERT6mA showed poor performances in a few species with a small sample size. To overcome this issue, pretraining and fine-tuning between two species were applied to the BERT6mA. The pretrained and fine-tuned models on specific species presented higher performances than other models even for the species with a small sample size. In addition to the prediction, we analyzed the attention weights generated by BERT6mA to reveal how the BERT6mA model extracts critical features responsible for the 6mA prediction. To facilitate biological sciences, the BERT6mA online web server and its source codes are freely accessible at https://github.com/kuratahiroyuki/BERT6mA.git, respectively.

Assuntos

Aprendizado Profundo , DNA/genética , Metilação de DNA , Software

4.

Elucidating Key Characteristics of PFAS Binding to Human Peroxisome Proliferator-Activated Receptor Alpha: An Explainable Machine Learning Approach.

Maeda, Kazuhiro; Hirano, Masashi; Hayashi, Taka; Iida, Midori; Kurata, Hiroyuki; Ishibashi, Hiroshi.

Environ Sci Technol ; 58(1): 488-497, 2024 Jan 09.

Artigo em Inglês | MEDLINE | ID: mdl-38134352

RESUMO

Per- and polyfluoroalkyl substances (PFAS) are widely employed anthropogenic fluorinated chemicals known to disrupt hepatic lipid metabolism by binding to human peroxisome proliferator-activated receptor alpha (PPARα). Therefore, screening for PFAS that bind to PPARα is of critical importance. Machine learning approaches are promising techniques for rapid screening of PFAS. However, traditional machine learning approaches lack interpretability, posing challenges in investigating the relationship between molecular descriptors and PPARα binding. In this study, we aimed to develop a novel, explainable machine learning approach to rapidly screen for PFAS that bind to PPARα. We calculated the PPARα-PFAS binding score and 206 molecular descriptors for PFAS. Through systematic and objective selection of important molecular descriptors, we developed a machine learning model with good predictive performance using only three descriptors. The molecular size (b_single) and electrostatic properties (BCUT_PEOE_3 and PEOE_VSA_PPOS) are important for PPARα-PFAS binding. Alternative PFAS are considered safer than their legacy predecessors. However, we found that alternative PFAS with many carbon atoms and ether groups exhibited a higher affinity for PPARα. Therefore, confirming the toxicity of these alternative PFAS compounds with such characteristics through biological experiments is important.

Assuntos

Fluorocarbonos , PPAR alfa , Humanos , PPAR alfa/metabolismo , Fígado/metabolismo

5.

LSTM-PHV: prediction of human-virus protein-protein interactions by LSTM with word2vec.

Tsukiyama, Sho; Hasan, Md Mehedi; Fujii, Satoshi; Kurata, Hiroyuki.

Brief Bioinform ; 22(6)2021 11 05.

Artigo em Inglês | MEDLINE | ID: mdl-34160596

RESUMO

Viral infection involves a large number of protein-protein interactions (PPIs) between human and virus. The PPIs range from the initial binding of viral coat proteins to host membrane receptors to the hijacking of host transcription machinery. However, few interspecies PPIs have been identified, because experimental methods including mass spectrometry are time-consuming and expensive, and molecular dynamic simulation is limited only to the proteins whose 3D structures are solved. Sequence-based machine learning methods are expected to overcome these problems. We have first developed the LSTM model with word2vec to predict PPIs between human and virus, named LSTM-PHV, by using amino acid sequences alone. The LSTM-PHV effectively learnt the training data with a highly imbalanced ratio of positive to negative samples and achieved AUCs of 0.976 and 0.973 and accuracies of 0.984 and 0.985 on the training and independent datasets, respectively. In predicting PPIs between human and unknown or new virus, the LSTM-PHV learned greatly outperformed the existing state-of-the-art PPI predictors. Interestingly, learning of only sequence contexts as words is sufficient for PPI prediction. Use of uniform manifold approximation and projection demonstrated that the LSTM-PHV clearly distinguished the positive PPI samples from the negative ones. We presented the LSTM-PHV online web server and support data that are freely available at http://kurata35.bio.kyutech.ac.jp/LSTM-PHV.

Assuntos

Biologia Computacional/métodos , Interações Hospedeiro-Patógeno , Mapeamento de Interação de Proteínas/métodos , Software , Proteínas Virais/metabolismo , Viroses/metabolismo , Viroses/virologia , Algoritmos , Sequência de Aminoácidos , Benchmarking , Bases de Dados de Proteínas , Aprendizado Profundo , Humanos , Domínios e Motivos de Interação entre Proteínas , Mapas de Interação de Proteínas , Reprodutibilidade dos Testes , Navegador

6.

Meta-i6mA: an interspecies predictor for identifying DNA N6-methyladenine sites of plant genomes by exploiting informative features in an integrative machine-learning framework.

Hasan, Md Mehedi; Basith, Shaherin; Khatun, Mst Shamima; Lee, Gwang; Manavalan, Balachandran; Kurata, Hiroyuki.

Brief Bioinform ; 22(3)2021 05 20.

Artigo em Inglês | MEDLINE | ID: mdl-32910169

RESUMO

DNA N6-methyladenine (6mA) represents important epigenetic modifications, which are responsible for various cellular processes. The accurate identification of 6mA sites is one of the challenging tasks in genome analysis, which leads to an understanding of their biological functions. To date, several species-specific machine learning (ML)-based models have been proposed, but majority of them did not test their model to other species. Hence, their practical application to other plant species is quite limited. In this study, we explored 10 different feature encoding schemes, with the goal of capturing key characteristics around 6mA sites. We selected five feature encoding schemes based on physicochemical and position-specific information that possesses high discriminative capability. The resultant feature sets were inputted to six commonly used ML methods (random forest, support vector machine, extremely randomized tree, logistic regression, naïve Bayes and AdaBoost). The Rosaceae genome was employed to train the above classifiers, which generated 30 baseline models. To integrate their individual strength, Meta-i6mA was proposed that combined the baseline models using the meta-predictor approach. In extensive independent test, Meta-i6mA showed high Matthews correlation coefficient values of 0.918, 0.827 and 0.635 on Rosaceae, rice and Arabidopsis thaliana, respectively and outperformed the existing predictors. We anticipate that the Meta-i6mA can be applied across different plant species. Furthermore, we developed an online user-friendly web server, which is available at http://kurata14.bio.kyutech.ac.jp/Meta-i6mA/.

Assuntos

Adenosina/análogos & derivados , Biologia Computacional/métodos , DNA de Plantas/genética , Epigênese Genética/genética , Genoma de Planta/genética , Aprendizado de Máquina , Adenosina/metabolismo , Algoritmos , Arabidopsis/genética , Arabidopsis/metabolismo , Sequência de Bases , DNA de Plantas/metabolismo , Internet , Modelos Genéticos , Oryza/genética , Oryza/metabolismo , Rosaceae/genética , Rosaceae/metabolismo , Especificidade da Espécie , Máquina de Vetores de Suporte

7.

NeuroPred-FRL: an interpretable prediction model for identifying neuropeptide using feature representation learning.

Hasan, Md Mehedi; Alam, Md Ashad; Shoombuatong, Watshara; Deng, Hong-Wen; Manavalan, Balachandran; Kurata, Hiroyuki.

Brief Bioinform ; 22(6)2021 11 05.

Artigo em Inglês | MEDLINE | ID: mdl-33975333

RESUMO

Neuropeptides (NPs) are the most versatile neurotransmitters in the immune systems that regulate various central anxious hormones. An efficient and effective bioinformatics tool for rapid and accurate large-scale identification of NPs is critical in immunoinformatics, which is indispensable for basic research and drug development. Although a few NP prediction tools have been developed, it is mandatory to improve their NPs' prediction performances. In this study, we have developed a machine learning-based meta-predictor called NeuroPred-FRL by employing the feature representation learning approach. First, we generated 66 optimal baseline models by employing 11 different encodings, six different classifiers and a two-step feature selection approach. The predicted probability scores of NPs based on the 66 baseline models were combined to be deemed as the input feature vector. Second, in order to enhance the feature representation ability, we applied the two-step feature selection approach to optimize the 66-D probability feature vector and then inputted the optimal one into a random forest classifier for the final meta-model (NeuroPred-FRL) construction. Benchmarking experiments based on both cross-validation and independent tests indicate that the NeuroPred-FRL achieves a superior prediction performance of NPs compared with the other state-of-the-art predictors. We believe that the proposed NeuroPred-FRL can serve as a powerful tool for large-scale identification of NPs, facilitating the characterization of their functional mechanisms and expediting their applications in clinical therapy. Moreover, we interpreted some model mechanisms of NeuroPred-FRL by leveraging the robust SHapley Additive exPlanation algorithm.

Assuntos

Biologia Computacional/métodos , Aprendizado de Máquina , Neuropeptídeos/química , Software , Algoritmos , Sequência Consenso , Bases de Dados Genéticas , Intervenção Baseada em Internet , Neuropeptídeos/metabolismo , Matrizes de Pontuação de Posição Específica , Reprodutibilidade dos Testes , Fluxo de Trabalho

8.

Decomposition of Unpolarized Fluorescence Spectrum of Uniaxially Oriented 1,3,5-Triphenylbenzene Microcrystals Into Polarized Fluorescence Spectra.

Hara, Michihiro; Takeshita, Tatsuya; Kurata, Hiroyuki; Kimura, Tsunehisa.

J Fluoresc ; 33(4): 1559-1563, 2023 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-36787039

RESUMO

Luminescence from solids such as crystals and aggregates is of growing academic and industrial interest. In this study, we report decomposition of the unpolarized fluorescence spectrum of uniaxially oriented 1,3,5-triphenylbenzene (TPB) microcrystals into four polarized spectra measured with polarizer (V: vertical and H: horizontal) and analyser (V: vertical and H: horizontal), where V and H indicate perpendicular and parallel to the layer of TPB molecules in the crystal, respectively. Resolved spectra were interpreted in terms of the molecular and excimer like (J- and H-dimer) emissions. The origin of the excimer like emissions was discussed in relation to the molecular packing in the crystal. It was shown that polarized crystal fluorescence can provide insight into the excitation/emission process in the crystal. Although preliminary, this study demonstrates the potential of polarized fluorescence to elucidate the luminescent mechanism.

9.

Deepm5C: A deep-learning-based hybrid framework for identifying human RNA N5-methylcytosine sites using a stacking strategy.

Hasan, Md Mehedi; Tsukiyama, Sho; Cho, Jae Youl; Kurata, Hiroyuki; Alam, Md Ashad; Liu, Xiaowen; Manavalan, Balachandran; Deng, Hong-Wen.

Mol Ther ; 30(8): 2856-2867, 2022 08 03.

Artigo em Inglês | MEDLINE | ID: mdl-35526094

RESUMO

As one of the most prevalent post-transcriptional epigenetic modifications, N5-methylcytosine (m5C) plays an essential role in various cellular processes and disease pathogenesis. Therefore, it is important accurately identify m5C modifications in order to gain a deeper understanding of cellular processes and other possible functional mechanisms. Although a few computational methods have been proposed, their respective models have been developed using small training datasets. Hence, their practical application is quite limited in genome-wide detection. To overcome the existing limitations, we propose Deepm5C, a bioinformatics method for identifying RNA m5C sites throughout the human genome. To develop Deepm5C, we constructed a novel benchmarking dataset and investigated a mixture of three conventional feature-encoding algorithms and a feature derived from word-embedding approaches. Afterward, four variants of deep-learning classifiers and four commonly used conventional classifiers were employed and trained with the four encodings, ultimately obtaining 32 baseline models. A stacking strategy is effectively utilized by integrating the predicted output of the optimal baseline models and trained with a one-dimensional (1D) convolutional neural network. As a result, the Deepm5C predictor achieved excellent performance during cross-validation with a Matthews correlation coefficient and an accuracy of 0.697 and 0.855, respectively. The corresponding metrics during the independent test were 0.691 and 0.852, respectively. Overall, Deepm5C achieved a more accurate and stable performance than the baseline models and significantly outperformed the existing predictors, demonstrating the effectiveness of our proposed hybrid framework. Furthermore, Deepm5C is expected to assist community-wide efforts in identifying putative m5Cs and to formulate the novel testable biological hypothesis.

Assuntos

Aprendizado Profundo , RNA , Algoritmos , Biologia Computacional/métodos , Humanos , Aprendizado de Máquina , RNA/genética

10.

Automatic Generation of SBML Kinetic Models from Natural Language Texts Using GPT.

Maeda, Kazuhiro; Kurata, Hiroyuki.

Int J Mol Sci ; 24(8)2023 Apr 14.

Artigo em Inglês | MEDLINE | ID: mdl-37108453

RESUMO

Kinetic modeling is an essential tool in systems biology research, enabling the quantitative analysis of biological systems and predicting their behavior. However, the development of kinetic models is a complex and time-consuming process. In this article, we propose a novel approach called KinModGPT, which generates kinetic models directly from natural language text. KinModGPT employs GPT as a natural language interpreter and Tellurium as an SBML generator. We demonstrate the effectiveness of KinModGPT in creating SBML kinetic models from complex natural language descriptions of biochemical reactions. KinModGPT successfully generates valid SBML models from a range of natural language model descriptions of metabolic pathways, protein-protein interaction networks, and heat shock response. This article demonstrates the potential of KinModGPT in kinetic modeling automation.

Assuntos

Modelos Biológicos , Linguagens de Programação , Simulação por Computador , Idioma , Fenômenos Fisiológicos Celulares , Software

11.

MLAGO: machine learning-aided global optimization for Michaelis constant estimation of kinetic modeling.

Maeda, Kazuhiro; Hatae, Aoi; Sakai, Yukie; Boogerd, Fred C; Kurata, Hiroyuki.

BMC Bioinformatics ; 23(1): 455, 2022 Nov 01.

Artigo em Inglês | MEDLINE | ID: mdl-36319952

RESUMO

BACKGROUND: Kinetic modeling is a powerful tool for understanding the dynamic behavior of biochemical systems. For kinetic modeling, determination of a number of kinetic parameters, such as the Michaelis constant (Km), is necessary, and global optimization algorithms have long been used for parameter estimation. However, the conventional global optimization approach has three problems: (i) It is computationally demanding. (ii) It often yields unrealistic parameter values because it simply seeks a better model fitting to experimentally observed behaviors. (iii) It has difficulty in identifying a unique solution because multiple parameter sets can allow a kinetic model to fit experimental data equally well (the non-identifiability problem). RESULTS: To solve these problems, we propose the Machine Learning-Aided Global Optimization (MLAGO) method for Km estimation of kinetic modeling. First, we use a machine learning-based Km predictor based only on three factors: EC number, KEGG Compound ID, and Organism ID, then conduct a constrained global optimization-based parameter estimation by using the machine learning-predicted Km values as the reference values. The machine learning model achieved relatively good prediction scores: RMSE = 0.795 and R2 = 0.536, making the subsequent global optimization easy and practical. The MLAGO approach reduced the error between simulation and experimental data while keeping Km values close to the machine learning-predicted values. As a result, the MLAGO approach successfully estimated Km values with less computational cost than the conventional method. Moreover, the MLAGO approach uniquely estimated Km values, which were close to the measured values. CONCLUSIONS: MLAGO overcomes the major problems in parameter estimation, accelerates kinetic modeling, and thus ultimately leads to better understanding of complex cellular systems. The web application for our machine learning-based Km predictor is accessible at https://sites.google.com/view/kazuhiro-maeda/software-tools-web-apps , which helps modelers perform MLAGO on their own parameter estimation tasks.

Assuntos

Algoritmos , Modelos Biológicos , Cinética , Simulação por Computador , Aprendizado de Máquina

12.

IRC-Fuse: improved and robust prediction of redox-sensitive cysteine by fusing of multiple feature representations.

Hasan, Md Mehedi; Alam, Md Ashad; Shoombuatong, Watshara; Kurata, Hiroyuki.

J Comput Aided Mol Des ; 35(3): 315-323, 2021 03.

Artigo em Inglês | MEDLINE | ID: mdl-33392948

RESUMO

Redox-sensitive cysteine (RSC) thiol contributes to many biological processes. The identification of RSC plays an important role in clarifying some mechanisms of redox-sensitive factors; nonetheless, experimental investigation of RSCs is expensive and time-consuming. The computational approaches that quickly and accurately identify candidate RSCs using the sequence information are urgently needed. Herein, an improved and robust computational predictor named IRC-Fuse was developed to identify the RSC by fusing of multiple feature representations. To enhance the performance of our model, we integrated the probability scores evaluated by the random forest models implementing different encoding schemes. Cross-validation results exhibited that the IRC-Fuse achieved accuracy and AUC of 0.741 and 0.807, respectively. The IRC-Fuse outperformed exiting methods with improvement of 10% and 13% on accuracy and MCC, respectively, over independent test data. Comparative analysis suggested that the IRC-Fuse was more effective and promising than the existing predictors. For the convenience of experimental scientists, the IRC-Fuse online web server was implemented and publicly accessible at http://kurata14.bio.kyutech.ac.jp/IRC-Fuse/ .

Assuntos

Benchmarking/métodos , Cisteína/química , Proteínas/química , Sequência de Aminoácidos , Biologia Computacional , Bases de Dados Factuais , Aprendizado de Máquina , Modelos Moleculares , Oxirredução , Compostos de Sulfidrila/química

13.

Fabrication and Characterisation of Organic EL Devices in the Presence of Cyclodextrin as an Interlayer.

Hara, Michihiro; Umeda, Takao; Kurata, Hiroyuki.

Sensors (Basel) ; 21(11)2021 May 25.

Artigo em Inglês | MEDLINE | ID: mdl-34070319

RESUMO

This study examined glass-based organic electroluminescence in the presence of a cyclodextrin polymer as an interlayer. Glass-based organic electroluminescence was achieved by the deposition of five layers of N,N'-Bis(3-methylphenyl)N,N'-bis(phenyl)-benzidine, cyclodextrin polymer (CDP), tris-(8-hydroxyquinolinato) aluminium LiF and Al on an indium tin oxide-coated glass substrate. The glass-based OEL exhibited green emission owing to the fluorescence of tris-(8-hydroxyquinolinato) aluminium. The highest luminance was 19,620 cd m-2. Moreover, the glass-based organic electroluminescence device showed green emission at 6 V in the curved state because of the inhibited aggregation of the cyclodextrin polymer. All organic molecules are insulating, but except CDP, they are standard molecules in conventional organic electroluminescence devices. In this device, the CDP layer contained pores that could allow conventional organic molecules to enter the pores and affect the organic electroluminescence interface. In particular, self-association was suppressed, efficiency was improved, and light emission was observed without the need for a high voltage. Overall, the glass-based organic electroluminescence device using CDP is an environmentally friendly device with a range of potential energy saving applications.

14.

PredNTS: Improved and Robust Prediction of Nitrotyrosine Sites by Integrating Multiple Sequence Features.

Nilamyani, Andi Nur; Auliah, Firda Nurul; Moni, Mohammad Ali; Shoombuatong, Watshara; Hasan, Md Mehedi; Kurata, Hiroyuki.

Int J Mol Sci ; 22(5)2021 Mar 08.

Artigo em Inglês | MEDLINE | ID: mdl-33800121

RESUMO

Nitrotyrosine, which is generated by numerous reactive nitrogen species, is a type of protein post-translational modification. Identification of site-specific nitration modification on tyrosine is a prerequisite to understanding the molecular function of nitrated proteins. Thanks to the progress of machine learning, computational prediction can play a vital role before the biological experimentation. Herein, we developed a computational predictor PredNTS by integrating multiple sequence features including K-mer, composition of k-spaced amino acid pairs (CKSAAP), AAindex, and binary encoding schemes. The important features were selected by the recursive feature elimination approach using a random forest classifier. Finally, we linearly combined the successive random forest (RF) probability scores generated by the different, single encoding-employing RF models. The resultant PredNTS predictor achieved an area under a curve (AUC) of 0.910 using five-fold cross validation. It outperformed the existing predictors on a comprehensive and independent dataset. Furthermore, we investigated several machine learning algorithms to demonstrate the superiority of the employed RF algorithm. The PredNTS is a useful computational resource for the prediction of nitrotyrosine sites. The web-application with the curated datasets of the PredNTS is publicly available.

Assuntos

Biologia Computacional , Aprendizado de Máquina , Processamento de Proteína Pós-Traducional , Proteínas/genética , Análise de Sequência de Proteína , Máquina de Vetores de Suporte , Tirosina/análogos & derivados , Tirosina/genética

15.

PUP-Fuse: Prediction of Protein Pupylation Sites by Integrating Multiple Sequence Representations.

Auliah, Firda Nurul; Nilamyani, Andi Nur; Shoombuatong, Watshara; Alam, Md Ashad; Hasan, Md Mehedi; Kurata, Hiroyuki.

Int J Mol Sci ; 22(4)2021 Feb 20.

Artigo em Inglês | MEDLINE | ID: mdl-33672741

RESUMO

Pupylation is a type of reversible post-translational modification of proteins, which plays a key role in the cellular function of microbial organisms. Several proteomics methods have been developed for the prediction and analysis of pupylated proteins and pupylation sites. However, the traditional experimental methods are laborious and time-consuming. Hence, computational algorithms are highly needed that can predict potential pupylation sites using sequence features. In this research, a new prediction model, PUP-Fuse, has been developed for pupylation site prediction by integrating multiple sequence representations. Meanwhile, we explored the five types of feature encoding approaches and three machine learning (ML) algorithms. In the final model, we integrated the successive ML scores using a linear regression model. The PUP-Fuse achieved a Mathew correlation value of 0.768 by a 10-fold cross-validation test. It also outperformed existing predictors in an independent test. The web server of the PUP-Fuse with curated datasets is freely available.

Assuntos

Algoritmos , Biologia Computacional/métodos , Processamento de Proteína Pós-Traducional , Proteínas/química , Proteínas/metabolismo , Sequência de Aminoácidos , Bases de Dados de Proteínas

16.

i6mA-Fuse: improved and robust prediction of DNA 6 mA sites in the Rosaceae genome by fusing multiple feature representation.

Hasan, Md Mehedi; Manavalan, Balachandran; Shoombuatong, Watshara; Khatun, Mst Shamima; Kurata, Hiroyuki.

Plant Mol Biol ; 103(1-2): 225-234, 2020 May.

Artigo em Inglês | MEDLINE | ID: mdl-32140819

RESUMO

DNA N6-methyladenine (6 mA) is one of the most vital epigenetic modifications and involved in controlling the various gene expression levels. With the avalanche of DNA sequences generated in numerous databases, the accurate identification of 6 mA plays an essential role for understanding molecular mechanisms. Because the experimental approaches are time-consuming and costly, it is desirable to develop a computation model for rapidly and accurately identifying 6 mA. To the best of our knowledge, we first proposed a computational model named i6mA-Fuse to predict 6 mA sites from the Rosaceae genomes, especially in Rosa chinensis and Fragaria vesca. We implemented the five encoding schemes, i.e., mononucleotide binary, dinucleotide binary, k-space spectral nucleotide, k-mer, and electron-ion interaction pseudo potential compositions, to build the five, single-encoding random forest (RF) models. The i6mA-Fuse uses a linear regression model to combine the predicted probability scores of the five, single encoding-based RF models. The resultant species-specific i6mA-Fuse achieved remarkably high performances with AUCs of 0.982 and 0.978 and with MCCs of 0.869 and 0.858 on the independent datasets of Rosa chinensis and Fragaria vesca, respectively. In the F. vesca-specific i6mA-Fuse, the MBE and EIIP contributed to 75% and 25% of the total prediction; in the R. chinensis-specific i6mA-Fuse, Kmer, MBE, and EIIP contribute to 15%, 65%, and 20% of the total prediction. To assist high-throughput prediction for DNA 6 mA identification, the i6mA-Fuse is publicly accessible at https://kurata14.bio.kyutech.ac.jp/i6mA-Fuse/.

Assuntos

Adenina/análogos & derivados , DNA de Plantas/metabolismo , Rosaceae/metabolismo , Adenina/metabolismo , Algoritmos , Sítios de Ligação , Biologia Computacional , Conjuntos de Dados como Assunto , Aprendizado de Máquina , Modelos Genéticos , Rosaceae/genética

17.

ProIn-Fuse: improved and robust prediction of proinflammatory peptides by fusing of multiple feature representations.

Khatun, Mst Shamima; Hasan, Md Mehedi; Shoombuatong, Watshara; Kurata, Hiroyuki.

J Comput Aided Mol Des ; 34(12): 1229-1236, 2020 12.

Artigo em Inglês | MEDLINE | ID: mdl-32964284

RESUMO

A proinflammatory peptide (PIP) is a type of signaling molecules that are secreted from immune cells, which contributes to the first line of defense against invading pathogens. Numerous experiments have shown that PIPs play an important role in human physiology such as vaccines and immunotherapeutic drugs. Considering high-throughput laboratory methods that are time consuming and costly, effective computational methods are great demand to timely and accurately identify PIPs. Thus, in this study, we proposed a computational model in conjunction with a multiple feature representation, called ProIn-Fuse, to improve the performance of PIPs identification. Specifically, a feature representation learning model was utilized to generate the probabilistic scores by using the random forest models employing eight sequence encoding schemes. Finally, the ProIn-Fuse was constructed by linearly combining the resultant eight probabilistic scores. Evaluated through independent test, the ProIn-Fuse yielded an accuracy of 0.746, which was 10% higher than those obtained by the state-of-the-art PIP predictors. The proposed ProIn-Fuse can facilitate faster and broader applications of PIPs in drug design and development. The web server, datasets and online instruction are freely accessible at http://kurata14.bio.kyutech.ac.jp/ProIn-Fuse .

Assuntos

Algoritmos , Biologia Computacional/métodos , Simulação por Computador , Mediadores da Inflamação/metabolismo , Aprendizado de Máquina , Fragmentos de Peptídeos/metabolismo , Humanos , Mediadores da Inflamação/imunologia , Fragmentos de Peptídeos/imunologia

18.

Recent Development of Machine Learning Methods in Microbial Phosphorylation Sites.

Rashid, Md Mamunur; Shatabda, Swakkhar; Hasan, Md Mehedi; Kurata, Hiroyuki.

Curr Genomics ; 21(3): 194-203, 2020 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-33071613

RESUMO

A variety of protein post-translational modifications has been identified that control many cellular functions. Phosphorylation studies in mycobacterial organisms have shown critical importance in diverse biological processes, such as intercellular communication and cell division. Recent technical advances in high-precision mass spectrometry have determined a large number of microbial phosphorylated proteins and phosphorylation sites throughout the proteome analysis. Identification of phosphorylated proteins with specific modified residues through experimentation is often labor-intensive, costly and time-consuming. All these limitations could be overcome through the application of machine learning (ML) approaches. However, only a limited number of computational phosphorylation site prediction tools have been developed so far. This work aims to present a complete survey of the existing ML-predictors for microbial phosphorylation. We cover a variety of important aspects for developing a successful predictor, including operating ML algorithms, feature selection methods, window size, and software utility. Initially, we review the currently available phosphorylation site databases of the microbiome, the state-of-the-art ML approaches, working principles, and their performances. Lastly, we discuss the limitations and future directions of the computational ML methods for the prediction of phosphorylation.

19.

Evolution of Sequence-based Bioinformatics Tools for Protein-protein Interaction Prediction.

Khatun, Mst Shamima; Shoombuatong, Watshara; Hasan, Md Mehedi; Kurata, Hiroyuki.

Curr Genomics ; 21(6): 454-463, 2020 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-33093807

RESUMO

Protein-protein interactions (PPIs) are the physical connections between two or more proteins via electrostatic forces or hydrophobic effects. Identification of the PPIs is pivotal, which contributes to many biological processes including protein function, disease incidence, and therapy design. The experimental identification of PPIs via high-throughput technology is time-consuming and expensive. Bioinformatics approaches are expected to solve such restrictions. In this review, our main goal is to provide an inclusive view of the existing sequence-based computational prediction of PPIs. Initially, we briefly introduce the currently available PPI databases and then review the state-of-the-art bioinformatics approaches, working principles, and their performances. Finally, we discuss the caveats and future perspective of the next generation algorithms for the prediction of PPIs.

20.

A prediction model of functional outcome at 6 months using clinical findings of a person with traumatic spinal cord injury at 1 month after injury.

Ariji, Yuto; Hayashi, Tetsuo; Ideta, Ryosuke; Koga, Ryuichiro; Murai, Satoshi; Towatari, Fumihiro; Terashi, Yoshiteru; Sakai, Hiroaki; Kurata, Hiroyuki; Maeda, Takeshi.

Spinal Cord ; 58(11): 1158-1165, 2020 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-32444638

RESUMO

STUDY DESIGN: Retrospective statistical analysis of database. OBJECTIVES: Prediction of the Spinal Cord Independence Measure version III Total Score (SCIM-TS) at 6 months after injury based on physical findings at 1 month after injury is an important index for rehabilitation approach in the recovery phase. SETTING: Spinal Injuries Center, Fukuoka, Japan. METHODS: The study participants were selected from patients with traumatic spinal cord injuries who were registered in the Japan Single Center Study for Spinal Cord Injury Data Base (JSSCI-DB) of the Japan Spinal Injuries Center specializing in spine and spinal cord injuries. Of the 534 participants registered with the JSSCI-DB between January 2012 and October 2018, we retrospectively extracted 137 participants for 6 months after injury, and these participants were included in this study. RESULTS: According to multiple regression analysis, SCIM-TS at 6 months after injury could be predicted based on only six variables, i.e., age at injury, three key muscles (C6 wrist extensors, C8 finger flexors, and L3 knee extensors), and two mobility assessments (WISCI and SCIM-item13) (Adjusted R-Squared: 0.83). These six independent variables were significant factors reflecting SCIM-TS at 6 months. CONCLUSIONS: In rehabilitation after traumatic spinal cord injuries, a simple and reliable prognostic model can help accurately predict the achievable activity of daily living competency to set a goal. In addition, if the procedure is simple, evaluation can be completed in a short period of time, and the physical burden on both treating staff and patients can be reduced.

Assuntos

Traumatismos da Medula Espinal , Atividades Cotidianas , Avaliação da Deficiência , Humanos , Japão , Prognóstico , Recuperação de Função Fisiológica , Estudos Retrospectivos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA