Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 115
Filtrar
1.
Methods ; 227: 37-47, 2024 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-38729455

RESUMO

RNA modification serves as a pivotal component in numerous biological processes. Among the prevalent modifications, 5-methylcytosine (m5C) significantly influences mRNA export, translation efficiency and cell differentiation and are also associated with human diseases, including Alzheimer's disease, autoimmune disease, cancer, and cardiovascular diseases. Identification of m5C is critically responsible for understanding the RNA modification mechanisms and the epigenetic regulation of associated diseases. However, the large-scale experimental identification of m5C present significant challenges due to labor intensity and time requirements. Several computational tools, using machine learning, have been developed to supplement experimental methods, but identifying these sites lack accuracy and efficiency. In this study, we introduce a new predictor, MLm5C, for precise prediction of m5C sites using sequence data. Briefly, we evaluated eleven RNA sequence-derived features with four basic machine learning algorithms to generate baseline models. From these 44 models, we ranked them based on their performance and subsequently stacked the Top 20 baseline models as the best model, named MLm5C. The MLm5C outperformed the-state-of-the-art predictors. Notably, the optimization of the sequence length surrounding the modification sites significantly improved the prediction performance. MLm5C is an invaluable tool in accelerating the detection of m5C sites within the human genome, thereby facilitating in the characterization of their roles in post-transcriptional regulation.

2.
Environ Sci Technol ; 58(1): 488-497, 2024 Jan 09.
Artigo em Inglês | MEDLINE | ID: mdl-38134352

RESUMO

Per- and polyfluoroalkyl substances (PFAS) are widely employed anthropogenic fluorinated chemicals known to disrupt hepatic lipid metabolism by binding to human peroxisome proliferator-activated receptor alpha (PPARα). Therefore, screening for PFAS that bind to PPARα is of critical importance. Machine learning approaches are promising techniques for rapid screening of PFAS. However, traditional machine learning approaches lack interpretability, posing challenges in investigating the relationship between molecular descriptors and PPARα binding. In this study, we aimed to develop a novel, explainable machine learning approach to rapidly screen for PFAS that bind to PPARα. We calculated the PPARα-PFAS binding score and 206 molecular descriptors for PFAS. Through systematic and objective selection of important molecular descriptors, we developed a machine learning model with good predictive performance using only three descriptors. The molecular size (b_single) and electrostatic properties (BCUT_PEOE_3 and PEOE_VSA_PPOS) are important for PPARα-PFAS binding. Alternative PFAS are considered safer than their legacy predecessors. However, we found that alternative PFAS with many carbon atoms and ether groups exhibited a higher affinity for PPARα. Therefore, confirming the toxicity of these alternative PFAS compounds with such characteristics through biological experiments is important.


Assuntos
Fluorocarbonos , PPAR alfa , Humanos , PPAR alfa/metabolismo , Fígado/metabolismo
3.
Comput Biol Med ; 169: 107848, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38145601

RESUMO

Dihydrouridine (DHU, D) is one of the most abundant post-transcriptional uridine modifications found in tRNA, mRNA, and snoRNA, closely associated with disease pathogenesis and various biological processes in eukaryotes. Identifying D sites is important for understanding the modification mechanisms and/or epigenetic regulation. However, biological experiments for detecting D sites are time-consuming and expensive. Given these challenges, computational methods have been developed for accurately identifying the D sites in genome-wide datasets. However, existing methods have some limitations, and their prediction performance needs to be improved. In this work, we have developed a new computational predictor for accurately identifying D sites called Stack-DHUpred. Briefly, we trained 66 baseline models or single-feature models by connecting six machine learning classifiers with eleven different feature encoding methods and stacked different baseline models to build stacked ensemble learning models. Subsequently, the optimal combination of the baseline models was identified for the construction of the final stacked model. Remarkably, the Stack-DHUpred outperformed the existing predictors on our new independent dataset, indicating that the stacking approach significantly improved the prediction performance. We have made Stack-DHUpred available to the public through a web server (http://kurata35.bio.kyutech.ac.jp/Stack-DHUpred) and a standalone program (https://github.com/kuratahiroyuki/Stack-DHUpred). We believe that Stack-DHUpred will be a valuable tool for accelerating the discovery of D modifications and understanding their role in post-transcriptional regulation.


Assuntos
Epigênese Genética , Genoma , RNA Mensageiro , Biologia Computacional
4.
Int J Mol Sci ; 24(8)2023 Apr 14.
Artigo em Inglês | MEDLINE | ID: mdl-37108453

RESUMO

Kinetic modeling is an essential tool in systems biology research, enabling the quantitative analysis of biological systems and predicting their behavior. However, the development of kinetic models is a complex and time-consuming process. In this article, we propose a novel approach called KinModGPT, which generates kinetic models directly from natural language text. KinModGPT employs GPT as a natural language interpreter and Tellurium as an SBML generator. We demonstrate the effectiveness of KinModGPT in creating SBML kinetic models from complex natural language descriptions of biochemical reactions. KinModGPT successfully generates valid SBML models from a range of natural language model descriptions of metabolic pathways, protein-protein interaction networks, and heat shock response. This article demonstrates the potential of KinModGPT in kinetic modeling automation.


Assuntos
Modelos Biológicos , Linguagens de Programação , Simulação por Computador , Idioma , Fenômenos Fisiológicos Celulares , Software
5.
J Fluoresc ; 33(4): 1559-1563, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-36787039

RESUMO

Luminescence from solids such as crystals and aggregates is of growing academic and industrial interest. In this study, we report decomposition of the unpolarized fluorescence spectrum of uniaxially oriented 1,3,5-triphenylbenzene (TPB) microcrystals into four polarized spectra measured with polarizer (V: vertical and H: horizontal) and analyser (V: vertical and H: horizontal), where V and H indicate perpendicular and parallel to the layer of TPB molecules in the crystal, respectively. Resolved spectra were interpreted in terms of the molecular and excimer like (J- and H-dimer) emissions. The origin of the excimer like emissions was discussed in relation to the molecular packing in the crystal. It was shown that polarized crystal fluorescence can provide insight into the excitation/emission process in the crystal. Although preliminary, this study demonstrates the potential of polarized fluorescence to elucidate the luminescent mechanism.

6.
Comput Struct Biotechnol J ; 21: 644-654, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36659917

RESUMO

N6-methyladenine (6mA) plays a critical role in various epigenetic processing including DNA replication, DNA repair, silencing, transcription, and diseases such as cancer. To understand such epigenetic mechanisms, 6 mA has been detected by high-throughput technologies on a genome-wide scale at single-base resolution, together with conventional methods such as immunoprecipitation, mass spectrometry and capillary electrophoresis, but these experimental approaches are time-consuming and laborious. To complement these problems, we have developed a CNN-based 6 mA site predictor, named CNN6mA, which proposed two new architectures: a position-specific 1-D convolutional layer and a cross-interactive network. In the position-specific 1-D convolutional layer, position-specific filters with different window sizes were applied to an inquiry sequence instead of sharing the same filters over all positions in order to extract the position-specific features at different levels. The cross-interactive network explored the relationships between all the nucleotide patterns within the inquiry sequence. Consequently, CNN6mA outperformed the existing state-of-the-art models in many species and created the contribution score vector that intelligibly interpret the prediction mechanism. The source codes and web application in CNN6mA are freely accessible at https://github.com/kuratahiroyuki/CNN6mA.git and http://kurata35.bio.kyutech.ac.jp/CNN6mA/, respectively.

7.
BMC Bioinformatics ; 23(1): 455, 2022 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-36319952

RESUMO

BACKGROUND: Kinetic modeling is a powerful tool for understanding the dynamic behavior of biochemical systems. For kinetic modeling, determination of a number of kinetic parameters, such as the Michaelis constant (Km), is necessary, and global optimization algorithms have long been used for parameter estimation. However, the conventional global optimization approach has three problems: (i) It is computationally demanding. (ii) It often yields unrealistic parameter values because it simply seeks a better model fitting to experimentally observed behaviors. (iii) It has difficulty in identifying a unique solution because multiple parameter sets can allow a kinetic model to fit experimental data equally well (the non-identifiability problem). RESULTS: To solve these problems, we propose the Machine Learning-Aided Global Optimization (MLAGO) method for Km estimation of kinetic modeling. First, we use a machine learning-based Km predictor based only on three factors: EC number, KEGG Compound ID, and Organism ID, then conduct a constrained global optimization-based parameter estimation by using the machine learning-predicted Km values as the reference values. The machine learning model achieved relatively good prediction scores: RMSE = 0.795 and R2 = 0.536, making the subsequent global optimization easy and practical. The MLAGO approach reduced the error between simulation and experimental data while keeping Km values close to the machine learning-predicted values. As a result, the MLAGO approach successfully estimated Km values with less computational cost than the conventional method. Moreover, the MLAGO approach uniquely estimated Km values, which were close to the measured values. CONCLUSIONS: MLAGO overcomes the major problems in parameter estimation, accelerates kinetic modeling, and thus ultimately leads to better understanding of complex cellular systems. The web application for our machine learning-based Km predictor is accessible at https://sites.google.com/view/kazuhiro-maeda/software-tools-web-apps , which helps modelers perform MLAGO on their own parameter estimation tasks.


Assuntos
Algoritmos , Modelos Biológicos , Cinética , Simulação por Computador , Aprendizado de Máquina
8.
PLoS One ; 17(10): e0276609, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36279284

RESUMO

Drug-target protein interaction (DTI) identification is fundamental for drug discovery and drug repositioning, because therapeutic drugs act on disease-causing proteins. However, the DTI identification process often requires expensive and time-consuming tasks, including biological experiments involving large numbers of candidate compounds. Thus, a variety of computation approaches have been developed. Of the many approaches available, chemo-genomics feature-based methods have attracted considerable attention. These methods compute the feature descriptors of drugs and proteins as the input data to train machine and deep learning models to enable accurate prediction of unknown DTIs. In addition, attention-based learning methods have been proposed to identify and interpret DTI mechanisms. However, improvements are needed for enhancing prediction performance and DTI mechanism elucidation. To address these problems, we developed an attention-based method designated the interpretable cross-attention network (ICAN), which predicts DTIs using the Simplified Molecular Input Line Entry System of drugs and amino acid sequences of target proteins. We optimized the attention mechanism architecture by exploring the cross-attention or self-attention, attention layer depth, and selection of the context matrixes from the attention mechanism. We found that a plain attention mechanism that decodes drug-related protein context features without any protein-related drug context features effectively achieved high performance. The ICAN outperformed state-of-the-art methods in several metrics on the DAVIS dataset and first revealed with statistical significance that some weighted sites in the cross-attention weight matrix represent experimental binding sites, thus demonstrating the high interpretability of the results. The program is freely available at https://github.com/kuratahiroyuki/ICAN.


Assuntos
Descoberta de Drogas , Proteínas , Simulação por Computador , Proteínas/metabolismo , Descoberta de Drogas/métodos , Sequência de Aminoácidos , Reposicionamento de Medicamentos , Interações Medicamentosas
9.
Comput Struct Biotechnol J ; 20: 5564-5573, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36249566

RESUMO

Viral infections represent a major health concern worldwide. The alarming rate at which SARS-CoV-2 spreads, for example, led to a worldwide pandemic. Viruses incorporate genetic material into the host genome to hijack host cell functions such as the cell cycle and apoptosis. In these viral processes, protein-protein interactions (PPIs) play critical roles. Therefore, the identification of PPIs between humans and viruses is crucial for understanding the infection mechanism and host immune responses to viral infections and for discovering effective drugs. Experimental methods including mass spectrometry-based proteomics and yeast two-hybrid assays are widely used to identify human-virus PPIs, but these experimental methods are time-consuming, expensive, and laborious. To overcome this problem, we developed a novel computational predictor, named cross-attention PHV, by implementing two key technologies of the cross-attention mechanism and a one-dimensional convolutional neural network (1D-CNN). The cross-attention mechanisms were very effective in enhancing prediction and generalization abilities. Application of 1D-CNN to the word2vec-generated feature matrices reduced computational costs, thus extending the allowable length of protein sequences to 9000 amino acid residues. Cross-attention PHV outperformed existing state-of-the-art models using a benchmark dataset and accurately predicted PPIs for unknown viruses. Cross-attention PHV also predicted human-SARS-CoV-2 PPIs with area under the curve values >0.95. The Cross-attention PHV web server and source codes are freely available at https://kurata35.bio.kyutech.ac.jp/Cross-attention_PHV/ and https://github.com/kuratahiroyuki/Cross-Attention_PHV, respectively.

10.
Brief Bioinform ; 23(4)2022 07 18.
Artigo em Inglês | MEDLINE | ID: mdl-35772910

RESUMO

The COVID-19 pandemic caused several million deaths worldwide. Development of anti-coronavirus drugs is thus urgent. Unlike conventional non-peptide drugs, antiviral peptide drugs are highly specific, easy to synthesize and modify, and not highly susceptible to drug resistance. To reduce the time and expense involved in screening thousands of peptides and assaying their antiviral activity, computational predictors for identifying anti-coronavirus peptides (ACVPs) are needed. However, few experimentally verified ACVP samples are available, even though a relatively large number of antiviral peptides (AVPs) have been discovered. In this study, we attempted to predict ACVPs using an AVP dataset and a small collection of ACVPs. Using conventional features, a binary profile and a word-embedding word2vec (W2V), we systematically explored five different machine learning methods: Transformer, Convolutional Neural Network, bidirectional Long Short-Term Memory, Random Forest (RF) and Support Vector Machine. Via exhaustive searches, we found that the RF classifier with W2V consistently achieved better performance on different datasets. The two main controlling factors were: (i) the dataset-specific W2V dictionary was generated from the training and independent test datasets instead of the widely used general UniProt proteome and (ii) a systematic search was conducted and determined the optimal k-mer value in W2V, which provides greater discrimination between positive and negative samples. Therefore, our proposed method, named iACVP, consistently provides better prediction performance compared with existing state-of-the-art methods. To assist experimentalists in identifying putative ACVPs, we implemented our model as a web server accessible via the following link: http://kurata35.bio.kyutech.ac.jp/iACVP.


Assuntos
Tratamento Farmacológico da COVID-19 , Pandemias , Antivirais/farmacologia , Humanos , Aprendizado de Máquina , Peptídeos
11.
Mol Ther ; 30(8): 2856-2867, 2022 08 03.
Artigo em Inglês | MEDLINE | ID: mdl-35526094

RESUMO

As one of the most prevalent post-transcriptional epigenetic modifications, N5-methylcytosine (m5C) plays an essential role in various cellular processes and disease pathogenesis. Therefore, it is important accurately identify m5C modifications in order to gain a deeper understanding of cellular processes and other possible functional mechanisms. Although a few computational methods have been proposed, their respective models have been developed using small training datasets. Hence, their practical application is quite limited in genome-wide detection. To overcome the existing limitations, we propose Deepm5C, a bioinformatics method for identifying RNA m5C sites throughout the human genome. To develop Deepm5C, we constructed a novel benchmarking dataset and investigated a mixture of three conventional feature-encoding algorithms and a feature derived from word-embedding approaches. Afterward, four variants of deep-learning classifiers and four commonly used conventional classifiers were employed and trained with the four encodings, ultimately obtaining 32 baseline models. A stacking strategy is effectively utilized by integrating the predicted output of the optimal baseline models and trained with a one-dimensional (1D) convolutional neural network. As a result, the Deepm5C predictor achieved excellent performance during cross-validation with a Matthews correlation coefficient and an accuracy of 0.697 and 0.855, respectively. The corresponding metrics during the independent test were 0.691 and 0.852, respectively. Overall, Deepm5C achieved a more accurate and stable performance than the baseline models and significantly outperformed the existing predictors, demonstrating the effectiveness of our proposed hybrid framework. Furthermore, Deepm5C is expected to assist community-wide efforts in identifying putative m5Cs and to formulate the novel testable biological hypothesis.


Assuntos
Aprendizado Profundo , RNA , Algoritmos , Biologia Computacional/métodos , Humanos , Aprendizado de Máquina , RNA/genética
12.
J Spinal Cord Med ; : 1-7, 2022 Mar 30.
Artigo em Inglês | MEDLINE | ID: mdl-35352975

RESUMO

OBJECTIVES: We evaluated the time course of the American Spinal Cord Injury Association (ASIA) impairment scale (AIS) for up to three months in participants within 72 h after traumatic spinal cord injury (TSCI) with complete paralysis. We aimed to determine the most useful sacral-sparing examination (deep anal pressure [DAP], voluntary anal contraction [VAC], S4-5 light touch [LT], or pin prick [PP] sensation) in determining AIS grades. DESIGN: Retrospective cohort study. SETTING: Spinal Injuries Center, Fukuoka, Japan. PARTICIPANTS: Among 668 TSCI participants registered in the Japan Single Center study for Spinal Cord Injury Database (JSSCI-DB) between January 2012 and May 2020, we extracted the data of 80 patients with AIS grade A within 72 h after injury and neurological level of injury (NLI) at T12 or higher. INTERVENTIONS: None. OUTCOME MEASURES: The sacral-sparing examination at the time of the change to incomplete paralysis was compared to the AIS determination using a standard algorithm and with each assessment including the VAC, DAP, S4-5LT, and S4-5PP examinations at the time of AIS functional change. Agreement among assessments was evaluated using weighted kappa coefficients. The relationship was evaluated using Spearman's rank correlation coefficients. RESULTS: Fifteen participants (18.8%) improved to incomplete paralysis (AIS B to D) within three months after injury. The single assessment among the sacral-sparing examinations with the highest agreement and strongest correlation with AIS determination was the S4-5LT examination (k = 0.89, P < 0.01, r = 0.84, P < 0.01). CONCLUSIONS: The S4-5LT examination is key in determining complete or incomplete paralysis due to its high discriminatory power.

13.
Brief Bioinform ; 23(2)2022 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-35225328

RESUMO

N6-methyladenine (6mA) is associated with important roles in DNA replication, DNA repair, transcription, regulation of gene expression. Several experimental methods were used to identify DNA modifications. However, these experimental methods are costly and time-consuming. To detect the 6mA and complement these shortcomings of experimental methods, we proposed a novel, deep leaning approach called BERT6mA. To compare the BERT6mA with other deep learning approaches, we used the benchmark datasets including 11 species. The BERT6mA presented the highest AUCs in eight species in independent tests. Furthermore, BERT6mA showed higher and comparable performance with the state-of-the-art models while the BERT6mA showed poor performances in a few species with a small sample size. To overcome this issue, pretraining and fine-tuning between two species were applied to the BERT6mA. The pretrained and fine-tuned models on specific species presented higher performances than other models even for the species with a small sample size. In addition to the prediction, we analyzed the attention weights generated by BERT6mA to reveal how the BERT6mA model extracts critical features responsible for the 6mA prediction. To facilitate biological sciences, the BERT6mA online web server and its source codes are freely accessible at https://github.com/kuratahiroyuki/BERT6mA.git, respectively.


Assuntos
Aprendizado Profundo , DNA/genética , Metilação de DNA , Software
14.
J Cardiol Cases ; 25(2): 83-86, 2022 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-35079304

RESUMO

Surgical aortic valve replacement (SAVR) in patients with anomalous origination of a coronary artery from the opposite sinus is associated with risk for myocardial ischemia during the perioperative period. [1] However, iatrogenic coronary ostial stenosis (ICOS) generally occurs within the first 6 months after SAVR. We present an unusual case of a 74-year-old man with anomalous origination of the right coronary artery from the left coronary sinus, who developed effort angina due to ICOS 19 months following SAVR and ascending aorta replacement. Angiography and computed tomography were utilized to perform a comparison before and after the procedure. From the results, it was evident that the flattened mild stenosis preoperatively was caused by anomalous origination of a coronary artery from the opposite sinus and progressed to severe stenosis by ICOS after the procedure. The patient was successfully treated with percutaneous coronary intervention. .

15.
Curr Med Chem ; 29(5): 865-880, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34348604

RESUMO

MicroRNAs (miRNAs) are central players that regulate the post-transcriptional processes of gene expression. Binding of miRNAs to target mRNAs can repress their translation by inducing the degradation or by inhibiting the translation of the target mRNAs. Highthroughput experimental approaches for miRNA target identification are costly and timeconsuming, depending on various factors. It is vitally important to develop bioinformatics methods for accurately predicting miRNA targets. With the increase of RNA sequences in the post-genomic era, bioinformatics methods are being developed for miRNA studies especially for miRNA target prediction. This review summarizes the current development of state-of-the-art bioinformatics tools for miRNA target prediction, points out the progress and limitations of the available miRNA databases, and their working principles. Finally, we discuss the caveat and perspectives of the next-generation algorithms for the prediction of miRNA targets.


Assuntos
MicroRNAs , Algoritmos , Biologia Computacional/métodos , Humanos , MicroRNAs/genética , MicroRNAs/metabolismo , RNA Mensageiro/genética
16.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34160596

RESUMO

Viral infection involves a large number of protein-protein interactions (PPIs) between human and virus. The PPIs range from the initial binding of viral coat proteins to host membrane receptors to the hijacking of host transcription machinery. However, few interspecies PPIs have been identified, because experimental methods including mass spectrometry are time-consuming and expensive, and molecular dynamic simulation is limited only to the proteins whose 3D structures are solved. Sequence-based machine learning methods are expected to overcome these problems. We have first developed the LSTM model with word2vec to predict PPIs between human and virus, named LSTM-PHV, by using amino acid sequences alone. The LSTM-PHV effectively learnt the training data with a highly imbalanced ratio of positive to negative samples and achieved AUCs of 0.976 and 0.973 and accuracies of 0.984 and 0.985 on the training and independent datasets, respectively. In predicting PPIs between human and unknown or new virus, the LSTM-PHV learned greatly outperformed the existing state-of-the-art PPI predictors. Interestingly, learning of only sequence contexts as words is sufficient for PPI prediction. Use of uniform manifold approximation and projection demonstrated that the LSTM-PHV clearly distinguished the positive PPI samples from the negative ones. We presented the LSTM-PHV online web server and support data that are freely available at http://kurata35.bio.kyutech.ac.jp/LSTM-PHV.


Assuntos
Biologia Computacional/métodos , Interações Hospedeiro-Patógeno , Mapeamento de Interação de Proteínas/métodos , Software , Proteínas Virais/metabolismo , Viroses/metabolismo , Viroses/virologia , Algoritmos , Sequência de Aminoácidos , Benchmarking , Bases de Dados de Proteínas , Aprendizado Profundo , Humanos , Domínios e Motivos de Interação entre Proteínas , Mapas de Interação de Proteínas , Reprodutibilidade dos Testes , Navegador
17.
Sensors (Basel) ; 21(11)2021 May 25.
Artigo em Inglês | MEDLINE | ID: mdl-34070319

RESUMO

This study examined glass-based organic electroluminescence in the presence of a cyclodextrin polymer as an interlayer. Glass-based organic electroluminescence was achieved by the deposition of five layers of N,N'-Bis(3-methylphenyl)N,N'-bis(phenyl)-benzidine, cyclodextrin polymer (CDP), tris-(8-hydroxyquinolinato) aluminium LiF and Al on an indium tin oxide-coated glass substrate. The glass-based OEL exhibited green emission owing to the fluorescence of tris-(8-hydroxyquinolinato) aluminium. The highest luminance was 19,620 cd m-2. Moreover, the glass-based organic electroluminescence device showed green emission at 6 V in the curved state because of the inhibited aggregation of the cyclodextrin polymer. All organic molecules are insulating, but except CDP, they are standard molecules in conventional organic electroluminescence devices. In this device, the CDP layer contained pores that could allow conventional organic molecules to enter the pores and affect the organic electroluminescence interface. In particular, self-association was suppressed, efficiency was improved, and light emission was observed without the need for a high voltage. Overall, the glass-based organic electroluminescence device using CDP is an environmentally friendly device with a range of potential energy saving applications.

18.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-33975333

RESUMO

Neuropeptides (NPs) are the most versatile neurotransmitters in the immune systems that regulate various central anxious hormones. An efficient and effective bioinformatics tool for rapid and accurate large-scale identification of NPs is critical in immunoinformatics, which is indispensable for basic research and drug development. Although a few NP prediction tools have been developed, it is mandatory to improve their NPs' prediction performances. In this study, we have developed a machine learning-based meta-predictor called NeuroPred-FRL by employing the feature representation learning approach. First, we generated 66 optimal baseline models by employing 11 different encodings, six different classifiers and a two-step feature selection approach. The predicted probability scores of NPs based on the 66 baseline models were combined to be deemed as the input feature vector. Second, in order to enhance the feature representation ability, we applied the two-step feature selection approach to optimize the 66-D probability feature vector and then inputted the optimal one into a random forest classifier for the final meta-model (NeuroPred-FRL) construction. Benchmarking experiments based on both cross-validation and independent tests indicate that the NeuroPred-FRL achieves a superior prediction performance of NPs compared with the other state-of-the-art predictors. We believe that the proposed NeuroPred-FRL can serve as a powerful tool for large-scale identification of NPs, facilitating the characterization of their functional mechanisms and expediting their applications in clinical therapy. Moreover, we interpreted some model mechanisms of NeuroPred-FRL by leveraging the robust SHapley Additive exPlanation algorithm.


Assuntos
Biologia Computacional/métodos , Aprendizado de Máquina , Neuropeptídeos/química , Software , Algoritmos , Sequência Consenso , Bases de Dados Genéticas , Intervenção Baseada em Internet , Neuropeptídeos/metabolismo , Matrizes de Pontuação de Posição Específica , Reprodutibilidade dos Testes , Fluxo de Trabalho
19.
Int J Mol Sci ; 22(5)2021 Mar 08.
Artigo em Inglês | MEDLINE | ID: mdl-33800121

RESUMO

Nitrotyrosine, which is generated by numerous reactive nitrogen species, is a type of protein post-translational modification. Identification of site-specific nitration modification on tyrosine is a prerequisite to understanding the molecular function of nitrated proteins. Thanks to the progress of machine learning, computational prediction can play a vital role before the biological experimentation. Herein, we developed a computational predictor PredNTS by integrating multiple sequence features including K-mer, composition of k-spaced amino acid pairs (CKSAAP), AAindex, and binary encoding schemes. The important features were selected by the recursive feature elimination approach using a random forest classifier. Finally, we linearly combined the successive random forest (RF) probability scores generated by the different, single encoding-employing RF models. The resultant PredNTS predictor achieved an area under a curve (AUC) of 0.910 using five-fold cross validation. It outperformed the existing predictors on a comprehensive and independent dataset. Furthermore, we investigated several machine learning algorithms to demonstrate the superiority of the employed RF algorithm. The PredNTS is a useful computational resource for the prediction of nitrotyrosine sites. The web-application with the curated datasets of the PredNTS is publicly available.


Assuntos
Biologia Computacional , Aprendizado de Máquina , Processamento de Proteína Pós-Traducional , Proteínas/genética , Análise de Sequência de Proteína , Máquina de Vetores de Suporte , Tirosina/análogos & derivados , Tirosina/genética
20.
Int J Mol Sci ; 22(4)2021 Feb 20.
Artigo em Inglês | MEDLINE | ID: mdl-33672741

RESUMO

Pupylation is a type of reversible post-translational modification of proteins, which plays a key role in the cellular function of microbial organisms. Several proteomics methods have been developed for the prediction and analysis of pupylated proteins and pupylation sites. However, the traditional experimental methods are laborious and time-consuming. Hence, computational algorithms are highly needed that can predict potential pupylation sites using sequence features. In this research, a new prediction model, PUP-Fuse, has been developed for pupylation site prediction by integrating multiple sequence representations. Meanwhile, we explored the five types of feature encoding approaches and three machine learning (ML) algorithms. In the final model, we integrated the successive ML scores using a linear regression model. The PUP-Fuse achieved a Mathew correlation value of 0.768 by a 10-fold cross-validation test. It also outperformed existing predictors in an independent test. The web server of the PUP-Fuse with curated datasets is freely available.


Assuntos
Algoritmos , Biologia Computacional/métodos , Processamento de Proteína Pós-Traducional , Proteínas/química , Proteínas/metabolismo , Sequência de Aminoácidos , Bases de Dados de Proteínas
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...