Pesquisa | Biblioteca Virtual em Saúde Fiocruz

iRice-MS: An integrated XGBoost model for detecting multitype post-translational modification sites in rice.

Lv, Hao; Zhang, Yang; Wang, Jia-Shu; Yuan, Shi-Shi; Sun, Zi-Jie; Dao, Fu-Ying; Guan, Zheng-Xing; Lin, Hao; Deng, Ke-Jun.

Brief Bioinform ; 23(1)2022 01 17.

Artigo em Inglês | MEDLINE | ID: mdl-34864888

RESUMO

Post-translational modification (PTM) refers to the covalent and enzymatic modification of proteins after protein biosynthesis, which orchestrates a variety of biological processes. Detecting PTM sites in proteome scale is one of the key steps to in-depth understanding their regulation mechanisms. In this study, we presented an integrated method based on eXtreme Gradient Boosting (XGBoost), called iRice-MS, to identify 2-hydroxyisobutyrylation, crotonylation, malonylation, ubiquitination, succinylation and acetylation in rice. For each PTM-specific model, we adopted eight feature encoding schemes, including sequence-based features, physicochemical property-based features and spatial mapping information-based features. The optimal feature set was identified from each encoding, and their respective models were established. Extensive experimental results show that iRice-MS always display excellent performance on 5-fold cross-validation and independent dataset test. In addition, our novel approach provides the superiority to other existing tools in terms of AUC value. Based on the proposed model, a web server named iRice-MS was established and is freely accessible at http://lin-group.cn/server/iRice-MS.

Assuntos

Oryza , Processamento de Proteína Pós-Traducional , Acetilação , Biologia Computacional , Modelos Biológicos , Oryza/metabolismo , Processamento de Proteína Pós-Traducional/fisiologia , Proteoma/metabolismo , Ubiquitinação

Deep-Kcr: accurate detection of lysine crotonylation sites using deep learning method.

Lv, Hao; Dao, Fu-Ying; Guan, Zheng-Xing; Yang, Hui; Li, Yan-Wen; Lin, Hao.

Brief Bioinform ; 22(4)2021 07 20.

Artigo em Inglês | MEDLINE | ID: mdl-33099604

RESUMO

As a newly discovered protein posttranslational modification, histone lysine crotonylation (Kcr) involved in cellular regulation and human diseases. Various proteomics technologies have been developed to detect Kcr sites. However, experimental approaches for identifying Kcr sites are often time-consuming and labor-intensive, which is difficult to widely popularize in large-scale species. Computational approaches are cost-effective and can be used in a high-throughput manner to generate relatively precise identification. In this study, we develop a deep learning-based method termed as Deep-Kcr for Kcr sites prediction by combining sequence-based features, physicochemical property-based features and numerical space-derived information with information gain feature selection. We investigate the performances of convolutional neural network (CNN) and five commonly used classifiers (long short-term memory network, random forest, LogitBoost, naive Bayes and logistic regression) using 10-fold cross-validation and independent set test. Results show that CNN could always display the best performance with high computational efficiency on large dataset. We also compare the Deep-Kcr with other existing tools to demonstrate the excellent predictive power and robustness of our method. Based on the proposed model, a webserver called Deep-Kcr was established and is freely accessible at http://lin-group.cn/server/Deep-Kcr.

Assuntos

Crotonatos/metabolismo , Bases de Dados de Proteínas , Aprendizado Profundo , Processamento de Proteína Pós-Traducional , Análise de Sequência de Proteína , Acilação , Humanos , Lisina/metabolismo

A Brief Survey for MicroRNA Precursor Identification Using Machine Learning Methods.

Guan, Zheng-Xing; Li, Shi-Hao; Zhang, Zi-Mei; Zhang, Dan; Yang, Hui; Ding, Hui.

Curr Genomics ; 21(1): 11-25, 2020 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-32655294

RESUMO

MicroRNAs, a group of short non-coding RNA molecules, could regulate gene expression. Many diseases are associated with abnormal expression of miRNAs. Therefore, accurate identification of miRNA precursors is necessary. In the past 10 years, experimental methods, comparative genomics methods, and artificial intelligence methods have been used to identify pre-miRNAs. However, experimental methods and comparative genomics methods have their disadvantages, such as time-consuming. In contrast, machine learning-based method is a better choice. Therefore, the review summarizes the current advances in pre-miRNA recognition based on computational methods, including the construction of benchmark datasets, feature extraction methods, prediction algorithms, and the results of the models. And we also provide valid information about the predictors currently available. Finally, we give the future perspectives on the identification of pre-miRNAs. The review provides scholars with a whole background of pre-miRNA identification by using machine learning methods, which can help researchers have a clear understanding of progress of the research in this field.

Application of Machine Learning Methods in Predicting Nuclear Receptors and their Families.

Zhang, Zi-Mei; Guan, Zheng-Xing; Wang, Fang; Zhang, Dan; Ding, Hui.

Med Chem ; 16(5): 594-604, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-31584374

RESUMO

Nuclear receptors (NRs) are a superfamily of ligand-dependent transcription factors that are closely related to cell development, differentiation, reproduction, homeostasis, and metabolism. According to the alignments of the conserved domains, NRs are classified and assigned the following seven subfamilies or eight subfamilies: (1) NR1: thyroid hormone like (thyroid hormone, retinoic acid, RAR-related orphan receptor, peroxisome proliferator activated, vitamin D3- like), (2) NR2: HNF4-like (hepatocyte nuclear factor 4, retinoic acid X, tailless-like, COUP-TFlike, USP), (3) NR3: estrogen-like (estrogen, estrogen-related, glucocorticoid-like), (4) NR4: nerve growth factor IB-like (NGFI-B-like), (5) NR5: fushi tarazu-F1 like (fushi tarazu-F1 like), (6) NR6: germ cell nuclear factor like (germ cell nuclear factor), and (7) NR0: knirps like (knirps, knirpsrelated, embryonic gonad protein, ODR7, trithorax) and DAX like (DAX, SHP), or dividing NR0 into (7) NR7: knirps like and (8) NR8: DAX like. Different NRs families have different structural features and functions. Since the function of a NR is closely correlated with which subfamily it belongs to, it is highly desirable to identify NRs and their subfamilies rapidly and effectively. The knowledge acquired is essential for a proper understanding of normal and abnormal cellular mechanisms. With the advent of the post-genomics era, huge amounts of sequence-known proteins have increased explosively. Conventional methods for accurately classifying the family of NRs are experimental means with high cost and low efficiency. Therefore, it has created a greater need for bioinformatics tools to effectively recognize NRs and their subfamilies for the purpose of understanding their biological function. In this review, we summarized the application of machine learning methods in the prediction of NRs from different aspects. We hope that this review will provide a reference for further research on the classification of NRs and their families.

Assuntos

Aprendizado de Máquina , Receptores Citoplasmáticos e Nucleares/genética , Animais , Biologia Computacional , Humanos , Receptores Citoplasmáticos e Nucleares/metabolismo

Recent Advancement in Predicting Subcellular Localization of Mycobacterial Protein with Machine Learning Methods.

Li, Shi-Hao; Guan, Zheng-Xing; Zhang, Dan; Zhang, Zi-Mei; Huang, Jian; Yang, Wuritu; Lin, Hao.

Med Chem ; 16(5): 605-619, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-31584379

RESUMO

Mycobacterium tuberculosis (MTB) can cause the terrible tuberculosis (TB), which is reported as one of the most dreadful epidemics. Although many biochemical molecular drugs have been developed to cope with this disease, the drug resistance-especially the multidrug-resistant (MDR) and extensively drug-resistance (XDR)-poses a huge threat to the treatment. However, traditional biochemical experimental method to tackle TB is time-consuming and costly. Benefited by the appearance of the enormous genomic and proteomic sequence data, TB can be treated via sequence-based biological computational approach-bioinformatics. Studies on predicting subcellular localization of mycobacterial protein (MBP) with high precision and efficiency may help figure out the biological function of these proteins and then provide useful insights for protein function annotation as well as drug design. In this review, we reported the progress that has been made in computational prediction of subcellular localization of MBP including the following aspects: 1) Construction of benchmark datasets. 2) Methods of feature extraction. 3) Techniques of feature selection. 4) Application of several published prediction algorithms. 5) The published results. 6) The further study on prediction of subcellular localization of MBP.

Assuntos

Proteínas de Bactérias/genética , Aprendizado de Máquina , Mycobacterium tuberculosis/genética , Proteínas de Bactérias/metabolismo , Biologia Computacional , Mycobacterium tuberculosis/metabolismo

An Overview on Predicting Protein Subchloroplast Localization by using Machine Learning Methods.

Liu, Meng-Lu; Su, Wei; Guan, Zheng-Xing; Zhang, Dan; Chen, Wei; Liu, Li; Ding, Hui.

Curr Protein Pept Sci ; 21(12): 1229-1241, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-31957607

RESUMO

The chloroplast is a type of subcellular organelle of green plants and eukaryotic algae, which plays an important role in the photosynthesis process. Since the function of a protein correlates with its location, knowing its subchloroplast localization is helpful for elucidating its functions. However, due to a large number of chloroplast proteins, it is costly and time-consuming to design biological experiments to recognize subchloroplast localizations of these proteins. To address this problem, during the past ten years, twelve computational prediction methods have been developed to predict protein subchloroplast localization. This review summarizes the research progress in this area. We hope the review could provide important guide for further computational study on protein subchloroplast localization.

Assuntos

Proteínas de Cloroplastos/genética , Cloroplastos/genética , Regulação da Expressão Gênica de Plantas , Aprendizado de Máquina , Modelos Estatísticos , Proteoma/genética , Sequência de Aminoácidos , Proteínas de Cloroplastos/classificação , Proteínas de Cloroplastos/metabolismo , Cloroplastos/metabolismo , Biologia Computacional/métodos , Biologia Computacional/estatística & dados numéricos , Conjuntos de Dados como Assunto , Plantas/genética , Plantas/metabolismo , Transporte Proteico , Proteoma/classificação , Proteoma/metabolismo

iDNA-MS: An Integrated Computational Tool for Detecting DNA Modification Sites in Multiple Genomes.

Lv, Hao; Dao, Fu-Ying; Zhang, Dan; Guan, Zheng-Xing; Yang, Hui; Su, Wei; Liu, Meng-Lu; Ding, Hui; Chen, Wei; Lin, Hao.

iScience ; 23(4): 100991, 2020 Apr 24.

Artigo em Inglês | MEDLINE | ID: mdl-32240948

RESUMO

5hmC, 6mA, and 4mC are three common DNA modifications and are involved in various of biological processes. Accurate genome-wide identification of these sites is invaluable for better understanding their biological functions. Owing to the labor-intensive and expensive nature of experimental methods, it is urgent to develop computational methods for the genome-wide detection of these sites. Keeping this in mind, the current study was devoted to construct a computational method to identify 5hmC, 6mA, and 4mC. We initially used K-tuple nucleotide component, nucleotide chemical property and nucleotide frequency, and mono-nucleotide binary encoding scheme to formulate samples. Subsequently, random forest was utilized to identify 5hmC, 6mA, and 4mC sites. Cross-validated results showed that the proposed method could produce the excellent generalization ability in the identification of the three modification sites. Based on the proposed model, a web-server called iDNA-MS was established and is freely accessible at http://lin-group.cn/server/iDNA-MS.

iDNA6mA-Rice: A Computational Tool for Detecting N6-Methyladenine Sites in Rice.

Lv, Hao; Dao, Fu-Ying; Guan, Zheng-Xing; Zhang, Dan; Tan, Jiu-Xin; Zhang, Yong; Chen, Wei; Lin, Hao.

Front Genet ; 10: 793, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31552096

RESUMO

DNA N6-methyladenine (6mA) is a dominant DNA modification form and involved in many biological functions. The accurate genome-wide identification of 6mA sites may increase understanding of its biological functions. Experimental methods for 6mA detection in eukaryotes genome are laborious and expensive. Therefore, it is necessary to develop computational methods to identify 6mA sites on a genomic scale, especially for plant genomes. Based on this consideration, the study aims to develop a machine learning-based method of predicting 6mA sites in the rice genome. We initially used mono-nucleotide binary encoding to formulate positive and negative samples. Subsequently, the machine learning algorithm named Random Forest was utilized to perform the classification for identifying 6mA sites. Our proposed method could produce an area under the receiver operating characteristic curve of 0.964 with an overall accuracy of 0.917, as indicated by the fivefold cross-validation test. Furthermore, an independent dataset was established to assess the generalization ability of our method. Finally, an area under the receiver operating characteristic curve of 0.981 was obtained, suggesting that the proposed method had good performance of predicting 6mA sites in the rice genome. For the convenience of retrieving 6mA sites, on the basis of the computational method, we built a freely accessible web server named iDNA6mA-Rice at http://lin-group.cn/server/iDNA6mA-Rice.

Recent Development of Computational Predicting Bioluminescent Proteins.

Zhang, Dan; Guan, Zheng-Xing; Zhang, Zi-Mei; Li, Shi-Hao; Dao, Fu-Ying; Tang, Hua; Lin, Hao.

Curr Pharm Des ; 25(40): 4264-4273, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31696804

RESUMO

Bioluminescent Proteins (BLPs) are widely distributed in many living organisms that act as a key role of light emission in bioluminescence. Bioluminescence serves various functions in finding food and protecting the organisms from predators. With the routine biotechnological application of bioluminescence, it is recognized to be essential for many medical, commercial and other general technological advances. Therefore, the prediction and characterization of BLPs are significant and can help to explore more secrets about bioluminescence and promote the development of application of bioluminescence. Since the experimental methods are money and time-consuming for BLPs identification, bioinformatics tools have played important role in fast and accurate prediction of BLPs by combining their sequences information with machine learning methods. In this review, we summarized and compared the application of machine learning methods in the prediction of BLPs from different aspects. We wish that this review will provide insights and inspirations for researches on BLPs.

Assuntos

Biologia Computacional , Proteínas Luminescentes/química , Aprendizado de Máquina

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA