Pesquisa | Biblioteca Virtual em Saúde Fiocruz

Evaluation of different computational methods on 5-methylcytosine sites identification.

Lv, Hao; Zhang, Zi-Mei; Li, Shi-Hao; Tan, Jiu-Xin; Chen, Wei; Lin, Hao.

Brief Bioinform ; 21(3): 982-995, 2020 05 21.

Artigo em Inglês | MEDLINE | ID: mdl-31157855

RESUMO

5-Methylcytosine (m5C) plays an extremely important role in the basic biochemical process. With the great increase of identified m5C sites in a wide variety of organisms, their epigenetic roles become largely unknown. Hence, accurate identification of m5C site is a key step in understanding its biological functions. Over the past several years, more attentions have been paid on the identification of m5C sites in multiple species. In this work, we firstly summarized the current progresses in computational prediction of m5C sites and then constructed a more powerful and reliable model for identifying m5C sites. To train the model, we collected experimentally confirmed m5C data from Homo sapiens, Mus musculus, Saccharomyces cerevisiae and Arabidopsis thaliana, and compared the performances of different feature extraction methods and classification algorithms for optimizing prediction model. Based on the optimal model, a novel predictor called iRNA-m5C was developed for the recognition of m5C sites. Finally, we critically evaluated the performance of iRNA-m5C and compared it with existing methods. The result showed that iRNA-m5C could produce the best prediction performance. We hope that this paper could provide a guide on the computational identification of m5C site and also anticipate that the proposed iRNA-m5C will become a powerful tool for large scale identification of m5C sites.

Assuntos

5-Metilcitosina/metabolismo , Biologia Computacional/métodos , Algoritmos , Animais , Arabidopsis/metabolismo , Conjuntos de Dados como Assunto , Humanos , Camundongos , Saccharomyces cerevisiae/metabolismo

DNA physical properties outperform sequence compositional information in classifying nucleosome-enriched and -depleted regions.

Liu, Guoqing; Liu, Guo-Jun; Tan, Jiu-Xin; Lin, Hao.

Genomics ; 111(5): 1167-1175, 2019 09.

Artigo em Inglês | MEDLINE | ID: mdl-30055231

RESUMO

The nucleosome is the fundamental structural unit of eukaryotic chromatin and plays an essential role in the epigenetic regulation of cellular processes, such as DNA replication, recombination, and transcription. Hence, it is important to identify nucleosome positions in the genome. Our previous model based on DNA deformation energy, in which a set of DNA physical descriptors was used, performed well in predicting nucleosome dyad positions and occupancy. In this study, we established a machine-learning model for predicting nucleosome occupancy in order to further verify the physical descriptors. Results showed that (1) our model outperformed several other sequence compositional information-based models, indicating a stronger dependence of nucleosome positioning on DNA physical properties; (2) nucleosome-enriched and -depleted regions have distinct features in terms of DNA physical descriptors like sequence-dependent flexibility and equilibrium structure parameters; (3) gene transcription start sites and termination sites can be well characterized with the distribution patterns of the physical descriptors, indicating the regulatory role of DNA physical properties in gene transcription. In addition, we developed a web server for the model, which is freely accessible at http://lin-group.cn/server/iNuc-force/.

Assuntos

DNA/química , Nucleossomos/genética , Software , Animais , Montagem e Desmontagem da Cromatina , DNA/genética , Humanos , Aprendizado de Máquina , Nucleossomos/química , Análise de Sequência de DNA/métodos

Identifying Phage Virion Proteins by Using Two-Step Feature Selection Methods.

Tan, Jiu-Xin; Dao, Fu-Ying; Lv, Hao; Feng, Peng-Mian; Ding, Hui.

Molecules ; 23(8)2018 Aug 10.

Artigo em Inglês | MEDLINE | ID: mdl-30103458

RESUMO

Accurate identification of phage virion protein is not only a key step for understanding the function of the phage virion protein but also helpful for further understanding the lysis mechanism of the bacterial cell. Since traditional experimental methods are time-consuming and costly for identifying phage virion proteins, it is extremely urgent to apply machine learning methods to accurately and efficiently identify phage virion proteins. In this work, a support vector machine (SVM) based method was proposed by mixing multiple sets of optimal g-gap dipeptide compositions. The analysis of variance (ANOVA) and the minimal-redundancy-maximal-relevance (mRMR) with an increment feature selection (IFS) were applied to single out the optimal feature set. In the five-fold cross-validation test, the proposed method achieved an overall accuracy of 87.95%. We believe that the proposed method will become an efficient and powerful method for scientists concerning phage virion proteins.

Assuntos

Bacteriófagos , Biologia Computacional/métodos , Máquina de Vetores de Suporte , Proteínas Virais/química , Vírion , Algoritmos , Análise de Variância , Bases de Dados de Proteínas , Curva ROC , Reprodutibilidade dos Testes

Early Diagnosis of Hepatocellular Carcinoma Using Machine Learning Method.

Zhang, Zi-Mei; Tan, Jiu-Xin; Wang, Fang; Dao, Fu-Ying; Zhang, Zhao-Yue; Lin, Hao.

Front Bioeng Biotechnol ; 8: 254, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-32292778

RESUMO

Hepatocellular carcinoma (HCC) is a serious cancer which ranked the fourth in cancer-related death worldwide. Hence, more accurate diagnostic models are urgently needed to aid the early HCC diagnosis under clinical scenarios and thus improve HCC treatment and survival. Several conventional methods have been used for discriminating HCC from cirrhosis tissues in patients without HCC (CwoHCC). However, the recognition successful rates are still far from satisfactory. In this study, we applied a computational approach that based on machine learning method to a set of microarray data generated from 1091 HCC samples and 242 CwoHCC samples. The within-sample relative expression orderings (REOs) method was used to extract numerical descriptors from gene expression profiles datasets. After removing the unrelated features by using maximum redundancy minimum relevance (mRMR) with incremental feature selection, we achieved "11-gene-pair" which could produce outstanding results. We further investigated the discriminate capability of the "11-gene-pair" for HCC recognition on several independent datasets. The wonderful results were obtained, demonstrating that the selected gene pairs can be signature for HCC. The proposed computational model can discriminate HCC and adjacent non-cancerous tissues from CwoHCC even for minimum biopsy specimens and inaccurately sampled specimens, which can be practical and effective for aiding the early HCC diagnosis at individual level.

A Survey for Predicting Enzyme Family Classes Using Machine Learning Methods.

Tan, Jiu-Xin; Lv, Hao; Wang, Fang; Dao, Fu-Ying; Chen, Wei; Ding, Hui.

Curr Drug Targets ; 20(5): 540-550, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-30277150

RESUMO

Enzymes are proteins that act as biological catalysts to speed up cellular biochemical processes. According to their main Enzyme Commission (EC) numbers, enzymes are divided into six categories: EC-1: oxidoreductase; EC-2: transferase; EC-3: hydrolase; EC-4: lyase; EC-5: isomerase and EC-6: synthetase. Different enzymes have different biological functions and acting objects. Therefore, knowing which family an enzyme belongs to can help infer its catalytic mechanism and provide information about the relevant biological function. With the large amount of protein sequences influxing into databanks in the post-genomics age, the annotation of the family for an enzyme is very important. Since the experimental methods are cost ineffective, bioinformatics tool will be a great help for accurately classifying the family of the enzymes. In this review, we summarized the application of machine learning methods in the prediction of enzyme family from different aspects. We hope that this review will provide insights and inspirations for the researches on enzyme family classification.

Assuntos

Biologia Computacional/métodos , Enzimas/classificação , Algoritmos , Animais , Biologia Computacional/economia , Enzimas/genética , Enzimas/metabolismo , Humanos , Aprendizado de Máquina , Anotação de Sequência Molecular , Família Multigênica

iDNA6mA-Rice: A Computational Tool for Detecting N6-Methyladenine Sites in Rice.

Lv, Hao; Dao, Fu-Ying; Guan, Zheng-Xing; Zhang, Dan; Tan, Jiu-Xin; Zhang, Yong; Chen, Wei; Lin, Hao.

Front Genet ; 10: 793, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31552096

RESUMO

DNA N6-methyladenine (6mA) is a dominant DNA modification form and involved in many biological functions. The accurate genome-wide identification of 6mA sites may increase understanding of its biological functions. Experimental methods for 6mA detection in eukaryotes genome are laborious and expensive. Therefore, it is necessary to develop computational methods to identify 6mA sites on a genomic scale, especially for plant genomes. Based on this consideration, the study aims to develop a machine learning-based method of predicting 6mA sites in the rice genome. We initially used mono-nucleotide binary encoding to formulate positive and negative samples. Subsequently, the machine learning algorithm named Random Forest was utilized to perform the classification for identifying 6mA sites. Our proposed method could produce an area under the receiver operating characteristic curve of 0.964 with an overall accuracy of 0.917, as indicated by the fivefold cross-validation test. Furthermore, an independent dataset was established to assess the generalization ability of our method. Finally, an area under the receiver operating characteristic curve of 0.981 was obtained, suggesting that the proposed method had good performance of predicting 6mA sites in the rice genome. For the convenience of retrieving 6mA sites, on the basis of the computational method, we built a freely accessible web server named iDNA6mA-Rice at http://lin-group.cn/server/iDNA6mA-Rice.

Identification of hormone binding proteins based on machine learning methods.

Tan, Jiu Xin; Li, Shi Hao; Zhang, Zi Mei; Chen, Cui Xia; Chen, Wei; Tang, Hua; Lin, Hao.

Math Biosci Eng ; 16(4): 2466-2480, 2019 03 22.

Artigo em Inglês | MEDLINE | ID: mdl-31137222

RESUMO

The soluble carrier hormone binding protein (HBP) plays an important role in the growth of human and other animals. HBP can also selectively and non-covalently interact with hormone. Therefore, accurate identification of HBP is an important prerequisite for understanding its biological functions and molecular mechanisms. Since experimental methods are still labor intensive and cost ineffective to identify HBP, it's necessary to develop computational methods to accurately and efficiently identify HBP. In this paper, a machine learning-based method was proposed to identify HBP, in which the samples were encoded by using the optimal tripeptide composition obtained based on the binomial distribution method. In the 5-fold cross-validation test, the proposed method yielded an overall accuracy of 97.15%. For the convenience of scientific community, a user-friendly webserver called HBPred2.0 was built, which could be freely accessed at http://lin-group.cn/server/HBPred2.0/.

Assuntos

Proteínas de Transporte/química , Biologia Computacional/métodos , Hormônios/química , Aprendizado de Máquina , Algoritmos , Aminoácidos/química , Análise de Variância , Animais , Simulação por Computador , Bases de Dados de Proteínas , Humanos , Peptídeos/química , Reprodutibilidade dos Testes , Software , Máquina de Vetores de Suporte

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA