Your browser doesn't support javascript.
loading
pDHS-ELM: computational predictor for plant DNase I hypersensitive sites based on extreme learning machines.
Zhang, Shanxin; Chang, Minjun; Zhou, Zhiping; Dai, Xiaofeng; Xu, Zhenghong.
Afiliação
  • Zhang S; Engineering Research Center of Internet of Things Technology Applications (Ministry of Education), School of Internet of Things Engineering, Jiangnan University, Wuxi, 214122, Jiangsu, China. shanxinzhang@jiangnan.edu.cn.
  • Chang M; School of Medicine and Pharmaceuticals, Jiangnan University, Wuxi, 214122, Jiangsu, China. shanxinzhang@jiangnan.edu.cn.
  • Zhou Z; College of Letters and Science, University of California, Berkeley, Berkeley, CA, 94720, USA.
  • Dai X; Engineering Research Center of Internet of Things Technology Applications (Ministry of Education), School of Internet of Things Engineering, Jiangnan University, Wuxi, 214122, Jiangsu, China.
  • Xu Z; School of Biotechnology, Jiangnan University, Wuxi, 214122, Jiangsu, China.
Mol Genet Genomics ; 293(4): 1035-1049, 2018 Aug.
Article em En | MEDLINE | ID: mdl-29594496
ABSTRACT
DNase I hypersensitive sites (DHSs) are hallmarks of chromatin zones containing transcriptional regulatory elements, making them critical in understanding the regulatory mechanisms of gene expression. Although large amounts of DHSs in the plant genome have been identified by high-throughput techniques, current DHSs obtained from experimental methods cover only a fraction of plant species and cell processes. Furthermore, these experimental methods are both time-consuming and expensive. Hence, it is urgent to develop automated computational means to efficiently and accurately predict DHSs in the plant genome. Recently, several methods have been proposed to predict the DHSs. However, all these methods took a lot of time to build the model, making them inappropriate for data with massive volume. In the present work, a new ensemble extreme learning machine (ELM)-based model called pDHS-ELM was proposed to predict the DHSs in the plant genome by fusing two different modes of pseudo-nucleotide composition. Here, two kinds of features including reverse complement kmer and pseudo-nucleotide composition were used to represent the DHSs. The ELM model was used to build the base classifiers. Then, an ensemble framework was employed to combine the outputs of these base classifiers. When applied to DHSs in Arabidopsis thaliana and rice (Oryza sativa) genome, the proposed method could obtain accuracies up to 88.48 and 87.58%, respectively. Compared with the state-of-the-art techniques, pDHS-ELM achieved higher sensitivity, specificity, and Matthew's correlation coefficient with much less training and test time. By employing pDHS-ELM, we identified 42,370 and 103,979 DHSs in A. thaliana and rice genome, respectively. The predicted DHSs were depleted of bulk nucleosomes and were tightly associated with transcription factors. Approximately 90% of the predicted DHSs were overlapped with transcription factors. Meanwhile, we demonstrated that the predicted DHSs were also associated with DNA methylation, nucleosome positioning/occupancy, and histone modification. This result suggests that pDHS-ELM can be considered as a new promising and powerful tool for transcriptional regulatory elements analysis. Our pDHS-ELM tool is available from the following website https//github.com/shanxinzhang/pDHS-ELM/ .
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Transcrição Gênica / RNA Antissenso / RNA de Plantas / Regulação da Expressão Gênica de Plantas / Populus / Loci Gênicos Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Ano de publicação: 2018 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Transcrição Gênica / RNA Antissenso / RNA de Plantas / Regulação da Expressão Gênica de Plantas / Populus / Loci Gênicos Tipo de estudo: Prognostic_studies / Risk_factors_studies Idioma: En Ano de publicação: 2018 Tipo de documento: Article