Assessing deep learning methods in cis-regulatory motif finding based on genomic sequencing data.

Zhang, Shuangquan; Ma, Anjun; Zhao, Jing; Xu, Dong; Ma, Qin; Wang, Yan

Zhang, Shuangquan; Ma, Anjun; Zhao, Jing; Xu, Dong; Ma, Qin; Wang, Yan.

Afiliação

Zhang S; Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, China.
Ma A; Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA.
Zhao J; Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA.
Xu D; Department of Electrical Engineering and Computer Science, and Christopher S. Bond Life Science Center, University of Missouri, MO, 65211, USA.
Ma Q; Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH, 43210, USA.
Wang Y; Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, 130012, China.

Brief Bioinform ; 23(1)2022 01 17.

Article em En | MEDLINE | ID: mdl-34607350

ABSTRACT

ABSTRACT

Identifying cis-regulatory motifs from genomic sequencing data (e.g. ChIP-seq and CLIP-seq) is crucial in identifying transcription factor (TF) binding sites and inferring gene regulatory mechanisms for any organism. Since 2015, deep learning (DL) methods have been widely applied to identify TF binding sites and predict motif patterns, with the strengths of offering a scalable, flexible and unified computational approach for highly accurate predictions. As far as we know, 20 DL methods have been developed. However, without a clear and systematic assessment, users will struggle to choose the most appropriate tool for their specific studies. In this manuscript, we evaluated 20 DL methods for cis-regulatory motif prediction using 690 ENCODE ChIP-seq, 126 cancer ChIP-seq and 55 RNA CLIP-seq data. Four metrics were investigated, including the accuracy of motif finding, the performance of DNA/RNA sequence classification, algorithm scalability and tool usability. The assessment results demonstrated the high complementarity of the existing DL methods. It was determined that the most suitable model should primarily depend on the data size and type and the method's outputs.

Assuntos

Aprendizado Profundo; Algoritmos; Sequência de Bases; Sítios de Ligação/genética; Imunoprecipitação da Cromatina; Fatores de Transcrição/genética; Fatores de Transcrição/metabolismo

Palavras-chave

CLIP-seq; ChIP-seq; TF binding sites identification; deep learning method assessment; motif prediction

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google

Texto completo: 1 Bases de dados: MEDLINE Assunto principal: Aprendizado Profundo Tipo de estudo: Diagnostic_studies / Prognostic_studies Idioma: En Revista: Brief Bioinform Assunto da revista: BIOLOGIA / INFORMATICA MEDICA Ano de publicação: 2022 Tipo de documento: Article País de afiliação: China

Texto completo

Adicionar na Minha BVS

Imprimir

XML

PubMed Links

Buscar no Google