Your browser doesn't support javascript.
loading
Prediction accuracy of regulatory elements from sequence varies by functional sequencing technique.
Nowling, Ronald J; Njoya, Kimani; Peters, John G; Riehle, Michelle M.
Afiliação
  • Nowling RJ; Electrical Engineering and Computer Science, Milwaukee School of Engineering, Milwaukee, WI, United States.
  • Njoya K; Department of Microbiology and Immunology, Medical College of Wisconsin, Milwaukee, WI, United States.
  • Peters JG; Electrical Engineering and Computer Science, Milwaukee School of Engineering, Milwaukee, WI, United States.
  • Riehle MM; Department of Microbiology and Immunology, Medical College of Wisconsin, Milwaukee, WI, United States.
Front Cell Infect Microbiol ; 13: 1182567, 2023.
Article em En | MEDLINE | ID: mdl-37600946
ABSTRACT

Introduction:

Various sequencing based approaches are used to identify and characterize the activities of cis-regulatory elements in a genome-wide fashion. Some of these techniques rely on indirect markers such as histone modifications (ChIP-seq with histone antibodies) or chromatin accessibility (ATAC-seq, DNase-seq, FAIRE-seq), while other techniques use direct measures such as episomal assays measuring the enhancer properties of DNA sequences (STARR-seq) and direct measurement of the binding of transcription factors (ChIP-seq with transcription factor-specific antibodies). The activities of cis-regulatory elements such as enhancers, promoters, and repressors are determined by their sequence and secondary processes such as chromatin accessibility, DNA methylation, and bound histone markers.

Methods:

Here, machine learning models are employed to evaluate the accuracy with which cis-regulatory elements identified by various commonly used sequencing techniques can be predicted by their underlying sequence alone to distinguish between cis-regulatory activity that is reflective of sequence content versus secondary processes. Results and

discussion:

Models trained and evaluated on D. melanogaster sequences identified through DNase-seq and STARR-seq are significantly more accurate than models trained on sequences identified by H3K4me1, H3K4me3, and H3K27ac ChIP-seq, FAIRE-seq, and ATAC-seq. These results suggest that the activity detected by DNase-seq and STARR-seq can be largely explained by underlying DNA sequence, independent of secondary processes. Experimentally, a subset of DNase-seq and H3K4me1 ChIP-seq sequences were tested for enhancer activity using luciferase assays and compared with previous tests performed on STARR-seq sequences. The experimental data indicated that STARR-seq sequences are substantially enriched for enhancer-specific activity, while the DNase-seq and H3K4me1 ChIP-seq sequences are not. Taken together, these results indicate that the DNase-seq approach identifies a broad class of regulatory elements of which enhancers are a subset and the associated data are appropriate for training models for detecting regulatory activity from sequence alone, STARR-seq data are best for training enhancer-specific sequence models, and H3K4me1 ChIP-seq data are not well suited for training and evaluating sequence-based models for cis-regulatory element prediction.
Assuntos
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Histonas / Drosophila melanogaster Tipo de estudo: Prognostic_studies / Risk_factors_studies Limite: Animals Idioma: En Ano de publicação: 2023 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Histonas / Drosophila melanogaster Tipo de estudo: Prognostic_studies / Risk_factors_studies Limite: Animals Idioma: En Ano de publicação: 2023 Tipo de documento: Article