Your browser doesn't support javascript.
loading
excluderanges: exclusion sets for T2T-CHM13, GRCm39, and other genome assemblies.
Ogata, Jonathan D; Mu, Wancen; Davis, Eric S; Xue, Bingjie; Harrell, J Chuck; Sheffield, Nathan C; Phanstiel, Douglas H; Love, Michael I; Dozmorov, Mikhail G.
Affiliation
  • Ogata JD; Department of Biostatistics, Virginia Commonwealth University, Richmond, VA 23298, United States.
  • Mu W; Department of Biostatistics, University of North Carolina-Chapel Hill, Chapel Hill, NC 27514, United States.
  • Davis ES; Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States.
  • Xue B; Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, United States.
  • Harrell JC; Department of Pathology, Virginia Commonwealth University, Richmond, VA 23284, United States.
  • Sheffield NC; Massey Cancer Center, Virginia Commonwealth University, Richmond, VA 23220, United States.
  • Phanstiel DH; Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, United States.
  • Love MI; Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States.
  • Dozmorov MG; Thurston Arthritis Research Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, United States.
Bioinformatics ; 39(4)2023 04 03.
Article in En | MEDLINE | ID: mdl-37067481
ABSTRACT

SUMMARY:

Exclusion regions are sections of reference genomes with abnormal pileups of short sequencing reads. Removing reads overlapping them improves biological signal, and these benefits are most pronounced in differential analysis settings. Several labs created exclusion region sets, available primarily through ENCODE and Github. However, the variety of exclusion sets creates uncertainty which sets to use. Furthermore, gap regions (e.g. centromeres, telomeres, short arms) create additional considerations in generating exclusion sets. We generated exclusion sets for the latest human T2T-CHM13 and mouse GRCm39 genomes and systematically assembled and annotated these and other sets in the excluderanges R/Bioconductor data package, also accessible via the BEDbase.org API. The package provides unified access to 82 GenomicRanges objects covering six organisms, multiple genome assemblies, and types of exclusion regions. For human hg38 genome assembly, we recommend hg38.Kundaje.GRCh38_unified_blacklist as the most well-curated and annotated, and sets generated by the Blacklist tool for other organisms. AVAILABILITY AND IMPLEMENTATION https//bioconductor.org/packages/excluderanges/. Package website https//dozmorovlab.github.io/excluderanges/.
Subject(s)

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Software / Genome, Human Limits: Animals / Humans Language: En Journal: Bioinformatics Journal subject: INFORMATICA MEDICA Year: 2023 Type: Article Affiliation country: United States

Full text: 1 Collection: 01-internacional Database: MEDLINE Main subject: Software / Genome, Human Limits: Animals / Humans Language: En Journal: Bioinformatics Journal subject: INFORMATICA MEDICA Year: 2023 Type: Article Affiliation country: United States