Your browser doesn't support javascript.
loading
XSI-a genotype compression tool for compressive genomics in large biobanks.
Wertenbroek, Rick; Rubinacci, Simone; Xenarios, Ioannis; Thoma, Yann; Delaneau, Olivier.
Afiliación
  • Wertenbroek R; School of Management and Engineering Vaud (HEIG-VD), HES-SO University of Applied Sciences and Arts Western Switzerland, Yverdon-les-Bains 1401, Switzerland.
  • Rubinacci S; Department of Computational Biology, University of Lausanne, Lausanne 1015, Switzerland.
  • Xenarios I; Department of Computational Biology, University of Lausanne, Lausanne 1015, Switzerland.
  • Thoma Y; Department of Computational Biology, University of Lausanne, Lausanne 1015, Switzerland.
  • Delaneau O; School of Management and Engineering Vaud (HEIG-VD), HES-SO University of Applied Sciences and Arts Western Switzerland, Yverdon-les-Bains 1401, Switzerland.
Bioinformatics ; 38(15): 3778-3784, 2022 08 02.
Article en En | MEDLINE | ID: mdl-35748697
MOTIVATION: Generation of genotype data has been growing exponentially over the last decade. With the large size of recent datasets comes a storage and computational burden with ever increasing costs. To reduce this burden, we propose XSI, a file format with reduced storage footprint that also allows computation on the compressed data and we show how this can improve future analyses. RESULTS: We show that xSqueezeIt (XSI) allows for a file size reduction of 4-20× compared with compressed BCF and demonstrate its potential for 'compressive genomics' on the UK Biobank whole-genome sequencing genotypes with 8× faster loading times, 5× faster run of homozygozity computation, 30× faster dot products computation and 280× faster allele counts. AVAILABILITY AND IMPLEMENTATION: The XSI file format specifications, API and command line tool are released under open-source (MIT) license and are available at https://github.com/rwk-unil/xSqueezeIt. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Programas Informáticos / Compresión de Datos Idioma: En Revista: Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2022 Tipo del documento: Article País de afiliación: Suiza

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Programas Informáticos / Compresión de Datos Idioma: En Revista: Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2022 Tipo del documento: Article País de afiliación: Suiza