Your browser doesn't support javascript.
loading
Sparse Project VCF: efficient encoding of population genotype matrices.
Lin, Michael F; Bai, Xiaodong; Salerno, William J; Reid, Jeffrey G.
Afiliação
  • Lin MF; mlin.net LLC, San Jose, CA 95113, USA.
  • Bai X; Department of Regeneron Pharmaceuticals, Inc., Regeneron Genetics Center, Tarrytown, NY 10591, USA.
  • Salerno WJ; Department of Regeneron Pharmaceuticals, Inc., Regeneron Genetics Center, Tarrytown, NY 10591, USA.
  • Reid JG; Department of Regeneron Pharmaceuticals, Inc., Regeneron Genetics Center, Tarrytown, NY 10591, USA.
Bioinformatics ; 36(22-23): 5537-5538, 2021 04 01.
Article em En | MEDLINE | ID: mdl-33300997
SUMMARY: Variant Call Format (VCF), the prevailing representation for germline genotypes in population sequencing, suffers rapid size growth as larger cohorts are sequenced and more rare variants are discovered. We present Sparse Project VCF (spVCF), an evolution of VCF with judicious entropy reduction and run-length encoding, delivering >10× size reduction for modern studies with practically minimal information loss. spVCF interoperates with VCF efficiently, including tabix-based random access. We demonstrate its effectiveness with the DiscovEHR and UK Biobank whole-exome sequencing cohorts. AVAILABILITY AND IMPLEMENTATION: Apache-licensed reference implementation: github.com/mlin/spVCF. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Software / Genômica Idioma: En Ano de publicação: 2021 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Software / Genômica Idioma: En Ano de publicação: 2021 Tipo de documento: Article