Statistical phasing of 150,119 sequenced genomes in the UK Biobank.
Am J Hum Genet
; 110(1): 161-165, 2023 01 05.
Article
em En
| MEDLINE
| ID: mdl-36450278
ABSTRACT
The first release of UK Biobank whole-genome sequence data contains 150,119 genomes. We present an open-source pipeline for filtering, phasing, and indexing these genomes on the cloud-based UK Biobank Research Analysis Platform. This pipeline makes it possible to apply haplotype-based methods to UK Biobank whole-genome sequence data. The pipeline uses BCFtools for marker filtering, Beagle for genotype phasing, and Tabix for VCF indexing. We used the pipeline to phase 406 million single-nucleotide variants on chromosomes 1-22 and X at a cost of £2,309. The maximum time required to process a chromosome was 2.6 days. In order to assess phase accuracy, we modified the pipeline to exclude trio parents. We observed a switch error rate of 0.0016 on chromosome 20 in the White British trio offspring. If we exclude markers with nonmajor allele frequency < 0.1% after phasing, this switch error rate decreases by 80% to 0.00032.
Palavras-chave
Texto completo:
1
Bases de dados:
MEDLINE
Assunto principal:
Genoma
/
Bancos de Espécimes Biológicos
Limite:
Animals
/
Humans
País/Região como assunto:
Europa
Idioma:
En
Revista:
Am J Hum Genet
Ano de publicação:
2023
Tipo de documento:
Article