Your browser doesn't support javascript.
loading
Analysis and benchmarking of small and large genomic variants across tandem repeats.
English, Adam C; Dolzhenko, Egor; Ziaei Jam, Helyaneh; McKenzie, Sean K; Olson, Nathan D; De Coster, Wouter; Park, Jonghun; Gu, Bida; Wagner, Justin; Eberle, Michael A; Gymrek, Melissa; Chaisson, Mark J P; Zook, Justin M; Sedlazeck, Fritz J.
Afiliação
  • English AC; Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA. adam.english@bcm.edu.
  • Dolzhenko E; Pacific Biosciences of California, Menlo Park, CA, USA.
  • Ziaei Jam H; Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA.
  • McKenzie SK; Oxford Nanopore Technologies, Inc., New York, NY, USA.
  • Olson ND; Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA.
  • De Coster W; Applied and Translational Neurogenomics Group, VIB Center for Molecular Neurology, VIB, Antwerp, Belgium.
  • Park J; Applied and Translational Neurogenomics Group, Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium.
  • Gu B; Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA.
  • Wagner J; Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
  • Eberle MA; Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA.
  • Gymrek M; Pacific Biosciences of California, Menlo Park, CA, USA.
  • Chaisson MJP; Department of Computer Science and Engineering, University of California, San Diego, La Jolla, CA, USA.
  • Zook JM; Department of Medicine, University of California, San Diego, La Jolla, CA, USA.
  • Sedlazeck FJ; Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA.
Nat Biotechnol ; 2024 Apr 26.
Article em En | MEDLINE | ID: mdl-38671154
ABSTRACT
Tandem repeats (TRs) are highly polymorphic in the human genome, have thousands of associated molecular traits and are linked to over 60 disease phenotypes. However, they are often excluded from at-scale studies because of challenges with variant calling and representation, as well as a lack of a genome-wide standard. Here, to promote the development of TR methods, we created a catalog of TR regions and explored TR properties across 86 haplotype-resolved long-read human assemblies. We curated variants from the Genome in a Bottle (GIAB) HG002 individual to create a TR dataset to benchmark existing and future TR analysis methods. We also present an improved variant comparison method that handles variants greater than 4 bp in length and varying allelic representation. The 8.1% of the genome covered by the TR catalog holds ~24.9% of variants per individual, including 124,728 small and 17,988 large variants for the GIAB HG002 'truth-set' TR benchmark. We demonstrate the utility of this pipeline across short-read and long-read technologies.

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Idioma: En Ano de publicação: 2024 Tipo de documento: Article