Your browser doesn't support javascript.
loading
Assessing Assembly Errors in Immunoglobulin Loci: A Comprehensive Evaluation of Long-read Genome Assemblies Across Vertebrates.
Zhu, Yixin; Watson, Corey; Safonova, Yana; Pennell, Matt; Bankevich, Anton.
Afiliación
  • Zhu Y; Department of Quantitative and Computational Biology and Biological Sciences, University of Southern California, Los Angeles, CA, United States.
  • Watson C; Department of Biochemistry and Molecular Biology, University of Louisville School of Medicine, Louisville, KY, United States.
  • Safonova Y; Department of Computer Science and Engineering, Pennsylvania State University, PA, United States.
  • Pennell M; Department of Quantitative and Computational Biology and Biological Sciences, University of Southern California, Los Angeles, CA, United States.
  • Bankevich A; Department of Computer Science and Engineering, Pennsylvania State University, PA, United States.
bioRxiv ; 2024 Aug 02.
Article en En | MEDLINE | ID: mdl-39091785
ABSTRACT
Long-read sequencing technologies have revolutionized genome assembly producing near-complete chromosome assemblies for numerous organisms, which are invaluable to research in many fields. However, regions with complex repetitive structure continue to represent a challenge for genome assembly algorithms, particularly in areas with high heterozygosity. Robust and comprehensive solutions for the assessment of assembly accuracy and completeness in these regions do not exist. In this study we focus on the assembly of biomedically important antibody-encoding immunoglobulin (IG) loci, which are characterized by complex duplications and repeat structures. High-quality full-length assemblies for these loci are critical for resolving haplotype-level annotations of IG genes, without which, functional and evolutionary studies of antibody immunity across vertebrates are not tractable. To address these challenges, we developed a pipeline, "CloseRead", that generates multiple assembly verification metrics for analysis and visualization. These metrics expand upon those of existing quality assessment tools and specifically target complex and highly heterozygous regions. Using CloseRead, we systematically assessed the accuracy and completeness of IG loci in publicly available assemblies of 74 vertebrate species, identifying problematic regions. We also demonstrated that inspecting assembly graphs for problematic regions can both identify the root cause of assembly errors and illuminate solutions for improving erroneous assemblies. For a subset of species, we were able to correct assembly errors through targeted reassembly. Together, our analysis demonstrated the utility of assembly assessment in improving the completeness and accuracy of IG loci across species.

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: BioRxiv Año: 2024 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo: 1 Colección: 01-internacional Base de datos: MEDLINE Idioma: En Revista: BioRxiv Año: 2024 Tipo del documento: Article País de afiliación: Estados Unidos