Your browser doesn't support javascript.
loading
Reproducible evaluation of transposable element detectors with McClintock 2 guides accurate inference of Ty insertion patterns in yeast.
Chen, Jingxuan; Basting, Preston J; Han, Shunhua; Garfinkel, David J; Bergman, Casey M.
Afiliação
  • Chen J; Institute of Bioinformatics, University of Georgia, Athens, GA, USA.
  • Basting PJ; Institute of Bioinformatics, University of Georgia, Athens, GA, USA.
  • Han S; Institute of Bioinformatics, University of Georgia, Athens, GA, USA.
  • Garfinkel DJ; Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA, USA.
  • Bergman CM; Institute of Bioinformatics, University of Georgia, Athens, GA, USA. cbergman@uga.edu.
Mob DNA ; 14(1): 8, 2023 Jul 14.
Article em En | MEDLINE | ID: mdl-37452430
BACKGROUND: Many computational methods have been developed to detect non-reference transposable element (TE) insertions using short-read whole genome sequencing data. The diversity and complexity of such methods often present challenges to new users seeking to reproducibly install, execute, or evaluate multiple TE insertion detectors. RESULTS: We previously developed the McClintock meta-pipeline to facilitate the installation, execution, and evaluation of six first-generation short-read TE detectors. Here, we report a completely re-implemented version of McClintock written in Python using Snakemake and Conda that improves its installation, error handling, speed, stability, and extensibility. McClintock 2 now includes 12 short-read TE detectors, auxiliary pre-processing and analysis modules, interactive HTML reports, and a simulation framework to reproducibly evaluate the accuracy of component TE detectors. When applied to the model microbial eukaryote Saccharomyces cerevisiae, we find substantial variation in the ability of McClintock 2 components to identify the precise locations of non-reference TE insertions, with RelocaTE2 showing the highest recall and precision in simulated data. We find that RelocaTE2, TEMP, TEMP2 and TEBreak provide consistent estimates of [Formula: see text]50 non-reference TE insertions per strain and that Ty2 has the highest number of non-reference TE insertions in a species-wide panel of [Formula: see text]1000 yeast genomes. Finally, we show that best-in-class predictors for yeast applied to resequencing data have sufficient resolution to reveal a dyad pattern of integration in nucleosome-bound regions upstream of yeast tRNA genes for Ty1, Ty2, and Ty4, allowing us to extend knowledge about fine-scale target preferences revealed previously for experimentally-induced Ty1 insertions to spontaneous insertions for other copia-superfamily retrotransposons in yeast. CONCLUSION: McClintock ( https://github.com/bergmanlab/mcclintock/ ) provides a user-friendly pipeline for the identification of TEs in short-read WGS data using multiple TE detectors, which should benefit researchers studying TE insertion variation in a wide range of different organisms. Application of the improved McClintock system to simulated and empirical yeast genome data reveals best-in-class methods and novel biological insights for one of the most widely-studied model eukaryotes and provides a paradigm for evaluating and selecting non-reference TE detectors in other species.
Palavras-chave

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Tipo de estudo: Prognostic_studies Idioma: En Revista: Mob DNA Ano de publicação: 2023 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Tipo de estudo: Prognostic_studies Idioma: En Revista: Mob DNA Ano de publicação: 2023 Tipo de documento: Article