Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
Add more filters










Database
Language
Publication year range
1.
Mob DNA ; 14(1): 8, 2023 Jul 14.
Article in English | MEDLINE | ID: mdl-37452430

ABSTRACT

BACKGROUND: Many computational methods have been developed to detect non-reference transposable element (TE) insertions using short-read whole genome sequencing data. The diversity and complexity of such methods often present challenges to new users seeking to reproducibly install, execute, or evaluate multiple TE insertion detectors. RESULTS: We previously developed the McClintock meta-pipeline to facilitate the installation, execution, and evaluation of six first-generation short-read TE detectors. Here, we report a completely re-implemented version of McClintock written in Python using Snakemake and Conda that improves its installation, error handling, speed, stability, and extensibility. McClintock 2 now includes 12 short-read TE detectors, auxiliary pre-processing and analysis modules, interactive HTML reports, and a simulation framework to reproducibly evaluate the accuracy of component TE detectors. When applied to the model microbial eukaryote Saccharomyces cerevisiae, we find substantial variation in the ability of McClintock 2 components to identify the precise locations of non-reference TE insertions, with RelocaTE2 showing the highest recall and precision in simulated data. We find that RelocaTE2, TEMP, TEMP2 and TEBreak provide consistent estimates of [Formula: see text]50 non-reference TE insertions per strain and that Ty2 has the highest number of non-reference TE insertions in a species-wide panel of [Formula: see text]1000 yeast genomes. Finally, we show that best-in-class predictors for yeast applied to resequencing data have sufficient resolution to reveal a dyad pattern of integration in nucleosome-bound regions upstream of yeast tRNA genes for Ty1, Ty2, and Ty4, allowing us to extend knowledge about fine-scale target preferences revealed previously for experimentally-induced Ty1 insertions to spontaneous insertions for other copia-superfamily retrotransposons in yeast. CONCLUSION: McClintock ( https://github.com/bergmanlab/mcclintock/ ) provides a user-friendly pipeline for the identification of TEs in short-read WGS data using multiple TE detectors, which should benefit researchers studying TE insertion variation in a wide range of different organisms. Application of the improved McClintock system to simulated and empirical yeast genome data reveals best-in-class methods and novel biological insights for one of the most widely-studied model eukaryotes and provides a paradigm for evaluating and selecting non-reference TE detectors in other species.

2.
bioRxiv ; 2023 Mar 21.
Article in English | MEDLINE | ID: mdl-36824955

ABSTRACT

BACKGROUND: Many computational methods have been developed to detect non-reference transposable element (TE) insertions using short-read whole genome sequencing data. The diversity and complexity of such methods often present challenges to new users seeking to reproducibly install, execute, or evaluate multiple TE insertion detectors. RESULTS: We previously developed the McClintock meta-pipeline to facilitate the installation, execution, and evaluation of six first-generation short-read TE detectors. Here, we report a completely re-implemented version of McClintock written in Python using Snakemake and Conda that improves its installation, error handling, speed, stability, and extensibility. McClintock 2 now includes 12 short-read TE detectors, auxiliary pre-processing and analysis modules, interactive HTML reports, and a simulation framework to reproducibly evaluate the accuracy of component TE detectors. When applied to the model microbial eukaryote Saccharomyces cerevisiae, we find substantial variation in the ability of McClintock 2 components to identify the precise locations of non-reference TE insertions, with RelocaTE2 showing the highest recall and precision in simulated data. We find that RelocaTE2, TEMP, TEMP2 and TEBreak provide a consistent and biologically meaningful view of non-reference TE insertions in a species-wide panel of ∻1000 yeast genomes, as evaluated by coverage-based abundance estimates and expected patterns of tRNA promoter targeting. Finally, we show that best-in-class predictors for yeast have sufficient resolution to reveal a dyad pattern of integration in nucleosome-bound regions upstream of yeast tRNA genes for Ty1, Ty2, and Ty4, allowing us to extend knowledge about fine-scale target preferences first revealed experimentally for Ty1 to natural insertions and related copia-superfamily retrotransposons in yeast. CONCLUSION: McClintock (https://github.com/bergmanlab/mcclintock/) provides a user-friendly pipeline for the identification of TEs in short-read WGS data using multiple TE detectors, which should benefit researchers studying TE insertion variation in a wide range of different organisms. Application of the improved McClintock system to simulated and empirical yeast genome data reveals best-in-class methods and novel biological insights for one of the most widely-studied model eukaryotes and provides a paradigm for evaluating and selecting non-reference TE detectors for other species.

3.
Nucleic Acids Res ; 50(21): e124, 2022 11 28.
Article in English | MEDLINE | ID: mdl-36156149

ABSTRACT

Animal cell lines often undergo extreme genome restructuring events, including polyploidy and segmental aneuploidy that can impede de novo whole-genome assembly (WGA). In some species like Drosophila, cell lines also exhibit massive proliferation of transposable elements (TEs). To better understand the role of transposition during animal cell culture, we sequenced the genome of the tetraploid Drosophila S2R+ cell line using long-read and linked-read technologies. WGAs for S2R+ were highly fragmented and generated variable estimates of TE content across sequencing and assembly technologies. We therefore developed a novel WGA-independent bioinformatics method called TELR that identifies, locally assembles, and estimates allele frequency of TEs from long-read sequence data (https://github.com/bergmanlab/telr). Application of TELR to a ∼130x PacBio dataset for S2R+ revealed many haplotype-specific TE insertions that arose by transposition after initial cell line establishment and subsequent tetraploidization. Local assemblies from TELR also allowed phylogenetic analysis of paralogous TEs, which revealed that proliferation of TE families in vitro can be driven by single or multiple source lineages. Our work provides a model for the analysis of TEs in complex heterozygous or polyploid genomes that are recalcitrant to WGA and yields new insights into the mechanisms of genome evolution in animal cell culture.


Subject(s)
DNA Transposable Elements , Polyploidy , Animals , DNA Transposable Elements/genetics , Phylogeny , Drosophila/genetics , Cell Line
4.
Genetics ; 221(3)2022 07 04.
Article in English | MEDLINE | ID: mdl-35536183

ABSTRACT

Cultured cells are widely used in molecular biology despite poor understanding of how cell line genomes change in vitro over time. Previous work has shown that Drosophila cultured cells have a higher transposable element content than whole flies, but whether this increase in transposable element content resulted from an initial burst of transposition during cell line establishment or ongoing transposition in cell culture remains unclear. Here, we sequenced the genomes of 25 sublines of Drosophila S2 cells and show that transposable element insertions provide abundant markers for the phylogenetic reconstruction of diverse sublines in a model animal cell culture system. DNA copy number evolution across S2 sublines revealed dramatically different patterns of genome organization that support the overall evolutionary history reconstructed using transposable element insertions. Analysis of transposable element insertion site occupancy and ancestral states support a model of ongoing transposition dominated by episodic activity of a small number of retrotransposon families. Our work demonstrates that substantial genome evolution occurs during long-term Drosophila cell culture, which may impact the reproducibility of experiments that do not control for subline identity.


Subject(s)
Drosophila , Genome, Insect , Animals , Cell Culture Techniques , DNA Transposable Elements/genetics , Drosophila/genetics , Drosophila melanogaster/genetics , Evolution, Molecular , Phylogeny , Reproducibility of Results
5.
G3 (Bethesda) ; 12(2)2022 02 04.
Article in English | MEDLINE | ID: mdl-34849844

ABSTRACT

Drosophila cell lines are used by researchers to investigate various cell biological phenomena. It is crucial to exercise good cell culture practice. Poor handling can lead to both inter- and intra-species cross-contamination. Prolonged culturing can lead to introduction of large- and small-scale genomic changes. These factors, therefore, make it imperative that methods to authenticate Drosophila cell lines are developed to ensure reproducibility. Mammalian cell line authentication is reliant on short tandem repeat (STR) profiling; however, the relatively low STR mutation rate in Drosophila melanogaster at the individual level is likely to preclude the value of this technique. In contrast, transposable elements (TEs) are highly polymorphic among individual flies and abundant in Drosophila cell lines. Therefore, we investigated the utility of TE insertions as markers to discriminate Drosophila cell lines derived from the same or different donor genotypes, divergent sub-lines of the same cell line, and from other insect cell lines. We developed a PCR-based next-generation sequencing protocol to cluster cell lines based on the genome-wide distribution of a limited number of diagnostic TE families. We determined the distribution of five TE families in S2R+, S2-DRSC, S2-DGRC, Kc167, ML-DmBG3-c2, mbn2, CME W1 Cl.8+, and ovarian somatic sheath Drosophila cell lines. Two independent downstream analyses of the next-generation sequencing data yielded similar clustering of these cell lines. Double-blind testing of the protocol reliably identified various Drosophila cell lines. In addition, our data indicate minimal changes with respect to the genome-wide distribution of these five TE families when cells are passaged for at least 50 times. The protocol developed can accurately identify and distinguish the numerous Drosophila cell lines available to the research community, thereby aiding reproducible Drosophila cell culture research.


Subject(s)
Cell Line , DNA Transposable Elements , Drosophila , Animals , DNA Transposable Elements/genetics , Drosophila/genetics , Drosophila melanogaster/genetics , Genome, Insect , Reproducibility of Results
6.
Genetics ; 219(2)2021 10 02.
Article in English | MEDLINE | ID: mdl-34849875

ABSTRACT

Cell culture systems allow key insights into biological mechanisms yet suffer from irreproducible outcomes in part because of cross-contamination or mislabeling of cell lines. Cell line misidentification can be mitigated by the use of genotyping protocols, which have been developed for human cell lines but are lacking for many important model species. Here, we leverage the classical observation that transposable elements (TEs) proliferate in cultured Drosophila cells to demonstrate that genome-wide TE insertion profiles can reveal the identity and provenance of Drosophila cell lines. We identify multiple cases where TE profiles clarify the origin of Drosophila cell lines (Sg4, mbn2, and OSS_E) relative to published reports, and also provide evidence that insertions from only a subset of long-terminal repeat retrotransposon families are necessary to mark Drosophila cell line identity. We also develop a new bioinformatics approach to detect TE insertions and estimate intra-sample allele frequencies in legacy whole-genome sequencing data (called ngs_te_mapper2), which revealed loss of heterozygosity as a mechanism shaping the unique TE profiles that identify Drosophila cell lines. Our work contributes to the general understanding of the forces impacting metazoan genomes as they evolve in cell culture and paves the way for high-throughput protocols that use TE insertions to authenticate cell lines in Drosophila and other organisms.


Subject(s)
Cell Line Authentication/methods , DNA Transposable Elements , Drosophila melanogaster/genetics , Loss of Heterozygosity , Animals , Cell Line , Cells, Cultured , Drosophila melanogaster/cytology , Whole Genome Sequencing/methods
7.
EBioMedicine ; 69: 103446, 2021 Jul.
Article in English | MEDLINE | ID: mdl-34157485

ABSTRACT

BACKGROUND: Breast cancers can be divided into HER2-negative and HER2-positive subtypes according to different status of HER2 gene. Despite extensive studies connecting germline mutations with possible risk of HER2-negative breast cancer, the main category of breast cancer, it remains challenging to obtain accurate risk assessment and to understand the potential underlying mechanisms. METHODS: We developed a novel framework named Damage Assessment of Genomic Mutations (DAGM), which projects rare coding mutations and gene expressions into Activity Profiles of Signalling Pathways (APSPs). FINDINGS: We characterized and validated DAGM framework at multiple levels. Based on an input of germline rare coding mutations, we obtained the corresponding APSP spectrum to calculate the APSP risk score, which was capable of distinguish HER2-negative from HER2-positive cases. These findings were validated using breast cancer data from TCGA (AUC = 0.7). DAGM revealed that HER2 signalling pathway was up-regulated in germline of HER2-negative patients, and those with high APSP risk scores had exhibited immune suppression. These findings were validated using RNA sequencing, phosphoproteome analysis, and CyTOF. Moreover, using germline mutations, DAGM could evaluate the risk for HER2-negative breast cancer, not only in women carrying BRCA1/2 mutations, but also in those without known disease-associated mutations. INTERPRETATION: The DAGM can facilitate the screening of subjects at high risk of HER2-negative breast cancer for primary prevention. This study also provides new insights into the potential mechanisms of developing HER2-negative breast cancer. The DAGM has the potential to be applied in the prevention, diagnosis, and treatment of HER2-negative breast cancer. FUNDING: This work was supported by the National Key Research and Development Program of China (grant no. 2018YFC0910406 and 2018AAA0103302 to CZ); the National Natural Science Foundation of China (grant no. 81202076 and 82072939 to MY, 81871513 to KW); the Guangzhou Science and Technology Program key projects (grant no. 2014J2200007 to MY, 202002030236 to KW); the National Key R&D Program of China (grant no. 2017YFC1309100 to CL); Shenzhen Science and Technology Planning Project (grant no. JCYJ20170817095211560 574 to YN); and the Natural Science Foundation of Guangdong Province (grant no. 2017A030313882 to KW and S2013010012048 to MY); Hefei National Laboratory for Physical Sciences at the Microscale (grant no. KF2020009 to GN); and RGC General Research Fund (grant no. 17114519 to YQS).


Subject(s)
Breast Neoplasms/genetics , Genetic Predisposition to Disease , Genetic Testing/methods , Germ-Line Mutation , Receptor, ErbB-2/genetics , Adult , Aged , Aged, 80 and over , Algorithms , Breast Neoplasms/pathology , Female , Humans , Middle Aged , Signal Transduction , Transcriptome
8.
PeerJ ; 5: e3824, 2017.
Article in English | MEDLINE | ID: mdl-28929030

ABSTRACT

The Drosophila melanogaster P transposable element provides one of the best cases of horizontal transfer of a mobile DNA sequence in eukaryotes. Invasion of natural populations by the P element has led to a syndrome of phenotypes known as P-M hybrid dysgenesis that emerges when strains differing in their P element composition mate and produce offspring. Despite extensive research on many aspects of P element biology, many questions remain about the genomic basis of variation in P-M dysgenesis phenotypes across populations. Here we compare estimates of genomic P element content with gonadal dysgenesis phenotypes for isofemale strains obtained from three worldwide populations of D. melanogaster to illuminate the molecular basis of natural variation in cytotype status. We show that P element abundance estimated from genome sequences of isofemale strains is highly correlated across different bioinformatics approaches, but that abundance estimates are sensitive to method and filtering strategies as well as incomplete inbreeding of isofemale strains. We find that P element content varies significantly across populations, with strains from a North American population having fewer P elements but a higher proportion of full-length elements than strains from populations sampled in Europe or Africa. Despite these geographic differences in P element abundance and structure, neither the number of P elements nor the ratio of full-length to internally-truncated copies is strongly correlated with the degree of gonadal dysgenesis exhibited by an isofemale strain. Thus, variation in P element abundance and structure across different populations does not necessarily lead to corresponding geographic differences in gonadal dysgenesis phenotypes. Finally, we confirm that population differences in the abundance and structure of P elements that are observed from isofemale lines can also be observed in pool-seq samples from the same populations. Our work supports the view that genomic P element content alone is not sufficient to explain variation in gonadal dysgenesis across strains of D. melanogaster, and informs future efforts to decode the genomic basis of geographic and temporal differences in P element induced phenotypes.

SELECTION OF CITATIONS
SEARCH DETAIL