Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 53
Filter
1.
bioRxiv ; 2024 Mar 16.
Article in English | MEDLINE | ID: mdl-38559060

ABSTRACT

Bruton's tyrosine kinase (BTK) inhibitors are effective for the treatment of chronic lymphocytic leukemia (CLL) due to BTK's role in B cell survival and proliferation. Treatment resistance is most commonly caused by the emergence of the hallmark BTKC481S mutation that inhibits drug binding. In this study, we aimed to investigate whether the presence of additional CLL driver mutations in cancer subclones harboring a BTKC481S mutation accelerates subclone expansion. In addition, we sought to determine whether BTK-mutated subclones exhibit distinct transcriptomic behavior when compared to other cancer subclones. To achieve these goals, we employ our recently published method (Qiao et al. 2024) that combines bulk DNA sequencing and single-cell RNA sequencing (scRNA-seq) data to genotype individual cells for the presence or absence of subclone-defining mutations. While the most common approach for scRNA-seq includes short-read sequencing, transcript coverage is limited due to the vast majority of the reads being concentrated at the priming end of the transcript. Here, we utilized MAS-seq, a long-read scRNAseq technology, to substantially increase transcript coverage across the entire length of the transcripts and expand the set of informative mutations to link cells to cancer subclones in six CLL patients who acquired BTKC481S mutations during BTK inhibitor treatment. We found that BTK-mutated subclones often acquire additional mutations in CLL driver genes, leading to faster subclone proliferation. When examining subclone-specific gene expression, we found that in one patient, BTK-mutated subclones are transcriptionally distinct from the rest of the malignant B cell population with an overexpression of CLL-relevant genes.

2.
Genome Res ; 34(2): 179-188, 2024 Mar 20.
Article in English | MEDLINE | ID: mdl-38355308

ABSTRACT

A mechanistic understanding of the biological and technical factors that impact transcript measurements is essential to designing and analyzing single-cell and single-nucleus RNA sequencing experiments. Nuclei contain the same pre-mRNA population as cells, but they contain a small subset of the mRNAs. Nonetheless, early studies argued that single-nucleus analysis yielded results comparable to cellular samples if pre-mRNA measurements were included. However, typical workflows do not distinguish between pre-mRNA and mRNA when estimating gene expression, and variation in their relative abundances across cell types has received limited attention. These gaps are especially important given that incorporating pre-mRNA has become commonplace for both assays, despite known gene length bias in pre-mRNA capture. Here, we reanalyze public data sets from mouse and human to describe the mechanisms and contrasting effects of mRNA and pre-mRNA sampling on gene expression and marker gene selection in single-cell and single-nucleus RNA-seq. We show that pre-mRNA levels vary considerably among cell types, which mediates the degree of gene length bias and limits the generalizability of a recently published normalization method intended to correct for this bias. As an alternative, we repurpose an existing post hoc gene length-based correction method from conventional RNA-seq gene set enrichment analysis. Finally, we show that inclusion of pre-mRNA in bioinformatic processing can impart a larger effect than assay choice itself, which is pivotal to the effective reuse of existing data. These analyses advance our understanding of the sources of variation in single-cell and single-nucleus RNA-seq experiments and provide useful guidance for future studies.


Subject(s)
Cell Nucleus , RNA Precursors , Humans , Animals , Mice , RNA-Seq , RNA, Messenger/genetics , Sequence Analysis, RNA/methods , Cell Nucleus/genetics , Gene Expression Profiling/methods , Single-Cell Analysis
3.
Genome Res ; 34(1): 94-105, 2024 Feb 07.
Article in English | MEDLINE | ID: mdl-38195207

ABSTRACT

Genetic and gene expression heterogeneity is an essential hallmark of many tumors, allowing the cancer to evolve and to develop resistance to treatment. Currently, the most commonly used data types for studying such heterogeneity are bulk tumor/normal whole-genome or whole-exome sequencing (WGS, WES); and single-cell RNA sequencing (scRNA-seq), respectively. However, tools are currently lacking to link genomic tumor subclonality with transcriptomic heterogeneity by integrating genomic and single-cell transcriptomic data collected from the same tumor. To address this gap, we developed scBayes, a Bayesian probabilistic framework that uses tumor subclonal structure inferred from bulk DNA sequencing data to determine the subclonal identity of cells from single-cell gene expression (scRNA-seq) measurements. Grouping together cells representing the same genetically defined tumor subclones allows comparison of gene expression across different subclones, or investigation of gene expression changes within the same subclone across time (i.e., progression, treatment response, or relapse) or space (i.e., at multiple metastatic sites and organs). We used simulated data sets, in silico synthetic data sets, as well as biological data sets generated from cancer samples to extensively characterize and validate the performance of our method, as well as to show improvements over existing methods. We show the validity and utility of our approach by applying it to published data sets and recapitulating the findings, as well as arriving at novel insights into cancer subclonal expression behavior in our own data sets. We further show that our method is applicable to a wide range of single-cell sequencing technologies including single-cell DNA sequencing as well as Smart-seq and 10x Genomics scRNA-seq protocols.


Subject(s)
Neoplasms , Humans , Exome Sequencing , Bayes Theorem , Neoplasms/genetics , Gene Expression Profiling/methods , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods
4.
Bioinformatics ; 39(8)2023 08 01.
Article in English | MEDLINE | ID: mdl-37498562

ABSTRACT

MOTIVATION: In time-critical clinical settings, such as precision medicine, genomic data needs to be processed as fast as possible to arrive at data-informed treatment decisions in a timely fashion. While sequencing throughput has dramatically increased over the past decade, bioinformatics analysis throughput has not been able to keep up with the pace of computer hardware improvement, and consequently has now turned into the primary bottleneck. Modern computer hardware today is capable of much higher performance than current genomic informatics algorithms can typically utilize, therefore presenting opportunities for significant improvement of performance. Accessing the raw sequencing data from BAM files, e.g. is a necessary and time-consuming step in nearly all sequence analysis tools, however existing programming libraries for BAM access do not take full advantage of the parallel input/output capabilities of storage devices. RESULTS: In an effort to stimulate the development of a new generation of faster sequence analysis tools, we developed quickBAM, a software library to accelerate sequencing data access by exploiting the parallelism in commodity storage hardware currently widely available. We demonstrate that analysis software ported to quickBAM consistently outperforms their current versions, in some cases finishing an analysis in under 3 min while the original version took 1.5 h, using the same storage solution. AVAILABILITY AND IMPLEMENTATION: Open source and freely available at https://gitlab.com/yiq/quickbam/, we envision that quickBAM will enable a new generation of high-performance informatics tools, either directly boosting their performance if they are currently data-access bottlenecked, or allow data-access to keep up with further optimizations in algorithms and compute techniques.


Subject(s)
Algorithms , Software , High-Throughput Nucleotide Sequencing/methods , Genomics , Informatics , Sequence Analysis, DNA/methods
6.
Nat Cancer ; 3(2): 232-250, 2022 02.
Article in English | MEDLINE | ID: mdl-35221336

ABSTRACT

Models that recapitulate the complexity of human tumors are urgently needed to develop more effective cancer therapies. We report a bank of human patient-derived xenografts (PDXs) and matched organoid cultures from tumors that represent the greatest unmet need: endocrine-resistant, treatment-refractory and metastatic breast cancers. We leverage matched PDXs and PDX-derived organoids (PDxO) for drug screening that is feasible and cost-effective with in vivo validation. Moreover, we demonstrate the feasibility of using these models for precision oncology in real time with clinical care in a case of triple-negative breast cancer (TNBC) with early metastatic recurrence. Our results uncovered a Food and Drug Administration (FDA)-approved drug with high efficacy against the models. Treatment with this therapy resulted in a complete response for the individual and a progression-free survival (PFS) period more than three times longer than their previous therapies. This work provides valuable methods and resources for functional precision medicine and drug development for human breast cancer.


Subject(s)
Organoids , Triple Negative Breast Neoplasms , Drug Discovery , Heterografts , Humans , Precision Medicine/methods , Triple Negative Breast Neoplasms/drug therapy , United States , Xenograft Model Antitumor Assays
7.
mSystems ; 6(6): e0119621, 2021 Dec 21.
Article in English | MEDLINE | ID: mdl-34874774

ABSTRACT

Evolve and resequencing (E&R) was applied to lab adaptation of Toxoplasma gondii for over 1,500 generations with the goal of mapping host-independent in vitro virulence traits. Phenotypic assessments of steps across the lytic cycle revealed that only traits needed in the extracellular milieu evolved. Nonsynonymous single-nucleotide polymorphisms (SNPs) in only one gene, a P4 flippase, fixated across two different evolving populations, whereas dramatic changes in the transcriptional signature of extracellular parasites were identified. Newly developed computational tools correlated phenotypes evolving at different rates with specific transcriptomic changes. A set of 300 phenotype-associated genes was mapped, of which nearly 50% is annotated as hypothetical. Validation of a select number of genes by knockouts confirmed their role in lab adaptation and highlights novel mechanisms underlying in vitro virulence traits. Further analyses of differentially expressed genes revealed the development of a "pro-tachyzoite" profile as well as the upregulation of the fatty acid biosynthesis (FASII) pathway. The latter aligned with the P4 flippase SNP and aligned with a low abundance of medium-chain fatty acids at low passage, indicating this is a limiting factor in extracellular parasites. In addition, partial overlap with the bradyzoite differentiation transcriptome in extracellular parasites indicated that stress pathways are involved in both situations. This was reflected in the partial overlap between the assembled ApiAP2 and Myb transcription factor network underlying the adapting extracellular state with the bradyzoite differentiation program. Overall, E&R is a new genomic tool successfully applied to map the development of polygenic traits underlying in vitro virulence of T. gondii. IMPORTANCE It has been well established that prolonged in vitro cultivation of Toxoplasma gondii augments progression of the lytic cycle. This lab adaptation results in increased capacities to divide, migrate, and survive outside a host cell, all of which are considered host-independent virulence factors. However, the mechanistic basis underlying these enhanced virulence features is unknown. Here, E&R was utilized to empirically characterize the phenotypic, genomic, and transcriptomic changes in the non-lab-adapted strain, GT1, during 2.5 years of lab adaptation. This identified the shutdown of stage differentiation and upregulation of lipid biosynthetic pathways as the key processes being modulated. Furthermore, lab adaptation was primarily driven by transcriptional reprogramming, which rejected the starting hypothesis that genetic mutations would drive lab adaptation. Overall, the work empirically shows that lab adaptation augments T. gondii's in vitro virulence by transcriptional reprogramming and that E&R is a powerful new tool to map multigenic traits.

8.
Genome Med ; 13(1): 170, 2021 10 28.
Article in English | MEDLINE | ID: mdl-34711268

ABSTRACT

BACKGROUND: Metastatic breast cancer is a deadly disease with a low 5-year survival rate. Tracking metastatic spread in living patients is difficult and thus poorly understood. METHODS: Via rapid autopsy, we have collected 30 tumor samples over 3 timepoints and across 8 organs from a triple-negative metastatic breast cancer patient. The large number of sites sampled, together with deep whole-genome sequencing and advanced computational analysis, allowed us to comprehensively reconstruct the tumor's evolution at subclonal resolution. RESULTS: The most unique, previously unreported aspect of the tumor's evolution that we observed in this patient was the presence of "subclone incubators," defined as metastatic sites where substantial tumor evolution occurs before colonization of additional sites and organs by subclones that initially evolved at the incubator site. Overall, we identified four discrete waves of metastatic expansions, each of which resulted in a number of new, genetically similar metastasis sites that also enriched for particular organs (e.g., abdominal vs bone and brain). The lung played a critical role in facilitating metastatic spread in this patient: the lung was the first site of metastatic escape from the primary breast lesion, subclones at this site were likely the source of all four subsequent metastatic waves, and multiple sites in the lung acted as subclone incubators. Finally, functional annotation revealed that many known drivers or metastasis-promoting tumor mutations in this patient were shared by some, but not all metastatic sites, highlighting the need for more comprehensive surveys of a patient's metastases for effective clinical intervention. CONCLUSIONS: Our analysis revealed the presence of substantial tumor evolution at metastatic incubator sites in a patient, with potentially important clinical implications. Our study demonstrated that sampling of a large number of metastatic sites affords unprecedented detail for studying metastatic evolution.


Subject(s)
Autopsy , Breast Neoplasms/classification , Breast Neoplasms/genetics , Neoplasm Metastasis , Biopsy , Evolution, Molecular , Female , Humans , Middle Aged , Mutation , Phylogeny
9.
Genome Med ; 13(1): 46, 2021 03 26.
Article in English | MEDLINE | ID: mdl-33771218

ABSTRACT

BACKGROUND: DNA sequencing has unveiled extensive tumor heterogeneity in several different cancer types, with many exhibiting diverse subclonal populations. Identifying and tracing mutations throughout the expansion and progression of a tumor represents a significant challenge. Furthermore, prioritizing the subset of such mutations most likely to contribute to tumor evolution or that could serve as potential therapeutic targets represents an ongoing problem. RESULTS: Here, we describe OncoGEMINI, a new tool designed for exploring the complex patterns and trajectory of somatic and inherited variation observed in heterogeneous tumors biopsied over the course of treatment. This is accomplished by creating a searchable database of variants that includes tumor sampling time points and allows for filtering methods that reflect specific changes in variant allele frequencies over time. Additionally, by incorporating existing annotations and resources that facilitate the interpretation of cancer mutations (e.g., CIViC, DGIdb), OncoGEMINI enables rapid searches for, and potential identification of, mutations that may be driving subclonal evolution. CONCLUSIONS: By combining relevant genomic annotations alongside specific filtering tools, OncoGEMINI provides powerful and customizable approaches that enable the quick identification of individual tumor variants that meet specified criteria. It can be applied to a wide range of tumor-derived sequence data, but is especially designed for studies with multiple samples, including longitudinal datasets. It is available under an MIT license at github.com/fakedrtom/oncogemini .


Subject(s)
Breast Neoplasms/genetics , Breast Neoplasms/pathology , Genetic Variation , Software , Biopsy , Databases, Genetic , Female , Humans , Longitudinal Studies , Neoplasm Metastasis
10.
PLoS One ; 15(2): e0229063, 2020.
Article in English | MEDLINE | ID: mdl-32084206

ABSTRACT

Challenges with distinguishing circulating tumor DNA (ctDNA) from next-generation sequencing (NGS) artifacts limits variant searches to established solid tumor mutations. Here we show early and random PCR errors are a principal source of NGS noise that persist despite duplex molecular barcoding, removal of artifacts due to clonal hematopoiesis of indeterminate potential, and suppression of patterned errors. We also demonstrate sample duplicates are necessary to eliminate the stochastic noise associated with NGS. Integration of sample duplicates into NGS analytics may broaden ctDNA applications by removing NGS-related errors that confound identification of true very low frequency variants during searches for ctDNA without a priori knowledge of specific mutations to target.


Subject(s)
Circulating Tumor DNA/genetics , High-Throughput Nucleotide Sequencing/methods , Adult , DNA Barcoding, Taxonomic , Female , Hematopoiesis/genetics , Humans , Male , Middle Aged
11.
Science ; 362(6420)2018 12 14.
Article in English | MEDLINE | ID: mdl-30545852

ABSTRACT

Whole-genome sequencing (WGS) has facilitated the first genome-wide evaluations of the contribution of de novo noncoding mutations to complex disorders. Using WGS, we identified 255,106 de novo mutations among sample genomes from members of 1902 quartet families in which one child, but not a sibling or their parents, was affected by autism spectrum disorder (ASD). In contrast to coding mutations, no noncoding functional annotation category, analyzed in isolation, was significantly associated with ASD. Casting noncoding variation in the context of a de novo risk score across multiple annotation categories, however, did demonstrate association with mutations localized to promoter regions. We found that the strongest driver of this promoter signal emanates from evolutionarily conserved transcription factor binding sites distal to the transcription start site. These data suggest that de novo mutations in promoter regions, characterized by evolutionary and functional signatures, contribute to ASD.


Subject(s)
Autism Spectrum Disorder/genetics , Mutation , Promoter Regions, Genetic/genetics , Binding Sites/genetics , Conserved Sequence , DNA Mutational Analysis , Genetic Loci , Genetic Variation , Humans , Pedigree , Risk , Transcription Factors/metabolism
12.
NPJ Genom Med ; 3: 22, 2018.
Article in English | MEDLINE | ID: mdl-30109124

ABSTRACT

Early infantile epileptic encephalopathy (EIEE) is a devastating epilepsy syndrome with onset in the first months of life. Although mutations in more than 50 different genes are known to cause EIEE, current diagnostic yields with gene panel tests or whole-exome sequencing are below 60%. We applied whole-genome analysis (WGA) consisting of whole-genome sequencing and comprehensive variant discovery approaches to a cohort of 14 EIEE subjects for whom prior genetic tests had not yielded a diagnosis. We identified both de novo point and INDEL mutations and de novo structural rearrangements in known EIEE genes, as well as mutations in genes not previously associated with EIEE. The detection of a pathogenic or likely pathogenic mutation in all 14 subjects demonstrates the utility of WGA to reduce the time and costs of clinical diagnosis of EIEE. While exome sequencing may have detected 12 of the 14 causal mutations, 3 of the 12 patients received non-diagnostic exome panel tests prior to genome sequencing. Thus, given the continued decline of sequencing costs, our results support the use of WGA with comprehensive variant discovery as an efficient strategy for the clinical diagnosis of EIEE and other genetic conditions.

13.
PLoS One ; 13(7): e0197333, 2018.
Article in English | MEDLINE | ID: mdl-30044795

ABSTRACT

Circulating tumor-derived cell-free DNA (ctDNA) enables non-invasive diagnosis, monitoring, and treatment susceptibility testing in human cancers. However, accurate detection of variant alleles, particularly during untargeted searches, remains a principal obstacle to widespread application of cell-free DNA in clinical oncology. In this study, isolation of short cell-free DNA fragments is shown to enrich for tumor variants and improve correction of PCR- and sequencing-associated errors. Subfractions of the mononucleosome of circulating cell-free DNA (ccfDNA) were isolated from patients with melanoma, pancreatic ductal adenocarcinoma, and colorectal adenocarcinoma using a high-throughput-capable automated gel-extraction platform. Using a 128-gene (128 kb) custom next-generation sequencing panel, variant alleles were on average 2-fold enriched in the short fraction (median insert size: ~142 bp) compared to the original ccfDNA sample, while 0.7-fold reduced in the fraction corresponding to the principal peak of the mononucleosome (median insert size: ~167 bp). Size-selected short fractions compared to the original ccfDNA yielded significantly larger family sizes (i.e., PCR duplicates) during in silico consensus sequence interpretation via unique molecular identifiers. Increments in family size were associated with a progressive reduction of PCR and sequencing errors. Although consensus read depth also decreased at larger family sizes, the variant allele frequency in the short ccfDNA fraction remained consistent, while variant detection in the original ccfDNA was commonly lost at family sizes necessary to minimize errors. These collective findings support the automated extraction of short ccfDNA fragments to enrich for ctDNA while concomitantly reducing false positives through in silico error correction.


Subject(s)
Cell-Free Nucleic Acids/blood , Circulating Tumor DNA/blood , High-Throughput Nucleotide Sequencing , Neoplasms/blood , Alleles , Cell-Free Nucleic Acids/genetics , Circulating Tumor DNA/genetics , Consensus Sequence , DNA Fragmentation , Humans , Neoplasms/genetics , Neoplasms/pathology
14.
Nat Genet ; 50(5): 727-736, 2018 04 26.
Article in English | MEDLINE | ID: mdl-29700473

ABSTRACT

Genomic association studies of common or rare protein-coding variation have established robust statistical approaches to account for multiple testing. Here we present a comparable framework to evaluate rare and de novo noncoding single-nucleotide variants, insertion/deletions, and all classes of structural variation from whole-genome sequencing (WGS). Integrating genomic annotations at the level of nucleotides, genes, and regulatory regions, we define 51,801 annotation categories. Analyses of 519 autism spectrum disorder families did not identify association with any categories after correction for 4,123 effective tests. Without appropriate correction, biologically plausible associations are observed in both cases and controls. Despite excluding previously identified gene-disrupting mutations, coding regions still exhibited the strongest associations. Thus, in autism, the contribution of de novo noncoding variation is probably modest in comparison to that of de novo coding variants. Robust results from future WGS studies will require large cohorts and comprehensive analytical strategies that consider the substantial multiple-testing burden.


Subject(s)
Autism Spectrum Disorder/genetics , Genetic Predisposition to Disease/genetics , INDEL Mutation/genetics , Polymorphism, Single Nucleotide/genetics , Protein Isoforms/genetics , Female , Genome/genetics , Genome-Wide Association Study/methods , Humans , Male
16.
Nat Methods ; 15(2): 123-126, 2018 02.
Article in English | MEDLINE | ID: mdl-29309061

ABSTRACT

GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE (https://github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE, Roadmap Epigenomics, and GTEx by facilitating data integration and hypothesis generation.


Subject(s)
Breast Neoplasms/genetics , Genome, Human , Genomics/methods , Search Engine/methods , Sequence Analysis, DNA/methods , Software , Databases, Genetic , Female , Humans , Internet
17.
J Clin Transl Sci ; 1(6): 381-386, 2017 Dec.
Article in English | MEDLINE | ID: mdl-29707261

ABSTRACT

INTRODUCTION: Computational analysis of genome or exome sequences may improve inherited disease diagnosis, but is costly and time-consuming. METHODS: We describe the use of iobio, a web-based tool suite for intuitive, real-time genome diagnostic analyses. RESULTS: We used iobio to identify the disease-causing variant in a patient with early infantile epileptic encephalopathy with prior nondiagnostic genetic testing. CONCLUSIONS: Iobio tools can be used by clinicians to rapidly identify disease-causing variants from genomic patient sequencing data.

18.
Nat Methods ; 12(10): 966-8, 2015 Oct.
Article in English | MEDLINE | ID: mdl-26258291

ABSTRACT

SpeedSeq is an open-source genome analysis platform that accomplishes alignment, variant detection and functional annotation of a 50× human genome in 13 h on a low-cost server and alleviates a bioinformatics bottleneck that typically demands weeks of computation with extensive hands-on expert involvement. SpeedSeq offers performance competitive with or superior to current methods for detecting germline and somatic single-nucleotide variants, structural variants, insertions and deletions, and it includes novel functionality for streamlined interpretation.


Subject(s)
Genome, Human , High-Throughput Nucleotide Sequencing/methods , Molecular Sequence Annotation/methods , Software , Genetic Variation , Humans , Neoplasms/genetics , Polymorphism, Single Nucleotide , Precision Medicine/methods , Workflow
19.
Genome Biol Evol ; 7(9): 2608-22, 2015 Aug 29.
Article in English | MEDLINE | ID: mdl-26319576

ABSTRACT

The goal of the 1000 Genomes Consortium is to characterize human genome structural variation (SV), including forms of copy number variations such as deletions, duplications, and insertions. Mobile element insertions, particularly Alu elements, are major contributors to genomic SV among humans. During the pilot phase of the project we experimentally validated 645 (611 intergenic and 34 exon targeted) polymorphic "young" Alu insertion events, absent from the human reference genome. Here, we report high resolution sequencing of 343 (322 unique) recent Alu insertion events, along with their respective target site duplications, precise genomic breakpoint coordinates, subfamily assignment, percent divergence, and estimated A-rich tail lengths. All the sequenced Alu loci were derived from the AluY lineage with no evidence of retrotransposition activity involving older Alu families (e.g., AluJ and AluS). AluYa5 is currently the most active Alu subfamily in the human lineage, followed by AluYb8, and many others including three newly identified subfamilies we have termed AluYb7a3, AluYb8b1, and AluYa4a1. This report provides the structural details of 322 unique Alu variants from individual human genomes collectively adding about 100 kb of genomic variation. Many Alu subfamilies are currently active in human populations, including a surprising level of AluY retrotransposition. Human Alu subfamilies exhibit continuous evolution with potential drivers sprouting new Alu lineages.


Subject(s)
Alu Elements , Evolution, Molecular , Genetic Variation , Genome, Human , Human Genome Project , Humans , Sequence Analysis, DNA , Sequence Deletion
20.
Cancer Inform ; 14(Suppl 1): 37-44, 2015.
Article in English | MEDLINE | ID: mdl-25931804

ABSTRACT

Mobile elements constitute greater than 45% of the human genome as a result of repeated insertion events during human genome evolution. Although most of mobile elements are fixed within the human population, some elements (including ALU, long interspersed elements (LINE) 1 (L1), and SVA) are still actively duplicating and may result in life-threatening human diseases such as cancer, motivating the need for accurate mobile-element insertion (MEI) detection tools. We developed a software package, TANGRAM, for MEI detection in next-generation sequencing data, currently serving as the primary MEI detection tool in the 1000 Genomes Project. TANGRAM takes advantage of valuable mapping information provided by our own MOSAIK mapper, and until recently required MOSAIK mappings as its input. In this study, we report a new feature that enables TANGRAM to be used on alignments generated by any mainstream short-read mapper, making it accessible for many genomic users. To demonstrate its utility for cancer genome analysis, we have applied TANGRAM to the TCGA (The Cancer Genome Atlas) mutation calling benchmark 4 dataset. TANGRAM is fast, accurate, easy to use, and open source on https://github.com/jiantao/Tangram.

SELECTION OF CITATIONS
SEARCH DETAIL
...