Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 53
Filtrar
1.
bioRxiv ; 2024 Mar 16.
Artículo en Inglés | MEDLINE | ID: mdl-38559060

RESUMEN

Bruton's tyrosine kinase (BTK) inhibitors are effective for the treatment of chronic lymphocytic leukemia (CLL) due to BTK's role in B cell survival and proliferation. Treatment resistance is most commonly caused by the emergence of the hallmark BTKC481S mutation that inhibits drug binding. In this study, we aimed to investigate whether the presence of additional CLL driver mutations in cancer subclones harboring a BTKC481S mutation accelerates subclone expansion. In addition, we sought to determine whether BTK-mutated subclones exhibit distinct transcriptomic behavior when compared to other cancer subclones. To achieve these goals, we employ our recently published method (Qiao et al. 2024) that combines bulk DNA sequencing and single-cell RNA sequencing (scRNA-seq) data to genotype individual cells for the presence or absence of subclone-defining mutations. While the most common approach for scRNA-seq includes short-read sequencing, transcript coverage is limited due to the vast majority of the reads being concentrated at the priming end of the transcript. Here, we utilized MAS-seq, a long-read scRNAseq technology, to substantially increase transcript coverage across the entire length of the transcripts and expand the set of informative mutations to link cells to cancer subclones in six CLL patients who acquired BTKC481S mutations during BTK inhibitor treatment. We found that BTK-mutated subclones often acquire additional mutations in CLL driver genes, leading to faster subclone proliferation. When examining subclone-specific gene expression, we found that in one patient, BTK-mutated subclones are transcriptionally distinct from the rest of the malignant B cell population with an overexpression of CLL-relevant genes.

2.
Genome Res ; 34(2): 179-188, 2024 Mar 20.
Artículo en Inglés | MEDLINE | ID: mdl-38355308

RESUMEN

A mechanistic understanding of the biological and technical factors that impact transcript measurements is essential to designing and analyzing single-cell and single-nucleus RNA sequencing experiments. Nuclei contain the same pre-mRNA population as cells, but they contain a small subset of the mRNAs. Nonetheless, early studies argued that single-nucleus analysis yielded results comparable to cellular samples if pre-mRNA measurements were included. However, typical workflows do not distinguish between pre-mRNA and mRNA when estimating gene expression, and variation in their relative abundances across cell types has received limited attention. These gaps are especially important given that incorporating pre-mRNA has become commonplace for both assays, despite known gene length bias in pre-mRNA capture. Here, we reanalyze public data sets from mouse and human to describe the mechanisms and contrasting effects of mRNA and pre-mRNA sampling on gene expression and marker gene selection in single-cell and single-nucleus RNA-seq. We show that pre-mRNA levels vary considerably among cell types, which mediates the degree of gene length bias and limits the generalizability of a recently published normalization method intended to correct for this bias. As an alternative, we repurpose an existing post hoc gene length-based correction method from conventional RNA-seq gene set enrichment analysis. Finally, we show that inclusion of pre-mRNA in bioinformatic processing can impart a larger effect than assay choice itself, which is pivotal to the effective reuse of existing data. These analyses advance our understanding of the sources of variation in single-cell and single-nucleus RNA-seq experiments and provide useful guidance for future studies.


Asunto(s)
Núcleo Celular , Precursores del ARN , Humanos , Animales , Ratones , RNA-Seq , ARN Mensajero/genética , Análisis de Secuencia de ARN/métodos , Núcleo Celular/genética , Perfilación de la Expresión Génica/métodos , Análisis de la Célula Individual
3.
Genome Res ; 34(1): 94-105, 2024 Feb 07.
Artículo en Inglés | MEDLINE | ID: mdl-38195207

RESUMEN

Genetic and gene expression heterogeneity is an essential hallmark of many tumors, allowing the cancer to evolve and to develop resistance to treatment. Currently, the most commonly used data types for studying such heterogeneity are bulk tumor/normal whole-genome or whole-exome sequencing (WGS, WES); and single-cell RNA sequencing (scRNA-seq), respectively. However, tools are currently lacking to link genomic tumor subclonality with transcriptomic heterogeneity by integrating genomic and single-cell transcriptomic data collected from the same tumor. To address this gap, we developed scBayes, a Bayesian probabilistic framework that uses tumor subclonal structure inferred from bulk DNA sequencing data to determine the subclonal identity of cells from single-cell gene expression (scRNA-seq) measurements. Grouping together cells representing the same genetically defined tumor subclones allows comparison of gene expression across different subclones, or investigation of gene expression changes within the same subclone across time (i.e., progression, treatment response, or relapse) or space (i.e., at multiple metastatic sites and organs). We used simulated data sets, in silico synthetic data sets, as well as biological data sets generated from cancer samples to extensively characterize and validate the performance of our method, as well as to show improvements over existing methods. We show the validity and utility of our approach by applying it to published data sets and recapitulating the findings, as well as arriving at novel insights into cancer subclonal expression behavior in our own data sets. We further show that our method is applicable to a wide range of single-cell sequencing technologies including single-cell DNA sequencing as well as Smart-seq and 10x Genomics scRNA-seq protocols.


Asunto(s)
Neoplasias , Humanos , Secuenciación del Exoma , Teorema de Bayes , Neoplasias/genética , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos
4.
Bioinformatics ; 39(8)2023 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-37498562

RESUMEN

MOTIVATION: In time-critical clinical settings, such as precision medicine, genomic data needs to be processed as fast as possible to arrive at data-informed treatment decisions in a timely fashion. While sequencing throughput has dramatically increased over the past decade, bioinformatics analysis throughput has not been able to keep up with the pace of computer hardware improvement, and consequently has now turned into the primary bottleneck. Modern computer hardware today is capable of much higher performance than current genomic informatics algorithms can typically utilize, therefore presenting opportunities for significant improvement of performance. Accessing the raw sequencing data from BAM files, e.g. is a necessary and time-consuming step in nearly all sequence analysis tools, however existing programming libraries for BAM access do not take full advantage of the parallel input/output capabilities of storage devices. RESULTS: In an effort to stimulate the development of a new generation of faster sequence analysis tools, we developed quickBAM, a software library to accelerate sequencing data access by exploiting the parallelism in commodity storage hardware currently widely available. We demonstrate that analysis software ported to quickBAM consistently outperforms their current versions, in some cases finishing an analysis in under 3 min while the original version took 1.5 h, using the same storage solution. AVAILABILITY AND IMPLEMENTATION: Open source and freely available at https://gitlab.com/yiq/quickbam/, we envision that quickBAM will enable a new generation of high-performance informatics tools, either directly boosting their performance if they are currently data-access bottlenecked, or allow data-access to keep up with further optimizations in algorithms and compute techniques.


Asunto(s)
Algoritmos , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Genómica , Informática , Análisis de Secuencia de ADN/métodos
6.
Nat Cancer ; 3(2): 232-250, 2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-35221336

RESUMEN

Models that recapitulate the complexity of human tumors are urgently needed to develop more effective cancer therapies. We report a bank of human patient-derived xenografts (PDXs) and matched organoid cultures from tumors that represent the greatest unmet need: endocrine-resistant, treatment-refractory and metastatic breast cancers. We leverage matched PDXs and PDX-derived organoids (PDxO) for drug screening that is feasible and cost-effective with in vivo validation. Moreover, we demonstrate the feasibility of using these models for precision oncology in real time with clinical care in a case of triple-negative breast cancer (TNBC) with early metastatic recurrence. Our results uncovered a Food and Drug Administration (FDA)-approved drug with high efficacy against the models. Treatment with this therapy resulted in a complete response for the individual and a progression-free survival (PFS) period more than three times longer than their previous therapies. This work provides valuable methods and resources for functional precision medicine and drug development for human breast cancer.


Asunto(s)
Organoides , Neoplasias de la Mama Triple Negativas , Descubrimiento de Drogas , Xenoinjertos , Humanos , Medicina de Precisión/métodos , Neoplasias de la Mama Triple Negativas/tratamiento farmacológico , Estados Unidos , Ensayos Antitumor por Modelo de Xenoinjerto
7.
mSystems ; 6(6): e0119621, 2021 Dec 21.
Artículo en Inglés | MEDLINE | ID: mdl-34874774

RESUMEN

Evolve and resequencing (E&R) was applied to lab adaptation of Toxoplasma gondii for over 1,500 generations with the goal of mapping host-independent in vitro virulence traits. Phenotypic assessments of steps across the lytic cycle revealed that only traits needed in the extracellular milieu evolved. Nonsynonymous single-nucleotide polymorphisms (SNPs) in only one gene, a P4 flippase, fixated across two different evolving populations, whereas dramatic changes in the transcriptional signature of extracellular parasites were identified. Newly developed computational tools correlated phenotypes evolving at different rates with specific transcriptomic changes. A set of 300 phenotype-associated genes was mapped, of which nearly 50% is annotated as hypothetical. Validation of a select number of genes by knockouts confirmed their role in lab adaptation and highlights novel mechanisms underlying in vitro virulence traits. Further analyses of differentially expressed genes revealed the development of a "pro-tachyzoite" profile as well as the upregulation of the fatty acid biosynthesis (FASII) pathway. The latter aligned with the P4 flippase SNP and aligned with a low abundance of medium-chain fatty acids at low passage, indicating this is a limiting factor in extracellular parasites. In addition, partial overlap with the bradyzoite differentiation transcriptome in extracellular parasites indicated that stress pathways are involved in both situations. This was reflected in the partial overlap between the assembled ApiAP2 and Myb transcription factor network underlying the adapting extracellular state with the bradyzoite differentiation program. Overall, E&R is a new genomic tool successfully applied to map the development of polygenic traits underlying in vitro virulence of T. gondii. IMPORTANCE It has been well established that prolonged in vitro cultivation of Toxoplasma gondii augments progression of the lytic cycle. This lab adaptation results in increased capacities to divide, migrate, and survive outside a host cell, all of which are considered host-independent virulence factors. However, the mechanistic basis underlying these enhanced virulence features is unknown. Here, E&R was utilized to empirically characterize the phenotypic, genomic, and transcriptomic changes in the non-lab-adapted strain, GT1, during 2.5 years of lab adaptation. This identified the shutdown of stage differentiation and upregulation of lipid biosynthetic pathways as the key processes being modulated. Furthermore, lab adaptation was primarily driven by transcriptional reprogramming, which rejected the starting hypothesis that genetic mutations would drive lab adaptation. Overall, the work empirically shows that lab adaptation augments T. gondii's in vitro virulence by transcriptional reprogramming and that E&R is a powerful new tool to map multigenic traits.

8.
Genome Med ; 13(1): 170, 2021 10 28.
Artículo en Inglés | MEDLINE | ID: mdl-34711268

RESUMEN

BACKGROUND: Metastatic breast cancer is a deadly disease with a low 5-year survival rate. Tracking metastatic spread in living patients is difficult and thus poorly understood. METHODS: Via rapid autopsy, we have collected 30 tumor samples over 3 timepoints and across 8 organs from a triple-negative metastatic breast cancer patient. The large number of sites sampled, together with deep whole-genome sequencing and advanced computational analysis, allowed us to comprehensively reconstruct the tumor's evolution at subclonal resolution. RESULTS: The most unique, previously unreported aspect of the tumor's evolution that we observed in this patient was the presence of "subclone incubators," defined as metastatic sites where substantial tumor evolution occurs before colonization of additional sites and organs by subclones that initially evolved at the incubator site. Overall, we identified four discrete waves of metastatic expansions, each of which resulted in a number of new, genetically similar metastasis sites that also enriched for particular organs (e.g., abdominal vs bone and brain). The lung played a critical role in facilitating metastatic spread in this patient: the lung was the first site of metastatic escape from the primary breast lesion, subclones at this site were likely the source of all four subsequent metastatic waves, and multiple sites in the lung acted as subclone incubators. Finally, functional annotation revealed that many known drivers or metastasis-promoting tumor mutations in this patient were shared by some, but not all metastatic sites, highlighting the need for more comprehensive surveys of a patient's metastases for effective clinical intervention. CONCLUSIONS: Our analysis revealed the presence of substantial tumor evolution at metastatic incubator sites in a patient, with potentially important clinical implications. Our study demonstrated that sampling of a large number of metastatic sites affords unprecedented detail for studying metastatic evolution.


Asunto(s)
Autopsia , Neoplasias de la Mama/clasificación , Neoplasias de la Mama/genética , Metástasis de la Neoplasia , Biopsia , Evolución Molecular , Femenino , Humanos , Persona de Mediana Edad , Mutación , Filogenia
9.
Genome Med ; 13(1): 46, 2021 03 26.
Artículo en Inglés | MEDLINE | ID: mdl-33771218

RESUMEN

BACKGROUND: DNA sequencing has unveiled extensive tumor heterogeneity in several different cancer types, with many exhibiting diverse subclonal populations. Identifying and tracing mutations throughout the expansion and progression of a tumor represents a significant challenge. Furthermore, prioritizing the subset of such mutations most likely to contribute to tumor evolution or that could serve as potential therapeutic targets represents an ongoing problem. RESULTS: Here, we describe OncoGEMINI, a new tool designed for exploring the complex patterns and trajectory of somatic and inherited variation observed in heterogeneous tumors biopsied over the course of treatment. This is accomplished by creating a searchable database of variants that includes tumor sampling time points and allows for filtering methods that reflect specific changes in variant allele frequencies over time. Additionally, by incorporating existing annotations and resources that facilitate the interpretation of cancer mutations (e.g., CIViC, DGIdb), OncoGEMINI enables rapid searches for, and potential identification of, mutations that may be driving subclonal evolution. CONCLUSIONS: By combining relevant genomic annotations alongside specific filtering tools, OncoGEMINI provides powerful and customizable approaches that enable the quick identification of individual tumor variants that meet specified criteria. It can be applied to a wide range of tumor-derived sequence data, but is especially designed for studies with multiple samples, including longitudinal datasets. It is available under an MIT license at github.com/fakedrtom/oncogemini .


Asunto(s)
Neoplasias de la Mama/genética , Neoplasias de la Mama/patología , Variación Genética , Programas Informáticos , Biopsia , Bases de Datos Genéticas , Femenino , Humanos , Estudios Longitudinales , Metástasis de la Neoplasia
10.
PLoS One ; 15(2): e0229063, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32084206

RESUMEN

Challenges with distinguishing circulating tumor DNA (ctDNA) from next-generation sequencing (NGS) artifacts limits variant searches to established solid tumor mutations. Here we show early and random PCR errors are a principal source of NGS noise that persist despite duplex molecular barcoding, removal of artifacts due to clonal hematopoiesis of indeterminate potential, and suppression of patterned errors. We also demonstrate sample duplicates are necessary to eliminate the stochastic noise associated with NGS. Integration of sample duplicates into NGS analytics may broaden ctDNA applications by removing NGS-related errors that confound identification of true very low frequency variants during searches for ctDNA without a priori knowledge of specific mutations to target.


Asunto(s)
ADN Tumoral Circulante/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Adulto , Código de Barras del ADN Taxonómico , Femenino , Hematopoyesis/genética , Humanos , Masculino , Persona de Mediana Edad
11.
Science ; 362(6420)2018 12 14.
Artículo en Inglés | MEDLINE | ID: mdl-30545852

RESUMEN

Whole-genome sequencing (WGS) has facilitated the first genome-wide evaluations of the contribution of de novo noncoding mutations to complex disorders. Using WGS, we identified 255,106 de novo mutations among sample genomes from members of 1902 quartet families in which one child, but not a sibling or their parents, was affected by autism spectrum disorder (ASD). In contrast to coding mutations, no noncoding functional annotation category, analyzed in isolation, was significantly associated with ASD. Casting noncoding variation in the context of a de novo risk score across multiple annotation categories, however, did demonstrate association with mutations localized to promoter regions. We found that the strongest driver of this promoter signal emanates from evolutionarily conserved transcription factor binding sites distal to the transcription start site. These data suggest that de novo mutations in promoter regions, characterized by evolutionary and functional signatures, contribute to ASD.


Asunto(s)
Trastorno del Espectro Autista/genética , Mutación , Regiones Promotoras Genéticas/genética , Sitios de Unión/genética , Secuencia Conservada , Análisis Mutacional de ADN , Sitios Genéticos , Variación Genética , Humanos , Linaje , Riesgo , Factores de Transcripción/metabolismo
12.
NPJ Genom Med ; 3: 22, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30109124

RESUMEN

Early infantile epileptic encephalopathy (EIEE) is a devastating epilepsy syndrome with onset in the first months of life. Although mutations in more than 50 different genes are known to cause EIEE, current diagnostic yields with gene panel tests or whole-exome sequencing are below 60%. We applied whole-genome analysis (WGA) consisting of whole-genome sequencing and comprehensive variant discovery approaches to a cohort of 14 EIEE subjects for whom prior genetic tests had not yielded a diagnosis. We identified both de novo point and INDEL mutations and de novo structural rearrangements in known EIEE genes, as well as mutations in genes not previously associated with EIEE. The detection of a pathogenic or likely pathogenic mutation in all 14 subjects demonstrates the utility of WGA to reduce the time and costs of clinical diagnosis of EIEE. While exome sequencing may have detected 12 of the 14 causal mutations, 3 of the 12 patients received non-diagnostic exome panel tests prior to genome sequencing. Thus, given the continued decline of sequencing costs, our results support the use of WGA with comprehensive variant discovery as an efficient strategy for the clinical diagnosis of EIEE and other genetic conditions.

13.
PLoS One ; 13(7): e0197333, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30044795

RESUMEN

Circulating tumor-derived cell-free DNA (ctDNA) enables non-invasive diagnosis, monitoring, and treatment susceptibility testing in human cancers. However, accurate detection of variant alleles, particularly during untargeted searches, remains a principal obstacle to widespread application of cell-free DNA in clinical oncology. In this study, isolation of short cell-free DNA fragments is shown to enrich for tumor variants and improve correction of PCR- and sequencing-associated errors. Subfractions of the mononucleosome of circulating cell-free DNA (ccfDNA) were isolated from patients with melanoma, pancreatic ductal adenocarcinoma, and colorectal adenocarcinoma using a high-throughput-capable automated gel-extraction platform. Using a 128-gene (128 kb) custom next-generation sequencing panel, variant alleles were on average 2-fold enriched in the short fraction (median insert size: ~142 bp) compared to the original ccfDNA sample, while 0.7-fold reduced in the fraction corresponding to the principal peak of the mononucleosome (median insert size: ~167 bp). Size-selected short fractions compared to the original ccfDNA yielded significantly larger family sizes (i.e., PCR duplicates) during in silico consensus sequence interpretation via unique molecular identifiers. Increments in family size were associated with a progressive reduction of PCR and sequencing errors. Although consensus read depth also decreased at larger family sizes, the variant allele frequency in the short ccfDNA fraction remained consistent, while variant detection in the original ccfDNA was commonly lost at family sizes necessary to minimize errors. These collective findings support the automated extraction of short ccfDNA fragments to enrich for ctDNA while concomitantly reducing false positives through in silico error correction.


Asunto(s)
Ácidos Nucleicos Libres de Células/sangre , ADN Tumoral Circulante/sangre , Secuenciación de Nucleótidos de Alto Rendimiento , Neoplasias/sangre , Alelos , Ácidos Nucleicos Libres de Células/genética , ADN Tumoral Circulante/genética , Secuencia de Consenso , Fragmentación del ADN , Humanos , Neoplasias/genética , Neoplasias/patología
14.
Nat Genet ; 50(5): 727-736, 2018 04 26.
Artículo en Inglés | MEDLINE | ID: mdl-29700473

RESUMEN

Genomic association studies of common or rare protein-coding variation have established robust statistical approaches to account for multiple testing. Here we present a comparable framework to evaluate rare and de novo noncoding single-nucleotide variants, insertion/deletions, and all classes of structural variation from whole-genome sequencing (WGS). Integrating genomic annotations at the level of nucleotides, genes, and regulatory regions, we define 51,801 annotation categories. Analyses of 519 autism spectrum disorder families did not identify association with any categories after correction for 4,123 effective tests. Without appropriate correction, biologically plausible associations are observed in both cases and controls. Despite excluding previously identified gene-disrupting mutations, coding regions still exhibited the strongest associations. Thus, in autism, the contribution of de novo noncoding variation is probably modest in comparison to that of de novo coding variants. Robust results from future WGS studies will require large cohorts and comprehensive analytical strategies that consider the substantial multiple-testing burden.


Asunto(s)
Trastorno del Espectro Autista/genética , Predisposición Genética a la Enfermedad/genética , Mutación INDEL/genética , Polimorfismo de Nucleótido Simple/genética , Isoformas de Proteínas/genética , Femenino , Genoma/genética , Estudio de Asociación del Genoma Completo/métodos , Humanos , Masculino
16.
Nat Methods ; 15(2): 123-126, 2018 02.
Artículo en Inglés | MEDLINE | ID: mdl-29309061

RESUMEN

GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE (https://github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE, Roadmap Epigenomics, and GTEx by facilitating data integration and hypothesis generation.


Asunto(s)
Neoplasias de la Mama/genética , Genoma Humano , Genómica/métodos , Motor de Búsqueda/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Bases de Datos Genéticas , Femenino , Humanos , Internet
17.
J Clin Transl Sci ; 1(6): 381-386, 2017 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-29707261

RESUMEN

INTRODUCTION: Computational analysis of genome or exome sequences may improve inherited disease diagnosis, but is costly and time-consuming. METHODS: We describe the use of iobio, a web-based tool suite for intuitive, real-time genome diagnostic analyses. RESULTS: We used iobio to identify the disease-causing variant in a patient with early infantile epileptic encephalopathy with prior nondiagnostic genetic testing. CONCLUSIONS: Iobio tools can be used by clinicians to rapidly identify disease-causing variants from genomic patient sequencing data.

18.
Nat Methods ; 12(10): 966-8, 2015 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-26258291

RESUMEN

SpeedSeq is an open-source genome analysis platform that accomplishes alignment, variant detection and functional annotation of a 50× human genome in 13 h on a low-cost server and alleviates a bioinformatics bottleneck that typically demands weeks of computation with extensive hands-on expert involvement. SpeedSeq offers performance competitive with or superior to current methods for detecting germline and somatic single-nucleotide variants, structural variants, insertions and deletions, and it includes novel functionality for streamlined interpretation.


Asunto(s)
Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Anotación de Secuencia Molecular/métodos , Programas Informáticos , Variación Genética , Humanos , Neoplasias/genética , Polimorfismo de Nucleótido Simple , Medicina de Precisión/métodos , Flujo de Trabajo
19.
Genome Biol Evol ; 7(9): 2608-22, 2015 Aug 29.
Artículo en Inglés | MEDLINE | ID: mdl-26319576

RESUMEN

The goal of the 1000 Genomes Consortium is to characterize human genome structural variation (SV), including forms of copy number variations such as deletions, duplications, and insertions. Mobile element insertions, particularly Alu elements, are major contributors to genomic SV among humans. During the pilot phase of the project we experimentally validated 645 (611 intergenic and 34 exon targeted) polymorphic "young" Alu insertion events, absent from the human reference genome. Here, we report high resolution sequencing of 343 (322 unique) recent Alu insertion events, along with their respective target site duplications, precise genomic breakpoint coordinates, subfamily assignment, percent divergence, and estimated A-rich tail lengths. All the sequenced Alu loci were derived from the AluY lineage with no evidence of retrotransposition activity involving older Alu families (e.g., AluJ and AluS). AluYa5 is currently the most active Alu subfamily in the human lineage, followed by AluYb8, and many others including three newly identified subfamilies we have termed AluYb7a3, AluYb8b1, and AluYa4a1. This report provides the structural details of 322 unique Alu variants from individual human genomes collectively adding about 100 kb of genomic variation. Many Alu subfamilies are currently active in human populations, including a surprising level of AluY retrotransposition. Human Alu subfamilies exhibit continuous evolution with potential drivers sprouting new Alu lineages.


Asunto(s)
Elementos Alu , Evolución Molecular , Variación Genética , Genoma Humano , Proyecto Genoma Humano , Humanos , Análisis de Secuencia de ADN , Eliminación de Secuencia
20.
Cancer Inform ; 14(Suppl 1): 37-44, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25931804

RESUMEN

Mobile elements constitute greater than 45% of the human genome as a result of repeated insertion events during human genome evolution. Although most of mobile elements are fixed within the human population, some elements (including ALU, long interspersed elements (LINE) 1 (L1), and SVA) are still actively duplicating and may result in life-threatening human diseases such as cancer, motivating the need for accurate mobile-element insertion (MEI) detection tools. We developed a software package, TANGRAM, for MEI detection in next-generation sequencing data, currently serving as the primary MEI detection tool in the 1000 Genomes Project. TANGRAM takes advantage of valuable mapping information provided by our own MOSAIK mapper, and until recently required MOSAIK mappings as its input. In this study, we report a new feature that enables TANGRAM to be used on alignments generated by any mainstream short-read mapper, making it accessible for many genomic users. To demonstrate its utility for cancer genome analysis, we have applied TANGRAM to the TCGA (The Cancer Genome Atlas) mutation calling benchmark 4 dataset. TANGRAM is fast, accurate, easy to use, and open source on https://github.com/jiantao/Tangram.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...