Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Más filtros











Intervalo de año de publicación
1.
Bioinformatics ; 33(2): 280-282, 2017 01 15.
Artículo en Inglés | MEDLINE | ID: mdl-27605106

RESUMEN

MOTIVATION: Large-scale rearrangements and copy number changes combined with different modes of clonal evolution create extensive somatic genome diversity, making it difficult to develop versatile and scalable variant calling tools and create well-calibrated benchmarks. RESULTS: We developed a new simulation framework tHapMix that enables the creation of tumour samples with different ploidy, purity and polyclonality features. It easily scales to simulation of hundreds of somatic genomes, while re-use of real read data preserves noise and biases present in sequencing platforms. We further demonstrate tHapMix utility by creating a simulated set of 140 somatic genomes and showing how it can be used in training and testing of somatic copy number variant calling tools. AVAILABILITY AND IMPLEMENTATION: tHapMix is distributed under an open source license and can be downloaded from https://github.com/Illumina/tHapMix CONTACT: sivakhno@illumina.comSupplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Variaciones en el Número de Copia de ADN , Genómica/métodos , Haplotipos , Neoplasias/genética , Ploidias , Programas Informáticos , Simulación por Computador , ADN de Neoplasias , Genoma , Humanos
2.
Bioinformatics ; 32(8): 1220-2, 2016 04 15.
Artículo en Inglés | MEDLINE | ID: mdl-26647377

RESUMEN

UNLABELLED: : We describe Manta, a method to discover structural variants and indels from next generation sequencing data. Manta is optimized for rapid germline and somatic analysis, calling structural variants, medium-sized indels and large insertions on standard compute hardware in less than a tenth of the time that comparable methods require to identify only subsets of these variant types: for example NA12878 at 50× genomic coverage is analyzed in less than 20 min. Manta can discover and score variants based on supporting paired and split-read evidence, with scoring models optimized for germline analysis of diploid individuals and somatic analysis of tumor-normal sample pairs. Call quality is similar to or better than comparable methods, as determined by pedigree consistency of germline calls and comparison of somatic calls to COSMIC database variants. Manta consistently assembles a higher fraction of its calls to base-pair resolution, allowing for improved downstream annotation and analysis of clinical significance. We provide Manta as a community resource to facilitate practical and routine structural variant analysis in clinical and research sequencing scenarios. AVAILABILITY AND IMPLEMENTATION: Manta is released under the open-source GPLv3 license. Source code, documentation and Linux binaries are available from https://github.com/Illumina/manta. CONTACT: csaunders@illumina.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Mutación INDEL , Neoplasias/genética , ADN de Neoplasias , Genoma , Genómica , Humanos , Programas Informáticos
3.
Bioinformatics ; 30(19): 2796-801, 2014 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-24950811

RESUMEN

MOTIVATION: FASTQ is a standard file format for DNA sequencing data, which stores both nucleotides and quality scores. A typical sequencing study can easily generate hundreds of gigabytes of FASTQ files, while public archives such as ENA and NCBI and large international collaborations such as the Cancer Genome Atlas can accumulate many terabytes of data in this format. Compression tools such as gzip are often used to reduce the storage burden but have the disadvantage that the data must be decompressed before they can be used. Here, we present BEETL-fastq, a tool that not only compresses FASTQ-formatted DNA reads more compactly than gzip but also permits rapid search for k-mer queries within the archived sequences. Importantly, the full FASTQ record of each matching read or read pair is returned, allowing the search results to be piped directly to any of the many standard tools that accept FASTQ data as input. RESULTS: We show that 6.6 terabytes of human reads in FASTQ format can be transformed into 1.7 terabytes of indexed files, from where we can search for 1, 10, 100, 1000 and a million of 30-mers in 3, 8, 14, 45 and 567 s, respectively, plus 20 ms per output read. Useful applications of the search capability are highlighted, including the genotyping of structural variant breakpoints and 'in silico pull-down' experiments in which only the reads that cover a region of interest are selectively extracted for the purposes of variant calling or visualization. AVAILABILITY AND IMPLEMENTATION: BEETL-fastq is part of the BEETL library, available as a github repository at github.com/BEETL/BEETL.


Asunto(s)
Compresión de Datos/métodos , Neoplasias/genética , Análisis de Secuencia de ADN/métodos , Algoritmos , Simulación por Computador , ADN , Genoma , Genoma Humano , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Programas Informáticos
4.
Cell ; 148(4): 780-91, 2012 Feb 17.
Artículo en Inglés | MEDLINE | ID: mdl-22341448

RESUMEN

The Tasmanian devil (Sarcophilus harrisii), the largest marsupial carnivore, is endangered due to a transmissible facial cancer spread by direct transfer of living cancer cells through biting. Here we describe the sequencing, assembly, and annotation of the Tasmanian devil genome and whole-genome sequences for two geographically distant subclones of the cancer. Genomic analysis suggests that the cancer first arose from a female Tasmanian devil and that the clone has subsequently genetically diverged during its spread across Tasmania. The devil cancer genome contains more than 17,000 somatic base substitution mutations and bears the imprint of a distinct mutational process. Genotyping of somatic mutations in 104 geographically and temporally distributed Tasmanian devil tumors reveals the pattern of evolution and spread of this parasitic clonal lineage, with evidence of a selective sweep in one geographical area and persistence of parallel lineages in other populations.


Asunto(s)
Neoplasias Faciales/veterinaria , Inestabilidad Genómica , Marsupiales/genética , Mutación , Animales , Evolución Clonal , Especies en Peligro de Extinción , Neoplasias Faciales/epidemiología , Neoplasias Faciales/genética , Neoplasias Faciales/patología , Femenino , Estudio de Asociación del Genoma Completo , Masculino , Datos de Secuencia Molecular , Tasmania/epidemiología
5.
Bioinformatics ; 26(24): 3051-8, 2010 Dec 15.
Artículo en Inglés | MEDLINE | ID: mdl-20966003

RESUMEN

MOTIVATION: Copy number abnormalities (CNAs) represent an important type of genetic mutation that can lead to abnormal cell growth and proliferation. New high-throughput sequencing technologies promise comprehensive characterization of CNAs. In contrast to microarrays, where probe design follows a carefully developed protocol, reads represent a random sample from a library and may be prone to representation biases due to GC content and other factors. The discrimination between true and false positive CNAs becomes an important issue. RESULTS: We present a novel approach, called CNAseg, to identify CNAs from second-generation sequencing data. It uses depth of coverage to estimate copy number states and flowcell-to-flowcell variability in cancer and normal samples to control the false positive rate. We tested the method using the COLO-829 melanoma cell line sequenced to 40-fold coverage. An extensive simulation scheme was developed to recreate different scenarios of copy number changes and depth of coverage by altering a real dataset with spiked-in CNAs. Comparison to alternative approaches using both real and simulated datasets showed that CNAseg achieves superior precision and improved sensitivity estimates. AVAILABILITY: The CNAseg package and test data are available at http://www.compbio.group.cam.ac.uk/software.html.


Asunto(s)
Algoritmos , Variaciones en el Número de Copia de ADN , Neoplasias/genética , Composición de Base , Línea Celular Tumoral , Genoma Humano , Humanos , Mutación , Análisis de Secuencia de ADN
6.
Nature ; 463(7278): 191-6, 2010 Jan 14.
Artículo en Inglés | MEDLINE | ID: mdl-20016485

RESUMEN

All cancers carry somatic mutations. A subset of these somatic alterations, termed driver mutations, confer selective growth advantage and are implicated in cancer development, whereas the remainder are passengers. Here we have sequenced the genomes of a malignant melanoma and a lymphoblastoid cell line from the same person, providing the first comprehensive catalogue of somatic mutations from an individual cancer. The catalogue provides remarkable insights into the forces that have shaped this cancer genome. The dominant mutational signature reflects DNA damage due to ultraviolet light exposure, a known risk factor for malignant melanoma, whereas the uneven distribution of mutations across the genome, with a lower prevalence in gene footprints, indicates that DNA repair has been preferentially deployed towards transcribed regions. The results illustrate the power of a cancer genome sequence to reveal traces of the DNA damage, repair, mutation and selection processes that were operative years before the cancer became symptomatic.


Asunto(s)
Genes Relacionados con las Neoplasias/genética , Genoma Humano/genética , Mutación/genética , Neoplasias/genética , Adulto , Línea Celular Tumoral , Daño del ADN/genética , Análisis Mutacional de ADN , Reparación del ADN/genética , Dosificación de Gen/genética , Humanos , Pérdida de Heterocigocidad/genética , Masculino , Melanoma/etiología , Melanoma/genética , MicroARNs/genética , Mutagénesis Insercional/genética , Neoplasias/etiología , Polimorfismo de Nucleótido Simple/genética , Medicina de Precisión , Eliminación de Secuencia/genética , Rayos Ultravioleta
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA