Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 53
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Genome Res ; 34(2): 179-188, 2024 Mar 20.
Artículo en Inglés | MEDLINE | ID: mdl-38355308

RESUMEN

A mechanistic understanding of the biological and technical factors that impact transcript measurements is essential to designing and analyzing single-cell and single-nucleus RNA sequencing experiments. Nuclei contain the same pre-mRNA population as cells, but they contain a small subset of the mRNAs. Nonetheless, early studies argued that single-nucleus analysis yielded results comparable to cellular samples if pre-mRNA measurements were included. However, typical workflows do not distinguish between pre-mRNA and mRNA when estimating gene expression, and variation in their relative abundances across cell types has received limited attention. These gaps are especially important given that incorporating pre-mRNA has become commonplace for both assays, despite known gene length bias in pre-mRNA capture. Here, we reanalyze public data sets from mouse and human to describe the mechanisms and contrasting effects of mRNA and pre-mRNA sampling on gene expression and marker gene selection in single-cell and single-nucleus RNA-seq. We show that pre-mRNA levels vary considerably among cell types, which mediates the degree of gene length bias and limits the generalizability of a recently published normalization method intended to correct for this bias. As an alternative, we repurpose an existing post hoc gene length-based correction method from conventional RNA-seq gene set enrichment analysis. Finally, we show that inclusion of pre-mRNA in bioinformatic processing can impart a larger effect than assay choice itself, which is pivotal to the effective reuse of existing data. These analyses advance our understanding of the sources of variation in single-cell and single-nucleus RNA-seq experiments and provide useful guidance for future studies.


Asunto(s)
Núcleo Celular , Precursores del ARN , Humanos , Animales , Ratones , RNA-Seq , ARN Mensajero/genética , Análisis de Secuencia de ARN/métodos , Núcleo Celular/genética , Perfilación de la Expresión Génica/métodos , Análisis de la Célula Individual
2.
Genome Res ; 34(1): 94-105, 2024 Feb 07.
Artículo en Inglés | MEDLINE | ID: mdl-38195207

RESUMEN

Genetic and gene expression heterogeneity is an essential hallmark of many tumors, allowing the cancer to evolve and to develop resistance to treatment. Currently, the most commonly used data types for studying such heterogeneity are bulk tumor/normal whole-genome or whole-exome sequencing (WGS, WES); and single-cell RNA sequencing (scRNA-seq), respectively. However, tools are currently lacking to link genomic tumor subclonality with transcriptomic heterogeneity by integrating genomic and single-cell transcriptomic data collected from the same tumor. To address this gap, we developed scBayes, a Bayesian probabilistic framework that uses tumor subclonal structure inferred from bulk DNA sequencing data to determine the subclonal identity of cells from single-cell gene expression (scRNA-seq) measurements. Grouping together cells representing the same genetically defined tumor subclones allows comparison of gene expression across different subclones, or investigation of gene expression changes within the same subclone across time (i.e., progression, treatment response, or relapse) or space (i.e., at multiple metastatic sites and organs). We used simulated data sets, in silico synthetic data sets, as well as biological data sets generated from cancer samples to extensively characterize and validate the performance of our method, as well as to show improvements over existing methods. We show the validity and utility of our approach by applying it to published data sets and recapitulating the findings, as well as arriving at novel insights into cancer subclonal expression behavior in our own data sets. We further show that our method is applicable to a wide range of single-cell sequencing technologies including single-cell DNA sequencing as well as Smart-seq and 10x Genomics scRNA-seq protocols.


Asunto(s)
Neoplasias , Humanos , Secuenciación del Exoma , Teorema de Bayes , Neoplasias/genética , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos
3.
Bioinformatics ; 39(8)2023 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-37498562

RESUMEN

MOTIVATION: In time-critical clinical settings, such as precision medicine, genomic data needs to be processed as fast as possible to arrive at data-informed treatment decisions in a timely fashion. While sequencing throughput has dramatically increased over the past decade, bioinformatics analysis throughput has not been able to keep up with the pace of computer hardware improvement, and consequently has now turned into the primary bottleneck. Modern computer hardware today is capable of much higher performance than current genomic informatics algorithms can typically utilize, therefore presenting opportunities for significant improvement of performance. Accessing the raw sequencing data from BAM files, e.g. is a necessary and time-consuming step in nearly all sequence analysis tools, however existing programming libraries for BAM access do not take full advantage of the parallel input/output capabilities of storage devices. RESULTS: In an effort to stimulate the development of a new generation of faster sequence analysis tools, we developed quickBAM, a software library to accelerate sequencing data access by exploiting the parallelism in commodity storage hardware currently widely available. We demonstrate that analysis software ported to quickBAM consistently outperforms their current versions, in some cases finishing an analysis in under 3 min while the original version took 1.5 h, using the same storage solution. AVAILABILITY AND IMPLEMENTATION: Open source and freely available at https://gitlab.com/yiq/quickbam/, we envision that quickBAM will enable a new generation of high-performance informatics tools, either directly boosting their performance if they are currently data-access bottlenecked, or allow data-access to keep up with further optimizations in algorithms and compute techniques.


Asunto(s)
Algoritmos , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Genómica , Informática , Análisis de Secuencia de ADN/métodos
4.
Nat Methods ; 15(2): 123-126, 2018 02.
Artículo en Inglés | MEDLINE | ID: mdl-29309061

RESUMEN

GIGGLE is a genomics search engine that identifies and ranks the significance of genomic loci shared between query features and thousands of genome interval files. GIGGLE (https://github.com/ryanlayer/giggle) scales to billions of intervals and is over three orders of magnitude faster than existing methods. Its speed extends the accessibility and utility of resources such as ENCODE, Roadmap Epigenomics, and GTEx by facilitating data integration and hypothesis generation.


Asunto(s)
Neoplasias de la Mama/genética , Genoma Humano , Genómica/métodos , Motor de Búsqueda/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Bases de Datos Genéticas , Femenino , Humanos , Internet
6.
Nat Methods ; 12(10): 966-8, 2015 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-26258291

RESUMEN

SpeedSeq is an open-source genome analysis platform that accomplishes alignment, variant detection and functional annotation of a 50× human genome in 13 h on a low-cost server and alleviates a bioinformatics bottleneck that typically demands weeks of computation with extensive hands-on expert involvement. SpeedSeq offers performance competitive with or superior to current methods for detecting germline and somatic single-nucleotide variants, structural variants, insertions and deletions, and it includes novel functionality for streamlined interpretation.


Asunto(s)
Genoma Humano , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Anotación de Secuencia Molecular/métodos , Programas Informáticos , Variación Genética , Humanos , Neoplasias/genética , Polimorfismo de Nucleótido Simple , Medicina de Precisión/métodos , Flujo de Trabajo
7.
Nature ; 491(7422): 56-65, 2012 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-23128226

RESUMEN

By characterizing the geographic and functional spectrum of human genetic variation, the 1000 Genomes Project aims to build a resource to help to understand the genetic contribution to disease. Here we describe the genomes of 1,092 individuals from 14 populations, constructed using a combination of low-coverage whole-genome and exome sequencing. By developing methods to integrate information across several algorithms and diverse data sources, we provide a validated haplotype map of 38 million single nucleotide polymorphisms, 1.4 million short insertions and deletions, and more than 14,000 larger deletions. We show that individuals from different populations carry different profiles of rare and common variants, and that low-frequency variants show substantial geographic differentiation, which is further increased by the action of purifying selection. We show that evolutionary conservation and coding consequence are key determinants of the strength of purifying selection, that rare-variant load varies substantially across biological pathways, and that each individual contains hundreds of rare non-coding variants at conserved sites, such as motif-disrupting changes in transcription-factor-binding sites. This resource, which captures up to 98% of accessible single nucleotide polymorphisms at a frequency of 1% in related populations, enables analysis of common and low-frequency variants in individuals from diverse, including admixed, populations.


Asunto(s)
Variación Genética/genética , Genética de Población , Genoma Humano/genética , Genómica , Alelos , Sitios de Unión/genética , Secuencia Conservada/genética , Evolución Molecular , Genética Médica , Estudio de Asociación del Genoma Completo , Haplotipos/genética , Humanos , Motivos de Nucleótidos , Polimorfismo de Nucleótido Simple/genética , Grupos Raciales/genética , Eliminación de Secuencia/genética , Factores de Transcripción/metabolismo
8.
Nature ; 470(7332): 59-65, 2011 Feb 03.
Artículo en Inglés | MEDLINE | ID: mdl-21293372

RESUMEN

Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.


Asunto(s)
Variaciones en el Número de Copia de ADN/genética , Genética de Población , Genoma Humano/genética , Genómica , Duplicación de Gen/genética , Predisposición Genética a la Enfermedad/genética , Genotipo , Humanos , Mutagénesis Insercional/genética , Reproducibilidad de los Resultados , Análisis de Secuencia de ADN , Eliminación de Secuencia/genética
9.
BMC Genomics ; 15: 795, 2014 Sep 16.
Artículo en Inglés | MEDLINE | ID: mdl-25228379

RESUMEN

BACKGROUND: Mobile elements (MEs) constitute greater than 50% of the human genome as a result of repeated insertion events during human genome evolution. Although most of these elements are now fixed in the population, some MEs, including ALU, L1, SVA and HERV-K elements, are still actively duplicating. Mobile element insertions (MEIs) have been associated with human genetic disorders, including Crohn's disease, hemophilia, and various types of cancer, motivating the need for accurate MEI detection methods. To comprehensively identify and accurately characterize these variants in whole genome next-generation sequencing (NGS) data, a computationally efficient detection and genotyping method is required. Current computational tools are unable to call MEI polymorphisms with sufficiently high sensitivity and specificity, or call individual genotypes with sufficiently high accuracy. RESULTS: Here we report Tangram, a computationally efficient MEI detection program that integrates read-pair (RP) and split-read (SR) mapping signals to detect MEI events. By utilizing SR mapping in its primary detection module, a feature unique to this software, Tangram is able to pinpoint MEI breakpoints with single-nucleotide precision. To understand the role of MEI events in disease, it is essential to produce accurate individual genotypes in clinical samples. Tangram is able to determine sample genotypes with very high accuracy. Using simulations and experimental datasets, we demonstrate that Tangram has superior sensitivity, specificity, breakpoint resolution and genotyping accuracy, when compared to other, recently developed MEI detection methods. CONCLUSIONS: Tangram serves as the primary MEI detection tool in the 1000 Genomes Project, and is implemented as a highly portable, memory-efficient, easy-to-use C++ computer program, built under an open-source development model.


Asunto(s)
Algoritmos , Elementos Alu , Cromosomas Humanos Par 22/genética , Biología Computacional/métodos , Genoma Humano , Genotipo , Humanos , Modelos Genéticos , Sensibilidad y Especificidad
10.
BMC Genomics ; 15: 354, 2014 May 10.
Artículo en Inglés | MEDLINE | ID: mdl-24885922

RESUMEN

BACKGROUND: Next generation sequencing is helping to overcome limitations in organisms less accessible to classical or reverse genetic methods by facilitating whole genome mutational analysis studies. One traditionally intractable group, the Apicomplexa, contains several important pathogenic protozoan parasites, including the Plasmodium species that cause malaria.Here we apply whole genome analysis methods to the relatively accessible model apicomplexan, Toxoplasma gondii, to optimize forward genetic methods for chemical mutagenesis using N-ethyl-N-nitrosourea (ENU) and ethylmethane sulfonate (EMS) at varying dosages. RESULTS: By comparing three different lab-strains we show that spontaneously generated mutations reflect genome composition, without nucleotide bias. However, the single nucleotide variations (SNVs) are not distributed randomly over the genome; most of these mutations reside either in non-coding sequence or are silent with respect to protein coding. This is in contrast to the random genomic distribution of mutations induced by chemical mutagenesis. Additionally, we report a genome wide transition vs transversion ratio (ti/tv) of 0.91 for spontaneous mutations in Toxoplasma, with a slightly higher rate of 1.20 and 1.06 for variants induced by ENU and EMS respectively. We also show that in the Toxoplasma system, surprisingly, both ENU and EMS have a proclivity for inducing mutations at A/T base pairs (78.6% and 69.6%, respectively). CONCLUSIONS: The number of SNVs between related laboratory strains is relatively low and managed by purifying selection away from changes to amino acid sequence. From an experimental mutagenesis point of view, both ENU (24.7%) and EMS (29.1%) are more likely to generate variation within exons than would naturally accumulate over time in culture (19.1%), demonstrating the utility of these approaches for yielding proportionally greater changes to the amino acid sequence. These results will not only direct the methods of future chemical mutagenesis in Toxoplasma, but also aid in designing forward genetic approaches in less accessible pathogenic protozoa as well.


Asunto(s)
Genoma , Toxoplasma/genética , Adenosina/genética , Adenosina/metabolismo , Secuencia de Aminoácidos , Emparejamiento Base , Línea Celular , Metanosulfonato de Etilo/toxicidad , Etilnitrosourea/toxicidad , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Datos de Secuencia Molecular , Pentosiltransferasa/genética , Pentosiltransferasa/metabolismo , Fenotipo , Mutación Puntual , Proteínas Protozoarias/genética , Proteínas Protozoarias/metabolismo , Timidina/genética , Timidina/metabolismo , Toxoplasma/efectos de los fármacos
11.
Bioinformatics ; 29(5): 656-7, 2013 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-23314327

RESUMEN

MOTIVATION: A common question arises at the beginning of every experiment where RNA-Seq is used to detect differential gene expression between two conditions: How many reads should we sequence? RESULTS: Scotty is an interactive web-based application that assists biologists to design an experiment with an appropriate sample size and read depth to satisfy the user-defined experimental objectives. This design can be based on data available from either pilot samples or publicly available datasets. AVAILABILITY: Scotty can be freely accessed on the web at http://euler.bc.edu/marthlab/scotty/scotty.php


Asunto(s)
Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Expresión Génica , Humanos , Internet
12.
Proc Natl Acad Sci U S A ; 108(29): 11983-8, 2011 Jul 19.
Artículo en Inglés | MEDLINE | ID: mdl-21730125

RESUMEN

High-throughput sequencing technology enables population-level surveys of human genomic variation. Here, we examine the joint allele frequency distributions across continental human populations and present an approach for combining complementary aspects of whole-genome, low-coverage data and targeted high-coverage data. We apply this approach to data generated by the pilot phase of the Thousand Genomes Project, including whole-genome 2-4× coverage data for 179 samples from HapMap European, Asian, and African panels as well as high-coverage target sequencing of the exons of 800 genes from 697 individuals in seven populations. We use the site frequency spectra obtained from these data to infer demographic parameters for an Out-of-Africa model for populations of African, European, and Asian descent and to predict, by a jackknife-based approach, the amount of genetic diversity that will be discovered as sample sizes are increased. We predict that the number of discovered nonsynonymous coding variants will reach 100,000 in each population after ∼1,000 sequenced chromosomes per population, whereas ∼2,500 chromosomes will be needed for the same number of synonymous variants. Beyond this point, the number of segregating sites in the European and Asian panel populations is expected to overcome that of the African panel because of faster recent population growth. Overall, we find that the majority of human genomic variable sites are rare and exhibit little sharing among diverged populations. Our results emphasize that replication of disease association for specific rare genetic variants across diverged populations must overcome both reduced statistical power because of rarity and higher population divergence.


Asunto(s)
Demografía , Evolución Molecular , Genes/genética , Variación Genética , Genética de Población , Modelos Genéticos , Grupos Raciales/genética , Frecuencia de los Genes , Flujo Genético , Genómica/métodos , Humanos , Análisis de Secuencia de ADN
13.
PLoS Genet ; 7(8): e1002236, 2011 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-21876680

RESUMEN

As a consequence of the accumulation of insertion events over evolutionary time, mobile elements now comprise nearly half of the human genome. The Alu, L1, and SVA mobile element families are still duplicating, generating variation between individual genomes. Mobile element insertions (MEI) have been identified as causes for genetic diseases, including hemophilia, neurofibromatosis, and various cancers. Here we present a comprehensive map of 7,380 MEI polymorphisms from the 1000 Genomes Project whole-genome sequencing data of 185 samples in three major populations detected with two detection methods. This catalog enables us to systematically study mutation rates, population segregation, genomic distribution, and functional properties of MEI polymorphisms and to compare MEI to SNP variation from the same individuals. Population allele frequencies of MEI and SNPs are described, broadly, by the same neutral ancestral processes despite vastly different mutation mechanisms and rates, except in coding regions where MEI are virtually absent, presumably due to strong negative selection. A direct comparison of MEI and SNP diversity levels suggests a differential mobile element insertion rate among populations.


Asunto(s)
Elementos Transponibles de ADN , Genoma Humano , Polimorfismo de Nucleótido Simple , Frecuencia de los Genes , Genotipo , Heterocigoto , Humanos , Mutagénesis Insercional , Tasa de Mutación
14.
bioRxiv ; 2024 Mar 16.
Artículo en Inglés | MEDLINE | ID: mdl-38559060

RESUMEN

Bruton's tyrosine kinase (BTK) inhibitors are effective for the treatment of chronic lymphocytic leukemia (CLL) due to BTK's role in B cell survival and proliferation. Treatment resistance is most commonly caused by the emergence of the hallmark BTKC481S mutation that inhibits drug binding. In this study, we aimed to investigate whether the presence of additional CLL driver mutations in cancer subclones harboring a BTKC481S mutation accelerates subclone expansion. In addition, we sought to determine whether BTK-mutated subclones exhibit distinct transcriptomic behavior when compared to other cancer subclones. To achieve these goals, we employ our recently published method (Qiao et al. 2024) that combines bulk DNA sequencing and single-cell RNA sequencing (scRNA-seq) data to genotype individual cells for the presence or absence of subclone-defining mutations. While the most common approach for scRNA-seq includes short-read sequencing, transcript coverage is limited due to the vast majority of the reads being concentrated at the priming end of the transcript. Here, we utilized MAS-seq, a long-read scRNAseq technology, to substantially increase transcript coverage across the entire length of the transcripts and expand the set of informative mutations to link cells to cancer subclones in six CLL patients who acquired BTKC481S mutations during BTK inhibitor treatment. We found that BTK-mutated subclones often acquire additional mutations in CLL driver genes, leading to faster subclone proliferation. When examining subclone-specific gene expression, we found that in one patient, BTK-mutated subclones are transcriptionally distinct from the rest of the malignant B cell population with an overexpression of CLL-relevant genes.

15.
BMC Genomics ; 14: 468, 2013 Jul 10.
Artículo en Inglés | MEDLINE | ID: mdl-23837845

RESUMEN

BACKGROUND: Next generation sequencing and advances in genomic enrichment technologies have enabled the discovery of the full spectrum of variants from common to rare alleles in the human population. The application of such technologies can be limited by the amount of DNA available. Whole genome amplification (WGA) can overcome such limitations. Here we investigate applicability of using WGA by comparing SNP and INDEL variant calls from a single genomic/WGA sample pair from two capture separate experiments: a 50 Mbp whole exome capture and a custom capture array of 4 Mbp region on chr12. RESULTS: Our results comparing variant calls derived from genomic and WGA DNA show that the majority of variant SNP and INDEL calls are common to both callsets, both at the site and genotype level and suggest that allele bias plays a minimal role when using WGA DNA in re-sequencing studies. CONCLUSIONS: Although the results of this study are based on a limited sample size, they suggest that using WGA DNA allows the discovery of the vast majority of variants, and achieves high concordance metrics, when comparing to genomic DNA calls.


Asunto(s)
ADN/genética , Variación Genética/genética , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Técnicas de Amplificación de Ácido Nucleico , Análisis de Secuencia de ADN/métodos , Alelos , Gráficos por Computador , Genotipo , Humanos
16.
BMC Genomics ; 14: 467, 2013 Jul 10.
Artículo en Inglés | MEDLINE | ID: mdl-23837824

RESUMEN

BACKGROUND: Toxoplasma gondii has a largely clonal population in North America and Europe, with types I, II and III clonal lineages accounting for the majority of strains isolated from patients. RH, a particular type I strain, is most frequently used to characterize Toxoplasma biology. However, compared to other type I strains, RH has unique characteristics such as faster growth, increased extracellular survival rate and inability to form orally infectious cysts. Thus, to identify candidate genes that could account for these parasite phenotypic differences, we determined genetic differences and differential parasite gene expression between RH and another type I strain, GT1. Moreover, as differences in host cell modulation could affect Toxoplasma replication in the host, we determined differentially modulated host processes among the type I strains through host transcriptional profiling. RESULTS: Through whole genome sequencing, we identified 1,394 single nucleotide polymorphisms (SNPs) and insertions/deletions (indels) between RH and GT1. These SNPs/indels together with parasite gene expression differences between RH and GT1 were used to identify candidate genes that could account for type I phenotypic differences. A polymorphism in dense granule protein, GRA2, determined RH and GT1 differences in the evasion of the interferon gamma response. In addition, host transcriptional profiling identified that genes regulated by NF-ĸB, such as interleukin (IL)-12p40, were differentially modulated by the different type I strains. We subsequently showed that this difference in NF-ĸB activation was due to polymorphisms in GRA15. Furthermore, we observed that RH, but not other type I strains, recruited phosphorylated IĸBα (a component of the NF-ĸB complex) to the parasitophorous vacuole membrane and this recruitment of p- IĸBα was partially dependent on GRA2. CONCLUSIONS: We identified candidate parasite genes that could be responsible for phenotypic variation among the type I strains through comparative genomics and transcriptomics. We also identified differentially modulated host pathways among the type I strains, and these can serve as a guideline for future studies in examining the phenotypic differences among type I strains.


Asunto(s)
Fenotipo , Toxoplasma/genética , Toxoplasma/fisiología , Animales , Fibroblastos/parasitología , Regulación de la Expresión Génica , Genes Protozoarios/genética , Células HEK293 , Humanos , Subunidad p40 de la Interleucina-12/metabolismo , Membranas Intracelulares/metabolismo , Membranas Intracelulares/parasitología , Macrófagos/metabolismo , Macrófagos/parasitología , Ratones , FN-kappa B/metabolismo , Polimorfismo de Nucleótido Simple , Transporte de Proteínas , Proteínas Protozoarias/genética , Proteínas Protozoarias/metabolismo , Especificidad de la Especie , Toxoplasma/metabolismo , Vacuolas/metabolismo
17.
Bioinformatics ; 28(4): 593-4, 2012 Feb 15.
Artículo en Inglés | MEDLINE | ID: mdl-22199392

RESUMEN

UNLABELLED: ART is a set of simulation tools that generate synthetic next-generation sequencing reads. This functionality is essential for testing and benchmarking tools for next-generation sequencing data analysis including read alignment, de novo assembly and genetic variation discovery. ART generates simulated sequencing reads by emulating the sequencing process with built-in, technology-specific read error models and base quality value profiles parameterized empirically in large sequencing datasets. We currently support all three major commercial next-generation sequencing platforms: Roche's 454, Illumina's Solexa and Applied Biosystems' SOLiD. ART also allows the flexibility to use customized read error model parameters and quality profiles. AVAILABILITY: Both source and binary software packages are available at http://www.niehs.nih.gov/research/resources/software/art.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN , Cromosomas Humanos Par 17/genética , Variación Genética , Humanos , Programas Informáticos
18.
BMC Bioinformatics ; 13: 305, 2012 Nov 17.
Artículo en Inglés | MEDLINE | ID: mdl-23157288

RESUMEN

BACKGROUND: DNA capture technologies combined with high-throughput sequencing now enable cost-effective, deep-coverage, targeted sequencing of complete exomes. This is well suited for SNP discovery and genotyping. However there has been little attention devoted to Copy Number Variation (CNV) detection from exome capture datasets despite the potentially high impact of CNVs in exonic regions on protein function. RESULTS: As members of the 1000 Genomes Project analysis effort, we investigated 697 samples in which 931 genes were targeted and sampled with 454 or Illumina paired-end sequencing. We developed a rigorous Bayesian method to detect CNVs in the genes, based on read depth within target regions. Despite substantial variability in read coverage across samples and targeted exons, we were able to identify 107 heterozygous deletions in the dataset. The experimentally determined false discovery rate (FDR) of the cleanest dataset from the Wellcome Trust Sanger Institute is 12.5%. We were able to substantially improve the FDR in a subset of gene deletion candidates that were adjacent to another gene deletion call (17 calls). The estimated sensitivity of our call-set was 45%. CONCLUSIONS: This study demonstrates that exonic sequencing datasets, collected both in population based and medical sequencing projects, will be a useful substrate for detecting genic CNV events, particularly deletions. Based on the number of events we found and the sensitivity of the methods in the present dataset, we estimate on average 16 genic heterozygous deletions per individual genome. Our power analysis informs ongoing and future projects about sequencing depth and uniformity of read coverage required for efficient detection.


Asunto(s)
Variaciones en el Número de Copia de ADN , Genoma/genética , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Teorema de Bayes , Exoma , Exones , Genotipo , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN
20.
Bioinformatics ; 27(12): 1691-2, 2011 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-21493652

RESUMEN

MOTIVATION: Analysis of genomic sequencing data requires efficient, easy-to-use access to alignment results and flexible data management tools (e.g. filtering, merging, sorting, etc.). However, the enormous amount of data produced by current sequencing technologies is typically stored in compressed, binary formats that are not easily handled by the text-based parsers commonly used in bioinformatics research. RESULTS: We introduce a software suite for programmers and end users that facilitates research analysis and data management using BAM files. BamTools provides both the first C++ API publicly available for BAM file support as well as a command-line toolkit. AVAILABILITY: BamTools was written in C++, and is supported on Linux, Mac OSX and MS Windows. Source code and documentation are freely available at http://github.org/pezmaster31/bamtools.


Asunto(s)
Genómica/métodos , Análisis de Secuencia de ADN , Programas Informáticos , Alineación de Secuencia
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA