Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Cancer Med ; 12(17): 17679-17691, 2023 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-37602814

RESUMEN

BACKGROUNDS: Despite recent advances, many cancers are still detected too late for curative treatment. There is, therefore, a need for the development of new diagnostic methods and biomarkers. One approach may arise from the detection of extrachromosomal circular DNA (eccDNA), which is part of cell-free DNA in human plasma. AIMS: First, we assessed and compared two methods for the purification of eccDNA from plasma. Second, we tested for an easy diagnostic application of eccDNA liquid biopsy-based assays. MATERIALS & METHODS: For the comparison we tested a solid-phase silica purification method and a phenol/chloroform method with salt precipitation. For the diagnostic application of eccDNA we developed and tested a qPCR primer-based SNP detection system, for the detection of two well-established cancer-causing KRAS mutations (G12V and G12R) on circular DNA. This investigation was supported by purifying, sequencing, and analysing clinical plasma samples for eccDNAs containing KRAS mutant alleles in 0.5 mL plasma from 16 pancreatic ductal adenocarcinoma patients and 19 healthy controls. RESULTS: In our method comparison we observed, that following exonuclease treatment a lower eccDNA yield was found for the phenol/chloroform method (15.7%-26.7%) compared with the solid-phase purification approach (47.8%-65.9%). For the diagnostic application of eccDNA tests, the sensitivity of the tested qPCR assay only reached ~10-3 in a background of 105 wild type (wt) KRAS circular entities, which was not improved by general amplification or primer-based inhibition of wt KRAS amplification. Furthermore, we did not detect eccDNA containing KRAS in any of the clinical samples. DISCUSSION: A potential explanation for our inability to detect any KRAS mutations in the clinical samples may be related to the general low abundance of eccDNA in plasma. CONCLUSION: Taken together our results provide a benchmark for eccDNA purification methods while raising the question of what is required for the optimal fast and sensitive detection of SNP mutations on eccDNA with greater sensitivity than primer-based qPCR detection.

2.
Comput Struct Biotechnol J ; 20: 3059-3067, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-35782732

RESUMEN

Extrachromosomal circular DNA (eccDNA) of chromosomal origin is common in eukaryotic cells. Amplification of oncogenes on large eccDNA (ecDNA) can drive biological processes such as tumorigenesis, and identification of eccDNA by sequencing after removal of chromosomal DNA is therefore important for understanding their impact on the expressed phenotype. However, the circular mitochondrial DNA (mtDNA) might challenge the detection of eccDNA because the average somatic cell has hundreds of copies of mtDNA. Here we show that 61.2-99.5% of reads from eccDNA-enriched samples correspond to mtDNA in mouse tissues. We have developed a method to selectively remove mtDNA from total circular DNA by CRISPR/Cas9 guided cleavage of mtDNA with one single-guide RNA (sgRNA) or two sgRNAs followed by exonuclease degradation of the linearized mtDNA. Sequencing revealed that mtDNA reads were 85.9% ± 12.6% removed from eccDNA of 9 investigated mouse tissues. CRISPR/Cas9 cleavage also efficiently removed mtDNA from a human HeLa cell line and colorectal cancer samples. We identified up to 14 times more, and also larger eccDNA in CRISPR/Cas9 treated colorectal cancer samples than in untreated samples. We foresee that the method can be applied to effectively remove mtDNA from any eukaryotic species to obtain higher eccDNA yields.

3.
Genome Biol ; 23(1): 12, 2022 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-34996510

RESUMEN

BACKGROUND: Accurate detection of somatic mutations is challenging but critical in understanding cancer formation, progression, and treatment. We recently proposed NeuSomatic, the first deep convolutional neural network-based somatic mutation detection approach, and demonstrated performance advantages on in silico data. RESULTS: In this study, we use the first comprehensive and well-characterized somatic reference data sets from the SEQC2 consortium to investigate best practices for using a deep learning framework in cancer mutation detection. Using the high-confidence somatic mutations established for a cancer cell line by the consortium, we identify the best strategy for building robust models on multiple data sets derived from samples representing real scenarios, for example, a model trained on a combination of real and spike-in mutations had the highest average performance. CONCLUSIONS: The strategy identified in our study achieved high robustness across multiple sequencing technologies for fresh and FFPE DNA input, varying tumor/normal purities, and different coverages, with significant superiority over conventional detection approaches in general, as well as in challenging situations such as low coverage, low variant allele frequency, DNA damage, and difficult genomic regions.


Asunto(s)
Aprendizaje Profundo , Neoplasias , Genómica , Humanos , Mutación , Neoplasias/genética , Redes Neurales de la Computación
4.
Genome Biol ; 23(1): 2, 2022 01 03.
Artículo en Inglés | MEDLINE | ID: mdl-34980216

RESUMEN

BACKGROUND: Reproducible detection of inherited variants with whole genome sequencing (WGS) is vital for the implementation of precision medicine and is a complicated process in which each step affects variant call quality. Systematically assessing reproducibility of inherited variants with WGS and impact of each step in the process is needed for understanding and improving quality of inherited variants from WGS. RESULTS: To dissect the impact of factors involved in detection of inherited variants with WGS, we sequence triplicates of eight DNA samples representing two populations on three short-read sequencing platforms using three library kits in six labs and call variants with 56 combinations of aligners and callers. We find that bioinformatics pipelines (callers and aligners) have a larger impact on variant reproducibility than WGS platform or library preparation. Single-nucleotide variants (SNVs), particularly outside difficult-to-map regions, are more reproducible than small insertions and deletions (indels), which are least reproducible when > 5 bp. Increasing sequencing coverage improves indel reproducibility but has limited impact on SNVs above 30×. CONCLUSIONS: Our findings highlight sources of variability in variant detection and the need for improvement of bioinformatics pipelines in the era of precision medicine with WGS.


Asunto(s)
Genoma Humano , Polimorfismo de Nucleótido Simple , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Mutación INDEL , Reproducibilidad de los Resultados , Secuenciación Completa del Genoma
5.
Genome Biol ; 22(1): 347, 2021 12 20.
Artículo en Inglés | MEDLINE | ID: mdl-34930391

RESUMEN

BACKGROUND: Genomic structural variations (SV) are important determinants of genotypic and phenotypic changes in many organisms. However, the detection of SV from next-generation sequencing data remains challenging. RESULTS: In this study, DNA from a Chinese family quartet is sequenced at three different sequencing centers in triplicate. A total of 288 derivative data sets are generated utilizing different analysis pipelines and compared to identify sources of analytical variability. Mapping methods provide the major contribution to variability, followed by sequencing centers and replicates. Interestingly, SV supported by only one center or replicate often represent true positives with 47.02% and 45.44% overlapping the long-read SV call set, respectively. This is consistent with an overall higher false negative rate for SV calling in centers and replicates compared to mappers (15.72%). Finally, we observe that the SV calling variability also persists in a genotyping approach, indicating the impact of the underlying sequencing and preparation approaches. CONCLUSIONS: This study provides the first detailed insights into the sources of variability in SV identification from next-generation sequencing and highlights remaining challenges in SV calling for large cohorts. We further give recommendations on how to reduce SV calling variability and the choice of alignment methodology.


Asunto(s)
Variación Estructural del Genoma , Genómica/métodos , Células Germinativas , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuencia de Bases , Sesgo , Mapeo Cromosómico , Análisis de Secuencia de ADN
6.
Nat Biotechnol ; 39(9): 1151-1160, 2021 09.
Artículo en Inglés | MEDLINE | ID: mdl-34504347

RESUMEN

The lack of samples for generating standardized DNA datasets for setting up a sequencing pipeline or benchmarking the performance of different algorithms limits the implementation and uptake of cancer genomics. Here, we describe reference call sets obtained from paired tumor-normal genomic DNA (gDNA) samples derived from a breast cancer cell line-which is highly heterogeneous, with an aneuploid genome, and enriched in somatic alterations-and a matched lymphoblastoid cell line. We partially validated both somatic mutations and germline variants in these call sets via whole-exome sequencing (WES) with different sequencing platforms and targeted sequencing with >2,000-fold coverage, spanning 82% of genomic regions with high confidence. Although the gDNA reference samples are not representative of primary cancer cells from a clinical sample, when setting up a sequencing pipeline, they not only minimize potential biases from technologies, assays and informatics but also provide a unique resource for benchmarking 'tumor-only' or 'matched tumor-normal' analyses.


Asunto(s)
Benchmarking , Neoplasias de la Mama/genética , Análisis Mutacional de ADN/normas , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Secuenciación Completa del Genoma/normas , Línea Celular Tumoral , Conjuntos de Datos como Asunto , Células Germinativas , Humanos , Mutación , Estándares de Referencia , Reproducibilidad de los Resultados
7.
Sci Rep ; 10(1): 4983, 2020 03 18.
Artículo en Inglés | MEDLINE | ID: mdl-32188929

RESUMEN

Tumor Mutational Burden (TMB) is a measure of the abundance of somatic mutations in a tumor, which has been shown to be an emerging biomarker for both anti-PD-(L)1 treatment and prognosis; however, multiple challenges still hinder the adoption of TMB as a biomarker. The key challenges are the inconsistency of tumor mutational burden measurement among assays and the lack of a meaningful threshold for TMB classification. Here we describe a new method, ecTMB (Estimation and Classification of TMB), which uses an explicit background mutation model to predict TMB robustly and to classify samples into biologically meaningful subtypes defined by tumor mutational burden.


Asunto(s)
Biomarcadores de Tumor/genética , ADN de Neoplasias/genética , Genoma Humano , Mutación , Neoplasias/clasificación , Neoplasias/genética , Carga Tumoral , Análisis Mutacional de ADN , ADN de Neoplasias/análisis , Exoma , Humanos , Inmunoterapia/métodos , Modelos Estadísticos , Neoplasias/tratamiento farmacológico , Neoplasias/patología , Pronóstico , Resultado del Tratamiento
8.
Genome Res ; 29(5): 870-880, 2019 05.
Artículo en Inglés | MEDLINE | ID: mdl-30992303

RESUMEN

Investigation of large structural variants (SVs) is a challenging yet important task in understanding trait differences in highly repetitive genomes. Combining different bioinformatic approaches for SV detection, we analyzed whole-genome sequencing data from 3000 rice genomes and identified 63 million individual SV calls that grouped into 1.5 million allelic variants. We found enrichment of long SVs in promoters and an excess of shorter variants in 5' UTRs. Across the rice genomes, we identified regions of high SV frequency enriched in stress response genes. We demonstrated how SVs may help in finding causative variants in genome-wide association analysis. These new insights into rice genome biology are valuable for understanding the effects SVs have on gene function, with the prospect of identifying novel agronomically important alleles that can be utilized to improve cultivated rice.


Asunto(s)
Variación Genética , Genoma de Planta , Variación Estructural del Genoma , Genómica/métodos , Oryza/genética , Alelos , Mapeo Cromosómico , Elementos Transponibles de ADN , Estudio de Asociación del Genoma Completo/métodos , Fenotipo , Análisis de Secuencia de ADN/métodos , Estrés Fisiológico/genética
9.
Nat Commun ; 10(1): 1041, 2019 03 04.
Artículo en Inglés | MEDLINE | ID: mdl-30833567

RESUMEN

Accurate detection of somatic mutations is still a challenge in cancer analysis. Here we present NeuSomatic, the first convolutional neural network approach for somatic mutation detection, which significantly outperforms previous methods on different sequencing platforms, sequencing strategies, and tumor purities. NeuSomatic summarizes sequence alignments into small matrices and incorporates more than a hundred features to capture mutation signals effectively. It can be used universally as a stand-alone somatic mutation detection method or with an ensemble of existing methods to achieve the highest accuracy.


Asunto(s)
Biología Computacional/métodos , Análisis Mutacional de ADN/métodos , Aprendizaje Automático , Mutación , Redes Neurales de la Computación , Biología Computacional/instrumentación , Análisis Mutacional de ADN/instrumentación , Bases de Datos Genéticas , Diploidia , Exoma , Genes Relacionados con las Neoplasias , Humanos , Neoplasias/genética , Alineación de Secuencia , Análisis de Secuencia de ADN/instrumentación , Análisis de Secuencia de ADN/métodos
10.
Nat Commun ; 9(1): 1069, 2018 03 14.
Artículo en Inglés | MEDLINE | ID: mdl-29540679

RESUMEN

The human genome is generally organized into stable chromosomes, and only tumor cells are known to accumulate kilobase (kb)-sized extrachromosomal circular DNA elements (eccDNAs). However, it must be expected that kb eccDNAs exist in normal cells as a result of mutations. Here, we purify and sequence eccDNAs from muscle and blood samples from 16 healthy men, detecting ~100,000 unique eccDNA types from 16 million nuclei. Half of these structures carry genes or gene fragments and the majority are smaller than 25 kb. Transcription from eccDNAs suggests that eccDNAs reside in nuclei and recurrence of certain eccDNAs in several individuals implies DNA circularization hotspots. Gene-rich chromosomes contribute to more eccDNAs per megabase and the most transcribed protein-coding gene in muscle, TTN (titin), provides the most eccDNAs per gene. Thus, somatic genomes are rich in chromosome-derived eccDNAs that may influence phenotypes through altered gene copy numbers and transcription of full-length or truncated genes.


Asunto(s)
Cromosomas Humanos/genética , ADN Circular/genética , Humanos , Mutación/genética , Transcripción Genética/genética
11.
Nat Commun ; 8(1): 59, 2017 07 05.
Artículo en Inglés | MEDLINE | ID: mdl-28680106

RESUMEN

RNA-sequencing (RNA-seq) is an essential technique for transcriptome studies, hundreds of analysis tools have been developed since it was debuted. Although recent efforts have attempted to assess the latest available tools, they have not evaluated the analysis workflows comprehensively to unleash the power within RNA-seq. Here we conduct an extensive study analysing a broad spectrum of RNA-seq workflows. Surpassing the expression analysis scope, our work also includes assessment of RNA variant-calling, RNA editing and RNA fusion detection techniques. Specifically, we examine both short- and long-read RNA-seq technologies, 39 analysis tools resulting in ~120 combinations, and ~490 analyses involving 15 samples with a variety of germline, cancer and stem cell data sets. We report the performance and propose a comprehensive RNA-seq analysis protocol, named RNACocktail, along with a computational pipeline achieving high accuracy. Validation on different samples reveals that our proposed protocol could help researchers extract more biologically relevant predictions by broad analysis of the transcriptome.RNA-seq is widely used for transcriptome analysis. Here, the authors analyse a wide spectrum of RNA-seq workflows and present a comprehensive analysis protocol named RNACocktail as well as a computational pipeline leveraging the widely used tools for accurate RNA-seq analysis.


Asunto(s)
Células Madre Embrionarias , Transcriptoma , Secuencia de Bases , Línea Celular , Humanos
12.
Bioinformatics ; 32(24): 3829-3832, 2016 12 15.
Artículo en Inglés | MEDLINE | ID: mdl-27667791

RESUMEN

LongISLND is a software package designed to simulate sequencing data according to the characteristics of third generation, single-molecule sequencing technologies. The general software architecture is easily extendable, as demonstrated by the emulation of Pacific Biosciences (PacBio) multi-pass sequencing with P5 and P6 chemistries, producing data in FASTQ, H5, and the latest PacBio BAM format. We demonstrate its utility by downstream processing with consensus building and variant calling. AVAILABILITY AND IMPLEMENTATION: LongISLND is implemented in Java and available at http://bioinform.github.io/longislnd CONTACT: hugo.lam@roche.comSupplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos , Simulación por Computador , Alineación de Secuencia
13.
BMC Genomics ; 17: 64, 2016 Jan 16.
Artículo en Inglés | MEDLINE | ID: mdl-26772178

RESUMEN

BACKGROUND: The human genome contains variants ranging in size from small single nucleotide polymorphisms (SNPs) to large structural variants (SVs). High-quality benchmark small variant calls for the pilot National Institute of Standards and Technology (NIST) Reference Material (NA12878) have been developed by the Genome in a Bottle Consortium, but no similar high-quality benchmark SV calls exist for this genome. Since SV callers output highly discordant results, we developed methods to combine multiple forms of evidence from multiple sequencing technologies to classify candidate SVs into likely true or false positives. Our method (svclassify) calculates annotations from one or more aligned bam files from many high-throughput sequencing technologies, and then builds a one-class model using these annotations to classify candidate SVs as likely true or false positives. RESULTS: We first used pedigree analysis to develop a set of high-confidence breakpoint-resolved large deletions. We then used svclassify to cluster and classify these deletions as well as a set of high-confidence deletions from the 1000 Genomes Project and a set of breakpoint-resolved complex insertions from Spiral Genetics. We find that likely SVs cluster separately from likely non-SVs based on our annotations, and that the SVs cluster into different types of deletions. We then developed a supervised one-class classification method that uses a training set of random non-SV regions to determine whether candidate SVs have abnormal annotations different from most of the genome. To test this classification method, we use our pedigree-based breakpoint-resolved SVs, SVs validated by the 1000 Genomes Project, and assembly-based breakpoint-resolved insertions, along with semi-automated visualization using svviz. CONCLUSIONS: We find that candidate SVs with high scores from multiple technologies have high concordance with PCR validation and an orthogonal consensus method MetaSV (99.7 % concordant), and candidate SVs with low scores are questionable. We distribute a set of 2676 high-confidence deletions and 68 high-confidence insertions with high svclassify scores from these call sets for benchmarking SV callers. We expect these methods to be particularly useful for establishing high-confidence SV calls for benchmark samples that have been characterized by multiple technologies.


Asunto(s)
Genoma Humano , Variación Estructural del Genoma , Programas Informáticos , Benchmarking , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Anotación de Secuencia Molecular , Linaje , Polimorfismo de Nucleótido Simple/genética
15.
Genome Biol ; 16: 197, 2015 Sep 17.
Artículo en Inglés | MEDLINE | ID: mdl-26381235

RESUMEN

SomaticSeq is an accurate somatic mutation detection pipeline implementing a stochastic boosting algorithm to produce highly accurate somatic mutation calls for both single nucleotide variants and small insertions and deletions. The workflow currently incorporates five state-of-the-art somatic mutation callers, and extracts over 70 individual genomic and sequencing features for each candidate site. A training set is provided to an adaptively boosted decision tree learner to create a classifier for predicting mutation statuses. We validate our results with both synthetic and real data. We report that SomaticSeq is able to achieve better overall accuracy than any individual tool incorporated.


Asunto(s)
Análisis Mutacional de ADN/métodos , Aprendizaje Automático , Neoplasias/genética , Humanos , Mutación INDEL
16.
Sci Rep ; 5: 14493, 2015 Sep 28.
Artículo en Inglés | MEDLINE | ID: mdl-26412485

RESUMEN

A high-confidence, comprehensive human variant set is critical in assessing accuracy of sequencing algorithms, which are crucial in precision medicine based on high-throughput sequencing. Although recent works have attempted to provide such a resource, they still do not encompass all major types of variants including structural variants (SVs). Thus, we leveraged the massive high-quality Sanger sequences from the HuRef genome to construct by far the most comprehensive gold set of a single individual, which was cross validated with deep Illumina sequencing, population datasets, and well-established algorithms. It was a necessary effort to completely reanalyze the HuRef genome as its previously published variants were mostly reported five years ago, suffering from compatibility, organization, and accuracy issues that prevent their direct use in benchmarking. Our extensive analysis and validation resulted in a gold set with high specificity and sensitivity. In contrast to the current gold sets of the NA12878 or HS1011 genomes, our gold set is the first that includes small variants, deletion SVs and insertion SVs up to a hundred thousand base-pairs. We demonstrate the utility of our HuRef gold set to benchmark several published SV detection tools.


Asunto(s)
Benchmarking , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Variación Genética , Genoma Humano , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Humanos
17.
Nat Commun ; 6: 7256, 2015 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-26028266

RESUMEN

Investigating genomic structural variants at basepair resolution is crucial for understanding their formation mechanisms. We identify and analyse 8,943 deletion breakpoints in 1,092 samples from the 1000 Genomes Project. We find breakpoints have more nearby SNPs and indels than the genomic average, likely a consequence of relaxed selection. By investigating the correlation of breakpoints with DNA methylation, Hi-C interactions, and histone marks and the substitution patterns of nucleotides near them, we find that breakpoints with the signature of non-allelic homologous recombination (NAHR) are associated with open chromatin. We hypothesize that some NAHR deletions occur without DNA replication and cell division, in embryonic and germline cells. In contrast, breakpoints associated with non-homologous (NH) mechanisms often have sequence microinsertions, templated from later replicating genomic sites, spaced at two characteristic distances from the breakpoint. These microinsertions are consistent with template-switching events and suggest a particular spatiotemporal configuration for DNA during the events.


Asunto(s)
Puntos de Rotura del Cromosoma , ADN/metabolismo , Eliminación de Gen , Genoma Humano/genética , Cromatina , Replicación del ADN , Recombinación Homóloga , Humanos , Mutación , Nucleótidos , Eliminación de Secuencia
18.
Bioinformatics ; 31(16): 2741-4, 2015 Aug 15.
Artículo en Inglés | MEDLINE | ID: mdl-25861968

RESUMEN

UNLABELLED: Structural variations (SVs) are large genomic rearrangements that vary significantly in size, making them challenging to detect with the relatively short reads from next-generation sequencing (NGS). Different SV detection methods have been developed; however, each is limited to specific kinds of SVs with varying accuracy and resolution. Previous works have attempted to combine different methods, but they still suffer from poor accuracy particularly for insertions. We propose MetaSV, an integrated SV caller which leverages multiple orthogonal SV signals for high accuracy and resolution. MetaSV proceeds by merging SVs from multiple tools for all types of SVs. It also analyzes soft-clipped reads from alignment to detect insertions accurately since existing tools underestimate insertion SVs. Local assembly in combination with dynamic programming is used to improve breakpoint resolution. Paired-end and coverage information is used to predict SV genotypes. Using simulation and experimental data, we demonstrate the effectiveness of MetaSV across various SV types and sizes. AVAILABILITY AND IMPLEMENTATION: Code in Python is at http://bioinform.github.io/metasv/. CONTACT: rd@bina.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Variación Genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos , Mutagénesis Insercional , Eliminación de Secuencia
19.
Bioinformatics ; 31(9): 1469-71, 2015 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-25524895

RESUMEN

SUMMARY: VarSim is a framework for assessing alignment and variant calling accuracy in high-throughput genome sequencing through simulation or real data. In contrast to simulating a random mutation spectrum, it synthesizes diploid genomes with germline and somatic mutations based on a realistic model. This model leverages information such as previously reported mutations to make the synthetic genomes biologically relevant. VarSim simulates and validates a wide range of variants, including single nucleotide variants, small indels and large structural variants. It is an automated, comprehensive compute framework supporting parallel computation and multiple read simulators. Furthermore, we developed a novel map data structure to validate read alignments, a strategy to compare variants binned in size ranges and a lightweight, interactive, graphical report to visualize validation results with detailed statistics. Thus far, it is the most comprehensive validation tool for secondary analysis in next generation sequencing. AVAILABILITY AND IMPLEMENTATION: Code in Java and Python along with instructions to download the reads and variants is at http://bioinform.github.io/varsim. CONTACT: rd@bina.com SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Variación Genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Programas Informáticos , Simulación por Computador , Genómica , Humanos , Mutación , Neoplasias/genética , Alineación de Secuencia
20.
Bioinformatics ; 28(18): 2366-73, 2012 Sep 15.
Artículo en Inglés | MEDLINE | ID: mdl-22811546

RESUMEN

MOTIVATION: Next-generation sequence analysis has become an important task both in laboratory and clinical settings. A key stage in the majority sequence analysis workflows, such as resequencing, is the alignment of genomic reads to a reference genome. The accurate alignment of reads with large indels is a computationally challenging task for researchers. RESULTS: We introduce SeqAlto as a new algorithm for read alignment. For reads longer than or equal to 100 bp, SeqAlto is up to 10 × faster than existing algorithms, while retaining high accuracy and the ability to align reads with large (up to 50 bp) indels. This improvement in efficiency is particularly important in the analysis of future sequencing data where the number of reads approaches many billions. Furthermore, SeqAlto uses less than 8 GB of memory to align against the human genome. SeqAlto is benchmarked against several existing tools with both real and simulated data. AVAILABILITY: Linux and Mac OS X binaries free for academic use are available at http://www.stanford.edu/group/wonglab/seqalto CONTACT: whwong@stanford.edu.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN , Programas Informáticos , Algoritmos , Genoma Humano , Genómica , Humanos , Mutación INDEL
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...