Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 58
Filtrar
1.
Mol Cell ; 77(6): 1307-1321.e10, 2020 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-31954095

RESUMO

A comprehensive catalog of cancer driver mutations is essential for understanding tumorigenesis and developing therapies. Exome-sequencing studies have mapped many protein-coding drivers, yet few non-coding drivers are known because genome-wide discovery is challenging. We developed a driver discovery method, ActiveDriverWGS, and analyzed 120,788 cis-regulatory modules (CRMs) across 1,844 whole tumor genomes from the ICGC-TCGA PCAWG project. We found 30 CRMs with enriched SNVs and indels (FDR < 0.05). These frequently mutated regulatory elements (FMREs) were ubiquitously active in human tissues, showed long-range chromatin interactions and mRNA abundance associations with target genes, and were enriched in motif-rewiring mutations and structural variants. Genomic deletion of one FMRE in human cells caused proliferative deficiencies and transcriptional deregulation of cancer genes CCNB1IP1, CDH1, and CDKN2B, validating observations in FMRE-mutated tumors. Pathway analysis revealed further sub-significant FMREs at cancer genes and processes, indicating an unexplored landscape of infrequent driver mutations in the non-coding genome.


Assuntos
Biomarcadores Tumorais/genética , Cromatina/metabolismo , Redes Reguladoras de Genes , Mutação , Neoplasias/genética , Neoplasias/patologia , Sequências Reguladoras de Ácido Nucleico , Proliferação de Células , Cromatina/genética , Biologia Computacional/métodos , Análise Mutacional de DNA , Genoma Humano , Células HEK293 , Humanos
2.
Hum Genet ; 142(2): 181-192, 2023 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-36331656

RESUMO

Rapid advancements of genome sequencing (GS) technologies have enhanced our understanding of the relationship between genes and human disease. To incorporate genomic information into the practice of medicine, new processes for the analysis, reporting, and communication of GS data are needed. Blood samples were collected from adults with a PCR-confirmed SARS-CoV-2 (COVID-19) diagnosis (target N = 1500). GS was performed. Data were filtered and analyzed using custom pipelines and gene panels. We developed unique patient-facing materials, including an online intake survey, group counseling presentation, and consultation letters in addition to a comprehensive GS report. The final report includes results generated from GS data: (1) monogenic disease risks; (2) carrier status; (3) pharmacogenomic variants; (4) polygenic risk scores for common conditions; (5) HLA genotype; (6) genetic ancestry; (7) blood group; and, (8) COVID-19 viral lineage. Participants complete pre-test genetic counseling and confirm preferences for secondary findings before receiving results. Counseling and referrals are initiated for clinically significant findings. We developed a genetic counseling, reporting, and return of results framework that integrates GS information across multiple areas of human health, presenting possibilities for the clinical application of comprehensive GS data in healthy individuals.


Assuntos
COVID-19 , Aconselhamento Genético , Adulto , Humanos , COVID-19/epidemiologia , COVID-19/genética , SARS-CoV-2/genética , Genômica/métodos , Genótipo
3.
Nat Methods ; 17(12): 1191-1199, 2020 12.
Artigo em Inglês | MEDLINE | ID: mdl-33230324

RESUMO

Probing epigenetic features on DNA has tremendous potential to advance our understanding of the phased epigenome. In this study, we use nanopore sequencing to evaluate CpG methylation and chromatin accessibility simultaneously on long strands of DNA by applying GpC methyltransferase to exogenously label open chromatin. We performed nanopore sequencing of nucleosome occupancy and methylome (nanoNOMe) on four human cell lines (GM12878, MCF-10A, MCF-7 and MDA-MB-231). The single-molecule resolution allows footprinting of protein and nucleosome binding, and determination of the combinatorial promoter epigenetic signature on individual molecules. Long-read sequencing makes it possible to robustly assign reads to haplotypes, allowing us to generate a fully phased human epigenome, consisting of chromosome-level allele-specific profiles of CpG methylation and chromatin accessibility. We further apply this to a breast cancer model to evaluate differential methylation and accessibility between cancerous and noncancerous cells.


Assuntos
Neoplasias da Mama/genética , Cromatina/genética , Metilação de DNA/genética , Sequenciamento por Nanoporos/métodos , Linhagem Celular Tumoral , Ilhas de CpG/genética , DNA/metabolismo , Epigenoma/genética , Feminino , Genoma Humano/genética , Humanos , Células MCF-7 , Metiltransferases/metabolismo , Regiões Promotoras Genéticas/genética , Análise de Sequência de DNA
5.
Nat Methods ; 16(5): 429-436, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-31011185

RESUMO

Replication of eukaryotic genomes is highly stochastic, making it difficult to determine the replication dynamics of individual molecules with existing methods. We report a sequencing method for the measurement of replication fork movement on single molecules by detecting nucleotide analog signal currents on extremely long nanopore traces (D-NAscent). Using this method, we detect 5-bromodeoxyuridine (BrdU) incorporated by Saccharomyces cerevisiae to reveal, at a genomic scale and on single molecules, the DNA sequences replicated during a pulse-labeling period. Under conditions of limiting BrdU concentration, D-NAscent detects the differences in BrdU incorporation frequency across individual molecules to reveal the location of active replication origins, fork direction, termination sites, and fork pausing/stalling events. We used sequencing reads of 20-160 kilobases to generate a whole-genome single-molecule map of DNA replication dynamics and discover a class of low-frequency stochastic origins in budding yeast. The D-NAscent software is available at https://github.com/MBoemo/DNAscent.git .


Assuntos
Replicação do DNA , Genoma Fúngico , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Nanoporos , Saccharomyces cerevisiae/genética , Bromodesoxiuridina/metabolismo , DNA Fúngico/genética , Genoma , Software
6.
Nat Methods ; 16(12): 1297-1305, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31740818

RESUMO

High-throughput complementary DNA sequencing technologies have advanced our understanding of transcriptome complexity and regulation. However, these methods lose information contained in biological RNA because the copied reads are often short and modifications are not retained. We address these limitations using a native poly(A) RNA sequencing strategy developed by Oxford Nanopore Technologies. Our study generated 9.9 million aligned sequence reads for the human cell line GM12878, using thirty MinION flow cells at six institutions. These native RNA reads had a median length of 771 bases, and a maximum aligned length of over 21,000 bases. Mitochondrial poly(A) reads provided an internal measure of read-length quality. We combined these long nanopore reads with higher accuracy short-reads and annotated GM12878 promoter regions to identify 33,984 plausible RNA isoforms. We describe strategies for assessing 3' poly(A) tail length, base modifications and transcript haplotypes.


Assuntos
Sequenciamento por Nanoporos/métodos , Poli A/genética , Análise de Sequência de RNA/métodos , Transcriptoma , Células Cultivadas , Humanos
7.
Nature ; 538(7625): 378-382, 2016 Oct 20.
Artigo em Inglês | MEDLINE | ID: mdl-27732578

RESUMO

Pancreatic cancer, a highly aggressive tumour type with uniformly poor prognosis, exemplifies the classically held view of stepwise cancer development. The current model of tumorigenesis, based on analyses of precursor lesions, termed pancreatic intraepithelial neoplasm (PanINs) lesions, makes two predictions: first, that pancreatic cancer develops through a particular sequence of genetic alterations (KRAS, followed by CDKN2A, then TP53 and SMAD4); and second, that the evolutionary trajectory of pancreatic cancer progression is gradual because each alteration is acquired independently. A shortcoming of this model is that clonally expanded precursor lesions do not always belong to the tumour lineage, indicating that the evolutionary trajectory of the tumour lineage and precursor lesions can be divergent. This prevailing model of tumorigenesis has contributed to the clinical notion that pancreatic cancer evolves slowly and presents at a late stage. However, the propensity for this disease to rapidly metastasize and the inability to improve patient outcomes, despite efforts aimed at early detection, suggest that pancreatic cancer progression is not gradual. Here, using newly developed informatics tools, we tracked changes in DNA copy number and their associated rearrangements in tumour-enriched genomes and found that pancreatic cancer tumorigenesis is neither gradual nor follows the accepted mutation order. Two-thirds of tumours harbour complex rearrangement patterns associated with mitotic errors, consistent with punctuated equilibrium as the principal evolutionary trajectory. In a subset of cases, the consequence of such errors is the simultaneous, rather than sequential, knockout of canonical preneoplastic genetic drivers that are likely to set-off invasive cancer growth. These findings challenge the current progression model of pancreatic cancer and provide insights into the mutational processes that give rise to these aggressive tumours.


Assuntos
Carcinogênese/genética , Carcinogênese/patologia , Rearranjo Gênico/genética , Genoma Humano/genética , Modelos Biológicos , Mutagênese/genética , Neoplasias Pancreáticas/genética , Neoplasias Pancreáticas/patologia , Carcinoma in Situ/genética , Cromotripsia , Variações do Número de Cópias de DNA/genética , Progressão da Doença , Evolução Molecular , Feminino , Genes Neoplásicos/genética , Humanos , Masculino , Mitose/genética , Mutação/genética , Invasividade Neoplásica/genética , Invasividade Neoplásica/patologia , Metástase Neoplásica/genética , Metástase Neoplásica/patologia , Poliploidia , Lesões Pré-Cancerosas/genética
8.
Nature ; 530(7589): 228-232, 2016 Feb 11.
Artigo em Inglês | MEDLINE | ID: mdl-26840485

RESUMO

The Ebola virus disease epidemic in West Africa is the largest on record, responsible for over 28,599 cases and more than 11,299 deaths. Genome sequencing in viral outbreaks is desirable to characterize the infectious agent and determine its evolutionary rate. Genome sequencing also allows the identification of signatures of host adaptation, identification and monitoring of diagnostic targets, and characterization of responses to vaccines and treatments. The Ebola virus (EBOV) genome substitution rate in the Makona strain has been estimated at between 0.87 × 10(-3) and 1.42 × 10(-3) mutations per site per year. This is equivalent to 16-27 mutations in each genome, meaning that sequences diverge rapidly enough to identify distinct sub-lineages during a prolonged epidemic. Genome sequencing provides a high-resolution view of pathogen evolution and is increasingly sought after for outbreak surveillance. Sequence data may be used to guide control measures, but only if the results are generated quickly enough to inform interventions. Genomic surveillance during the epidemic has been sporadic owing to a lack of local sequencing capacity coupled with practical difficulties transporting samples to remote sequencing facilities. To address this problem, here we devise a genomic surveillance system that utilizes a novel nanopore DNA sequencing instrument. In April 2015 this system was transported in standard airline luggage to Guinea and used for real-time genomic surveillance of the ongoing epidemic. We present sequence data and analysis of 142 EBOV samples collected during the period March to October 2015. We were able to generate results less than 24 h after receiving an Ebola-positive sample, with the sequencing process taking as little as 15-60 min. We show that real-time genomic surveillance is possible in resource-limited settings and can be established rapidly to monitor outbreaks.


Assuntos
Ebolavirus/genética , Monitoramento Epidemiológico , Genoma Viral/genética , Doença pelo Vírus Ebola/epidemiologia , Doença pelo Vírus Ebola/virologia , Análise de Sequência de DNA/instrumentação , Análise de Sequência de DNA/métodos , Aeronaves , Surtos de Doenças/estatística & dados numéricos , Ebolavirus/classificação , Ebolavirus/patogenicidade , Guiné/epidemiologia , Humanos , Mutagênese/genética , Taxa de Mutação , Fatores de Tempo
9.
BMC Bioinformatics ; 21(1): 343, 2020 Aug 05.
Artigo em Inglês | MEDLINE | ID: mdl-32758139

RESUMO

BACKGROUND: Nanopore sequencing enables portable, real-time sequencing applications, including point-of-care diagnostics and in-the-field genotyping. Achieving these outcomes requires efficient bioinformatic algorithms for the analysis of raw nanopore signal data. However, comparing raw nanopore signals to a biological reference sequence is a computationally complex task. The dynamic programming algorithm called Adaptive Banded Event Alignment (ABEA) is a crucial step in polishing sequencing data and identifying non-standard nucleotides, such as measuring DNA methylation. Here, we parallelise and optimise an implementation of the ABEA algorithm (termed f5c) to efficiently run on heterogeneous CPU-GPU architectures. RESULTS: By optimising memory, computations and load balancing between CPU and GPU, we demonstrate how f5c can perform ∼3-5 × faster than an optimised version of the original CPU-only implementation of ABEA in the Nanopolish software package. We also show that f5c enables DNA methylation detection on-the-fly using an embedded System on Chip (SoC) equipped with GPUs. CONCLUSIONS: Our work not only demonstrates that complex genomics analyses can be performed on lightweight computing systems, but also benefits High-Performance Computing (HPC). The associated source code for f5c along with GPU optimised ABEA is available at https://github.com/hasindu2008/f5c .


Assuntos
Gráficos por Computador , Nanoporos , Processamento de Sinais Assistido por Computador , Algoritmos , Biologia Computacional , Bases de Dados como Assunto , Genoma Humano , Humanos , Análise de Sequência
10.
Genome Res ; 27(2): 300-309, 2017 02.
Artigo em Inglês | MEDLINE | ID: mdl-27986821

RESUMO

We are rapidly approaching the point where we have sequenced millions of human genomes. There is a pressing need for new data structures to store raw sequencing data and efficient algorithms for population scale analysis. Current reference-based data formats do not fully exploit the redundancy in population sequencing nor take advantage of shared genetic variation. In recent years, the Burrows-Wheeler transform (BWT) and FM-index have been widely employed as a full-text searchable index for read alignment and de novo assembly. We introduce the concept of a population BWT and use it to store and index the sequencing reads of 2705 samples from the 1000 Genomes Project. A key feature is that, as more genomes are added, identical read sequences are increasingly observed, and compression becomes more efficient. We assess the support in the 1000 Genomes read data for every base position of two human reference assembly versions, identifying that 3.2 Mbp with population support was lost in the transition from GRCh37 with 13.7 Mbp added to GRCh38. We show that the vast majority of variant alleles can be uniquely described by overlapping 31-mers and show how rapid and accurate SNP and indel genotyping can be carried out across the genomes in the population BWT. We use the population BWT to carry out nonreference queries to search for the presence of all known viral genomes and discover human T-lymphotropic virus 1 integrations in six samples in a recognized epidemiological distribution.


Assuntos
Genoma Humano/genética , Genômica , Alinhamento de Sequência/métodos , Sequenciamento Completo do Genoma/métodos , Alelos , Compressão de Dados , Genótipo , Humanos , Mutação INDEL/genética , Análise de Sequência de DNA , Software
11.
Genome Res ; 27(5): 849-864, 2017 05.
Artigo em Inglês | MEDLINE | ID: mdl-28396521

RESUMO

The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Genoma Humano , Genômica/métodos , Análise de Sequência de DNA/métodos , Software , Mapeamento de Sequências Contíguas/normas , Genômica/normas , Haploidia , Haplótipos , Humanos , Polimorfismo Genético , Padrões de Referência , Análise de Sequência de DNA/normas
12.
Nat Methods ; 14(4): 407-410, 2017 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-28218898

RESUMO

In nanopore sequencing devices, electrolytic current signals are sensitive to base modifications, such as 5-methylcytosine (5-mC). Here we quantified the strength of this effect for the Oxford Nanopore Technologies MinION sequencer. By using synthetically methylated DNA, we were able to train a hidden Markov model to distinguish 5-mC from unmethylated cytosine. We applied our method to sequence the methylome of human DNA, without requiring special steps for library preparation.


Assuntos
5-Metilcitosina/análise , Citosina/metabolismo , Metilação de DNA , Genoma Humano , Linhagem Celular Tumoral , Ilhas de CpG , Citosina/análise , Escherichia coli/genética , Humanos , Cadeias de Markov , Nanoporos
13.
Annu Rev Genomics Hum Genet ; 16: 153-72, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25939056

RESUMO

The current genomic revolution was made possible by joint advances in genome sequencing technologies and computational approaches for analyzing sequence data. The close interaction between biologists and computational scientists is perhaps most apparent in the development of approaches for sequencing entire genomes, a feat that would not be possible without sophisticated computational tools called genome assemblers (short for genome sequence assemblers). Here, we survey the key developments in algorithms for assembling genome sequences since the development of the first DNA sequencing methods more than 35 years ago.


Assuntos
Algoritmos , Genômica/métodos , Análise de Sequência de DNA/métodos , Cromossomos Artificiais Bacterianos , Clonagem Molecular , Gráficos por Computador , Genoma , Humanos
15.
Nat Methods ; 12(8): 733-5, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26076426

RESUMO

We have assembled de novo the Escherichia coli K-12 MG1655 chromosome in a single 4.6-Mb contig using only nanopore data. Our method has three stages: (i) overlaps are detected between reads and then corrected by a multiple-alignment process; (ii) corrected reads are assembled using the Celera Assembler; and (iii) the assembly is polished using a probabilistic model of the signal-level data. The assembly reconstructs gene order and has 99.5% nucleotide identity.


Assuntos
Biologia Computacional/métodos , Escherichia coli K12/genética , Genoma Bacteriano , Nanoporos , Nanotecnologia/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Mapeamento de Sequências Contíguas/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Reprodutibilidade dos Testes , Software
16.
Bioinformatics ; 33(1): 49-55, 2017 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-27614348

RESUMO

MOTIVATION: The highly portable Oxford Nanopore MinION sequencer has enabled new applications of genome sequencing directly in the field. However, the MinION currently relies on a cloud computing platform, Metrichor (metrichor.com), for translating locally generated sequencing data into basecalls. RESULTS: To allow offline and private analysis of MinION data, we created Nanocall. Nanocall is the first freely available, open-source basecaller for Oxford Nanopore sequencing data and does not require an internet connection. Using R7.3 chemistry, on two E.coli and two human samples, with natural as well as PCR-amplified DNA, Nanocall reads have ∼68% identity, directly comparable to Metrichor '1D' data. Further, Nanocall is efficient, processing ∼2500 Kbp of sequence per core hour using the fastest settings, and fully parallelized. Using a 4 core desktop computer, Nanocall could basecall a MinION sequencing run in real time. Metrichor provides the ability to integrate the '1D' sequencing of template and complement strands of a single DNA molecule, and create a '2D' read. Nanocall does not currently integrate this technology, and addition of this capability will be an important future development. In summary, Nanocall is the first open-source, freely available, off-line basecaller for Oxford Nanopore sequencing data. AVAILABILITY AND IMPLEMENTATION: Nanocall is available at github.com/mateidavid/nanocall, released under the MIT license. CONTACT: matei.david@oicr.on.caSupplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
DNA/análise , Análise de Sequência de DNA/métodos , Software , Escherichia coli/genética , Humanos , Reação em Cadeia da Polimerase
17.
Nature ; 483(7388): 169-75, 2012 Mar 07.
Artigo em Inglês | MEDLINE | ID: mdl-22398555

RESUMO

Gorillas are humans' closest living relatives after chimpanzees, and are of comparable importance for the study of human origins and evolution. Here we present the assembly and analysis of a genome sequence for the western lowland gorilla, and compare the whole genomes of all extant great ape genera. We propose a synthesis of genetic and fossil evidence consistent with placing the human-chimpanzee and human-chimpanzee-gorilla speciation events at approximately 6 and 10 million years ago. In 30% of the genome, gorilla is closer to human or chimpanzee than the latter are to each other; this is rarer around coding genes, indicating pervasive selection throughout great ape evolution, and has functional consequences in gene expression. A comparison of protein coding genes reveals approximately 500 genes showing accelerated evolution on each of the gorilla, human and chimpanzee lineages, and evidence for parallel acceleration, particularly of genes involved in hearing. We also compare the western and eastern gorilla species, estimating an average sequence divergence time 1.75 million years ago, but with evidence for more recent genetic exchange and a population bottleneck in the eastern species. The use of the genome sequence in these and future analyses will promote a deeper understanding of great ape biology and evolution.


Assuntos
Evolução Molecular , Especiação Genética , Genoma/genética , Gorilla gorilla/genética , Animais , Feminino , Regulação da Expressão Gênica , Variação Genética/genética , Genômica , Humanos , Macaca mulatta/genética , Dados de Sequência Molecular , Pan troglodytes/genética , Filogenia , Pongo/genética , Proteínas/genética , Alinhamento de Sequência , Especificidade da Espécie , Transcrição Gênica
19.
Mol Biol Evol ; 31(4): 872-88, 2014 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-24425782

RESUMO

The question of how genetic variation in a population influences phenotypic variation and evolution is of major importance in modern biology. Yet much is still unknown about the relative functional importance of different forms of genome variation and how they are shaped by evolutionary processes. Here we address these questions by population level sequencing of 42 strains from the budding yeast Saccharomyces cerevisiae and its closest relative S. paradoxus. We find that genome content variation, in the form of presence or absence as well as copy number of genetic material, is higher within S. cerevisiae than within S. paradoxus, despite genetic distances as measured in single-nucleotide polymorphisms being vastly smaller within the former species. This genome content variation, as well as loss-of-function variation in the form of premature stop codons and frameshifting indels, is heavily enriched in the subtelomeres, strongly reinforcing the relevance of these regions to functional evolution. Genes affected by these likely functional forms of variation are enriched for functions mediating interaction with the external environment (sugar transport and metabolism, flocculation, metal transport, and metabolism). Our results and analyses provide a comprehensive view of genomic diversity in budding yeast and expose surprising and pronounced differences between the variation within S. cerevisiae and that within S. paradoxus. We also believe that the sequence data and de novo assemblies will constitute a useful resource for further evolutionary and population genomics studies.


Assuntos
Genes Fúngicos , Saccharomyces cerevisiae/genética , Arsenitos/farmacologia , Variações do Número de Cópias de DNA , Farmacorresistência Fúngica/genética , Evolução Molecular , Ligação Genética , Especiação Genética , Genoma Fúngico , Anotação de Sequência Molecular , Família Multigênica , Filogenia , Polimorfismo de Nucleotídeo Único , Saccharomyces cerevisiae/efeitos dos fármacos , Saccharomyces cerevisiae/crescimento & desenvolvimento , Análise de Sequência de DNA , Compostos de Sódio/farmacologia
20.
Genome Res ; 22(3): 549-56, 2012 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-22156294

RESUMO

De novo genome sequence assembly is important both to generate new sequence assemblies for previously uncharacterized genomes and to identify the genome sequence of individuals in a reference-unbiased way. We present memory efficient data structures and algorithms for assembly using the FM-index derived from the compressed Burrows-Wheeler transform, and a new assembler based on these called SGA (String Graph Assembler). We describe algorithms to error-correct, assemble, and scaffold large sets of sequence data. SGA uses the overlap-based string graph model of assembly, unlike most de novo assemblers that rely on de Bruijn graphs, and is simply parallelizable. We demonstrate the error correction and assembly performance of SGA on 1.2 billion sequence reads from a human genome, which we are able to assemble using 54 GB of memory. The resulting contigs are highly accurate and contiguous, while covering 95% of the reference genome (excluding contigs <200 bp in length). Because of the low memory requirements and parallelization without requiring inter-process communication, SGA provides the first practical assembler to our knowledge for a mammalian-sized genome on a low-end computing cluster.


Assuntos
Genômica/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos , Animais , Biologia Computacional/métodos , Compressão de Dados , Humanos , Internet , Reprodutibilidade dos Testes
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa