Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 59
Filter
Add more filters

Publication year range
1.
Nature ; 630(8016): 401-411, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38811727

ABSTRACT

Apes possess two sex chromosomes-the male-specific Y chromosome and the X chromosome, which is present in both males and females. The Y chromosome is crucial for male reproduction, with deletions being linked to infertility1. The X chromosome is vital for reproduction and cognition2. Variation in mating patterns and brain function among apes suggests corresponding differences in their sex chromosomes. However, owing to their repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the methodology developed for the telomere-to-telomere (T2T) human genome, we produced gapless assemblies of the X and Y chromosomes for five great apes (bonobo (Pan paniscus), chimpanzee (Pan troglodytes), western lowland gorilla (Gorilla gorilla gorilla), Bornean orangutan (Pongo pygmaeus) and Sumatran orangutan (Pongo abelii)) and a lesser ape (the siamang gibbon (Symphalangus syndactylus)), and untangled the intricacies of their evolution. Compared with the X chromosomes, the ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements-owing to the accumulation of lineage-specific ampliconic regions, palindromes, transposable elements and satellites. Many Y chromosome genes expand in multi-copy families and some evolve under purifying selection. Thus, the Y chromosome exhibits dynamic evolution, whereas the X chromosome is more stable. Mapping short-read sequencing data to these assemblies revealed diversity and selection patterns on sex chromosomes of more than 100 individual great apes. These reference assemblies are expected to inform human evolution and conservation genetics of non-human apes, all of which are endangered species.


Subject(s)
Hominidae , X Chromosome , Y Chromosome , Animals , Female , Male , Gorilla gorilla/genetics , Hominidae/genetics , Hominidae/classification , Hylobatidae/genetics , Pan paniscus/genetics , Pan troglodytes/genetics , Phylogeny , Pongo abelii/genetics , Pongo pygmaeus/genetics , Telomere/genetics , X Chromosome/genetics , Y Chromosome/genetics , Evolution, Molecular , DNA Copy Number Variations/genetics , Humans , Endangered Species , Reference Standards
2.
Nature ; 621(7978): 344-354, 2023 Sep.
Article in English | MEDLINE | ID: mdl-37612512

ABSTRACT

The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications1-3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.


Subject(s)
Chromosomes, Human, Y , Genomics , Sequence Analysis, DNA , Humans , Base Sequence , Chromosomes, Human, Y/genetics , DNA, Satellite/genetics , Genetic Variation/genetics , Genetics, Population , Genomics/methods , Genomics/standards , Heterochromatin/genetics , Multigene Family/genetics , Reference Standards , Segmental Duplications, Genomic/genetics , Sequence Analysis, DNA/standards , Tandem Repeat Sequences/genetics , Telomere/genetics
3.
Nature ; 611(7936): 519-531, 2022 Nov.
Article in English | MEDLINE | ID: mdl-36261518

ABSTRACT

The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.


Subject(s)
Chromosome Mapping , Diploidy , Genome, Human , Genomics , Humans , Chromosome Mapping/standards , Genome, Human/genetics , Haplotypes/genetics , High-Throughput Nucleotide Sequencing/methods , High-Throughput Nucleotide Sequencing/standards , Sequence Analysis, DNA/methods , Sequence Analysis, DNA/standards , Reference Standards , Genomics/methods , Genomics/standards , Chromosomes, Human/genetics , Genetic Variation/genetics
4.
Nat Methods ; 20(8): 1213-1221, 2023 08.
Article in English | MEDLINE | ID: mdl-37365340

ABSTRACT

Advancements in sequencing technologies and assembly methods enable the regular production of high-quality genome assemblies characterizing complex regions. However, challenges remain in efficiently interpreting variation at various scales, from smaller tandem repeats to megabase rearrangements, across many human genomes. We present a PanGenome Research Tool Kit (PGR-TK) enabling analyses of complex pangenome structural and haplotype variation at multiple scales. We apply the graph decomposition methods in PGR-TK to the class II major histocompatibility complex demonstrating the importance of the human pangenome for analyzing complicated regions. Moreover, we investigate the Y-chromosome genes, DAZ1/DAZ2/DAZ3/DAZ4, of which structural variants have been linked to male infertility, and X-chromosome genes OPN1LW and OPN1MW linked to eye disorders. We further showcase PGR-TK across 395 complex repetitive medically important genes. This highlights the power of PGR-TK to resolve complex variation in regions of the genome that were previously too complex to analyze.


Subject(s)
Genome, Human , Genomics , Male , Humans , Major Histocompatibility Complex
5.
Nature ; 546(7659): 524-527, 2017 06 22.
Article in English | MEDLINE | ID: mdl-28605751

ABSTRACT

Complete and accurate reference genomes and annotations provide fundamental tools for characterization of genetic and functional variation. These resources facilitate the determination of biological processes and support translation of research findings into improved and sustainable agricultural technologies. Many reference genomes for crop plants have been generated over the past decade, but these genomes are often fragmented and missing complex repeat regions. Here we report the assembly and annotation of a reference genome of maize, a genetic and agricultural model species, using single-molecule real-time sequencing and high-resolution optical mapping. Relative to the previous reference genome, our assembly features a 52-fold increase in contig length and notable improvements in the assembly of intergenic spaces and centromeres. Characterization of the repetitive portion of the genome revealed more than 130,000 intact transposable elements, allowing us to identify transposable element lineage expansions that are unique to maize. Gene annotations were updated using 111,000 full-length transcripts obtained by single-molecule real-time sequencing. In addition, comparative optical mapping of two other inbred maize lines revealed a prevalence of deletions in regions of low gene density and maize lineage-specific genes.


Subject(s)
Genome, Plant/genetics , High-Throughput Nucleotide Sequencing/methods , Single Molecule Imaging/methods , Zea mays/genetics , Centromere/genetics , Chromosomes, Plant/genetics , Contig Mapping , Crops, Agricultural/genetics , DNA Transposable Elements/genetics , DNA, Intergenic/genetics , Genes, Plant/genetics , Molecular Sequence Annotation , Optics and Photonics , Phylogeny , RNA, Messenger/analysis , RNA, Messenger/genetics , Reference Standards , Sorghum/genetics
6.
Bioinformatics ; 37(3): 413-415, 2021 04 20.
Article in English | MEDLINE | ID: mdl-32766814

ABSTRACT

SUMMARY: Ribbon is an alignment visualization tool that shows how alignments are positioned within both the reference and read contexts, giving an intuitive view that enables a better understanding of structural variants and the read evidence supporting them. Ribbon was born out of a need to curate complex structural variant calls and determine whether each was well supported by long-read evidence, and it uses the same intuitive visualization method to shed light on contig alignments from genome-to-genome comparisons. AVAILABILITY AND IMPLEMENTATION: Ribbon is freely available online at http://genomeribbon.com/ and is open-source at https://github.com/marianattestad/ribbon. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genomics , Software , Genome
7.
Kidney Int ; 99(1): 186-197, 2021 01.
Article in English | MEDLINE | ID: mdl-32781106

ABSTRACT

Although the gold standard of monitoring kidney transplant function relies on glomerular filtration rate (GFR), little is known about GFR trajectories after transplantation, their determinants, and their association with outcomes. To evaluate these parameters we examined kidney transplant recipients receiving care at 15 academic centers. Patients underwent prospective monitoring of estimated GFR (eGFR) measurements, with assessment of clinical, functional, histological and immunological parameters. Additional validation took place in seven randomized controlled trials that included a total of 14,132 patients with 403,497 eGFR measurements. After a median follow-up of 6.5 years, 1,688 patients developed end-stage kidney disease. Using unsupervised latent class mixed models, we identified eight distinct eGFR trajectories. Multinomial regression models identified seven significant determinants of eGFR trajectories including donor age, eGFR, proteinuria, and several significant histological features: graft scarring, graft interstitial inflammation and tubulitis, microcirculation inflammation, and circulating anti-HLA donor specific antibodies. The eGFR trajectories were associated with progression to end stage kidney disease. These trajectories, their determinants and respective associations with end stage kidney disease were similar across cohorts, as well as in diverse clinical scenarios, therapeutic eras and in the seven randomized control trials. Thus, our results provide the basis for a trajectory-based assessment of kidney transplant patients for risk stratification and monitoring.


Subject(s)
Kidney Failure, Chronic , Kidney Transplantation , Glomerular Filtration Rate , Humans , Kidney Failure, Chronic/diagnosis , Kidney Failure, Chronic/surgery , Kidney Transplantation/adverse effects , Prospective Studies
8.
Genome Res ; 28(8): 1126-1135, 2018 08.
Article in English | MEDLINE | ID: mdl-29954844

ABSTRACT

The SK-BR-3 cell line is one of the most important models for HER2+ breast cancers, which affect one in five breast cancer patients. SK-BR-3 is known to be highly rearranged, although much of the variation is in complex and repetitive regions that may be underreported. Addressing this, we sequenced SK-BR-3 using long-read single molecule sequencing from Pacific Biosciences and develop one of the most detailed maps of structural variations (SVs) in a cancer genome available, with nearly 20,000 variants present, most of which were missed by short-read sequencing. Surrounding the important ERBB2 oncogene (also known as HER2), we discover a complex sequence of nested duplications and translocations, suggesting a punctuated progression. Full-length transcriptome sequencing further revealed several novel gene fusions within the nested genomic variants. Combining long-read genome and transcriptome sequencing enables an in-depth analysis of how SVs disrupt the genome and sheds new light on the complex mechanisms involved in cancer genome evolution.


Subject(s)
Breast Neoplasms/genetics , Gene Amplification/genetics , Gene Rearrangement/genetics , Oncogenes/genetics , Breast Neoplasms/pathology , Female , Genome, Human , Genomic Structural Variation , High-Throughput Nucleotide Sequencing , Humans , MCF-7 Cells , Receptor, ErbB-2/genetics , Repetitive Sequences, Nucleic Acid/genetics , Transcriptome/genetics
9.
Genome Res ; 27(5): 849-864, 2017 05.
Article in English | MEDLINE | ID: mdl-28396521

ABSTRACT

The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.


Subject(s)
Contig Mapping/methods , Genome, Human , Genomics/methods , Sequence Analysis, DNA/methods , Software , Contig Mapping/standards , Genomics/standards , Haploidy , Haplotypes , Humans , Polymorphism, Genetic , Reference Standards , Sequence Analysis, DNA/standards
10.
Nat Methods ; 13(12): 1050-1054, 2016 Dec.
Article in English | MEDLINE | ID: mdl-27749838

ABSTRACT

While genome assembly projects have been successful in many haploid and inbred species, the assembly of noninbred or rearranged heterozygous genomes remains a major challenge. To address this challenge, we introduce the open-source FALCON and FALCON-Unzip algorithms (https://github.com/PacificBiosciences/FALCON/) to assemble long-read sequencing data into highly accurate, contiguous, and correctly phased diploid genomes. We generate new reference sequences for heterozygous samples including an F1 hybrid of Arabidopsis thaliana, the widely cultivated Vitis vinifera cv. Cabernet Sauvignon, and the coral fungus Clavicorona pyxidata, samples that have challenged short-read assembly approaches. The FALCON-based assemblies are substantially more contiguous and complete than alternate short- or long-read approaches. The phased diploid assembly enabled the study of haplotype structure and heterozygosities between homologous chromosomes, including the identification of widespread heterozygous structural variation within coding sequences.


Subject(s)
Diploidy , Genome, Fungal/genetics , Genome, Plant/genetics , Genomics/methods , Polymorphism, Single Nucleotide/genetics , Algorithms , Arabidopsis/genetics , Basidiomycota/genetics , DNA, Fungal/genetics , DNA, Plant/genetics , Haplotypes , Heterozygote , Humans , Sequence Analysis, DNA , Vitis/genetics
11.
Blood ; 130(1): 48-58, 2017 07 06.
Article in English | MEDLINE | ID: mdl-28490572

ABSTRACT

Genomic studies have revealed significant branching heterogeneity in cancer. Studies of resistance to tyrosine kinase inhibitor therapy have not fully reflected this heterogeneity because resistance in individual patients has been ascribed to largely mutually exclusive on-target or off-target mechanisms in which tumors either retain dependency on the target oncogene or subvert it through a parallel pathway. Using targeted sequencing from single cells and colonies from patient samples, we demonstrate tremendous clonal diversity in the majority of acute myeloid leukemia (AML) patients with activating FLT3 internal tandem duplication mutations at the time of acquired resistance to the FLT3 inhibitor quizartinib. These findings establish that clinical resistance to quizartinib is highly complex and reflects the underlying clonal heterogeneity of AML.


Subject(s)
Benzothiazoles/administration & dosage , Drug Resistance, Neoplasm , High-Throughput Nucleotide Sequencing , INDEL Mutation , Leukemia, Myeloid, Acute , Phenylurea Compounds/administration & dosage , fms-Like Tyrosine Kinase 3/genetics , Drug Resistance, Neoplasm/drug effects , Drug Resistance, Neoplasm/genetics , Female , Humans , Leukemia, Myeloid, Acute/drug therapy , Leukemia, Myeloid, Acute/genetics , Male
12.
PLoS Genet ; 12(4): e1005954, 2016 Apr.
Article in English | MEDLINE | ID: mdl-27082250

ABSTRACT

We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer), a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species' native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics.


Subject(s)
Bass/genetics , Chromosome Mapping , Animals , Bass/classification , Genome , In Situ Hybridization, Fluorescence , Phylogeny
14.
Nat Methods ; 12(8): 780-6, 2015 Aug.
Article in English | MEDLINE | ID: mdl-26121404

ABSTRACT

We present the first comprehensive analysis of a diploid human genome that combines single-molecule sequencing with single-molecule genome maps. Our hybrid assembly markedly improves upon the contiguity observed from traditional shotgun sequencing approaches, with scaffold N50 values approaching 30 Mb, and we identified complex structural variants (SVs) missed by other high-throughput approaches. Furthermore, by combining Illumina short-read data with long reads, we phased both single-nucleotide variants and SVs, generating haplotypes with over 99% consistency with previous trio-based studies. Our work shows that it is now possible to integrate single-molecule and high-throughput sequence data to generate de novo assembled genomes that approach reference quality.


Subject(s)
Computational Biology/methods , Genome, Human , High-Throughput Nucleotide Sequencing/methods , Polymorphism, Single Nucleotide , Algorithms , Chromosome Mapping , Diploidy , Gene Library , Genetic Variation , Genome , Haplotypes , Humans , Nucleotides/genetics , Reproducibility of Results , Sequence Analysis, DNA , Tandem Repeat Sequences
16.
Nature ; 485(7397): 260-3, 2012 Apr 15.
Article in English | MEDLINE | ID: mdl-22504184

ABSTRACT

Effective targeted cancer therapeutic development depends upon distinguishing disease-associated 'driver' mutations, which have causative roles in malignancy pathogenesis, from 'passenger' mutations, which are dispensable for cancer initiation and maintenance. Translational studies of clinically active targeted therapeutics can definitively discriminate driver from passenger lesions and provide valuable insights into human cancer biology. Activating internal tandem duplication (ITD) mutations in FLT3 (FLT3-ITD) are detected in approximately 20% of acute myeloid leukaemia (AML) patients and are associated with a poor prognosis. Abundant scientific and clinical evidence, including the lack of convincing clinical activity of early FLT3 inhibitors, suggests that FLT3-ITD probably represents a passenger lesion. Here we report point mutations at three residues within the kinase domain of FLT3-ITD that confer substantial in vitro resistance to AC220 (quizartinib), an active investigational inhibitor of FLT3, KIT, PDGFRA, PDGFRB and RET; evolution of AC220-resistant substitutions at two of these amino acid positions was observed in eight of eight FLT3-ITD-positive AML patients with acquired resistance to AC220. Our findings demonstrate that FLT3-ITD can represent a driver lesion and valid therapeutic target in human AML. AC220-resistant FLT3 kinase domain mutants represent high-value targets for future FLT3 inhibitor development efforts.


Subject(s)
Benzothiazoles/therapeutic use , Leukemia, Myeloid, Acute/drug therapy , Leukemia, Myeloid, Acute/genetics , Molecular Targeted Therapy , Mutation/genetics , Phenylurea Compounds/therapeutic use , fms-Like Tyrosine Kinase 3/antagonists & inhibitors , fms-Like Tyrosine Kinase 3/genetics , Benzothiazoles/pharmacology , Cell Line, Tumor , DNA Mutational Analysis , Drug Resistance, Neoplasm/genetics , Humans , Leukemia, Myeloid, Acute/metabolism , Models, Molecular , Molecular Structure , Phenylurea Compounds/pharmacology , Protein Binding , Protein Structure, Tertiary/genetics , Recurrence , Reproducibility of Results , fms-Like Tyrosine Kinase 3/metabolism
17.
BMC Genomics ; 18(1): 527, 2017 07 12.
Article in English | MEDLINE | ID: mdl-28701198

ABSTRACT

BACKGROUND: Long read technologies have revolutionized de novo genome assembly by generating contigs orders of magnitude longer than that of short read assemblies. Although assembly contiguity has increased, it usually does not reconstruct a full chromosome or an arm of the chromosome, resulting in an unfinished chromosome level assembly. To increase the contiguity of the assembly to the chromosome level, different strategies are used which exploit long range contact information between chromosomes in the genome. METHODS: We develop a scalable and computationally efficient scaffolding method that can boost the assembly contiguity to a large extent using genome-wide chromatin interaction data such as Hi-C. RESULTS: we demonstrate an algorithm that uses Hi-C data for longer-range scaffolding of de novo long read genome assemblies. We tested our methods on the human and goat genome assemblies. We compare our scaffolds with the scaffolds generated by LACHESIS based on various metrics. CONCLUSION: Our new algorithm SALSA produces more accurate scaffolds compared to the existing state of the art method LACHESIS.


Subject(s)
Contig Mapping/methods , Algorithms , Animals , Genomics , Goats/genetics , Humans
18.
Bioinformatics ; 32(13): 1921-1924, 2016 07 01.
Article in English | MEDLINE | ID: mdl-27153570

ABSTRACT

MOTIVATION: Long arrays of near-identical tandem repeats are a common feature of centromeric and subtelomeric regions in complex genomes. These sequences present a source of repeat structure diversity that is commonly ignored by standard genomic tools. Unlike reads shorter than the underlying repeat structure that rely on indirect inference methods, e.g. assembly, long reads allow direct inference of satellite higher order repeat structure. To automate characterization of local centromeric tandem repeat sequence variation we have designed Alpha-CENTAURI (ALPHA satellite CENTromeric AUtomated Repeat Identification), that takes advantage of Pacific Bioscience long-reads from whole-genome sequencing datasets. By operating on reads prior to assembly, our approach provides a more comprehensive set of repeat-structure variants and is not impacted by rearrangements or sequence underrepresentation due to misassembly. RESULTS: We demonstrate the utility of Alpha-CENTAURI in characterizing repeat structure for alpha satellite containing reads in the hydatidiform mole (CHM1, haploid-like) genome. The pipeline is designed to report local repeat organization summaries for each read, thereby monitoring rearrangements in repeat units, shifts in repeat orientation and sites of array transition into non-satellite DNA, typically defined by transposable element insertion. We validate the method by showing consistency with existing centromere high order repeat references. Alpha-CENTAURI can, in principle, run on any sequence data, offering a method to generate a sequence repeat resolution that could be readily performed using consensus sequences available for other satellite families in genomes without high-quality reference assemblies. AVAILABILITY AND IMPLEMENTATION: Documentation and source code for Alpha-CENTAURI are freely available at http://github.com/volkansevim/alpha-CENTAURI CONTACT: ali.bashir@mssm.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Centromere/genetics , Computational Biology/methods , Genomics , Sequence Analysis, DNA/methods , Tandem Repeat Sequences , Algorithms , Consensus Sequence , Female , Humans , Hydatidiform Mole/genetics , Pregnancy
19.
Nat Methods ; 10(6): 563-9, 2013 Jun.
Article in English | MEDLINE | ID: mdl-23644548

ABSTRACT

We present a hierarchical genome-assembly process (HGAP) for high-quality de novo microbial genome assemblies using only a single, long-insert shotgun DNA library in conjunction with Single Molecule, Real-Time (SMRT) DNA sequencing. Our method uses the longest reads as seeds to recruit all other reads for construction of highly accurate preassembled reads through a directed acyclic graph-based consensus procedure, which we follow with assembly using off-the-shelf long-read assemblers. In contrast to hybrid approaches, HGAP does not require highly accurate raw reads for error correction. We demonstrate efficient genome assembly for several microorganisms using as few as three SMRT Cell zero-mode waveguide arrays of sequencing and for BACs using just one SMRT Cell. Long repeat regions can be successfully resolved with this workflow. We also describe a consensus algorithm that incorporates SMRT sequencing primary quality values to produce de novo genome sequence exceeding 99.999% accuracy.


Subject(s)
Genome, Bacterial , Sequence Analysis, DNA/methods , Chromosomes, Artificial, Bacterial , Escherichia coli/genetics , Gene Library , Humans , Repetitive Sequences, Nucleic Acid
20.
N Engl J Med ; 364(1): 33-42, 2011 Jan 06.
Article in English | MEDLINE | ID: mdl-21142692

ABSTRACT

BACKGROUND: Although cholera has been present in Latin America since 1991, it had not been epidemic in Haiti for at least 100 years. Recently, however, there has been a severe outbreak of cholera in Haiti. METHODS: We used third-generation single-molecule real-time DNA sequencing to determine the genome sequences of 2 clinical Vibrio cholerae isolates from the current outbreak in Haiti, 1 strain that caused cholera in Latin America in 1991, and 2 strains isolated in South Asia in 2002 and 2008. Using primary sequence data, we compared the genomes of these 5 strains and a set of previously obtained partial genomic sequences of 23 diverse strains of V. cholerae to assess the likely origin of the cholera outbreak in Haiti. RESULTS: Both single-nucleotide variations and the presence and structure of hypervariable chromosomal elements indicate that there is a close relationship between the Haitian isolates and variant V. cholerae El Tor O1 strains isolated in Bangladesh in 2002 and 2008. In contrast, analysis of genomic variation of the Haitian isolates reveals a more distant relationship with circulating South American isolates. CONCLUSIONS: The Haitian epidemic is probably the result of the introduction, through human activity, of a V. cholerae strain from a distant geographic source. (Funded by the National Institute of Allergy and Infectious Diseases and the Howard Hughes Medical Institute.).


Subject(s)
Cholera/microbiology , Genes, Bacterial , Vibrio cholerae/classification , Vibrio cholerae/genetics , Cholera/epidemiology , Chromosome Mapping , Disease Outbreaks , Feces/microbiology , Genetic Variation , Genome, Bacterial , Haiti/epidemiology , History, 18th Century , Humans , Phylogeny , Sequence Analysis, DNA , Serotyping , Vibrio cholerae/isolation & purification , Vibrio cholerae O1/genetics
SELECTION OF CITATIONS
SEARCH DETAIL