Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 23
Filter
Add more filters










Publication year range
1.
Am J Clin Pathol ; 155(5): 748-754, 2021 04 26.
Article in English | MEDLINE | ID: mdl-33258912

ABSTRACT

OBJECTIVES: Diffuse large B-cell lymphoma (DLBCL) is an aggressive non-Hodgkin lymphoma with a heterogenous genetic landscape that can require multiple assays to characterize. We reviewed a 1-step RNA-based assay to determine cell of origin (COO), detect translocations, and identify mutations and to assess the role of the assay in diagnosis. METHODS: Using a single custom Archer FusionPlex Lymphoma panel, we performed anchored multiplex polymerase chain reaction-based RNA sequencing on 41 cases of de novo DLBCL. Each case was subclassified by COO, and gene fusions and hotspot mutations were identified. The findings were then compared with COO classification by the Hans immunohistochemical algorithm and NanoString technology, cytogenetics, and fluorescence in situ hybridization results. RESULTS: Concordant COO classification by the FusionPlex panel and NanoString was observed in 35 of 41 cases (85.3%), with NanoString and Hans concordant in 33 of 41 cases (80.5%) and FusionPlex and Hans concordant in 33 of 41 cases (80.5%). The FusionPlex assay also detected 6 of 11 BCL6 translocations (4 cryptic), 2 of 3 BCL2 translocations, and 2 of 4 MYC translocations. Mutations were detected in lymphoma-related genes in 24 of 41 cases. CONCLUSION: This FusionPlex assay offers a single method for COO classification, mutation detection, and identification of important translocations in DLBCL. Although not replacing traditional testing, it could offer useful data when limited tissue is available.


Subject(s)
Lymphoma, Large B-Cell, Diffuse/genetics , Lymphoma, Large B-Cell, Diffuse/pathology , Mutation/genetics , Translocation, Genetic/genetics , Adult , Aged , Aged, 80 and over , Female , Humans , Lymphoma, Large B-Cell, Diffuse/diagnosis , Male , Middle Aged , Proto-Oncogene Proteins c-bcl-2/genetics , Proto-Oncogene Proteins c-bcl-2/metabolism , Proto-Oncogene Proteins c-bcl-6/genetics , Proto-Oncogene Proteins c-bcl-6/metabolism , Exome Sequencing/methods
3.
Histopathology ; 69(4): 551-9, 2016 Oct.
Article in English | MEDLINE | ID: mdl-26990025

ABSTRACT

AIMS: Endometrial stromal sarcomas (ESSs) are divided into low-grade and high-grade subtypes, with the latter showing more aggressive clinical behaviour. Although histology and immunophenotype can aid in the diagnosis of these tumours, genetic studies can provide additional diagnostic insights, as low-grade ESSs frequently harbour fusions involving JAZF1/SUZ12 and/or JAZF1/PHF1, whereas high-grade ESSs are defined by YWHAE-NUTM2A/B fusions. The aim of this study was to evaluate the utility of a next-generation sequencing (NGS)-based assay in identifying ESS fusions in archival formalin-fixed paraffin-embedded tumour samples. METHODS AND RESULTS: We applied an NGS-based fusion transcript detection assay (Archer FusionPlex Sarcoma Panel) that targets YWHAE and JAZF1 fusions in a series of low-grade ESSs (n = 11) and high-grade ESSs (n = 5) that were previously confirmed to harbour genetic rearrangements by fluorescence in-situ hybridization (FISH) and/or reverse transcription polymerase chain reaction (RT-PCR) analyses. The fusion assay identified junctional fusion transcript sequences that corresponded to the known FISH/RT-PCR results in all cases. Four low-grade ESSs harboured JAZF1-PHF1 fusions with different junctional sequences, and all were correctly identified because of the open-ended nature of the assay design, using anchored multiplex polymerase chain reaction. Seven non-ESS sarcomas were also included as negative controls, and no strong ESS fusion candidates were identified in these cases. CONCLUSIONS: Our findings demonstrate good sensitivity and specificity of an NGS-based gene fusion assay in the detection of ESS fusion transcripts.


Subject(s)
Endometrial Neoplasms/diagnosis , Endometrial Stromal Tumors/diagnosis , High-Throughput Nucleotide Sequencing/methods , Oncogene Proteins, Fusion/analysis , Sarcoma, Endometrial Stromal/diagnosis , Adult , Aged , Endometrial Neoplasms/genetics , Endometrial Stromal Tumors/genetics , Female , Humans , Middle Aged , Pathology, Molecular , Sarcoma, Endometrial Stromal/genetics , Sensitivity and Specificity , Young Adult
4.
Nat Genet ; 48(4): 427-37, 2016 Apr.
Article in English | MEDLINE | ID: mdl-26950095

ABSTRACT

To connect human biology to fish biomedical models, we sequenced the genome of spotted gar (Lepisosteus oculatus), whose lineage diverged from teleosts before teleost genome duplication (TGD). The slowly evolving gar genome has conserved in content and size many entire chromosomes from bony vertebrate ancestors. Gar bridges teleosts to tetrapods by illuminating the evolution of immunity, mineralization and development (mediated, for example, by Hox, ParaHox and microRNA genes). Numerous conserved noncoding elements (CNEs; often cis regulatory) undetectable in direct human-teleost comparisons become apparent using gar: functional studies uncovered conserved roles for such cryptic CNEs, facilitating annotation of sequences identified in human genome-wide association studies. Transcriptomic analyses showed that the sums of expression domains and expression levels for duplicated teleost genes often approximate the patterns and levels of expression for gar genes, consistent with subfunctionalization. The gar genome provides a resource for understanding evolution after genome duplication, the origin of vertebrate genomes and the function of human regulatory sequences.


Subject(s)
Fishes/genetics , Animals , Evolution, Molecular , Female , Fishes/metabolism , Genome , Humans , Karyotype , Models, Genetic , Organ Specificity , Sequence Analysis, DNA , Transcriptome
5.
Cell ; 162(4): 738-50, 2015 Aug 13.
Article in English | MEDLINE | ID: mdl-26276630

ABSTRACT

The 2013-2015 West African epidemic of Ebola virus disease (EVD) reminds us of how little is known about biosafety level 4 viruses. Like Ebola virus, Lassa virus (LASV) can cause hemorrhagic fever with high case fatality rates. We generated a genomic catalog of almost 200 LASV sequences from clinical and rodent reservoir samples. We show that whereas the 2013-2015 EVD epidemic is fueled by human-to-human transmissions, LASV infections mainly result from reservoir-to-human infections. We elucidated the spread of LASV across West Africa and show that this migration was accompanied by changes in LASV genome abundance, fatality rates, codon adaptation, and translational efficiency. By investigating intrahost evolution, we found that mutations accumulate in epitopes of viral surface proteins, suggesting selection for immune escape. This catalog will serve as a foundation for the development of vaccines and diagnostics. VIDEO ABSTRACT.


Subject(s)
Genome, Viral , Lassa Fever/virology , Lassa virus/genetics , RNA, Viral/genetics , Africa, Western/epidemiology , Animals , Biological Evolution , Disease Reservoirs , Ebolavirus/genetics , Genetic Variation , Glycoproteins/genetics , Hemorrhagic Fever, Ebola/virology , Humans , Lassa Fever/epidemiology , Lassa Fever/transmission , Lassa virus/classification , Lassa virus/physiology , Murinae/genetics , Mutation , Nigeria/epidemiology , Viral Proteins/genetics , Zoonoses/epidemiology , Zoonoses/virology
6.
Nat Biotechnol ; 32(12): 1250-5, 2014 Dec.
Article in English | MEDLINE | ID: mdl-25402615

ABSTRACT

The domestic ferret (Mustela putorius furo) is an important animal model for multiple human respiratory diseases. It is considered the 'gold standard' for modeling human influenza virus infection and transmission. Here we describe the 2.41 Gb draft genome assembly of the domestic ferret, constituting 2.28 Gb of sequence plus gaps. We annotated 19,910 protein-coding genes on this assembly using RNA-seq data from 21 ferret tissues. We characterized the ferret host response to two influenza virus infections by RNA-seq analysis of 42 ferret samples from influenza time-course data and showed distinct signatures in ferret trachea and lung tissues specific to 1918 or 2009 human pandemic influenza virus infections. Using microarray data from 16 ferret samples reflecting cystic fibrosis disease progression, we showed that transcriptional changes in the CFTR-knockout ferret lung reflect pathways of early disease that cannot be readily studied in human infants with cystic fibrosis disease.


Subject(s)
Ferrets/genetics , Genome , Influenza, Human/genetics , Sequence Analysis, DNA , Animals , Base Sequence , Chromosome Mapping , Disease Models, Animal , High-Throughput Nucleotide Sequencing , Humans , Influenza, Human/transmission , Influenza, Human/virology , Molecular Sequence Annotation , Molecular Sequence Data , Orthomyxoviridae/genetics , Orthomyxoviridae/pathogenicity
7.
Bioinformatics ; 30(24): 3558-60, 2014 Dec 15.
Article in English | MEDLINE | ID: mdl-25172923

ABSTRACT

MOTIVATION: The naked mole rat (Heterocephalus glaber) is an exceptionally long-lived and cancer-resistant rodent native to East Africa. Although its genome was previously sequenced, here we report a new assembly sequenced by us with substantially higher N50 values for scaffolds and contigs. RESULTS: We analyzed the annotation of this new improved assembly and identified candidate genomic adaptations which may have contributed to the evolution of the naked mole rat's extraordinary traits, including in regions of p53, and the hyaluronan receptors CD44 and HMMR (RHAMM). Furthermore, we developed a freely available web portal, the Naked Mole Rat Genome Resource (http://www.naked-mole-rat.org), featuring the data and results of our analysis, to assist researchers interested in the genome and genes of the naked mole rat, and also to facilitate further studies on this fascinating species.


Subject(s)
Genome , Longevity/genetics , Mole Rats/genetics , Neoplasms/genetics , Animals , Evolution, Molecular , Female , Genes , Genomics , Guinea Pigs , Humans , Mice , Sequence Alignment
8.
Proc Natl Acad Sci U S A ; 111(3): 1102-7, 2014 Jan 21.
Article in English | MEDLINE | ID: mdl-24385586

ABSTRACT

High-grade serous ovarian cancers are characterized by widespread recurrent copy number alterations. Although some regions of copy number change harbor known oncogenes and tumor suppressor genes, the genes targeted by the majority of amplified or deleted regions in ovarian cancer remain undefined. Here we systematically tested amplified genes for their ability to promote tumor formation using an in vivo multiplexed transformation assay. We identified the GRB2-associated binding protein 2 (GAB2) as a recurrently amplified gene that potently transforms immortalized ovarian and fallopian tube secretory epithelial cells. Cancer cell lines overexpressing GAB2 require GAB2 for survival and show evidence of phosphatidylinositol 3-kinase (PI3K) pathway activation, which was required for GAB2-induced transformation. Cell lines overexpressing GAB2 were as sensitive to PI3K inhibition as cell lines harboring mutant PIK3CA. Together, these observations nominate GAB2 as an ovarian cancer oncogene, identify an alternative mechanism to activate PI3K signaling, and underscore the importance of PI3K signaling in this cancer.


Subject(s)
Adaptor Proteins, Signal Transducing/metabolism , Gene Amplification , Gene Expression Regulation, Neoplastic , Ovarian Neoplasms/genetics , Adaptor Proteins, Signal Transducing/genetics , Animals , Cell Line, Tumor , Cell Proliferation , Cell Transformation, Neoplastic , Female , Genomics , Humans , Mice , Mice, Nude , Neoplasm Transplantation , Oligonucleotide Array Sequence Analysis , Open Reading Frames , Ovarian Neoplasms/metabolism , Phosphatidylinositol 3-Kinases/metabolism , Signal Transduction
9.
Nat Methods ; 10(7): 623-9, 2013 Jul.
Article in English | MEDLINE | ID: mdl-23685885

ABSTRACT

RNA-seq is an effective method for studying the transcriptome, but it can be difficult to apply to scarce or degraded RNA from fixed clinical samples, rare cell populations or cadavers. Recent studies have proposed several methods for RNA-seq of low-quality and/or low-quantity samples, but the relative merits of these methods have not been systematically analyzed. Here we compare five such methods using metrics relevant to transcriptome annotation, transcript discovery and gene expression. Using a single human RNA sample, we constructed and sequenced ten libraries with these methods and compared them against two control libraries. We found that the RNase H method performed best for chemically fragmented, low-quality RNA, and we confirmed this through analysis of actual degraded samples. RNase H can even effectively replace oligo(dT)-based methods for standard RNA-seq. SMART and NuGEN had distinct strengths for measuring low-quantity RNA. Our analysis allows biologists to select the most suitable methods and provides a benchmark for future method development.


Subject(s)
Algorithms , Artifacts , Gene Expression Profiling/methods , RNA/genetics , Sample Size , Sequence Analysis, RNA/methods , Software , Transcriptome/genetics
10.
Nature ; 496(7445): 311-6, 2013 Apr 18.
Article in English | MEDLINE | ID: mdl-23598338

ABSTRACT

The discovery of a living coelacanth specimen in 1938 was remarkable, as this lineage of lobe-finned fish was thought to have become extinct 70 million years ago. The modern coelacanth looks remarkably similar to many of its ancient relatives, and its evolutionary proximity to our own fish ancestors provides a glimpse of the fish that first walked on land. Here we report the genome sequence of the African coelacanth, Latimeria chalumnae. Through a phylogenomic analysis, we conclude that the lungfish, and not the coelacanth, is the closest living relative of tetrapods. Coelacanth protein-coding genes are significantly more slowly evolving than those of tetrapods, unlike other genomic features. Analyses of changes in genes and regulatory elements during the vertebrate adaptation to land highlight genes involved in immunity, nitrogen excretion and the development of fins, tail, ear, eye, brain and olfaction. Functional assays of enhancers involved in the fin-to-limb transition and in the emergence of extra-embryonic tissues show the importance of the coelacanth genome as a blueprint for understanding tetrapod evolution.


Subject(s)
Biological Evolution , Fishes/classification , Fishes/genetics , Genome/genetics , Animals , Animals, Genetically Modified , Chick Embryo , Conserved Sequence/genetics , Enhancer Elements, Genetic/genetics , Evolution, Molecular , Extremities/anatomy & histology , Extremities/growth & development , Fishes/anatomy & histology , Fishes/physiology , Genes, Homeobox/genetics , Genomics , Immunoglobulin M/genetics , Mice , Molecular Sequence Annotation , Molecular Sequence Data , Phylogeny , Sequence Alignment , Sequence Analysis, DNA , Vertebrates/anatomy & histology , Vertebrates/genetics , Vertebrates/physiology
11.
G3 (Bethesda) ; 3(1): 41-63, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23316438

ABSTRACT

Pyrenophora tritici-repentis is a necrotrophic fungus causal to the disease tan spot of wheat, whose contribution to crop loss has increased significantly during the last few decades. Pathogenicity by this fungus is attributed to the production of host-selective toxins (HST), which are recognized by their host in a genotype-specific manner. To better understand the mechanisms that have led to the increase in disease incidence related to this pathogen, we sequenced the genomes of three P. tritici-repentis isolates. A pathogenic isolate that produces two known HSTs was used to assemble a reference nuclear genome of approximately 40 Mb composed of 11 chromosomes that encode 12,141 predicted genes. Comparison of the reference genome with those of a pathogenic isolate that produces a third HST, and a nonpathogenic isolate, showed the nonpathogen genome to be more diverged than those of the two pathogens. Examination of gene-coding regions has provided candidate pathogen-specific proteins and revealed gene families that may play a role in a necrotrophic lifestyle. Analysis of transposable elements suggests that their presence in the genome of pathogenic isolates contributes to the creation of novel genes, effector diversification, possible horizontal gene transfer events, identified copy number variation, and the first example of transduplication by DNA transposable elements in fungi. Overall, comparative analysis of these genomes provides evidence that pathogenicity in this species arose through an influx of transposable elements, which created a genetically flexible landscape that can easily respond to environmental changes.


Subject(s)
Ascomycota/genetics , Ascomycota/pathogenicity , Evolution, Molecular , Genetic Variation , Genome, Fungal/genetics , Mycotoxins/genetics , Triticum/microbiology , Base Sequence , Chromosome Mapping , Cytogenetic Analysis , DNA Primers/genetics , DNA Transposable Elements/genetics , Gene Duplication/genetics , Genomics , Likelihood Functions , Models, Genetic , Molecular Sequence Annotation , Molecular Sequence Data , Phylogeny , Sequence Analysis, DNA
12.
Nucleic Acids Res ; 41(1): e13, 2013 Jan 07.
Article in English | MEDLINE | ID: mdl-22962364

ABSTRACT

RNA viruses are the causative agents for AIDS, influenza, SARS, and other serious health threats. Development of rapid and broadly applicable methods for complete viral genome sequencing is highly desirable to fully understand all aspects of these infectious agents as well as for surveillance of viral pandemic threats and emerging pathogens. However, traditional viral detection methods rely on prior sequence or antigen knowledge. In this study, we describe sequence-independent amplification for samples containing ultra-low amounts of viral RNA coupled with Illumina sequencing and de novo assembly optimized for viral genomes. With 5 million reads, we capture 96 to 100% of the viral protein coding region of HIV, respiratory syncytial and West Nile viral samples from as little as 100 copies of viral RNA. The methods presented here are scalable to large numbers of samples and capable of generating full or near full length viral genomes from clone and clinical samples with low amounts of viral RNA, without prior sequence information and in the presence of substantial host contamination.


Subject(s)
Genome, Viral , Nucleic Acid Amplification Techniques , RNA, Viral/chemistry , Sequence Analysis, RNA , Base Sequence , HIV/genetics , Humans , Molecular Sequence Data , Respiratory Syncytial Viruses/genetics , Reverse Transcriptase Polymerase Chain Reaction , West Nile virus/genetics
13.
Genome Res ; 22(11): 2270-7, 2012 Nov.
Article in English | MEDLINE | ID: mdl-22829535

ABSTRACT

Exceptionally accurate genome reference sequences have proven to be of great value to microbial researchers. Thus, to date, about 1800 bacterial genome assemblies have been "finished" at great expense with the aid of manual laboratory and computational processes that typically iterate over a period of months or even years. By applying a new laboratory design and new assembly algorithm to 16 samples, we demonstrate that assemblies exceeding finished quality can be obtained from whole-genome shotgun data and automated computation. Cost and time requirements are thus dramatically reduced.


Subject(s)
Bacteria/genetics , Genome, Bacterial , Genomic Library , Sequence Analysis, DNA/methods , Algorithms
14.
Genome Res ; 22(11): 2241-9, 2012 Nov.
Article in English | MEDLINE | ID: mdl-22800726

ABSTRACT

Eliminating the bacterial cloning step has been a major factor in the vastly improved efficiency of massively parallel sequencing approaches. However, this also has made it a technical challenge to produce the modern equivalent of the Fosmid- or BAC-end sequences that were crucial for assembling and analyzing complex genomes during the Sanger-based sequencing era. To close this technology gap, we developed Fosill, a method for converting Fosmids to Illumina-compatible jumping libraries. We constructed Fosmid libraries in vectors with Illumina primer sequences and specific nicking sites flanking the cloning site. Our family of pFosill vectors allows multiplex Fosmid cloning of end-tagged genomic fragments without physical size selection and is compatible with standard and multiplex paired-end Illumina sequencing. To excise the bulk of each cloned insert, we introduced two nicks in the vector, translated them into the inserts, and cleaved them. Recircularization of the vector via coligation of insert termini followed by inverse PCR generates a jumping library for paired-end sequencing with 101-base reads. The yield of unique Fosmid-sized jumps is sufficiently high, and the background of short, incorrectly spaced and chimeric artifacts sufficiently low, to enable applications such as mapping of structural variation and scaffolding of de novo assemblies. We demonstrate the power of Fosill to map genome rearrangements in a cancer cell line and identified three fusion genes that were corroborated by RNA-seq data. Our Fosill-powered assembly of the mouse genome has an N50 scaffold length of 17.0 Mb, rivaling the connectivity (16.9 Mb) of the Sanger-sequencing based draft assembly.


Subject(s)
Escherichia coli/genetics , Genetic Vectors/genetics , Genome, Bacterial , Genome, Fungal , Genomic Library , Schizosaccharomyces/genetics , Sequence Analysis, DNA/methods , Animals , Gene Rearrangement , Mice , Mice, Inbred C57BL
15.
PLoS Pathog ; 8(3): e1002529, 2012.
Article in English | MEDLINE | ID: mdl-22412369

ABSTRACT

Deep sequencing technologies have the potential to transform the study of highly variable viral pathogens by providing a rapid and cost-effective approach to sensitively characterize rapidly evolving viral quasispecies. Here, we report on a high-throughput whole HIV-1 genome deep sequencing platform that combines 454 pyrosequencing with novel assembly and variant detection algorithms. In one subject we combined these genetic data with detailed immunological analyses to comprehensively evaluate viral evolution and immune escape during the acute phase of HIV-1 infection. The majority of early, low frequency mutations represented viral adaptation to host CD8+ T cell responses, evidence of strong immune selection pressure occurring during the early decline from peak viremia. CD8+ T cell responses capable of recognizing these low frequency escape variants coincided with the selection and evolution of more effective secondary HLA-anchor escape mutations. Frequent, and in some cases rapid, reversion of transmitted mutations was also observed across the viral genome. When located within restricted CD8 epitopes these low frequency reverting mutations were sufficient to prime de novo responses to these epitopes, again illustrating the capacity of the immune response to recognize and respond to low frequency variants. More importantly, rapid viral escape from the most immunodominant CD8+ T cell responses coincided with plateauing of the initial viral load decline in this subject, suggestive of a potential link between maintenance of effective, dominant CD8 responses and the degree of early viremia reduction. We conclude that the early control of HIV-1 replication by immunodominant CD8+ T cell responses may be substantially influenced by rapid, low frequency viral adaptations not detected by conventional sequencing approaches, which warrants further investigation. These data support the critical need for vaccine-induced CD8+ T cell responses to target more highly constrained regions of the virus in order to ensure the maintenance of immunodominant CD8 responses and the sustained decline of early viremia.


Subject(s)
Genome, Viral/genetics , Genome-Wide Association Study , HIV Infections/virology , HIV-1/genetics , Immune Evasion/immunology , CD8-Positive T-Lymphocytes/immunology , Genetic Variation , Genomic Structural Variation , HIV Infections/immunology , HIV Infections/prevention & control , HIV-1/immunology , HIV-1/pathogenicity , Humans , Immune Evasion/genetics , Oligonucleotide Array Sequence Analysis , RNA, Viral/analysis , Sequence Analysis, RNA , Viral Vaccines/immunology
16.
Science ; 332(6032): 930-6, 2011 May 20.
Article in English | MEDLINE | ID: mdl-21511999

ABSTRACT

The fission yeast clade--comprising Schizosaccharomyces pombe, S. octosporus, S. cryophilus, and S. japonicus--occupies the basal branch of Ascomycete fungi and is an important model of eukaryote biology. A comparative annotation of these genomes identified a near extinction of transposons and the associated innovation of transposon-free centromeres. Expression analysis established that meiotic genes are subject to antisense transcription during vegetative growth, which suggests a mechanism for their tight regulation. In addition, trans-acting regulators control new genes within the context of expanded functional modules for meiosis and stress response. Differences in gene content and regulation also explain why, unlike the budding yeast of Saccharomycotina, fission yeasts cannot use ethanol as a primary carbon source. These analyses elucidate the genome structure and gene regulation of fission yeast and provide tools for investigation across the Schizosaccharomyces clade.


Subject(s)
Genome, Fungal , Schizosaccharomyces/genetics , Centromere/genetics , Centromere/physiology , Centromere/ultrastructure , DNA Transposable Elements , Evolution, Molecular , Gene Expression Profiling , Gene Expression Regulation, Fungal , Genes, Mating Type, Fungal , Genomics , Glucose/metabolism , Meiosis , Molecular Sequence Annotation , Molecular Sequence Data , Phylogeny , RNA, Antisense/genetics , RNA, Fungal/genetics , RNA, Small Interfering/genetics , RNA, Untranslated/genetics , Regulatory Elements, Transcriptional , Schizosaccharomyces/growth & development , Schizosaccharomyces/metabolism , Schizosaccharomyces pombe Proteins/genetics , Schizosaccharomyces pombe Proteins/metabolism , Sequence Analysis, DNA , Species Specificity , Transcription Factors/genetics , Transcription Factors/metabolism , Transcription, Genetic
17.
Genome Biol ; 12(1): R1, 2011.
Article in English | MEDLINE | ID: mdl-21205303

ABSTRACT

Genome targeting methods enable cost-effective capture of specific subsets of the genome for sequencing. We present here an automated, highly scalable method for carrying out the Solution Hybrid Selection capture approach that provides a dramatic increase in scale and throughput of sequence-ready libraries produced. Significant process improvements and a series of in-process quality control checkpoints are also added. These process improvements can also be used in a manual version of the protocol.


Subject(s)
Automation, Laboratory , Exome , Gene Library , Nucleic Acid Hybridization/methods , Oligonucleotide Array Sequence Analysis/methods , Genome, Human , High-Throughput Nucleotide Sequencing , Humans , Quality Control
18.
Proc Natl Acad Sci U S A ; 108(4): 1513-8, 2011 Jan 25.
Article in English | MEDLINE | ID: mdl-21187386

ABSTRACT

Massively parallel DNA sequencing technologies are revolutionizing genomics by making it possible to generate billions of relatively short (~100-base) sequence reads at very low cost. Whereas such data can be readily used for a wide range of biomedical applications, it has proven difficult to use them to generate high-quality de novo genome assemblies of large, repeat-rich vertebrate genomes. To date, the genome assemblies generated from such data have fallen far short of those obtained with the older (but much more expensive) capillary-based sequencing approach. Here, we report the development of an algorithm for genome assembly, ALLPATHS-LG, and its application to massively parallel DNA sequence data from the human and mouse genomes, generated on the Illumina platform. The resulting draft genome assemblies have good accuracy, short-range contiguity, long-range connectivity, and coverage of the genome. In particular, the base accuracy is high (≥99.95%) and the scaffold sizes (N50 size = 11.5 Mb for human and 7.2 Mb for mouse) approach those obtained with capillary-based sequencing. The combination of improved sequencing technology and improved computational methods should now make it possible to increase dramatically the de novo sequencing of large genomes. The ALLPATHS-LG program is available at http://www.broadinstitute.org/science/programs/genome-biology/crd.


Subject(s)
Algorithms , Genomics/methods , Sequence Analysis, DNA/methods , Software , Animals , Genome/genetics , Humans , Internet , Mice , Reproducibility of Results
19.
Environ Microbiol ; 12(11): 3035-56, 2010 Nov.
Article in English | MEDLINE | ID: mdl-20662890

ABSTRACT

T4-like myoviruses are ubiquitous, and their genes are among the most abundant documented in ocean systems. Here we compare 26 T4-like genomes, including 10 from non-cyanobacterial myoviruses, and 16 from marine cyanobacterial myoviruses (cyanophages) isolated on diverse Prochlorococcus or Synechococcus hosts. A core genome of 38 virion construction and DNA replication genes was observed in all 26 genomes, with 32 and 25 additional genes shared among the non-cyanophage and cyanophage subsets, respectively. These hierarchical cores are highly syntenic across the genomes, and sampled to saturation. The 25 cyanophage core genes include six previously described genes with putative functions (psbA, mazG, phoH, hsp20, hli03, cobS), a hypothetical protein with a potential phytanoyl-CoA dioxygenase domain, two virion structural genes, and 16 hypothetical genes. Beyond previously described cyanophage-encoded photosynthesis and phosphate stress genes, we observed core genes that may play a role in nitrogen metabolism during infection through modulation of 2-oxoglutarate. Patterns among non-core genes that may drive niche diversification revealed that phosphorus-related gene content reflects source waters rather than host strain used for isolation, and that carbon metabolism genes appear associated with putative mobile elements. As well, phages isolated on Synechococcus had higher genome-wide %G+C and often contained different gene subsets (e.g. petE, zwf, gnd, prnA, cpeT) than those isolated on Prochlorococcus. However, no clear diagnostic genes emerged to distinguish these phage groups, suggesting blurred boundaries possibly due to cross-infection. Finally, genome-wide comparisons of both diverse and closely related, co-isolated genomes provide a locus-to-locus variability metric that will prove valuable for interpreting metagenomic data sets.


Subject(s)
Bacteriophage T4/genetics , Cyanobacteria/virology , Ketoglutaric Acids/metabolism , Myoviridae/genetics , Quaternary Ammonium Compounds/metabolism , Seawater/virology , Bacteriophage T4/classification , Base Composition , Evolution, Molecular , Genetic Variation , Genome, Viral , Metagenomics , Molecular Sequence Data , Myoviridae/classification , Nitrogen/metabolism , Oceans and Seas , Prochlorococcus/virology , Seawater/microbiology , Sequence Analysis, DNA , Synechococcus/virology , Viral Core Proteins/genetics , Viral Tail Proteins/genetics , Water Microbiology
20.
Science ; 328(5981): 994-9, 2010 May 21.
Article in English | MEDLINE | ID: mdl-20489017

ABSTRACT

The human microbiome refers to the community of microorganisms, including prokaryotes, viruses, and microbial eukaryotes, that populate the human body. The National Institutes of Health launched an initiative that focuses on describing the diversity of microbial species that are associated with health and disease. The first phase of this initiative includes the sequencing of hundreds of microbial reference genomes, coupled to metagenomic sequencing from multiple body sites. Here we present results from an initial reference genome sequencing of 178 microbial genomes. From 547,968 predicted polypeptides that correspond to the gene complement of these strains, previously unidentified ("novel") polypeptides that had both unmasked sequence length greater than 100 amino acids and no BLASTP match to any nonreference entry in the nonredundant subset were defined. This analysis resulted in a set of 30,867 polypeptides, of which 29,987 (approximately 97%) were unique. In addition, this set of microbial genomes allows for approximately 40% of random sequences from the microbiome of the gastrointestinal tract to be associated with organisms based on the match criteria used. Insights into pan-genome analysis suggest that we are still far from saturating microbial species genetic data sets. In addition, the associated metrics and standards used by our group for quality assurance are presented.


Subject(s)
Genome, Bacterial , Metagenome/genetics , Sequence Analysis, DNA , Bacteria/classification , Bacteria/genetics , Bacterial Proteins/chemistry , Bacterial Proteins/genetics , Biodiversity , Computational Biology , Databases, Genetic , Gastrointestinal Tract/microbiology , Genes, Bacterial , Genetic Variation , Genome, Archaeal , Humans , Metagenomics/methods , Metagenomics/standards , Mouth/microbiology , Peptides/chemistry , Peptides/genetics , Phylogeny , Respiratory System/microbiology , Sequence Analysis, DNA/standards , Skin/microbiology , Urogenital System/microbiology
SELECTION OF CITATIONS
SEARCH DETAIL
...