Search | VHL Regional Portal

Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species.

Bradnam, Keith R; Fass, Joseph N; Alexandrov, Anton; Baranay, Paul; Bechner, Michael; Birol, Inanç; Boisvert, Sébastien; Chapman, Jarrod A; Chapuis, Guillaume; Chikhi, Rayan; Chitsaz, Hamidreza; Chou, Wen-Chi; Corbeil, Jacques; Del Fabbro, Cristian; Docking, T Roderick; Durbin, Richard; Earl, Dent; Emrich, Scott; Fedotov, Pavel; Fonseca, Nuno A; Ganapathy, Ganeshkumar; Gibbs, Richard A; Gnerre, Sante; Godzaridis, Elénie; Goldstein, Steve; Haimel, Matthias; Hall, Giles; Haussler, David; Hiatt, Joseph B; Ho, Isaac Y; Howard, Jason; Hunt, Martin; Jackman, Shaun D; Jaffe, David B; Jarvis, Erich D; Jiang, Huaiyang; Kazakov, Sergey; Kersey, Paul J; Kitzman, Jacob O; Knight, James R; Koren, Sergey; Lam, Tak-Wah; Lavenier, Dominique; Laviolette, François; Li, Yingrui; Li, Zhenyu; Liu, Binghang; Liu, Yue; Luo, Ruibang; Maccallum, Iain.

Gigascience ; 2(1): 10, 2013 Jul 22.

Article in English | MEDLINE | ID: mdl-23870653

ABSTRACT

BACKGROUND: The process of generating raw genome sequence data continues to become cheaper, faster, and more accurate. However, assembly of such data into high-quality, finished genome sequences remains challenging. Many genome assembly tools are available, but they differ greatly in terms of their performance (speed, scalability, hardware requirements, acceptance of newer read technologies) and in their final output (composition of assembled sequence). More importantly, it remains largely unclear how to best assess the quality of assembled genome sequences. The Assemblathon competitions are intended to assess current state-of-the-art methods in genome assembly. RESULTS: In Assemblathon 2, we provided a variety of sequence data to be assembled for three vertebrate species (a bird, a fish, and snake). This resulted in a total of 43 submitted assemblies from 21 participating teams. We evaluated these assemblies using a combination of optical map data, Fosmid sequences, and several statistical methods. From over 100 different metrics, we chose ten key measures by which to assess the overall quality of the assemblies. CONCLUSIONS: Many current genome assemblers produced useful assemblies, containing a significant representation of their genes and overall genome structure. However, the high degree of variability between the entries suggests that there is still much room for improvement in the field of genome assembly and that approaches which work well in assembling the genome of one species may not necessarily work well for another.

High-quality draft assemblies of mammalian genomes from massively parallel sequence data.

Gnerre, Sante; Maccallum, Iain; Przybylski, Dariusz; Ribeiro, Filipe J; Burton, Joshua N; Walker, Bruce J; Sharpe, Ted; Hall, Giles; Shea, Terrance P; Sykes, Sean; Berlin, Aaron M; Aird, Daniel; Costello, Maura; Daza, Riza; Williams, Louise; Nicol, Robert; Gnirke, Andreas; Nusbaum, Chad; Lander, Eric S; Jaffe, David B.

Proc Natl Acad Sci U S A ; 108(4): 1513-8, 2011 Jan 25.

Article in English | MEDLINE | ID: mdl-21187386

ABSTRACT

Massively parallel DNA sequencing technologies are revolutionizing genomics by making it possible to generate billions of relatively short (~100-base) sequence reads at very low cost. Whereas such data can be readily used for a wide range of biomedical applications, it has proven difficult to use them to generate high-quality de novo genome assemblies of large, repeat-rich vertebrate genomes. To date, the genome assemblies generated from such data have fallen far short of those obtained with the older (but much more expensive) capillary-based sequencing approach. Here, we report the development of an algorithm for genome assembly, ALLPATHS-LG, and its application to massively parallel DNA sequence data from the human and mouse genomes, generated on the Illumina platform. The resulting draft genome assemblies have good accuracy, short-range contiguity, long-range connectivity, and coverage of the genome. In particular, the base accuracy is high (≥99.95%) and the scaffold sizes (N50 size = 11.5 Mb for human and 7.2 Mb for mouse) approach those obtained with capillary-based sequencing. The combination of improved sequencing technology and improved computational methods should now make it possible to increase dramatically the de novo sequencing of large genomes. The ALLPATHS-LG program is available at http://www.broadinstitute.org/science/programs/genome-biology/crd.

Subject(s)

Algorithms , Genomics/methods , Sequence Analysis, DNA/methods , Software , Animals , Genome/genetics , Humans , Internet , Mice , Reproducibility of Results

Genome-wide analysis of estrogen receptor binding sites.

Carroll, Jason S; Meyer, Clifford A; Song, Jun; Li, Wei; Geistlinger, Timothy R; Eeckhoute, Jérôme; Brodsky, Alexander S; Keeton, Erika Krasnickas; Fertuck, Kirsten C; Hall, Giles F; Wang, Qianben; Bekiranov, Stefan; Sementchenko, Victor; Fox, Edward A; Silver, Pamela A; Gingeras, Thomas R; Liu, X Shirley; Brown, Myles.

Nat Genet ; 38(11): 1289-97, 2006 Nov.

Article in English | MEDLINE | ID: mdl-17013392

ABSTRACT

The estrogen receptor is the master transcriptional regulator of breast cancer phenotype and the archetype of a molecular therapeutic target. We mapped all estrogen receptor and RNA polymerase II binding sites on a genome-wide scale, identifying the authentic cis binding sites and target genes, in breast cancer cells. Combining this unique resource with gene expression data demonstrates distinct temporal mechanisms of estrogen-mediated gene regulation, particularly in the case of estrogen-suppressed genes. Furthermore, this resource has allowed the identification of cis-regulatory sites in previously unexplored regions of the genome and the cooperating transcription factors underlying estrogen signaling in breast cancer.

Subject(s)

Genome, Human , Receptors, Estrogen/metabolism , Response Elements , Adaptor Proteins, Signal Transducing/metabolism , Adenocarcinoma/genetics , Breast Neoplasms/genetics , Cells, Cultured , Chromosome Mapping/methods , Conserved Sequence , DNA-Binding Proteins/metabolism , Down-Regulation , Gene Expression , Gene Expression Regulation , Humans , Microarray Analysis/methods , Nuclear Proteins/metabolism , Nuclear Receptor Interacting Protein 1 , Response Elements/physiology , Transcription Factors/physiology , Transcription Initiation Site

XRCC4 suppresses medulloblastomas with recurrent translocations in p53-deficient mice.

Yan, Catherine T; Kaushal, Dhruv; Murphy, Michael; Zhang, Yu; Datta, Abhishek; Chen, Changzhong; Monroe, Brianna; Mostoslavsky, Gustavo; Coakley, Kristen; Gao, Yijie; Mills, Kevin D; Fazeli, Alex P; Tepsuporn, Suprawee; Hall, Giles; Mulligan, Richard; Fox, Edward; Bronson, Roderick; De Girolami, Umberto; Lee, Charles; Alt, Frederick W.

Proc Natl Acad Sci U S A ; 103(19): 7378-83, 2006 May 09.

Article in English | MEDLINE | ID: mdl-16670198

ABSTRACT

Inactivation of the XRCC4 nonhomologous end-joining factor in the mouse germ line leads to embryonic lethality, in association with apoptosis of newly generated, postmitotic neurons. We now show that conditional inactivation of the XRCC4 in nestin-expressing neuronal progenitor cells, although leading to no obvious phenotype in a WT background, leads to early onset of neuronally differentiated medulloblastomas (MBs) in a p53-deficient background. A substantial proportion of the XRCC4/p53-deficient MBs have high-level N-myc gene amplification, often intrachromosomally in the context of complex translocations or other alterations of chromosome 12, on which N-myc resides, or extrachromosomally within double minutes. In addition, most XRCC4/p53-deficient MBs harbor clonal translocations of chromosome 13, which frequently involve chromosome 6 as a partner. One copy of the patched gene (Ptc), which lies on chromosome 13, was deleted in all tested XRCC4/p53-deficient MBs in the context of translocations or interstitial deletions. In addition, Cyclin D2, a chromosome 6 gene, was amplified in a subset of tumors. Notably, amplification of Myc-family or Cyclin D2 genes and deletion of Ptc also have been observed in human MBs. We therefore conclude that, in neuronal cells of mice, the nonhomologous end-joining pathway plays a critical role in suppressing genomic instability that, in a p53-deficient background, routinely contributes to genesis of MBs with recurrent chromosomal alterations.

Subject(s)

DNA-Binding Proteins/metabolism , Medulloblastoma/metabolism , Translocation, Genetic/genetics , Tumor Suppressor Protein p53/deficiency , Tumor Suppressor Protein p53/metabolism , Alleles , Animals , DNA-Binding Proteins/deficiency , DNA-Binding Proteins/genetics , Down-Regulation/genetics , Gene Amplification , Intermediate Filament Proteins/metabolism , Medulloblastoma/genetics , Medulloblastoma/pathology , Mice , Mice, Knockout , Nerve Tissue Proteins/metabolism , Nestin , Survival Rate , Time Factors , Tumor Cells, Cultured , Tumor Suppressor Protein p53/genetics

Genomic mapping of RNA polymerase II reveals sites of co-transcriptional regulation in human cells.

Brodsky, Alexander S; Meyer, Clifford A; Swinburne, Ian A; Hall, Giles; Keenan, Benjamin J; Liu, Xiaole S; Fox, Edward A; Silver, Pamela A.

Genome Biol ; 6(8): R64, 2005.

Article in English | MEDLINE | ID: mdl-16086846

ABSTRACT

BACKGROUND: Transcription by RNA polymerase II is regulated at many steps including initiation, promoter release, elongation and termination. Accumulation of RNA polymerase II at particular locations across genes can be indicative of sites of regulation. RNA polymerase II is thought to accumulate at the promoter and at sites of co-transcriptional alternative splicing where the rate of RNA synthesis slows. RESULTS: To further understand transcriptional regulation at a global level, we determined the distribution of RNA polymerase II within regions of the human genome designated by the ENCODE project. Hypophosphorylated RNA polymerase II localizes almost exclusively to 5' ends of genes. On the other hand, localization of total RNA polymerase II reveals a variety of distinct landscapes across many genes with 74% of the observed enriched locations at exons. RNA polymerase II accumulates at many annotated constitutively spliced exons, but is biased for alternatively spliced exons. Finally, RNA polymerase II is also observed at locations not in gene regions. CONCLUSION: Localizing RNA polymerase II across many millions of base pairs in the human genome identifies novel sites of transcription and provides insights into the regulation of transcription elongation. These data indicate that RNA polymerase II accumulates most often at exons during transcription. Thus, a major factor of transcription elongation control in mammalian cells is the coordination of transcription and pre-mRNA processing to define exons.

Subject(s)

Chromosome Mapping , Gene Expression Regulation , Genome, Human/genetics , RNA Polymerase II/genetics , Transcription, Genetic , Chromatin Immunoprecipitation , Chromosomes, Human , Exons/genetics , HeLa Cells , Humans , RNA Polymerase III/genetics , RNA, Messenger/genetics , RNA, Messenger/metabolism

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL