RESUMEN
Stroma is a poorly defined non-parenchymal component of virtually every organ with key roles in organ development, homeostasis, and repair. Studies of the bone marrow stroma have defined individual populations in the stem cell niche regulating hematopoietic regeneration and capable of initiating leukemia. Here, we use single-cell RNA sequencing (scRNA-seq) to define a cellular taxonomy of the mouse bone marrow stroma and its perturbation by malignancy. We identified seventeen stromal subsets expressing distinct hematopoietic regulatory genes spanning new fibroblastic and osteoblastic subpopulations including distinct osteoblast differentiation trajectories. Emerging acute myeloid leukemia impaired mesenchymal osteogenic differentiation and reduced regulatory molecules necessary for normal hematopoiesis. These data suggest that tissue stroma responds to malignant cells by disadvantaging normal parenchymal cells. Our taxonomy of the stromal compartment provides a comprehensive bone marrow cell census and experimental support for cancer cell crosstalk with specific stromal elements to impair normal tissue function and thereby enable emergent cancer.
Asunto(s)
Células de la Médula Ósea/metabolismo , Diferenciación Celular , Homeostasis , Leucemia Mieloide Aguda/metabolismo , Osteoblastos/metabolismo , Osteogénesis , Microambiente Tumoral , Animales , Células de la Médula Ósea/patología , Humanos , Leucemia Mieloide Aguda/patología , Ratones , Osteoblastos/patología , Células del Estroma/metabolismo , Células del Estroma/patologíaRESUMEN
Finding the components of cellular circuits and determining their functions systematically remains a major challenge in mammalian cells. Here, we introduced genome-wide pooled CRISPR-Cas9 libraries into dendritic cells (DCs) to identify genes that control the induction of tumor necrosis factor (Tnf) by bacterial lipopolysaccharide (LPS), a key process in the host response to pathogens, mediated by the Tlr4 pathway. We found many of the known regulators of Tlr4 signaling, as well as dozens of previously unknown candidates that we validated. By measuring protein markers and mRNA profiles in DCs that are deficient in known or candidate genes, we classified the genes into three functional modules with distinct effects on the canonical responses to LPS and highlighted functions for the PAF complex and oligosaccharyltransferase (OST) complex. Our findings uncover new facets of innate immune circuits in primary cells and provide a genetic approach for dissection of mammalian cell circuits.
Asunto(s)
Sistemas CRISPR-Cas , Técnicas Genéticas , Inmunidad Innata , Animales , Células de la Médula Ósea/inmunología , Diferenciación Celular , Supervivencia Celular , Células Dendríticas/citología , Células Dendríticas/inmunología , Técnicas de Inactivación de Genes , Redes Reguladoras de Genes , Hexosiltransferasas/metabolismo , Proteínas de la Membrana/metabolismo , Ratones , Ratones Transgénicos , Receptor Toll-Like 4/inmunología , Factor de Necrosis Tumoral alfa/inmunologíaRESUMEN
The death receptor Fas removes activated lymphocytes through apoptosis. Previous transcriptional profiling predicted that Fas positively regulates interleukin-17 (IL-17)-producing T helper 17 (Th17) cells. Here, we demonstrate that Fas promoted the generation and stability of Th17 cells and prevented their differentiation into Th1 cells. Mice with T-cell- and Th17-cell-specific deletion of Fas were protected from induced autoimmunity, and Th17 cell differentiation and stability were impaired. Fas-deficient Th17 cells instead developed a Th1-cell-like transcriptional profile, which a new algorithm predicted to depend on STAT1. Experimentally, Fas indeed bound and sequestered STAT1, and Fas deficiency enhanced IL-6-induced STAT1 activation and nuclear translocation, whereas deficiency of STAT1 reversed the transcriptional changes induced by Fas deficiency. Thus, our computational and experimental approach identified Fas as a regulator of the Th17-to-Th1 cell balance by controlling the availability of opposing STAT1 and STAT3 to have a direct impact on autoimmunity.
Asunto(s)
Diferenciación Celular/inmunología , Factor de Transcripción STAT1/metabolismo , Células TH1/inmunología , Células TH1/metabolismo , Células Th17/inmunología , Células Th17/metabolismo , Receptor fas/metabolismo , Animales , Apoptosis/inmunología , Biomarcadores , Caspasas/metabolismo , Perfilación de la Expresión Génica , Técnicas de Inactivación de Genes , Activación de Linfocitos , Ratones , Fenotipo , Fosforilación , Unión Proteica , Transporte de Proteínas , Factor de Transcripción STAT3/metabolismo , Células Th17/citología , Transcriptoma , Receptor fas/genéticaRESUMEN
The avascular nature of cartilage makes it a unique tissue1-4, but whether and how the absence of nutrient supply regulates chondrogenesis remain unknown. Here we show that obstruction of vascular invasion during bone healing favours chondrogenic over osteogenic differentiation of skeletal progenitor cells. Unexpectedly, this process is driven by a decreased availability of extracellular lipids. When lipids are scarce, skeletal progenitors activate forkhead box O (FOXO) transcription factors, which bind to the Sox9 promoter and increase its expression. Besides initiating chondrogenesis, SOX9 acts as a regulator of cellular metabolism by suppressing oxidation of fatty acids, and thus adapts the cells to an avascular life. Our results define lipid scarcity as an important determinant of chondrogenic commitment, reveal a role for FOXO transcription factors during lipid starvation, and identify SOX9 as a critical metabolic mediator. These data highlight the importance of the nutritional microenvironment in the specification of skeletal cell fate.
Asunto(s)
Huesos/citología , Microambiente Celular , Condrogénesis , Metabolismo de los Lípidos , Factor de Transcripción SOX9/metabolismo , Células Madre/citología , Células Madre/metabolismo , Animales , Huesos/irrigación sanguínea , Condrocitos/citología , Condrocitos/metabolismo , Ácidos Grasos/metabolismo , Femenino , Privación de Alimentos , Factores de Transcripción Forkhead/metabolismo , Masculino , Ratones , Ratones Endogámicos C57BL , Osteogénesis , Oxidación-Reducción , Factor de Transcripción SOX9/genética , Transducción de Señal , Cicatrización de HeridasRESUMEN
Cichlid fishes are famous for large, diverse and replicated adaptive radiations in the Great Lakes of East Africa. To understand the molecular mechanisms underlying cichlid phenotypic diversity, we sequenced the genomes and transcriptomes of five lineages of African cichlids: the Nile tilapia (Oreochromis niloticus), an ancestral lineage with low diversity; and four members of the East African lineage: Neolamprologus brichardi/pulcher (older radiation, Lake Tanganyika), Metriaclima zebra (recent radiation, Lake Malawi), Pundamilia nyererei (very recent radiation, Lake Victoria), and Astatotilapia burtoni (riverine species around Lake Tanganyika). We found an excess of gene duplications in the East African lineage compared to tilapia and other teleosts, an abundance of non-coding element divergence, accelerated coding sequence evolution, expression divergence associated with transposable element insertions, and regulation by novel microRNAs. In addition, we analysed sequence data from sixty individuals representing six closely related species from Lake Victoria, and show genome-wide diversifying selection on coding and regulatory variants, some of which were recruited from ancient polymorphisms. We conclude that a number of molecular mechanisms shaped East African cichlid genomes, and that amassing of standing variation during periods of relaxed purifying selection may have been important in facilitating subsequent evolutionary diversification.
Asunto(s)
Cíclidos/clasificación , Cíclidos/genética , Evolución Molecular , Especiación Genética , Genoma/genética , África Oriental , Animales , Elementos Transponibles de ADN/genética , Duplicación de Gen/genética , Regulación de la Expresión Génica/genética , Genómica , Lagos , MicroARNs/genética , Filogenia , Polimorfismo Genético/genéticaRESUMEN
Exceptionally accurate genome reference sequences have proven to be of great value to microbial researchers. Thus, to date, about 1800 bacterial genome assemblies have been "finished" at great expense with the aid of manual laboratory and computational processes that typically iterate over a period of months or even years. By applying a new laboratory design and new assembly algorithm to 16 samples, we demonstrate that assemblies exceeding finished quality can be obtained from whole-genome shotgun data and automated computation. Cost and time requirements are thus dramatically reduced.
Asunto(s)
Bacterias/genética , Genoma Bacteriano , Biblioteca Genómica , Análisis de Secuencia de ADN/métodos , AlgoritmosRESUMEN
Massively parallel DNA sequencing technologies are revolutionizing genomics by making it possible to generate billions of relatively short (~100-base) sequence reads at very low cost. Whereas such data can be readily used for a wide range of biomedical applications, it has proven difficult to use them to generate high-quality de novo genome assemblies of large, repeat-rich vertebrate genomes. To date, the genome assemblies generated from such data have fallen far short of those obtained with the older (but much more expensive) capillary-based sequencing approach. Here, we report the development of an algorithm for genome assembly, ALLPATHS-LG, and its application to massively parallel DNA sequence data from the human and mouse genomes, generated on the Illumina platform. The resulting draft genome assemblies have good accuracy, short-range contiguity, long-range connectivity, and coverage of the genome. In particular, the base accuracy is high (≥99.95%) and the scaffold sizes (N50 size = 11.5 Mb for human and 7.2 Mb for mouse) approach those obtained with capillary-based sequencing. The combination of improved sequencing technology and improved computational methods should now make it possible to increase dramatically the de novo sequencing of large genomes. The ALLPATHS-LG program is available at http://www.broadinstitute.org/science/programs/genome-biology/crd.
Asunto(s)
Algoritmos , Genómica/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Animales , Genoma/genética , Humanos , Internet , Ratones , Reproducibilidad de los ResultadosRESUMEN
There are many applications in which quantitative information about DNA mixtures with different molecular lengths is important. Gene therapy vectors are much longer than can be sequenced individually via short-read NGS. However, vector preparations may contain smaller DNAs that behave differently during sequencing. We have used two library preparations each for Pacific Biosystems (PacBio) and Oxford Nanopore Technologies NGS to determine their suitability for quantitative assessment of varying sized DNAs. Equimolar length standards were generated from E. coli genomic DNA. Both PacBio library preparations provided a consistent length dependence though with a complex pattern. This method is sufficiently sensitive that differences in genomic copy number between DNA from E. coli grown in exponential and stationary phase conditions could be detected. The transposase-based Oxford Nanopore library preparation provided a predictable length dependence, but the random sequence starts caused the loss of original length information. The ligation-based approach retained length information but read frequency was more variable. Modeling of E. coli versus lambda read frequency via cubic spline smoothing showed that the shorter genome could be used as a suitable internal spike-in for DNAs in the 200 bp to 10 kb range, allowing meaningful QC to be carried out with AAV preparations.
Asunto(s)
Escherichia coli , Secuenciación de Nucleótidos de Alto Rendimiento , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Calibración , Análisis de Secuencia de ADN/métodos , ADNRESUMEN
Many different functions are regulated by circadian rhythms, including those orchestrated by discrete clock neurons within animal brains. To comprehensively characterize and assign cell identity to the 75 pairs of Drosophila circadian neurons, we optimized a single-cell RNA sequencing method and assayed clock neuron gene expression at different times of day. The data identify at least 17 clock neuron categories with striking spatial regulation of gene expression. Transcription factor regulation is prominent and likely contributes to the robust circadian oscillation of many transcripts, including those that encode cell-surface proteins previously shown to be important for cell recognition and synapse formation during development. The many other clock-regulated genes also constitute an important resource for future mechanistic and functional studies between clock neurons and/or for temporal signaling to circuits elsewhere in the fly brain.
Asunto(s)
Relojes Biológicos , Ritmo Circadiano , Drosophila melanogaster/fisiología , Regulación de la Expresión Génica , Neuronas/fisiología , Transcriptoma , Animales , Drosophila melanogaster/genética , Femenino , Masculino , Factores de TiempoRESUMEN
MOTIVATION: A typical PSI-BLAST search consists of iterative scanning and alignment of a large sequence database during which a scoring profile is progressively built and refined. Such a profile can also be stored and used to search against a different database of sequences. Using it to search against a database of consensus rather than native sequences is a simple add-on that boosts performance surprisingly well. The improvement comes at a price: we hypothesized that random alignment score statistics would differ between native and consensus sequences. Thus PSI-BLAST-based profile searches against consensus sequences might incorrectly estimate statistical significance of alignment scores. In addition, iterative searches against consensus databases may fail. Here, we addressed these challenges in an attempt to harness the full power of the combination of PSI-BLAST and consensus sequences. RESULTS: We studied alignment score statistics for various types of consensus sequences. In general, the score distribution parameters of profile-based consensus sequence alignments differed significantly from those derived for the native sequences. PSI-BLAST partially compensated for the parameter variation. We have identified a protocol for building specialized consensus sequences that significantly improved search sensitivity and preserved score distribution parameters. As a result, PSI-BLAST profiles can be used to search specialized consensus sequences without sacrificing estimates of statistical significance. We also provided results indicating that iterative PSI-BLAST searches against consensus sequences could work very well. Overall, we showed how a very popular and effective method could be used to identify significantly more relevant similarities among protein sequences. AVAILABILITY: http://www.rostlab.org/services/consensus/.
Asunto(s)
Secuencia de Consenso , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Algoritmos , Secuencia de Aminoácidos , Proteínas/químicaRESUMEN
Sequence alignments may be the most fundamental computational resource for molecular biology. The best methods that identify sequence relatedness through profile-profile comparisons are much slower and more complex than sequence-sequence and sequence-profile comparisons such as, respectively, BLAST and PSI-BLAST. Families of related genes and gene products (proteins) can be represented by consensus sequences that list the nucleic/amino acid most frequent at each sequence position in that family. Here, we propose a novel approach for consensus-sequence-based comparisons. This approach improved searches and alignments as a standard add-on to PSI-BLAST without any changes of code. Improvements were particularly significant for more difficult tasks such as the identification of distant structural relations between proteins and their corresponding alignments. Despite the fact that the improvements were higher for more divergent relations, they were consistent even at high accuracy/low error rates for non-trivially related proteins. The improvements were very easy to achieve; no parameter used by PSI-BLAST was altered and no single line of code changed. Furthermore, the consensus sequence add-on required relatively little additional CPU time. We discuss how advanced users of PSI-BLAST can immediately benefit from using consensus sequences on their local computers. We have also made the method available through the Internet (http://www.rostlab.org/services/consensus/).
Asunto(s)
Alineación de Secuencia/métodos , Secuencia de Aminoácidos , Sustitución de Aminoácidos , Secuencia de Bases , Secuencia de Consenso , Análisis de Secuencia de Proteína , Programas InformáticosRESUMEN
Head and neck squamous cell carcinomas (HNSCC) are an ideal immunotherapy target due to their high mutation burden and frequent infiltration with lymphocytes. Preclinical models to investigate targeted and combination therapies as well as defining biomarkers to guide treatment represent an important need in the field. Immunogenomics approaches have illuminated the role of mutation-derived tumor neoantigens as potential biomarkers of response to checkpoint blockade as well as representing therapeutic vaccines. Here, we aimed to define a platform for checkpoint and other immunotherapy studies using syngeneic HNSCC cell line models (MOC2 and MOC22), and evaluated the association between mutation burden, predicted neoantigen landscape, infiltrating T cell populations and responsiveness of tumors to anti-PD1 therapy. We defined dramatic hematopoietic cell transcriptomic alterations in the MOC22 anti-PD1 responsive model in both tumor and draining lymph nodes. Using a cancer immunogenomics pipeline and validation with ELISPOT and tetramer analysis, we identified the H-2Kb-restricted ICAM1P315L (mICAM1) as a neoantigen in MOC22. Finally, we demonstrated that mICAM1 vaccination was able to protect against MOC22 tumor development defining mICAM1 as a bona fide neoantigen. Together these data define a pre-clinical HNSCC model system that provides a foundation for future investigations into combination and novel therapeutics.
RESUMEN
Building an integrated view of cellular responses to environmental cues remains a fundamental challenge due to the complexity of intracellular networks in mammalian cells. Here, we introduce an integrative biochemical and genetic framework to dissect signal transduction events using multiple data types and, in particular, to unify signaling and transcriptional networks. Using the Toll-like receptor (TLR) system as a model cellular response, we generate multifaceted datasets on physical, enzymatic, and functional interactions and integrate these data to reveal biochemical paths that connect TLR4 signaling to transcription. We define the roles of proximal TLR4 kinases, identify and functionally test two dozen candidate regulators, and demonstrate a role for Ap1ar (encoding the Gadkin protein) and its binding partner, Picalm, potentially linking vesicle transport with pro-inflammatory responses. Our study thus demonstrates how deciphering dynamic cellular responses by integrating datasets on various regulatory layers defines key components and higher-order logic underlying signaling-to-transcription pathways.
Asunto(s)
Células Dendríticas/metabolismo , Receptores Toll-Like/metabolismo , Humanos , Fosforilación , Transducción de SeñalRESUMEN
Very few methods address the problem of predicting beta-barrel membrane proteins directly from sequence. One reason is that only very few high-resolution structures for transmembrane beta-barrel (TMB) proteins have been determined thus far. Here we introduced the design, statistics and results of a novel profile-based hidden Markov model for the prediction and discrimination of TMBs. The method carefully attempts to avoid over-fitting the sparse experimental data. While our model training and scoring procedures were very similar to a recently published work, the architecture and structure-based labelling were significantly different. In particular, we introduced a new definition of beta- hairpin motifs, explicit state modelling of transmembrane strands, and a log-odds whole-protein discrimination score. The resulting method reached an overall four-state (up-, down-strand, periplasmic-, outer-loop) accuracy as high as 86%. Furthermore, accurately discriminated TMB from non-TMB proteins (45% coverage at 100% accuracy). This high precision enabled the application to 72 entirely sequenced Gram-negative bacteria. We found over 164 previously uncharacterized TMB proteins at high confidence. Database searches did not implicate any of these proteins with membranes. We challenge that the vast majority of our 164 predictions will eventually be verified experimentally. All proteome predictions and the PROFtmb prediction method are available at http://www.rostlab.org/ services/PROFtmb/.
Asunto(s)
Proteínas de la Membrana/química , Proteoma/química , Proteómica/métodos , Análisis de Secuencia de Proteína/métodos , Cadenas de Markov , Proteínas de la Membrana/fisiología , Estructura Secundaria de Proteína , Reproducibilidad de los Resultados , Alineación de SecuenciaRESUMEN
EVA (http://cubic.bioc.columbia.edu/eva/) is a web server for evaluation of the accuracy of automated protein structure prediction methods. The evaluation is updated automatically each week, to cope with the large number of existing prediction servers and the constant changes in the prediction methods. EVA currently assesses servers for secondary structure prediction, contact prediction, comparative protein structure modelling and threading/fold recognition. Every day, sequences of newly available protein structures in the Protein Data Bank (PDB) are sent to the servers and their predictions are collected. The predictions are then compared to the experimental structures once a week; the results are published on the EVA web pages. Over time, EVA has accumulated prediction results for a large number of proteins, ranging from hundreds to thousands, depending on the prediction method. This large sample assures that methods are compared reliably. As a result, EVA provides useful information to developers as well as users of prediction methods.
Asunto(s)
Conformación Proteica , Análisis de Secuencia de Proteína , Automatización , Bases de Datos de Proteínas , Internet , Pliegue de Proteína , Estructura Secundaria de Proteína , Proteínas/química , Reproducibilidad de los Resultados , Homología Estructural de ProteínaRESUMEN
The most reliable way to align two proteins of unknown structure is through sequence-profile and profile-profile alignment methods. If the structure for one of the two is known, fold recognition methods outperform purely sequence-based alignments. Here, we introduced a novel method that aligns generalised sequence and predicted structure profiles. Using predicted 1D structure (secondary structure and solvent accessibility) significantly improved over sequence-only methods, both in terms of correctly recognising pairs of proteins with different sequences and similar structures and in terms of correctly aligning the pairs. The scores obtained by our generalised scoring matrix followed an extreme value distribution; this yielded accurate estimates of the statistical significance of our alignments. We found that mistakes in 1D structure predictions correlated between proteins from different sequence-structure families. The impact of this surprising result was that our method succeeded in significantly out-performing sequence-only methods even without explicitly using structural information from any of the two. Since AGAPE also outperformed established methods that rely on 3D information, we made it available through. If we solved the problem of CPU-time required to apply AGAPE on millions of proteins, our results could also impact everyday database searches.
Asunto(s)
Pliegue de Proteína , Proteínas/metabolismo , Análisis de Secuencia de Proteína , Bases de Datos de Proteínas , Modelos Moleculares , Alineación de SecuenciaRESUMEN
Protein expression is regulated by the production and degradation of messenger RNAs (mRNAs) and proteins, but their specific relationships remain unknown. We combine measurements of protein production and degradation and mRNA dynamics so as to build a quantitative genomic model of the differential regulation of gene expression in lipopolysaccharide-stimulated mouse dendritic cells. Changes in mRNA abundance play a dominant role in determining most dynamic fold changes in protein levels. Conversely, the preexisting proteome of proteins performing basic cellular functions is remodeled primarily through changes in protein production or degradation, accounting for more than half of the absolute change in protein molecules in the cell. Thus, the proteome is regulated by transcriptional induction for newly activated cellular functions and by protein life-cycle changes for remodeling of preexisting functions.
Asunto(s)
Células de la Médula Ósea/inmunología , Células Dendríticas/inmunología , Interacciones Huésped-Patógeno/inmunología , Simulación de Dinámica Molecular , Biosíntesis de Proteínas , Proteolisis , Aminoácidos/química , Aminoácidos/metabolismo , Animales , Técnicas de Cultivo de Célula , Marcaje Isotópico/métodos , Lipopolisacáridos/inmunología , Ratones , Proteínas Mitocondriales/metabolismo , ARN Mensajero/biosíntesis , ARN Mensajero/genética , Análisis de Secuencia de ARNRESUMEN
We have analysed fold recognition, secondary structure and contact prediction servers from CAFASP3. This assessment was carried out in the framework of the fully automated, web-based evaluation server EVA. Detailed results are available at http://cubic.bioc.columbia.edu/eva/cafasp3/. We observed that the sequence-unique targets from CAFASP3/CASP5 were not fully representative for evaluating performance. For all three categories, we showed how careless ranking might be misleading. We compared methods from all categories to experts in secondary structure and contact prediction and homology modellers to fold recognisers. While the secondary structure experts clearly outperformed all others, the contact experts appeared to outperform only novel fold methods. Automatic evaluation servers are good at getting statistics right and at using these to discard misleading ranking schemes. We challenge that to let machines rule where they are best might be the best way for the community to enjoy the tremendous benefit of CASP as a unique opportunity for brainstorming.
Asunto(s)
Biología Computacional/métodos , Proteínas/química , Algoritmos , Pliegue de Proteína , Estructura Secundaria de Proteína , Sensibilidad y EspecificidadRESUMEN
Aging is accompanied by physiological impairments, which, in insulin-responsive tissues, including the liver, predispose individuals to metabolic disease. However, the molecular mechanisms underlying these changes remain largely unknown. Here, we analyze genome-wide profiles of RNA and chromatin organization in the liver of young (3 months) and old (21 months) mice. Transcriptional changes suggest that derepression of the nuclear receptors PPARα, PPARγ, and LXRα in aged mouse liver leads to activation of targets regulating lipid synthesis and storage, whereas age-dependent changes in nucleosome occupancy are associated with binding sites for both known regulators (forkhead factors and nuclear receptors) and candidates associated with nuclear lamina (Hdac3 and Srf) implicated to govern metabolic function of aging liver. Winged-helix transcription factor Foxa2 and nuclear receptor corepressor Hdac3 exhibit a reciprocal binding pattern at PPARα targets contributing to gene expression changes that lead to steatosis in aged liver.
Asunto(s)
Envejecimiento/metabolismo , Hígado/crecimiento & desarrollo , Hígado/metabolismo , Mamíferos/metabolismo , Nucleosomas/metabolismo , Animales , Secuencia de Bases , Proteínas de Unión al ADN/metabolismo , Hígado Graso/patología , Regulación del Desarrollo de la Expresión Génica , Factor Nuclear 3-beta del Hepatocito/metabolismo , Histona Desacetilasas/metabolismo , Inflamación/patología , Hígado/patología , Masculino , Ratones Endogámicos C57BL , Modelos Biológicos , Datos de Secuencia Molecular , Lámina Nuclear/metabolismo , PPAR alfa/metabolismo , Unión Proteica , Factores de Transcripción/metabolismoRESUMEN
We demonstrate that genome sequences approaching finished quality can be generated from short paired reads. Using 36 base (fragment) and 26 base (jumping) reads from five microbial genomes of varied GC composition and sizes up to 40 Mb, ALLPATHS2 generated assemblies with long, accurate contigs and scaffolds. Velvet and EULER-SR were less accurate. For example, for Escherichia coli, the fraction of 10-kb stretches that were perfect was 99.8% (ALLPATHS2), 68.7% (Velvet), and 42.1% (EULER-SR).