Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
1.
Genome Res ; 29(5): 798-808, 2019 05.
Artículo en Inglés | MEDLINE | ID: mdl-30940689

RESUMEN

Here, we describe single-tube long fragment read (stLFR), a technology that enables sequencing of data from long DNA molecules using economical second-generation sequencing technology. It is based on adding the same barcode sequence to subfragments of the original long DNA molecule (DNA cobarcoding). To achieve this efficiently, stLFR uses the surface of microbeads to create millions of miniaturized barcoding reactions in a single tube. Using a combinatorial process, up to 3.6 billion unique barcode sequences were generated on beads, enabling practically nonredundant cobarcoding with 50 million barcodes per sample. Using stLFR, we demonstrate efficient unique cobarcoding of more than 8 million 20- to 300-kb genomic DNA fragments. Analysis of the human genome NA12878 with stLFR demonstrated high-quality variant calling and phase block lengths up to N50 34 Mb. We also demonstrate detection of complex structural variants and complete diploid de novo assembly of NA12878. These analyses were all performed using single stLFR libraries, and their construction did not significantly add to the time or cost of whole-genome sequencing (WGS) library preparation. stLFR represents an easily automatable solution that enables high-quality sequencing, phasing, SV detection, scaffolding, cost-effective diploid de novo genome assembly, and other long DNA sequencing applications.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Secuenciación Completa del Genoma/métodos , Análisis Costo-Beneficio , Diploidia , Biblioteca de Genes , Genoma Humano , Genómica , Haplotipos/genética , Secuenciación de Nucleótidos de Alto Rendimiento/economía , Humanos , Secuenciación Completa del Genoma/economía
2.
Bioinformatics ; 30(12): 1660-6, 2014 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-24532719

RESUMEN

MOTIVATION: Transcriptome sequencing has long been the favored method for quickly and inexpensively obtaining a large number of gene sequences from an organism with no reference genome. Owing to the rapid increase in throughputs and decrease in costs of next- generation sequencing, RNA-Seq in particular has become the method of choice. However, the very short reads (e.g. 2 × 90 bp paired ends) from next generation sequencing makes de novo assembly to recover complete or full-length transcript sequences an algorithmic challenge. RESULTS: Here, we present SOAPdenovo-Trans, a de novo transcriptome assembler designed specifically for RNA-Seq. We evaluated its performance on transcriptome datasets from rice and mouse. Using as our benchmarks the known transcripts from these well-annotated genomes (sequenced a decade ago), we assessed how SOAPdenovo-Trans and two other popular transcriptome assemblers handled such practical issues as alternative splicing and variable expression levels. Our conclusion is that SOAPdenovo-Trans provides higher contiguity, lower redundancy and faster execution. AVAILABILITY AND IMPLEMENTATION: Source code and user manual are available at http://sourceforge.net/projects/soapdenovotrans/.


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ARN/métodos , Empalme Alternativo , Animales , Genómica/métodos , Ratones , Oryza/genética
3.
Appl Microbiol Biotechnol ; 99(6): 2763-72, 2015 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-25687447

RESUMEN

A marine-derived actinobacteria Streptomyces sp. M10 was identified as a prolific antifungal compounds producer and shared a 99.02 % 16S ribosomal RNA (rRNA) sequence similarity with that of Streptomyces marokkonensis Ap1(T), which can produce polyene macrolides. To further evaluate its biosynthetic potential, the 7,207,169 bp Streptomyces sp. M10 linear chromosome was sequenced and mined for identifiable secondary metabolite-associated gene clusters. A total of 20 secondary metabolite-associated gene clusters were deduced, including three polyketide synthases (PKSs), four non-ribosomal peptide synthetases (NRPSs), four hybrid NRPS-PKSs, three NRPS-independent siderophores, and two lantibiotic and four terpene biosynthetic gene clusters. One of the type I PKS gene cluster, pks1, shared a 85 % nucleotide similarity with candicidin/FR008 gene cluster, indicating the capacity of this organism to produce polyene macrolides. This assumption was verified by a scale-up culturing of Streptomyces sp. M10 on A1 agar plates, which lead to the isolation of two polyene families PF1 and PF2, with characteristic UV adsorption at 269, 278, and 290 nm (PF1) and 363, 386, and 408 nm (PF2), respectively. Compound 9-04 was further purified from PF1, and its chemical structure was partially elucidated to be a typical polyene macrolide by NMR and UV spectrum. This study affirmatively identified Streptomyces sp. M10 as a source of polyene metabolites and highlighted genome mining of interested organism as a powerful tool for natural product discovery.

4.
Mol Ecol Resour ; 19(4): 944-956, 2019 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-30735609

RESUMEN

Marine mammals are important models for studying convergent evolution and aquatic adaption, and thus reference genomes of marine mammals can provide evolutionary insights. Here, we present the first chromosome-level marine mammal genome assembly based on the data generated by the BGISEQ-500 platform, for a stranded female sperm whale (Physeter macrocephalus). Using this reference genome, we performed chromosome evolution analysis of the sperm whale, including constructing ancestral chromosomes, identifying chromosome rearrangement events and comparing with cattle chromosomes, which provides a resource for exploring marine mammal adaptation and speciation. We detected a high proportion of long interspersed nuclear elements and expanded gene families, and contraction of major histocompatibility complex region genes which were specific to sperm whale. Using comparisons with sheep and cattle, we analysed positively selected genes to identify gene pathways that may be related to adaptation to the marine environment. Further, we identified possible convergent evolution in aquatic mammals by testing for positively selected genes across three orders of marine mammals. In addition, we used publicly available resequencing data to confirm a rapid decline in global population size in the Pliocene to Pleistocene transition. This study sheds light on the chromosome evolution and genetic mechanisms underpinning sperm whale adaptations, providing valuable resources for future comparative genomics.


Asunto(s)
Organismos Acuáticos/genética , Ecosistema , Evolución Molecular , Genoma , Cachalote/genética , Adaptación Biológica , Animales , Bovinos , Femenino , Ovinos
6.
Gigascience ; 1(1): 18, 2012 Dec 27.
Artículo en Inglés | MEDLINE | ID: mdl-23587118

RESUMEN

BACKGROUND: There is a rapidly increasing amount of de novo genome assembly using next-generation sequencing (NGS) short reads; however, several big challenges remain to be overcome in order for this to be efficient and accurate. SOAPdenovo has been successfully applied to assemble many published genomes, but it still needs improvement in continuity, accuracy and coverage, especially in repeat regions. FINDINGS: To overcome these challenges, we have developed its successor, SOAPdenovo2, which has the advantage of a new algorithm design that reduces memory consumption in graph construction, resolves more repeat regions in contig assembly, increases coverage and length in scaffold construction, improves gap closing, and optimizes for large genome. CONCLUSIONS: Benchmark using the Assemblathon1 and GAGE datasets showed that SOAPdenovo2 greatly surpasses its predecessor SOAPdenovo and is competitive to other assemblers on both assembly length and accuracy. We also provide an updated assembly version of the 2008 Asian (YH) genome using SOAPdenovo2. Here, the contig and scaffold N50 of the YH genome were ~20.9 kbp and ~22 Mbp, respectively, which is 3-fold and 50-fold longer than the first published version. The genome coverage increased from 81.16% to 93.91%, and memory consumption was ~2/3 lower during the point of largest memory consumption.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA