A divide-and-conquer algorithm for large-scale de novo transcriptome assembly through combining small assemblies from existing algorithms.

Sze, Sing-Hoi; Parrott, Jonathan J; Tarone, Aaron M

Sze, Sing-Hoi; Parrott, Jonathan J; Tarone, Aaron M.

Afiliación

Sze SH; Department of Computer Science and Engineering, Texas A&M University, College Station, Mexico, 77843, TX, USA. shsze@cse.tamu.edu.
Parrott JJ; Department of Biochemistry & Biophysics, Texas A&M University, College Station, Mexico, 77843, TX, USA. shsze@cse.tamu.edu.
Tarone AM; Department of Entomology, Texas A&M University, College Station, Mexico, 77843, TX, USA.

BMC Genomics ; 18(Suppl 10): 895, 2017 Dec 06.

Article en En | MEDLINE | ID: mdl-29244008

ABSTRACT

ABSTRACT

BACKGROUND:

While the continued development of high-throughput sequencing has facilitated studies of entire transcriptomes in non-model organisms, the incorporation of an increasing amount of RNA-Seq libraries has made de novo transcriptome assembly difficult. Although algorithms that can assemble a large amount of RNA-Seq data are available, they are generally very memory-intensive and can only be used to construct small assemblies.

RESULTS:

We develop a divide-and-conquer strategy that allows these algorithms to be utilized, by subdividing a large RNA-Seq data set into small libraries. Each individual library is assembled independently by an existing algorithm, and a merging algorithm is developed to combine these assemblies by picking a subset of high quality transcripts to form a large transcriptome. When compared to existing algorithms that return a single assembly directly, this strategy achieves comparable or increased accuracy as memory-efficient algorithms that can be used to process a large amount of RNA-Seq data, and comparable or decreased accuracy as memory-intensive algorithms that can only be used to construct small assemblies.

CONCLUSIONS:

Our divide-and-conquer strategy allows memory-intensive de novo transcriptome assembly algorithms to be utilized to construct large assemblies.

Asunto(s)

Algoritmos; Perfilación de la Expresión Génica/métodos; Animales; Arabidopsis/genética; Drosophila melanogaster/genética; Schizosaccharomyces/genética; Análisis de Secuencia de ARN

Palabras clave

Divide-and-conquer; RNA-Seq; de novo transcriptome assembly

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Colección: 01-internacional Banco de datos: MEDLINE Asunto principal: Algoritmos / Perfilación de la Expresión Génica Tipo de estudio: Prognostic_studies Límite: Animals Idioma: En Revista: BMC Genomics Asunto de la revista: GENETICA Año: 2017 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google