Discovery and assembly of repeat family pseudomolecules from sparse genomic sequence data using the Assisted Automated Assembler of Repeat Families (AAARF) algorithm.

DeBarry, Jeremy D; Liu, Renyi; Bennetzen, Jeffrey L

DeBarry, Jeremy D; Liu, Renyi; Bennetzen, Jeffrey L.

Afiliación

DeBarry JD; Department of Genetics, University of Georgia, Athens, GA 30602-7223, USA. jdebarry@uga.edu

BMC Bioinformatics ; 9: 235, 2008 May 13.

Article en En | MEDLINE | ID: mdl-18474116

ABSTRACT

ABSTRACT

BACKGROUND:

Higher eukaryotic genomes are typically large, complex and filled with both genes and multiple classes of repetitive DNA. The repetitive DNAs, primarily transposable elements, are a rapidly evolving genome component that can provide the raw material for novel selected functions and also indicate the mechanisms and history of genome evolution in any ancestral lineage. Despite their abundance, universality and significance, studies of genomic repeat content have been largely limited to analyses of the repeats in fully sequenced genomes.

RESULTS:

In order to facilitate a broader range of repeat analyses, the Assisted Automated Assembler of Repeat Families algorithm has been developed. This program, written in PERL and with numerous adjustable parameters, identifies sequence overlaps in small shotgun sequence datasets and walks them out to create long pseudomolecules representing the most abundant repeats in any genome. Testing of this program in maize indicated that it found and assembled all of the major repeats in one or more pseudomolecules, including coverage of the major Long Terminal Repeat retrotransposon families. Both Sanger sequence and 454 datasets were appropriate.

CONCLUSION:

These results now indicate that hundreds of higher eukaryotic genomes can be efficiently characterized for the nature, abundance and evolution of their major repetitive DNA components.

Asunto(s)

ADN de Plantas/análisis; Genoma; Secuencias Repetitivas de Ácidos Nucleicos; Programas Informáticos; Zea mays/genética; Algoritmos; Bases de Datos Genéticas; Dosificación de Gen; Familia de Multigenes; Alineación de Secuencia; Análisis de Secuencia de ADN/métodos

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google

Texto completo: 1 Base de datos: MEDLINE Asunto principal: Programas Informáticos / Secuencias Repetitivas de Ácidos Nucleicos / Genoma / ADN de Plantas / Zea mays Tipo de estudio: Prognostic_studies Idioma: En Revista: BMC Bioinformatics Asunto de la revista: INFORMATICA MEDICA Año: 2008 Tipo del documento: Article País de afiliación: Estados Unidos

Texto completo

Imprimir

XML

PubMed Links

Buscar en Google