RESUMO
A correction to this article has been published and is linked from the HTML and PDF versions of this paper. The error has been fixed in the paper.
RESUMO
Large repeat rich genomes present challenges for assembly using short read technologies. The 32 Gb axolotl genome is estimated to contain ~19 Gb of repetitive DNA making an assembly from short reads alone effectively impossible. Indeed, this model species has been sequenced to 20× coverage but the reads could not be conventionally assembled. Using an alternative strategy, we have assembled subsets of these reads into scaffolds describing over 19,000 gene models. We call this method Virtual Genome Walking as it locally assembles whole genome reads based on a reference transcriptome, identifying exons and iteratively extending them into surrounding genomic sequence. These assemblies are then linked and refined to generate gene models including upstream and downstream genomic, and intronic, sequence. Our assemblies are validated by comparison with previously published axolotl bacterial artificial chromosome (BAC) sequences. Our analyses of axolotl intron length, intron-exon structure, repeat content and synteny provide novel insights into the genic structure of this model species. This resource will enable new experimental approaches in axolotl, such as ChIP-Seq and CRISPR and aid in future whole genome sequencing efforts. The assembled sequences and annotations presented here are freely available for download from https://tinyurl.com/y8gydc6n . The software pipeline is available from https://github.com/LooseLab/iterassemble .
Assuntos
Ambystoma mexicanum/genética , Passeio de Cromossomo/métodos , Biologia Computacional/métodos , Animais , Perfilação da Expressão Gênica/métodos , Íntrons , Anotação de Sequência Molecular , SoftwareRESUMO
BACKGROUND: Identifying protein-coding genes from species without a reference genome sequence can be complicated by the presence of sequencing errors, particularly insertions and deletions. A number of tools capable of correcting erroneous frame-shifts within assembled transcripts are available but often do not report back DNA sequences required for subsequent phylogenetic analysis. Amongst those that do, the Genewise algorithm is the most effective. However, it requires a homology wrapper to be used in this way, and here we demonstrate it perfectly corrects frame-shifts only 60% of the time. RESULTS: We therefore created AlignWise, a tool that combines Genewise with our own homology-based method, AlignFS, to identify protein-coding regions and correct erroneous frame-shifts, suitable for subsequent phylogenetic analysis. We compared AlignWise against other open reading frame finding software and demonstrate that the AlignFS algorithm is more accurate than Genewise at correcting frame-shifts within an order. We show that AlignWise provides the greatest accuracy at higher evolutionary distances, out-performing both AlignFS and Genewise individually. CONCLUSIONS: AlignWise produces a single ORF per transcript and identifies and corrects frame-shifts with high accuracy. It is therefore well suited for analysing novel transcriptome assemblies and EST sequences in the absence of a reference genome.
Assuntos
Algoritmos , Mutação da Fase de Leitura/genética , Genoma Humano , Fases de Leitura Aberta/genética , Filogenia , Software , Sequência de Bases , Perfilação da Expressão Gênica , Humanos , Dados de Sequência Molecular , Alinhamento de Sequência , Homologia de Sequência do Ácido NucleicoRESUMO
Primordial germ cell (PGC) specification occurs either by induction from pluripotent cells (epigenesis) or by a cell-autonomous mechanism mediated by germ plasm (preformation). Among vertebrates, epigenesis is basal, whereas germ plasm has evolved convergently across lineages and is associated with greater speciation. We compared protein-coding sequences of vertebrate species that employ preformation with their sister taxa that use epigenesis and demonstrate that genes evolve more rapidly in species containing germ plasm. Furthermore, differences in rates of evolution appear to cause phylogenetic incongruence in protein-coding sequence comparisons between vertebrate taxa. Our results support the hypothesis that germ plasm liberates constraints on somatic development and that enhanced evolvability drives the evolution of germ plasm.