RESUMEN
BACKGROUND: Second generation sequencing has permitted detailed sequence characterisation at the whole genome level of a growing number of non-model organisms, but the data produced have short read-lengths and biased genome coverage leading to fragmented genome assemblies. The PacBio RS long-read sequencing platform offers the promise of increased read length and unbiased genome coverage and thus the potential to produce genome sequence data of a finished quality containing fewer gaps and longer contigs. However, these advantages come at a much greater cost per nucleotide and with a perceived increase in error-rate. In this investigation, we evaluated the performance of the PacBio RS sequencing platform through the sequencing and de novo assembly of the Potentilla micrantha chloroplast genome. RESULTS: Following error-correction, a total of 28,638 PacBio RS reads were recovered with a mean read length of 1,902 bp totalling 54,492,250 nucleotides and representing an average depth of coverage of 320× the chloroplast genome. The dataset covered the entire 154,959 bp of the chloroplast genome in a single contig (100% coverage) compared to seven contigs (90.59% coverage) recovered from an Illumina data, and revealed no bias in coverage of GC rich regions. Post-assembly the data were largely concordant with the Illumina data generated and allowed 187 ambiguities in the Illumina data to be resolved. The additional read length also permitted small differences in the two inverted repeat regions to be assigned unambiguously. CONCLUSIONS: This is the first report to our knowledge of a chloroplast genome assembled de novo using PacBio sequence data. The PacBio RS data generated here were assembled into a single large contig spanning the P. micrantha chloroplast genome, with a higher degree of accuracy than an Illumina dataset generated at a much greater depth of coverage, due to longer read lengths and lower GC bias in the data. The results we present suggest PacBio data will be of immense utility for the development of genome sequence assemblies containing fewer unresolved gaps and ambiguities and a significantly smaller number of contigs than could be produced using short-read sequence data alone.
Asunto(s)
Genoma del Cloroplasto/genética , Potentilla/genética , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Composición de Base/genética , Secuencia de Bases , Bases de Datos GenéticasRESUMEN
BACKGROUND: Rapid development of highly saturated genetic maps aids molecular breeding, which can accelerate gain per breeding cycle in woody perennial plants such as Rubus idaeus (red raspberry). Recently, robust genotyping methods based on high-throughput sequencing were developed, which provide high marker density, but result in some genotype errors and a large number of missing genotype values. Imputation can reduce the number of missing values and can correct genotyping errors, but current methods of imputation require a reference genome and thus are not an option for most species. RESULTS: Genotyping by Sequencing (GBS) was used to produce highly saturated maps for a R. idaeus pseudo-testcross progeny. While low coverage and high variance in sequencing resulted in a large number of missing values for some individuals, a novel method of imputation based on maximum likelihood marker ordering from initial marker segregation overcame the challenge of missing values, and made map construction computationally tractable. The two resulting parental maps contained 4521 and 2391 molecular markers spanning 462.7 and 376.6 cM respectively over seven linkage groups. Detection of precise genomic regions with segregation distortion was possible because of map saturation. Microsatellites (SSRs) linked these results to published maps for cross-validation and map comparison. CONCLUSIONS: GBS together with genome-independent imputation provides a rapid method for genetic map construction in any pseudo-testcross progeny. Our method of imputation estimates the correct genotype call of missing values and corrects genotyping errors that lead to inflated map size and reduced precision in marker placement. Comparison of SSRs to published R. idaeus maps showed that the linkage maps constructed with GBS and our method of imputation were robust, and marker positioning reliable. The high marker density allowed identification of genomic regions with segregation distortion in R. idaeus, which may help to identify deleterious alleles that are the basis of inbreeding depression in the species.
Asunto(s)
Mapeo Cromosómico/métodos , Genoma de Planta/genética , Técnicas de Genotipaje/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Rosaceae/genética , Marcadores Genéticos/genética , Repeticiones de Microsatélite/genética , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
Even with recent reductions in sequencing costs, most plants lack the genomic resources required for successful short-read transcriptome analyses as performed routinely in model species. Several approaches for the analysis of short-read transcriptome data are reviewed for nonmodel species for which the genome of a close relative is used as the reference genome. Two approaches using a data set from Phytophthora-challenged Rubus idaeus (red raspberry) are compared. Over 70000000 86-nt Illumina reads derived from R. idaeus roots were aligned to the Fragaria vesca genome using publicly available informatics tools (Bowtie/TopHat and Cufflinks). Alignment identified 16956 putatively expressed genes. De novo assembly was performed with the same data set and a publicly available transcriptome assembler (Trinity). A BLAST search with a maximum e-value threshold of 1.0 × 10(-3) revealed that over 36000 transcripts had matches to plants and over 500 to Phytophthora. Gene expression estimates from alignment to F. vesca and de novo assembly were compared for raspberry (Pearson's correlation = 0.730). Together, alignment to the genome of a close relative and de novo assembly constitute a powerful method of transcriptome analysis in nonmodel organisms. Alignment to the genome of a close relative provides a framework for differential expression testing if alignments are made to the predefined gene-space of a close relative and de novo assembly provides a more robust method of identifying unique sequences and sequences from other organisms in a system. These methods are considered experimental in nonmodel systems, but can be used to generate resources and specific testable hypotheses.
Asunto(s)
Perfilación de la Expresión Génica/métodos , Genoma de Planta , Rosaceae/genética , Programas Informáticos , Bases de Datos Genéticas , Resistencia a la Enfermedad/genética , Etiquetas de Secuencia Expresada , Regulación de la Expresión Génica de las Plantas , Phytophthora/genética , Phytophthora/inmunología , Phytophthora/patogenicidad , Enfermedades de las Plantas/genética , Enfermedades de las Plantas/inmunología , Proteínas de Plantas/genética , Raíces de Plantas/genética , Raíces de Plantas/inmunología , Raíces de Plantas/microbiología , ARN de Planta/genética , Rosaceae/inmunología , Rosaceae/microbiología , Alineación de Secuencia/métodosRESUMEN
Background: The genus Potentilla is closely related to that of Fragaria, the economically important strawberry genus. Potentilla micrantha is a species that does not develop berries but shares numerous morphological and ecological characteristics with Fragaria vesca. These similarities make P. micrantha an attractive choice for comparative genomics studies with F. vesca. Findings: In this study, the P. micrantha genome was sequenced and annotated, and RNA-Seq data from the different developmental stages of flowering and fruiting were used to develop a set of gene predictions. A 327 Mbp sequence and annotation of the genome of P. micrantha, spanning 2674 sequence contigs, with an N50 size of 335,712, estimated to cover 80% of the total genome size of the species was developed. The genus Potentilla has a characteristically larger genome size than Fragaria, but the recovered sequence scaffolds were remarkably collinear at the micro-syntenic level with the genome of F. vesca, its closest sequenced relative. A total of 33,602 genes were predicted, and 95.1% of bench-marking universal single-copy orthologous genes were complete within the presented sequence. Thus, we argue that the majority of the gene-rich regions of the genome have been sequenced. Conclusions: Comparisons of RNA-Seq data from the stages of floral and fruit development revealed genes differentially expressed between P. micrantha and F. vesca.The data presented are a valuable resource for future studies of berry development in Fragaria and the Rosaceae and they also shed light on the evolution of genome size and organization in this family.
Asunto(s)
Flores/genética , Fragaria/genética , Frutas/genética , Genoma de Planta , Potentilla/genética , Flores/crecimiento & desarrollo , Fragaria/crecimiento & desarrollo , Frutas/crecimiento & desarrollo , Regulación de la Expresión Génica de las Plantas , Filogenia , Potentilla/crecimiento & desarrollo , Análisis de Secuencia de ARN , Transcriptoma , Secuenciación Completa del GenomaRESUMEN
Makino is recognized as an ancestor of the octoploid strawberry species, which includes the cultivated strawberry, × Duchesne ex Rozier. Here we report the construction of the first high-density linkage map for . The linkage map (Fii map) is based on two high-throughput techniques of single nucleotide polymorphism (SNP) genotyping: the IStraw90 Array (hereafter "Array"), and genotyping by sequencing (GBS). The F generation mapping population was derived by selfing hybrid F1D, the product of a cross between two divergent accessions collected from Hokkaido, Japan. The Fii map consists of seven linkage groups (LGs) and has an overall length of 451.7 cM as defined by 496 loci populated by 4173 markers: 3280 from the Array and 893 from GBS. Comparisons with two versions of the ssp. L. 'Hawaii 4' pseudo-chromosome (PC) assemblies reveal substantial conservation of synteny and colinearity, yet identified differences that point to possible genomic divergences between and , and/or to genomic assembly errors. The Fii map provides a basis for anchoring a genome assembly as a prerequisite for constructing a second diploid reference genome for .
Asunto(s)
Fragaria/genética , Ligamiento Genético , Marcadores Genéticos/genética , Genoma de Planta/genética , Polimorfismo de Nucleótido Simple , Mapeo Cromosómico , Diploidia , Genotipo , Técnicas de Genotipaje , Hawaii , Secuenciación de Nucleótidos de Alto Rendimiento , JapónRESUMEN
The Rosoideae is a subfamily of the Rosaceae that contains a number of species of economic importance, including the soft fruit species strawberry (Fragaria ×ananassa), red (Rubus idaeus) and black (Rubus occidentalis) raspberries, blackberries (Rubus spp.) and one of the most economically important cut flower genera, the roses (Rosa spp.). Molecular genetics and genomics resources for the Rosoideae have developed rapidly over the past two decades, beginning with the development and application of a number of molecular marker types including restriction fragment length polymorphisms, amplified fragment length polymorphisms and microsatellites, and culminating in the recent publication of the genome sequence of the woodland strawberry, Fragaria vesca, and the development of high throughput single nucleotide polymorphism (SNP)-genotyping resources for Fragaria, Rosa and Rubus. These tools have been used to identify genes and other functional elements that control traits of economic importance, to study the evolution of plant genome structure within the subfamily, and are beginning to facilitate genomic-assisted breeding through the development and deployment of markers linked to traits such as aspects of fruit quality, disease resistance and the timing of flowering. In this review, we report on the developments that have been made over the last 20â years in the field of molecular genetics and structural genomics within the Rosoideae, comment on how the knowledge gained will improve the efficiency of cultivar development and discuss how these advances will enhance our understanding of the biological processes determining agronomically important traits in all Rosoideae species.