Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 14 de 14
1.
Front Plant Sci ; 10: 323, 2019.
Article En | MEDLINE | ID: mdl-30930928

Whole genome profiling (WGP) is a sequence-based physical mapping technology and uses sequence tags generated by next generation sequencing for construction of bacterial artificial chromosome (BAC) contigs of complex genomes. The physical map provides a framework for assembly of genome sequence and information for localization of genes that are difficult to find through positional cloning. To address the challenges of accurate assembly of the pea genome (∼4.2 GB of which approximately 85% is repetitive sequences), we have adopted the WGP technology for assembly of a pea BAC library. Multi-dimensional pooling of 295,680 BAC clones and sequencing the ends of restriction fragments of pooled DNA generated 1,814 million high quality reads, of which 825 million were deconvolutable to 1.11 million unique WGP sequence tags. These WGP tags were used to assemble 220,013 BACs into contigs. Assembly of the BAC clones using the modified Fingerprinted Contigs (FPC) program has resulted in 13,040 contigs, consisting of 213,719 BACs, and 6,294 singleton BACs. The average contig size is 0.33 Mbp and the N50 contig size is 0.62 Mbp. WGPTM technology has proved to provide a robust physical map of the pea genome, which would have been difficult to assemble using traditional restriction digestion based methods. This sequence-based physical map will be useful to assemble the genome sequence of pea. Additionally, the 1.1 million WGP tags will support efficient assignment of sequence scaffolds to the BAC clones, and thus an efficient sequencing of BAC pools with targeted genome regions of interest.

3.
Plant Methods ; 12: 44, 2016.
Article En | MEDLINE | ID: mdl-27843484

BACKGROUND: Plant phenotypic data shrouds a wealth of information which, when accurately analysed and linked to other data types, brings to light the knowledge about the mechanisms of life. As phenotyping is a field of research comprising manifold, diverse and time-consuming experiments, the findings can be fostered by reusing and combining existing datasets. Their correct interpretation, and thus replicability, comparability and interoperability, is possible provided that the collected observations are equipped with an adequate set of metadata. So far there have been no common standards governing phenotypic data description, which hampered data exchange and reuse. RESULTS: In this paper we propose the guidelines for proper handling of the information about plant phenotyping experiments, in terms of both the recommended content of the description and its formatting. We provide a document called "Minimum Information About a Plant Phenotyping Experiment", which specifies what information about each experiment should be given, and a Phenotyping Configuration for the ISA-Tab format, which allows to practically organise this information within a dataset. We provide examples of ISA-Tab-formatted phenotypic data, and a general description of a few systems where the recommendations have been implemented. CONCLUSIONS: Acceptance of the rules described in this paper by the plant phenotyping community will help to achieve findable, accessible, interoperable and reusable data.

4.
BMC Bioinformatics ; 17: 115, 2016 Mar 03.
Article En | MEDLINE | ID: mdl-26936254

BACKGROUND: Scaffolding is an essential step in the genome assembly process. Current methods based on large fragment paired-end reads or long reads allow an increase in contiguity but often lack consistency in repetitive regions, resulting in fragmented assemblies. Here, we describe a novel tool to link assemblies to a genome map to aid complex genome reconstruction by detecting assembly errors and allowing scaffold ordering and anchoring. RESULTS: We present MaGuS (map-guided scaffolding), a modular tool that uses a draft genome assembly, a Whole Genome Profiling™ (WGP) map, and high-throughput paired-end sequencing data to estimate the quality and to enhance the contiguity of an assembly. We generated several assemblies of the Arabidopsis genome using different scaffolding programs and applied MaGuS to select the best assembly using quality metrics. Then, we used MaGuS to perform map-guided scaffolding to increase contiguity by creating new scaffold links in low-covered and highly repetitive regions where other commonly used scaffolding methods lack consistency. CONCLUSIONS: MaGuS is a powerful reference-free evaluator of assembly quality and a WGP map-guided scaffolder that is freely available at https://github.com/institut-de-genomique/MaGuS. Its use can be extended to other high-throughput sequencing data (e.g., long-read data) and also to other map data (e.g., genetic maps) to improve the quality and the contiguity of large and complex genome assemblies.


Arabidopsis/genetics , Chromosomes, Plant/genetics , Genome, Plant , High-Throughput Nucleotide Sequencing/methods , Physical Chromosome Mapping , Sequence Analysis, DNA/methods , Chromosomes, Artificial, Bacterial , Contig Mapping , Repetitive Sequences, Nucleic Acid , Sequence Alignment
5.
J Exp Bot ; 66(18): 5417-27, 2015 Sep.
Article En | MEDLINE | ID: mdl-26044092

Recent methodological developments in plant phenotyping, as well as the growing importance of its applications in plant science and breeding, are resulting in a fast accumulation of multidimensional data. There is great potential for expediting both discovery and application if these data are made publicly available for analysis. However, collection and storage of phenotypic observations is not yet sufficiently governed by standards that would ensure interoperability among data providers and precisely link specific phenotypes and associated genomic sequence information. This lack of standards is mainly a result of a large variability of phenotyping protocols, the multitude of phenotypic traits that are measured, and the dependence of these traits on the environment. This paper discusses the current situation of standardization in the area of phenomics, points out the problems and shortages, and presents the areas that would benefit from improvement in this field. In addition, the foundations of the work that could revise the situation are proposed, and practical solutions developed by the authors are introduced.


Crops, Agricultural/genetics , Genome, Plant , Genomics/methods , Phenotype , Statistics as Topic/methods
6.
Plant J ; 79(2): 334-47, 2014 Jul.
Article En | MEDLINE | ID: mdl-24813060

Bread wheat (Triticum aestivum L.) is the most important staple food crop for 35% of the world's population. International efforts are underway to facilitate an increase in wheat production, of which the International Wheat Genome Sequencing Consortium (IWGSC) plays an important role. As part of this effort, we have developed a sequence-based physical map of wheat chromosome 6A using whole-genome profiling (WGP™). The bacterial artificial chromosome (BAC) contig assembly tools fingerprinted contig (fpc) and linear topological contig (ltc) were used and their contig assemblies were compared. A detailed investigation of the contigs structure revealed that ltc created a highly robust assembly compared with those formed by fpc. The ltc assemblies contained 1217 contigs for the short arm and 1113 contigs for the long arm, with an L50 of 1 Mb. To facilitate in silico anchoring, WGP™ tags underlying BAC contigs were extended by wheat and wheat progenitor genome sequence information. Sequence data were used for in silico anchoring against genetic markers with known sequences, of which almost 79% of the physical map could be anchored. Moreover, the assigned sequence information led to the 'decoration' of the respective physical map with 3359 anchored genes. Thus, this robust and genetically anchored physical map will serve as a framework for the sequencing of wheat chromosome 6A, and is of immediate use for map-based isolation of agronomically important genes/quantitative trait loci located on this chromosome.


Chromosomes, Plant/genetics , Physical Chromosome Mapping , Triticum/genetics , Chromosomes, Artificial, Bacterial/genetics
7.
Plant J ; 75(5): 880-9, 2013 Sep.
Article En | MEDLINE | ID: mdl-23672264

Genomics-based breeding of economically important crops such as banana, coffee, cotton, potato, tobacco and wheat is often hampered by genome size, polyploidy and high repeat content. We adapted sequence-based whole-genome profiling (WGP™) technology to obtain insight into the polyploidy of the model plant Nicotiana tabacum (tobacco). N. tabacum is assumed to originate from a hybridization event between ancestors of Nicotiana sylvestris and Nicotiana tomentosiformis approximately 200,000 years ago. This resulted in tobacco having a haploid genome size of 4500 million base pairs, approximately four times larger than the related tomato (Solanum lycopersicum) and potato (Solanum tuberosum) genomes. In this study, a physical map containing 9750 contigs of bacterial artificial chromosomes (BACs) was constructed. The mean contig size was 462 kbp, and the calculated genome coverage equaled the estimated tobacco genome size. We used a method for determination of the ancestral origin of the genome by annotation of WGP sequence tags. This assignment agreed with the ancestral annotation available from the tobacco genetic map, and may be used to investigate the evolution of homoeologous genome segments after polyploidization. The map generated is an essential scaffold for the tobacco genome. We propose the combination of WGP physical mapping technology and tag profiling of ancestral lines as a generally applicable method to elucidate the ancestral origin of genome segments of polyploid species. The physical mapping of genes and their origins will enable application of biotechnology to polyploid plants aimed at accelerating and increasing the precision of breeding for abiotic and biotic stress resistance.


Chromosome Mapping , Genome, Plant , Nicotiana/genetics , Physical Chromosome Mapping , Breeding , Genetic Linkage , Hybridization, Genetic , Molecular Sequence Annotation , Polyploidy
8.
BMC Genomics ; 13: 47, 2012 Jan 30.
Article En | MEDLINE | ID: mdl-22289472

BACKGROUND: Sequencing projects using a clone-by-clone approach require the availability of a robust physical map. The SNaPshot technology, based on pair-wise comparisons of restriction fragments sizes, has been used recently to build the first physical map of a wheat chromosome and to complete the maize physical map. However, restriction fragments sizes shared randomly between two non-overlapping BACs often lead to chimerical contigs and mis-assembled BACs in such large and repetitive genomes. Whole Genome Profiling (WGP™) was developed recently as a new sequence-based physical mapping technology and has the potential to limit this problem. RESULTS: A subset of the wheat 3B chromosome BAC library covering 230 Mb was used to establish a WGP physical map and to compare it to a map obtained with the SNaPshot technology. We first adapted the WGP-based assembly methodology to cope with the complexity of the wheat genome. Then, the results showed that the WGP map covers the same length than the SNaPshot map but with 30% less contigs and, more importantly with 3.5 times less mis-assembled BACs. Finally, we evaluated the benefit of integrating WGP tags in different sequence assemblies obtained after Roche/454 sequencing of BAC pools. We showed that while WGP tag integration improves assemblies performed with unpaired reads and with paired-end reads at low coverage, it does not significantly improve sequence assemblies performed at high coverage (25x) with paired-end reads. CONCLUSIONS: Our results demonstrate that, with a suitable assembly methodology, WGP builds more robust physical maps than the SNaPshot technology in wheat and that WGP can be adapted to any genome. Moreover, WGP tag integration in sequence assemblies improves low quality assembly. However, to achieve a high quality draft sequence assembly, a sequencing depth of 25x paired-end reads is required, at which point WGP tag integration does not provide additional scaffolding value. Finally, we suggest that WGP tags can support the efficient sequencing of BAC pools by enabling reliable assignment of sequence scaffolds to their BAC of origin, a feature that is of great interest when using BAC pooling strategies to reduce the cost of sequencing large genomes.


Genome, Plant , Physical Chromosome Mapping , Sequence Analysis, DNA/methods , Triticum/genetics , Chromosomes, Artificial, Bacterial , Chromosomes, Plant , Contig Mapping , DNA Transposable Elements , Sequence Alignment
9.
Genome Res ; 21(4): 618-25, 2011 Apr.
Article En | MEDLINE | ID: mdl-21324881

We present whole genome profiling (WGP), a novel next-generation sequencing-based physical mapping technology for construction of bacterial artificial chromosome (BAC) contigs of complex genomes, using Arabidopsis thaliana as an example. WGP leverages short read sequences derived from restriction fragments of two-dimensionally pooled BAC clones to generate sequence tags. These sequence tags are assigned to individual BAC clones, followed by assembly of BAC contigs based on shared regions containing identical sequence tags. Following in silico analysis of WGP sequence tags and simulation of a map of Arabidopsis chromosome 4 and maize, a WGP map of Arabidopsis thaliana ecotype Columbia was constructed de novo using a six-genome equivalent BAC library. Validation of the WGP map using the Columbia reference sequence confirmed that 350 BAC contigs (98%) were assembled correctly, spanning 97% of the 102-Mb calculated genome coverage. We demonstrate that WGP maps can also be generated for more complex plant genomes and will serve as excellent scaffolds to anchor genetic linkage maps and integrate whole genome sequence data.


Arabidopsis/genetics , Chromosome Mapping/methods , Genome, Plant/genetics , High-Throughput Nucleotide Sequencing , Chromosomes, Artificial, Bacterial/genetics , Computational Biology , Contig Mapping , Genomic Library
10.
Methods Mol Biol ; 578: 73-91, 2009.
Article En | MEDLINE | ID: mdl-19768587

Single nucleotide polymorphisms (SNPs) are the most abundant form of genetic variation and are the basis for most molecular markers. Before these SNPs can be used for direct sequence-based SNP detection or in a derived SNP assay, they need to be identified. For those regions or species where no validated SNPs are available in the public databases, a good alternative is to mine them from DNA sequences. The alignment of multiple sequence fragments originating from different genotypes representing the same region on the genome will allow for the discovery of sequence variants. The corresponding nucleotide mismatches are likely to be SNPs or insertions/deletions. A large amount of sequence data to be mined is present in the public databases (both expressed sequence tags and genomic sequences) and is free to use without having to do large-scale sequencing oneself. However, with the appearance of the next-generation sequencing machines (Roche GS/454, Illumina GA/Solexa, SOLiD), high-throughput sequencing is becoming widely available. This will allow for the sequencing of polymorphic genotypes on specific target areas and consequent SNP identification. In this paper we discuss the bioinformatics tools required to analyze DNA sequence data for SNP mining. A general approach for the consecutive steps in the mining process is described and commonly used SNP discovery pipelines are presented.


Computational Biology/methods , Polymorphism, Single Nucleotide/genetics , Sequence Analysis, DNA/methods , Base Sequence , Humans , Molecular Sequence Data , Sequence Alignment
11.
PLoS One ; 4(3): e4761, 2009.
Article En | MEDLINE | ID: mdl-19283079

Reverse genetics approaches rely on the detection of sequence alterations in target genes to identify allelic variants among mutant or natural populations. Current (pre-) screening methods such as TILLING and EcoTILLING are based on the detection of single base mismatches in heteroduplexes using endonucleases such as CEL 1. However, there are drawbacks in the use of endonucleases due to their relatively poor cleavage efficiency and exonuclease activity. Moreover, pre-screening methods do not reveal information about the nature of sequence changes and their possible impact on gene function. We present KeyPoint technology, a high-throughput mutation/polymorphism discovery technique based on massive parallel sequencing of target genes amplified from mutant or natural populations. KeyPoint combines multi-dimensional pooling of large numbers of individual DNA samples and the use of sample identification tags ("sample barcoding") with next-generation sequencing technology. We show the power of KeyPoint by identifying two mutants in the tomato eIF4E gene based on screening more than 3000 M2 families in a single GS FLX sequencing run, and discovery of six haplotypes of tomato eIF4E gene by re-sequencing three amplicons in a subset of 92 tomato lines from the EU-SOL core collection. We propose KeyPoint technology as a broadly applicable amplicon sequencing approach to screen mutant populations or germplasm collections for identification of (novel) allelic variation in a high-throughput fashion.


Mutation , Nucleic Acid Amplification Techniques/methods , Polymorphism, Genetic , Sequence Analysis, DNA/methods , Solanum lycopersicum/genetics , Alleles , Base Sequence , Eukaryotic Initiation Factor-4E/genetics , Haplotypes , Polymorphism, Single Nucleotide
12.
PLoS One ; 2(11): e1172, 2007 Nov 14.
Article En | MEDLINE | ID: mdl-18000544

Application of single nucleotide polymorphisms (SNPs) is revolutionizing human bio-medical research. However, discovery of polymorphisms in low polymorphic species is still a challenging and costly endeavor, despite widespread availability of Sanger sequencing technology. We present CRoPS as a novel approach for polymorphism discovery by combining the power of reproducible genome complexity reduction of AFLP with Genome Sequencer (GS) 20/GS FLX next-generation sequencing technology. With CRoPS, hundreds-of-thousands of sequence reads derived from complexity-reduced genome sequences of two or more samples are processed and mined for SNPs using a fully-automated bioinformatics pipeline. We show that over 75% of putative maize SNPs discovered using CRoPS are successfully converted to SNPWave assays, confirming them to be true SNPs derived from unique (single-copy) genome sequences. By using CRoPS, polymorphism discovery will become affordable in organisms with high levels of repetitive DNA in the genome and/or low levels of polymorphism in the (breeding) germplasm without the need for prior sequence information.


Polymorphism, Single Nucleotide , Base Sequence , Genome, Plant , Molecular Sequence Data , Sequence Homology, Nucleic Acid , Zea mays/genetics
13.
Genetics ; 171(3): 1341-52, 2005 Nov.
Article En | MEDLINE | ID: mdl-16085696

In the quest for fine mapping quantitative trait loci (QTL) at a subcentimorgan scale, several methods that involve the construction of inbred lines and the generation of large progenies of such inbred lines have been developed (Complex Trait Consortium 2003). Here we present an alternative method that significantly speeds up QTL fine mapping by using one segregating population. As a first step, a rough mapping analysis is performed on a small part of the population. Once the QTL have been mapped to a chromosomal interval by standard procedures, a large population of 1000 plants or more is analyzed with markers flanking the defined QTL to select QTL isogenic recombinants (QIRs). QIRs bear a recombination event in the QTL interval of interest, while other QTL have the same homozygous genotype. Only these QIRs are subsequently phenotyped to fine map the QTL. By focusing at an early stage on the informative individuals in the population only, the efforts in population genotyping and phenotyping are significantly reduced as compared to prior methods. The principles of this approach are demonstrated by fine mapping an erucic acid QTL of rapeseed at a subcentimorgan scale.


Brassica rapa/genetics , Chromosome Mapping/statistics & numerical data , Quantitative Trait Loci , Brassica rapa/metabolism , Erucic Acids/metabolism , Genetic Markers , Genetics, Population/statistics & numerical data , Recombination, Genetic , Sample Size
14.
Genomics ; 82(6): 606-18, 2003 Dec.
Article En | MEDLINE | ID: mdl-14611802

cDNA-AFLP is a genome-wide expression analysis technology that does not require any prior knowledge of gene sequences. This PCR-based technique combines a high sensitivity with a high specificity, allowing detection of rarely expressed genes and distinguishing between homologous genes. In this report, we validated quantitative expression data of 110 cDNA-AFLP fragments in yeast with DNA microarrays and GeneChip data. The best correlation was found between cDNA-AFLP and GeneChip data. The cDNA-AFLP data revealed a low number of inconsistent profiles that could be explained by gel artifact, overexposure, or mismatch amplification. In addition, 18 cDNA-AFLP fragments displayed homology to genomic yeast DNA, but could not be linked unambiguously to any known ORF. These fragments were most probably derived from 5' or 3' noncoding sequences or might represent previously unidentified ORFs. Genes liable to cross hybridization showed identical results in cDNA-AFLP and GeneChip analysis. Three genes, which were readily detected with cDNA-AFLP, showed no significant expression in GeneChip experiments. We show that cDNA-AFLP is a very good alternative to microarrays and since no preexisting biological or sequence information is required, it is applicable to any species.


DNA, Complementary/genetics , Gene Expression , Oligonucleotide Array Sequence Analysis , Polymorphism, Restriction Fragment Length , Saccharomyces cerevisiae/genetics , DNA Primers , Fluorescent Dyes , Sensitivity and Specificity , Sequence Analysis, DNA
...