Búsqueda | BVS CLAP/SMR-OPS/OMS

Genome architecture and genetic diversity of allopolyploid okra (Abelmoschus esculentus).

Nieuwenhuis, Ronald; Hesselink, Thamara; van den Broeck, Hetty C; Cordewener, Jan; Schijlen, Elio; Bakker, Linda; Diaz Trivino, Sara; Struss, Darush; de Hoop, Simon-Jan; de Jong, Hans; Peters, Sander A.

Plant J ; 118(1): 225-241, 2024 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-38133904

RESUMEN

The allopolyploid okra (Abelmoschus esculentus) unveiled telomeric repeats flanking distal gene-rich regions and short interstitial TTTAGGG telomeric repeats, possibly representing hallmarks of chromosomal speciation. Ribosomal RNA (rRNA) genes organize into 5S clusters, distinct from the 18S-5.8S-28S units, indicating an S-type rRNA gene arrangement. The assembly, in line with cytogenetic and cytometry observations, identifies 65 chromosomes and a 1.45 Gb genome size estimate in a haploid sibling. The lack of aberrant meiotic configurations implies limited to no recombination among sub-genomes. k-mer distribution analysis reveals 75% has a diploid nature and 15% heterozygosity. The configurations of Benchmarking Universal Single-Copy Ortholog (BUSCO), k-mer, and repeat clustering point to the presence of at least two sub-genomes one with 30 and the other with 35 chromosomes, indicating the allopolyploid nature of the okra genome. Over 130 000 putative genes, derived from mapped IsoSeq data and transcriptome data from public okra accessions, exhibit a low genetic diversity of one single nucleotide polymorphisms per 2.1 kbp. The genes are predominantly located at the distal chromosome ends, declining toward central scaffold domains. Long terminal repeat retrotransposons prevail in central domains, consistent with the observed pericentromeric heterochromatin and distal euchromatin. Disparities in paralogous gene counts suggest potential sub-genome differentiation implying possible sub-genome dominance. Amino acid query sequences of putative genes facilitated phenol biosynthesis pathway annotation. Comparison with manually curated reference KEGG pathways from related Malvaceae species reveals the genetic basis for putative enzyme coding genes that likely enable metabolic reactions involved in the biosynthesis of dietary and therapeutic compounds in okra.

Asunto(s)

Abelmoschus , Abelmoschus/genética , Abelmoschus/metabolismo , Genoma , Telómero , Diploidia , Variación Genética

Exploring genetic variation in the tomato (Solanum section Lycopersicon) clade by whole-genome sequencing.

Aflitos, Saulo; Schijlen, Elio; de Jong, Hans; de Ridder, Dick; Smit, Sandra; Finkers, Richard; Wang, Jun; Zhang, Gengyun; Li, Ning; Mao, Likai; Bakker, Freek; Dirks, Rob; Breit, Timo; Gravendeel, Barbara; Huits, Henk; Struss, Darush; Swanson-Wagner, Ruth; van Leeuwen, Hans; van Ham, Roeland C H J; Fito, Laia; Guignier, Laëtitia; Sevilla, Myrna; Ellul, Philippe; Ganko, Eric; Kapur, Arvind; Reclus, Emannuel; de Geus, Bernard; van de Geest, Henri; Te Lintel Hekkert, Bas; van Haarst, Jan; Smits, Lars; Koops, Andries; Sanchez-Perez, Gabino; van Heusden, Adriaan W; Visser, Richard; Quan, Zhiwu; Min, Jiumeng; Liao, Li; Wang, Xiaoli; Wang, Guangbiao; Yue, Zhen; Yang, Xinhua; Xu, Na; Schranz, Eric; Smets, Erik; Vos, Rutger; Rauwerda, Johan; Ursem, Remco; Schuit, Cees; Kerns, Mike.

Plant J ; 80(1): 136-48, 2014 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-25039268

RESUMEN

We explored genetic variation by sequencing a selection of 84 tomato accessions and related wild species representative of the Lycopersicon, Arcanum, Eriopersicon and Neolycopersicon groups, which has yielded a huge amount of precious data on sequence diversity in the tomato clade. Three new reference genomes were reconstructed to support our comparative genome analyses. Comparative sequence alignment revealed group-, species- and accession-specific polymorphisms, explaining characteristic fruit traits and growth habits in the various cultivars. Using gene models from the annotated Heinz 1706 reference genome, we observed differences in the ratio between non-synonymous and synonymous SNPs (dN/dS) in fruit diversification and plant growth genes compared to a random set of genes, indicating positive selection and differences in selection pressure between crop accessions and wild species. In wild species, the number of single-nucleotide polymorphisms (SNPs) exceeds 10 million, i.e. 20-fold higher than found in most of the crop accessions, indicating dramatic genetic erosion of crop and heirloom tomatoes. In addition, the highest levels of heterozygosity were found for allogamous self-incompatible wild species, while facultative and autogamous self-compatible species display a lower heterozygosity level. Using whole-genome SNP information for maximum-likelihood analysis, we achieved complete tree resolution, whereas maximum-likelihood trees based on SNPs from ten fruit and growth genes show incomplete resolution for the crop accessions, partly due to the effect of heterozygous SNPs. Finally, results suggest that phylogenetic relationships are correlated with habitat, indicating the occurrence of geographical races within these groups, which is of practical importance for Solanum genome evolution studies.

Asunto(s)

Variación Genética , Genoma de Planta/genética , Solanum lycopersicum/genética , Cruzamiento , Mapeo Cromosómico , ADN de Plantas/química , ADN de Plantas/genética , Frutas/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Datos de Secuencia Molecular , Fenotipo , Filogenia , Polimorfismo de Nucleótido Simple , Alineación de Secuencia , Análisis de Secuencia de ADN , Especificidad de la Especie

Recalibration of mapping quality scores in Illumina short-read alignments improves SNP detection results in low-coverage sequencing data.

Cline, Eliot; Wisittipanit, Nuttachat; Boongoen, Tossapon; Chukeatirote, Ekachai; Struss, Darush; Eungwanichayapant, Anant.

PeerJ ; 8: e10501, 2020.

Artículo en Inglés | MEDLINE | ID: mdl-33354434

RESUMEN

BACKGROUND: Low-coverage sequencing is a cost-effective way to obtain reads spanning an entire genome. However, read depth at each locus is low, making sequencing error difficult to separate from actual variation. Prior to variant calling, sequencer reads are aligned to a reference genome, with alignments stored in Sequence Alignment/Map (SAM) files. Each alignment has a mapping quality (MAPQ) score indicating the probability a read is incorrectly aligned. This study investigated the recalibration of probability estimates used to compute MAPQ scores for improving variant calling performance in single-sample, low-coverage settings. MATERIALS AND METHODS: Simulated tomato, hot pepper and rice genomes were implanted with known variants. From these, simulated paired-end reads were generated at low coverage and aligned to the original reference genomes. Features extracted from the SAM formatted alignment files for tomato were used to train machine learning models to detect incorrectly aligned reads and output estimates of the probability of misalignment for each read in all three data sets. MAPQ scores were then re-computed from these estimates. Next, the SAM files were updated with new MAPQ scores. Finally, Variant calling was performed on the original and recalibrated alignments and the results compared. RESULTS: Incorrectly aligned reads comprised only 0.16% of the reads in the training set. This severe class imbalance required special consideration for model training. The F1 score for detecting misaligned reads ranged from 0.76 to 0.82. The best performing model was used to compute new MAPQ scores. Single Nucleotide Polymorphism (SNP) detection was improved after mapping score recalibration. In rice, recall for called SNPs increased by 5.2%, while for tomato and pepper it increased by 3.1% and 1.5%, respectively. For all three data sets the precision of SNP calls ranged from 0.91 to 0.95, and was largely unchanged both before and after mapping score recalibration. CONCLUSION: Recalibrating MAPQ scores delivers modest improvements in single-sample variant calling results. Some variant callers operate on multiple samples simultaneously. They exploit every sample's reads to compensate for the low read-depth of individual samples. This improves polymorphism detection and genotype inference. It may be that small improvements in single-sample settings translate to larger gains in a multi-sample experiment. A study to investigate this is ongoing.

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA