RESUMO
PREMISE: Vigna includes economically vital crops and wild species. Molecular systematic studies of Vigna species resulted in generic segregates of many New World (NW) species. However, limited Old World (OW) sampling left questions regarding inter- and intraspecific relationships in Vigna s.s. METHODS: African species, including the putative sister genus Physostigma, were comprehensively sampled within the context of NW relatives. Maximum likelihood and Bayesian inference analyses of the chloroplast matK-trnK and nuclear ribosomal ITS/5.8 S (ITS) DNA regions were undertaken to resolve OW Vigna taxonomic questions. Divergence dates were estimated using BEAST to date key nodes in the phylogeny. RESULTS: Analyses of matK and ITS data supported five clades of Vigna s.s.: subg. Lasiospron, a reduced subg. Vigna, subg. Haydonia, subg. Ceratotropis, an enlarged subg. Plectrotropis, and a clade including V. kirkii and V. stenophylla. Genome size estimates of 601 Mb for V. kirkii are near the overall mean of the genus, whereas V. stenophylla had a larger genome (810 Mb), similar to some Vigna subg. Ceratotropis or Plectrotropis species. CONCLUSIONS: Former subg. Vigna is reduced to yellow- and blue-flowered species and subg. Plectrotropis is enlarged to mostly all white-, pink-, and purple-flowered species. The age of the split between NW and OW Vigna lineages is ~6-7 Myr. Genome size estimates cannot rule out a polyploid or hybrid origin for V. stenophylla, potentially involving extinct lineage ancestors of Vigna subg. Ceratotropis or Plectrotropis, as indicated by network and phylogenetic analyses. Taxonomic revisions are suggested based on these results.
RESUMO
Extensive research has focused on exploring the range of genome sizes in eukaryotes, with a particular emphasis on land plants, where significant variability has been observed. Accurate estimation of genome size is essential for various research purposes, but existing sequence-based methods have limitations, particularly for low-coverage datasets. In this study, we introduce LocoGSE, a novel genome size estimator designed specifically for low-coverage datasets generated by genome skimming approaches. LocoGSE relies on mapping the reads on single copy consensus proteins without the need for a reference genome assembly. We calibrated LocoGSE using 430 low-coverage Angiosperm genome skimming datasets and compared its performance against other estimators. Our results demonstrate that LocoGSE accurately predicts monoploid genome size even at very low depth of coverage (<1X) and on highly heterozygous samples. Additionally, LocoGSE provides stable estimates across individuals with varying ploidy levels. LocoGSE fills a gap in sequence-based plant genome size estimation by offering a user-friendly and reliable tool that does not rely on high coverage or reference assemblies. We anticipate that LocoGSE will facilitate plant genome size analysis and contribute to evolutionary and ecological studies in the field. Furthermore, at the cost of an initial calibration, LocoGSE can be used in other lineages.
RESUMO
[This corrects the article DOI: 10.3389/fpls.2024.1328966.].
RESUMO
Introduction: In the realm of next-generation sequencing datasets, various characteristics can be extracted through k-mer based analysis. Among these characteristics, genome size (GS) is one that can be estimated with relative ease, yet achieving satisfactory accuracy, especially in the context of heterozygosity, remains a challenge. Methods: In this study, we introduce a high-precision genome size estimator, GSET (Genome Size Estimation Tool), which is based on k-mer histogram correction. Results: We have evaluated GSET on both simulated and real datasets. The experimental results demonstrate that this tool can estimate genome size with greater precision, even surpassing the accuracy of state-of-the-art tools. Notably, GSET also performs satisfactorily on heterozygous datasets, where other tools struggle to produce useable results. Discussion: The processing model of GSET diverges from the popular data fitting models used by similar tools. Instead, it is derived from empirical data and incorporates a correction term to mitigate the impact of sequencing errors on genome size estimation. GSET is freely available for use and can be accessed at the following URL: https://github.com/Xingyu-Liao/GSET.
RESUMO
Genome sequence and identification of specific genes involved in the targeted secondary metabolite biosynthesis are two essential requirements for the improvement of any medicinal plant. Commiphora wightii (Arnott) Bhandari (family: Burseraceae), a medicinal plant native to Western India, produces a phytosterol guggulsterone, which is useful for treating atherosclerosis, arthritis, high cholesterol, acne, and obesity. For enhanced guggulsterone yield, key genes involved in its biosynthesis pathway need to be predicted, for which the genome sequence of the species is a pre-requisite. Therefore, we assembled the first-ever hybrid draft genome of C. wightii with a genome size of 1.03 Gb and 107,221 contigs using Illumina and PacBio platforms. The N50 and L50 values in this assembled genome were ~74 Kb and 3486 bp, respectively with a guanine-cytosine (GC) content of 35.6% and 98.7%. The Benchmarking Universal Single Copy Ortholog (BUSCO) value indicated good integrity of assembly. Analysis predicted the presence of 31,187 genes and 342.35 Mb repeat elements in the genome. The comparative genome analysis of C. wightii with relevant orthogroups predicted a few key genes associated with phytosterol biosynthesis and secondary metabolism pathways. The assembled draft genome and the predicted genes should help the future variety development program with improved guggulsterone contents in C. wightii.
RESUMO
Understanding the processes and consequences of the morphological diversity of organisms is one of the major goals of evolutionary biology. Studies on the evolution of developmental mechanisms of morphologies, or evo-devo, have been extensively conducted in many taxa and have revealed many interesting phenomena at the molecular level. However, many other taxa exhibiting intriguing morphological diversity remain unexplored in the field of evo-devo. Although the annelid family Syllidae shows spectacular diversity in morphological development associated with reproduction, its evo-devo study, especially on molecular development, has progressed slowly. In this study, we focused on Megasyllis nipponica as a new model species for evo-devo in syllids and performed transcriptome sequencing to develop a massive genetic resource, which will be useful for future molecular studies. From the transcriptome data, we identified candidate genes that are likely involved in morphogenesis, including genes involved in hormone regulation, sex determination and appendage development. Furthermore, a computational analysis of the transcriptome sequence data indicated the occurrence of DNA methylation in coding regions of the M. nipponica genome. In addition, flow cytometry analysis showed that the genome size of M. nipponica was approximately 524 megabases. These results facilitate the study of morphogenesis in molecular terms and contribute to our understanding of the morphological diversity in syllids.
Assuntos
Anelídeos , Biologia do Desenvolvimento , Animais , Transcriptoma , Anelídeos/genética , Genoma , Hormônios , Evolução BiológicaRESUMO
The first step in any genome research after obtaining the read data is to perform a due quality control of the sequenced reads. In a de novo genome assembly project, the second step is to estimate two important features, the genome size and 'best k-mer', to start the assembly tests with different de novo assembly software and its parameters. However, the quality control of the sequenced genome libraries as a whole, instead of focusing on the reads only, is frequently overlooked and realized to be important only when the assembly tests did not render the expected results. We have developed GSER, a Genome Size Estimator using R, a pipeline to evaluate the relationship between k-mers and genome size, as a means for quality assessment of the sequenced genome libraries. GSER generates a set of charts that allow the analyst to evaluate the library datasets before starting the assembly. The script which runs the pipeline can be downloaded from http://www.mobilomics.org/GSER/downloads or http://github.com/mobilomics/GSER.
RESUMO
Zygnematophyceae green algae (ZGA) have been shown to be the closest relatives of land plants. Three nuclear genomes (Spirogloea muscicola, Mesotaenium endlicherianum, and Penium margaritaceum) of ZGA have been recently published, and more genomes are underway. Here we analyzed two Zygnema circumcarinatum strains SAG 698-1a (mating +) and SAG 698-1b (mating -) and found distinct cell sizes and other morphological differences. The molecular identities of the two strains were further investigated by sequencing their 18S rRNA, psaA and rbcL genes. These marker genes of SAG 698-1a were surprisingly much more similar to Z. cylindricum (SAG 698-2) than to SAG 698-1b. Phylogenies of these marker genes also showed that SAG 698-1a and SAG 698-1b were well separated into two different Zygnema clades, where SAG 698-1a was clustered with Z. cylindricum, while SAG 698-1b was clustered with Z. tunetanum. Additionally, physiological parameters like ETRmax values differed between SAG 698-1a and SAG 698-1b after 2 months of cultivation. The de-epoxidation state (DEPS) of the xanthophyll cycle pigments also showed significant differences. Surprisingly, the two strains could not conjugate, and significantly differed in the thickness of the mucilage layer. Additionally, ZGA cell walls are highly enriched with sticky and acidic polysaccharides, and therefore the widely used plant nuclear extraction protocols do not work well in ZGA. Here, we also report a fast and simple method, by mechanical chopping, for efficient nuclear extraction in the two SAG strains. More importantly, the extracted nuclei were further used for nuclear genome size estimation of the two SAG strains by flow cytometry (FC). To confirm the FC result, we have also used other experimental methods for nuclear genome size estimation of the two strains. Interestingly, the two strains were found to have very distinct nuclear genome sizes (313.2 ± 2.0 Mb in SAG 698-1a vs. 63.5 ± 0.5 Mb in SAG 698-1b). Our multiple lines of evidence strongly indicate that SAG 698-1a possibly had been confused with SAG 698-2 prior to 2005, and most likely represents Z. cylindricum or a closely related species.
RESUMO
BACKGROUND: Global warming and other ecological changes have facilitated the expansion of Ixodes ricinus tick populations. Ixodes ricinus is the most important carrier of vector-borne pathogens in Europe, transmitting viruses, protozoa and bacteria, in particular Borrelia burgdorferi (sensu lato), the causative agent of Lyme borreliosis, the most prevalent vector-borne disease in humans in the Northern hemisphere. To faster control this disease vector, a better understanding of the I. ricinus tick is necessary. To facilitate such studies, we recently published the first reference genome of this highly prevalent pathogen vector. Here, we further extend these studies by scaffolding and annotating the first reference genome by using ultra-long sequencing reads from third generation single molecule sequencing. In addition, we present the first genome size estimation for I. ricinus ticks and the embryo-derived cell line IRE/CTVM19. RESULTS: 235,953 contigs were integrated into 204,904 scaffolds, extending the currently known genome lengths by more than 30% from 393 to 516 Mb and the N50 contig value by 87% from 1643 bp to a N50 scaffold value of 3067 bp. In addition, 25,263 sequences were annotated by comparison to the tick's North American relative Ixodes scapularis. After (conserved) hypothetical proteins, zinc finger proteins, secreted proteins and P450 coding proteins were the most prevalent protein categories annotated. Interestingly, more than 50% of the amino acid sequences matching the homology threshold had 95-100% identity to the corresponding I. scapularis gene models. The sequence information was complemented by the first genome size estimation for this species. Flow cytometry-based genome size analysis revealed a haploid genome size of 2.65Gb for I. ricinus ticks and 3.80 Gb for the cell line. CONCLUSIONS: We present a first draft sequence map of the I. ricinus genome based on a PacBio-Illumina assembly. The I. ricinus genome was shown to be 26% (500 Mb) larger than the genome of its American relative I. scapularis. Based on the genome size of 2.65 Gb we estimated that we covered about 67% of the non-repetitive sequences. Genome annotation will facilitate screening for specific molecular pathways in I. ricinus cells and provides an overview of characteristics and functions.