RESUMO
Introduction: Sorghum (Sorghum bicolor (L.) Moench) is an agriculturally and economically important staple crop that has immense potential as a bioenergy feedstock due to its relatively high productivity on marginal lands. To capitalize on and further improve sorghum as a potential source of sustainable biofuel, it is essential to understand the genomic mechanisms underlying complex traits related to yield, composition, and environmental adaptations. Methods: Expanding on a recently developed mapping population, we generated de novo genome assemblies for 10 parental genotypes from this population and identified a comprehensive set of over 24 thousand large structural variants (SVs) and over 10.5 million single nucleotide polymorphisms (SNPs). Results: We show that SVs and nonsynonymous SNPs are enriched in different gene categories, emphasizing the need for long read sequencing in crop species to identify novel variation. Furthermore, we highlight SVs and SNPs occurring in genes and pathways with known associations to critical bioenergy-related phenotypes and characterize the landscape of genetic differences between sweet and cellulosic genotypes. Discussion: These resources can be integrated into both ongoing and future mapping and trait discovery for sorghum and its myriad uses including food, feed, bioenergy, and increasingly as a carbon dioxide removal mechanism.
RESUMO
In this dataset, we report the genome assembly and data analysis of Mycobacterium tuberculosis strain SIT745/EAI1-MYS. Previously, this strain was isolated from a Malaysian patient with extra-pulmonary tuberculosis, and identification of this strain is done by spoligotype patterns with fifteen known Shared International Type (SITs). Further analysis showed that this strain has a remarkable phylogeographical specificity for Malaysia. Based on the National Center for Biotechnology Information (NCBI) nucleotide database information, the complete genome consists of 150 contigs with various sequence lengths and was not assembled. In this assembly, the aforementioned contigs along with reference sequence from Mycobacterium tuberculosis strain H37Rv and Mycobacterium bovis strain AF2122/97 was used for gap closures, were assembled into a single circular chromosome length of approximately 4.42 Mega bases (Mb) with an average GC content of 65.6%. The single circular chromosome was shown to contain 4,009 protein-coding sequences, 3 ribosomal RNAs, 45 transfer RNAs, and 12 superclasses distributed with 277 subsystems which constitute nearly 1900 genes, respectively. The genome information will provide fundamental knowledge of this organism as well as insight for understanding genomic and proteomic profiling, phylogenetic relationship.