RESUMO
This study evaluated the genetic diversity and population structures in a novel cotton germplasm collection comprising 132 diploids, including Glossypium klotzschianum and allotetraploid cotton accessions, including Glossypium barbadense, Glossypium darwinii, Glossypium tomentosum, Glossypium ekmanianum, and Glossypium stephensii, from Santa Cruz, Isabella, San Cristobal, Hawaiian, Dominican Republic, and Wake Atoll islands. A total of 111 expressed sequence tag (EST) and genomic simple sequence repeat (gSSR) markers produced 382 polymorphic loci with an average of 3.44 polymorphic alleles per SSR marker. Polymorphism information content values counted 0.08 to 0.82 with an average of 0.56. Analysis of a genetic distance matrix revealed values of 0.003 to 0.53 with an average of 0.33 in the wild cotton collection. Phylogenetic analysis supported the subgroups identified by STRUCTURE and corresponds well with the results of principal coordinate analysis with a cumulative variation of 45.65%. A total of 123 unique alleles were observed among all accessions and 31 identified only in G. ekmanianum. Analysis of molecular variance revealed highly significant variation between the six groups identified by structure analysis with 49% of the total variation and 51% of the variation was due to diversity within the groups. The highest genetic differentiation among tetraploid populations was observed between accessions from the Hawaiian and Santa Cruz regions with a pairwise FST of 0.752 (p < 0.001). DUF819 containing an uncharacterized gene named yjcL linked to genomic markers has been found to be highly related to tryptophan-aspartic acid (W-D) repeats in a superfamily of genes. The RNA sequence expression data of the yjcL-linked gene Gh_A09G2500 was found to be upregulated under drought and salt stress conditions. The existence of genetic diversity, characterization of genes and variation in novel germplasm collection will be a landmark addition to the genetic study of cotton germplasm.