RESUMO
We report reference-quality genome assemblies and annotations for two accessions of soybean (Glycine max) and for one accession of Glycine soja, the closest wild relative of G. max. The G. max assemblies provided are for widely used US cultivars: the northern line Williams 82 (Wm82) and the southern line Lee. The Wm82 assembly improves the prior published assembly, and the Lee and G. soja assemblies are new for these accessions. Comparisons among the three accessions show generally high structural conservation, but nucleotide difference of 1.7 single-nucleotide polymorphisms (snps) per kb between Wm82 and Lee, and 4.7 snps per kb between these lines and G. soja. snp distributions and comparisons with genotypes of the Lee and Wm82 parents highlight patterns of introgression and haplotype structure. Comparisons against the US germplasm collection show placement of the sequenced accessions relative to global soybean diversity. Analysis of a pan-gene collection shows generally high conservation, with variation occurring primarily in genomically clustered gene families. We found approximately 40-42 inversions per chromosome between either Lee or Wm82v4 and G. soja, and approximately 32 inversions per chromosome between Wm82 and Lee. We also investigated five domestication loci. For each locus, we found two different alleles with functional differences between G. soja and the two domesticated accessions. The genome assemblies for multiple cultivated accessions and for the closest wild ancestor of soybean provides a valuable set of resources for identifying causal variants that underlie traits for the domestication and improvement of soybean, serving as a basis for future research and crop improvement efforts for this important crop species.
Assuntos
Fabaceae/genética , Variação Genética , Genoma de Planta , Alelos , Centrômero/genética , Resistência à Doença/genética , Genética Populacional , Genótipo , Haplótipos , Dureza , Família Multigênica , Filogenia , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Sequências Repetitivas de Ácido Nucleico , Banco de Sementes/classificação , Inversão de Sequência , Telômero/genéticaRESUMO
RNA transcripts circulating in peripheral blood represent an important source of non-invasive biomarkers. To accurately quantify the levels of circulating transcripts, one needs to normalize the data with internal control reference genes, which are detected at relatively constant levels across blood samples. A few reference gene candidates have to be selected from transcriptome data before the validation of their stable expression by reverse-transcription quantitative polymerase chain reaction. However, there is a lack of transcriptome, let alone whole-transcriptome, data from maternal blood. To overcome this shortfall, we performed RNA-sequencing on blood samples from women presenting with preterm labor. The coefficient of variation (CV) of expression levels was calculated. Of 11,215 exons detected in the maternal blood whole-transcriptome, a panel of 395 genes, including PPP1R15B, EXOC8, ACTB, and TPT1, were identified to comprise exons with considerably less variable expression level (CV, 7.75-17.7%) than any GAPDH exon (minimum CV, 27.3%). Upon validation, the selected genes from this panel remained more stably expressed than GAPDH in maternal blood. This panel is over-represented with genes involved with the actin cytoskeleton, macromolecular complex, and integrin signaling. This groundwork provides a starting point for systematically selecting reference gene candidates for normalizing the levels of circulating RNA transcripts in maternal blood.
Assuntos
RNA/sangue , RNA/genética , Análise de Sequência de RNA/métodos , Algoritmos , Éxons/genética , Feminino , Regulação da Expressão Gênica , Humanos , Anotação de Sequência Molecular , Gravidez , Padrões de Referência , Software , Transcriptoma/genética , Proteína Tumoral 1 Controlada por TraduçãoRESUMO
Large structural variants (SVs) in the human genome are difficult to detect and study by conventional sequencing technologies. With long-range genome analysis platforms, such as optical mapping, one can identify large SVs (>2 kb) across the genome in one experiment. Analyzing optical genome maps of 154 individuals from the 26 populations sequenced in the 1000 Genomes Project, we find that phylogenetic population patterns of large SVs are similar to those of single nucleotide variations in 86% of the human genome, while ~2% of the genome has high structural complexity. We are able to characterize SVs in many intractable regions of the genome, including segmental duplications and subtelomeric, pericentromeric, and acrocentric areas. In addition, we discover ~60 Mb of non-redundant genome content missing in the reference genome sequence assembly. Our results highlight the need for a comprehensive set of alternate haplotypes from different populations to represent SV patterns in the genome.