الملخص
Gloriosa superba is an economical source of pharmaceutical colchicine, which is a mitotic poison used to treat gout, cancer and inflammatory diseases. It is important to study the genetic variations in this plant, but the progress is impeded due to limited number of molecular markers. In this study, we developed the expressed sequence tag-derived simple sequence repeat (EST-SSR) markers from the transcriptome sequence of the leaf samples of three different ecotypes of G. superba. De novo assembly was performed on these sequencing data to generate a total of 65,579 unigenes and 38,200 coding sequences (CDSs). These CDSs were annotated using NCBI Nr protein database, gene ontology terms and KEGG pathways. Differential gene expression was studied to yield differences in these ecotypes at the molecular level. Finally, a total of 14,672 potential EST-SSRs were identified from these unigenes, among which the dinucleotide (5754, 39.22%) and trinucleotide (5421, 36.95%) repeats were most abundant types followed by mononucleotides (3213, 21.83%). The most frequent motifs were CT/GA (1392, 9.48%), AG/TC (1219, 8.31%), and GA/CT (1146, 7.82%) among the dinucleotide repeats and CCG/ CGG (1487, 10.13%), AGG/CCT (1421, 9.68%), AGC/CTG (697, 4.75%) and AAG/CTT (621, 4.23%) among the trinucleotide repeats. Polymorphism study using a random set of 20 newly developed EST-SSRs revealed polymorphic information content value ranging from 0 to 0.5926 with an average of 0.4021. The large-scale ESTs developed in the current study will be useful as a genomic resource for further investigation of the genetic variations in this species
الملخص
Northern snakehead, Ophiocephalus argus Cantor, is an endemic freshwater fish in China. However, wild stocks of O. argus are dwindling sharply. Further, water conservancy projects, environmental pollution and human activities have caused the decrease of wild stocks, which has attracted much attention. Here, we have investigated the genomic information of O. argus using IlluminaHiseq 4000 sequencing. The transcriptomes of O. argus were sequenced by Illumina technology. A total of 67,564 sequences from 79,500,964 paired-end reads were generated, 33,710 unigenes were annotated based on protein databases (NCBI nonredundant (NR) databases). In total, 7182 unigenes had the clusters of orthologous group (COG) classifications, 33,710 unigenes were assigned to 59 gene ontology (GO) terms. Further, a total of 21,464 simple sequence repeats (SSRs) from 67,564 unigenes and 113,518 single nucleotide polymorphism (SNP) sites among 335 Mclean reads were yielded for O. argus based on a transcriptome-wide search. The new transcriptome data which is presented in this study for O. argus will provide valuable information for gene discovery and downstream applications, such as phylogenetic analysis, gene-expression profiling and identification of genetic markers (SSRs andSNP).
الملخص
To unravel the genetic mechanisms of disease and physiological traits, it requires comprehensive sequencing analysis of large sample size in Chinese populations. Here, we report the primary results of the Chinese Academy of Sciences Precision Medicine Initiative (CASPMI) project launched by the Chinese Academy of Sciences, including the de novo assembly of a northern Han reference genome (NH1.0) and whole genome analyses of 597 healthy people coming from most areas in China. Given the two existing reference genomes for Han Chinese (YH and HX1) were both from the south, we constructed NH1.0, a new reference genome from a northern individual, by combining the sequencing strategies of PacBio, 10× Genomics, and Bionano mapping. Using this integrated approach, we obtained an N50 scaffold size of 46.63 Mb for the NH1.0 genome and performed a comparative genome analysis of NH1.0 with YH and HX1. In order to generate a genomic variation map of Chinese populations, we performed the whole-genome sequencing of 597 participants and identified 24.85 million (M) single nucleotide variants (SNVs), 3.85 M small indels, and 106,382 structural variations. In the association analysis with collected phenotypes, we found that the T allele of rs1549293 in KAT8 significantly correlated with the waist circumference in northern Han males. Moreover, significant genetic diversity in MTHFR, TCN2, FADS1, and FADS2, which associate with circulating folate, vitamin B12, or lipid metabolism, was observed between northerners and southerners. Especially, for the homocysteine-increasing allele of rs1801133 (MTHFR 677T), we hypothesize that there exists a "comfort" zone for a high frequency of 677T between latitudes of 35-45 degree North. Taken together, our results provide a high-quality northern Han reference genome and novel population-specific data sets of genetic variants for use in the personalized and precision medicine.
الملخص
Abstract Red swamp crayfish is an important model organism for research of the invertebrate innate immunity mechanism. Its excellent disease resistance against bacteria, fungi, and viruses is well-known. However, the antiviral mechanisms of crayfish remain unclear. In this study, we obtained high-quality sequence reads from normal and white spot syndrome virus (WSSV)-challenged crayfish gills. For group normal (GN), 39,390,280 high-quality clean reads were randomly assembled to produce 172,591 contigs; whereas, 34,011,488 high-quality clean reads were randomly assembled to produce 182,176 contigs for group WSSV-challenged (GW). After GO annotations analysis, a total of 35,539 (90.01%), 14,931 (37.82%), 28,221 (71.48%), 25,290 (64.05%), 15,595 (39.50%), and 13,848 (35.07%) unigenes had significant matches with sequences in the Nr, Nt, Swiss-Prot, KEGG, COG and GO databases, respectively. Through the comparative analysis between GN and GW, 12,868 genes were identified as differentially up-regulated DEGs, and 9,194 genes were identified as differentially down-regulated DEGs. Ultimately, these DEGs were mapped into different signaling pathways, including three important signaling pathways related to innate immunity responses. These results could provide new insights into crayfish antiviral immunity mechanism.
الملخص
Objective To produce a comprehensive transcript dataset of Oncomelania hupensis before and after Schistosoma japonicum infection,so as to provide experimental data for perfecting genetic structural information and excavating related mo-lecular markers of O. hupensis infected by S. japonicum. Methods O. hupensis snails were divided into the following 3 groups:one week after S. japonicum miracidium infection,4 weeks after S. japonicum miracidium infection,and normal condition. Mil-lion high-quality reads were obtained from the normalized cDNA of the pooled samples,which were assembled into transcripts. Results A total of 63686 unigenes were identified and were classified into 4 main categories,including general functional pre-diction(15.36%),signal transduction mechanism(11.75%),posttranslational modification(8.89%),and functional unknown (12.20%). Conclusions The transcriptome information of O. hupensis snail after the invasion of S. japonicum shows that sever-al genes are significantly up-regulated or down regulated expression,and that the availability of transcriptome information might provide a strong foundation for further understanding the schistosome-snail interaction at the molecular level.
الملخص
Recently, the technologies of DNA sequence variation and gene expression profiling have been used widely as approaches in the expertise of genome biology and genetics. The application to genome study has been particularly developed with the introduction of the next-generation DNA sequencer (NGS) Roche/454 and Illumina/Solexa systems, along with bioinformation analysis technologies of whole-genome de novo assembly, expression profiling, DNA variation discovery, and genotyping. Both massive whole-genome shotgun paired-end sequencing and mate paired-end sequencing data are important steps for constructing de novo assembly of novel genome sequencing data. It is necessary to have DNA sequence information from a multiplatform NGS with at least 2x and 30x depth sequence of genome coverage using Roche/454 and Illumina/Solexa, respectively, for effective an way of de novo assembly. Massive short-length reading data from the Illumina/Solexa system is enough to discover DNA variation, resulting in reducing the cost of DNA sequencing. Whole-genome expression profile data are useful to approach genome system biology with quantification of expressed RNAs from a whole-genome transcriptome, depending on the tissue samples. The hybrid mRNA sequences from Rohce/454 and Illumina/Solexa are more powerful to find novel genes through de novo assembly in any whole-genome sequenced species. The 20x and 50x coverage of the estimated transcriptome sequences using Roche/454 and Illumina/Solexa, respectively, is effective to create novel expressed reference sequences. However, only an average 30x coverage of a transcriptome with short read sequences of Illumina/Solexa is enough to check expression quantification, compared to the reference expressed sequence tag sequence.