ABSTRACT
Antarctic krill (Euphausia superba) is Earth's most abundant wild animal, and its enormous biomass is vital to the Southern Ocean ecosystem. Here, we report a 48.01-Gb chromosome-level Antarctic krill genome, whose large genome size appears to have resulted from inter-genic transposable element expansions. Our assembly reveals the molecular architecture of the Antarctic krill circadian clock and uncovers expanded gene families associated with molting and energy metabolism, providing insights into adaptations to the cold and highly seasonal Antarctic environment. Population-level genome re-sequencing from four geographical sites around the Antarctic continent reveals no clear population structure but highlights natural selection associated with environmental variables. An apparent drastic reduction in krill population size 10 mya and a subsequent rebound 100 thousand years ago coincides with climate change events. Our findings uncover the genomic basis of Antarctic krill adaptations to the Southern Ocean and provide valuable resources for future Antarctic research.
Subject(s)
Euphausiacea , Genome , Animals , Circadian Clocks/genetics , Ecosystem , Euphausiacea/genetics , Euphausiacea/physiology , Genomics , Sequence Analysis, DNA , DNA Transposable Elements , Biological Evolution , Adaptation, PhysiologicalABSTRACT
Lungfishes are the closest extant relatives of tetrapods and preserve ancestral traits linked with the water-to-land transition. However, their huge genome sizes have hindered understanding of this key transition in evolution. Here, we report a 40-Gb chromosome-level assembly of the African lungfish (Protopterus annectens) genome, which is the largest genome assembly ever reported and has a contig and chromosome N50 of 1.60 Mb and 2.81 Gb, respectively. The large size of the lungfish genome is due mainly to retrotransposons. Genes with ultra-long length show similar expression levels to other genes, indicating that lungfishes have evolved high transcription efficacy to keep gene expression balanced. Together with transcriptome and experimental data, we identified potential genes and regulatory elements related to such terrestrial adaptation traits as pulmonary surfactant, anxiolytic ability, pentadactyl limbs, and pharyngeal remodeling. Our results provide insights and key resources for understanding the evolutionary pathway leading from fishes to humans.
Subject(s)
Adaptation, Biological , Biological Evolution , Fishes/genetics , Whole Genome Sequencing , Animal Fins/anatomy & histology , Animal Fins/physiology , Animals , Extremities/anatomy & histology , Extremities/physiology , Fishes/anatomy & histology , Fishes/classification , Fishes/physiology , Phylogeny , Respiratory Physiological Phenomena , Respiratory System/anatomy & histology , Vertebrates/geneticsABSTRACT
Polyploidization is important to the evolution of plants. Subgenome dominance is a distinct phenomenon associated with most allopolyploids. A gene on the dominant subgenome tends to express to higher RNA levels in all organs as compared to the expression of its syntenic paralogue (homoeolog). The mechanism that underlies the formation of subgenome dominance remains unknown, but there is evidence for the involvement of transposon/DNA methylation density differences nearby the genes of parents as being causal. The subgenome with lower density of transposon and methylation near genes is positively associated with subgenome dominance. Here, we generated eight generations of allotetraploid progenies from the merging of parental genomes Brassica rapa and Brassica oleracea. We found that transposon/methylation density differ near genes between the parental (rapa:oleracea) existed in the wide hybrid, persisted in the neotetraploids (the synthetic Brassica napus), but these neotetraploids expressed no expected subgenome dominance. This absence of B. rapa vs. B. oleracea subgenome dominance is particularly significant because, while there is no negative relationship between transposon/methylation level and subgenome dominance in the neotetraploids, the more ancient parental subgenomes for all Brassica did show differences in transposon/methylation densities near genes and did express, in the same samples of cells, biased gene expression diagnostic of subgenome dominance. We conclude that subgenome differences in methylated transposon near genes are not sufficient to initiate the biased gene expressions defining subgenome dominance. Our result was unexpected, and we suggest a "nuclear chimera" model to explain our data.
Subject(s)
Brassica napus , Brassica rapa , Brassica , Brassica/genetics , Genome, Plant/genetics , Brassica rapa/genetics , Brassica napus/genetics , DNA Methylation/genetics , PolyploidyABSTRACT
MOTIVATION: Seeding is a rate-limiting stage in sequence alignment for next-generation sequencing reads. The existing optimization algorithms typically utilize hardware and machine-learning techniques to accelerate seeding. However, an efficient solution provided by professional next-generation sequencing compressors has been largely overlooked by far. In addition to achieving remarkable compression ratios by reordering reads, these compressors provide valuable insights for downstream alignment that reveal the repetitive computations accounting for more than 50% of seeding procedure in commonly used short read aligner BWA-MEM at typical sequencing coverage. Nevertheless, the exploited redundancy information is not fully realized or utilized. RESULTS: In this study, we present a compressive seeding algorithm, named CompSeed, to fill the gap. CompSeed, in collaboration with the existing reordering-based compression tools, finishes the BWA-MEM seeding process in about half the time by caching all intermediate seeding results in compact trie structures to directly answer repetitive inquiries that frequently cause random memory accesses. Furthermore, CompSeed demonstrates better performance as sequencing coverage increases, as it focuses solely on the small informative portion of sequencing reads after compression. The innovative strategy highlights the promising potential of integrating sequence compression and alignment to tackle the ever-growing volume of sequencing data. AVAILABILITY AND IMPLEMENTATION: CompSeed is available at https://github.com/i-xiaohu/CompSeed.
Subject(s)
Data Compression , Software , Sequence Analysis, DNA/methods , Algorithms , Data Compression/methods , Computers , High-Throughput Nucleotide Sequencing/methodsABSTRACT
Detailed knowledge of the genetic variations in diverse crop populations forms the basis for genetic crop improvement and gene functional studies. In the present study, we analyzed a large rice population with a total of 10 548 accessions to construct a rice super-population variation map (RSPVM), consisting of 54 378 986 single nucleotide polymorphisms, 11 119 947 insertion/deletion mutations and 184 736 presence/absence variations. Assessment of variation detection efficiency for different population sizes revealed a sharp increase of all types of variation as the population size increased and a gradual saturation of that after the population size reached 10 000. Variant frequency analysis indicated that â¼90% of the obtained variants were rare, and would therefore likely be difficult to detect in a relatively small population. Among the rare variants, only 2.7% were predicted to be deleterious. Population structure, genetic diversity and gene functional polymorphism of this large population were evaluated based on different subsets of RSPVM, demonstrating the great potential of RSPVM for use in downstream applications. Our study provides both a rich genetic basis for understanding natural rice variations and a powerful tool for exploiting great potential of rare variants in future rice research, including population genetics and functional genomics.
Subject(s)
Genetic Variation , Oryza , Genetics, Population , Genomics , Oryza/genetics , Polymorphism, Single NucleotideABSTRACT
BACKGROUND: In cold and temperate zones, seasonal reproduction plays a crucial role in the survival and reproductive success of species. The photoperiod influences reproductive processes in seasonal breeders through the hypothalamic-pituitary-gonadal (HPG) axis, in which the mediobasal hypothalamus (MBH) serves as the central region responsible for transmitting light information to the endocrine system. However, the cis-regulatory elements and the transcriptional activation mechanisms related to seasonal activation of the reproductive axis in MBH remain largely unclear. In this study, an artificial photoperiod program was used to induce the HPG axis activation in male quails, and we compared changes in chromatin accessibility changes during the seasonal activation of the HPG axis. RESULTS: Alterations in chromatin accessibility occurred in the mediobasal hypothalamus (MBH) and stabilized at LD7 during the activation of the HPG axis. Most open chromatin regions (OCRs) are enriched mainly in introns and distal intergenic regions. The differentially accessible regions (DARs) showed enrichment of binding motifs of the RFX, NKX, and MEF family of transcription factors that gained-loss accessibility under long-day conditions, while the binding motifs of the nuclear receptor (NR) superfamily and BZIP family gained-open accessibility. Retinoic acid signaling and GTPase-mediated signal transduction are involved in adaptation to long days and maintenance of the HPG axis activation. According to our footprint analysis, three clock-output genes (TEF, DBP, and HLF) and the THRA were the first responders to long days in LD3. THRB, NR3C2, AR, and NR3C1 are the key players associated with the initiation and maintenance of the activation of the HPG axis, which appeared at LD7 and tended to be stable under long-day conditions. By integrating chromatin and the transcriptome, three genes (DIO2, SLC16A2, and PDE6H) involved in thyroid hormone signaling showed differential chromatin accessibility and expression levels during the seasonal activation of the HPG axis. TRPA1, a target of THRB identified by DAP-seq, was sensitive to photoactivation and exhibited differential expression levels between short- and long-day conditions. CONCLUSION: Our data suggest that trans effects were the main factors affecting gene expression during the seasonal activation of the HPG axis. This study could lead to further research on the seasonal reproductive behavior of birds, particularly the role of MBH in controlling seasonal reproductive behavior.
Subject(s)
Chromatin , Quail , Animals , Male , Seasons , Quail/genetics , Chromatin/genetics , Chromatin/metabolism , Hypothalamus/metabolism , Reproduction/genetics , PhotoperiodABSTRACT
Pangolins are one of nature's most fascinating species being scales covered and myrmecophagous diet, yet relatively little is known about the molecular basis. Here, we combine the multi-omics, evolution, and fundamental proteins feature analysis of both Chinese and Malayan pangolins, highlighting the molecular mechanism of both myrmecophagous diet and scale formation, representing a fascinating evolutionary strategy to occupy the unique ecological niches. In contrast to conserved organization of epidermal differentiation complex, pangolin has undergone large scale variation and gene loss events causing expression pattern and function conversion that contribute to cornified epithelium structures on stomach to adapt myrmecophagous diet. Our assemblies also enable us to discover large copies number of high glycine-tyrosine keratin-associated proteins (HGT-KRTAPs). In addition, highly homogenized tandem array, amino content, and the specific expression pattern further validate the strong connection between the molecular mechanism of scale hardness and HGT-KRTAPs.
Subject(s)
Genome , Pangolins , Animals , DietABSTRACT
The spotted hyena (Crocuta crocuta) is a large and unique terrestrial carnivore. It is a particularly fascinating species due to its distinct phenotypic traits, especially its complex social structure and scavenging lifestyle, with associated high dietary exposure to microbial pathogens. However, the underlying molecular mechanisms related to these phenotypes remain elusive. Here, we sequenced and assembled a high-quality long-read genome of the spotted hyena, with a contig N50 length of â¼13.75 Mb. Based on comparative genomics, immunoglobulin family members (e.g., IGKV4-1) showed significant adaptive duplications in the spotted hyena and striped hyena. Furthermore, immune-related genes (e.g., CD8A, LAG3, and TLR3) experienced species-specific positive selection in the spotted hyena lineage. These results suggest that immune tolerance between the spotted hyena and closely related striped hyena has undergone adaptive divergence to cope with prolonged dietary exposure to microbial pathogens from scavenging. Furthermore, we provided the potential genetic insights underlying social complexity, hinting at social behavior and cognition. Specifically, the RECNE-associated genes (e.g., UGP2 and ACTR2) in the spotted hyena genome are involved in regulation of social communication. Taken together, our genomic analyses provide molecular insights into the scavenging lifestyle and societal complexity of spotted hyenas.
Subject(s)
Hyaenidae , Animals , Base Sequence , Genome , Hyaenidae/genetics , Social BehaviorABSTRACT
Existing long-read assemblers require thousands of central processing unit hours to assemble a human genome and are being outpaced by sequencing technologies in terms of both throughput and cost. We developed a long-read assembler wtdbg2 (https://github.com/ruanjue/wtdbg2) that is 2-17 times as fast as published tools while achieving comparable contiguity and accuracy. It paves the way for population-scale long-read assembly in future.
Subject(s)
High-Throughput Nucleotide Sequencing/methods , Algorithms , Datasets as Topic , Genome, Human , HumansABSTRACT
Diterpenoid alkaloids (DAs) have been often utilized in clinical practice due to their analgesic and anti-inflammatory properties. Natural DAs are prevalent in the family Ranunculaceae, notably in the Aconitum genus. Nevertheless, the evolutionary origin of the biosynthesis pathway responsible for DA production remains unknown. In this study, we successfully assembled a high-quality, pseudochromosome-level genome of the DA-rich species Aconitum vilmorinianum (A. vilmorinianum) (5.76 Gb). An A. vilmorinianum-specific whole-genome duplication event was discovered using comparative genomic analysis, which may aid in the evolution of the DA biosynthesis pathway. We identified several genes involved in DA biosynthesis via integrated genomic, transcriptomic, and metabolomic analyses. These genes included enzymes encoding target ent-kaurene oxidases and aminotransferases, which facilitated the activation of diterpenes and insertion of nitrogen atoms into diterpene skeletons, thereby mediating the transformation of diterpenes into DAs. The divergence periods of these genes in A. vilmorinianum were further assessed, and it was shown that two major types of genes were involved in the establishment of the DA biosynthesis pathway. Our integrated analysis offers fresh insights into the evolutionary origin of DAs in A. vilmorinianum as well as suggestions for engineering the biosynthetic pathways to obtain desired DAs.
Subject(s)
Aconitum , Alkaloids , Diterpenes , Aconitum/genetics , Aconitum/metabolism , Multiomics , Diterpenes/metabolism , Alkaloids/metabolism , Transcriptome/genetics , Plant RootsABSTRACT
Whole-genome genotyping methods are important for breeding. However, it has been challenging to develop a robust method for simultaneous foreground and background genotyping that can easily be adapted to different genes and species. In our study, we accidently discovered that in adapter ligation-mediated PCR, the amplification by primer-template mismatched annealing (PTMA) along the genome could generate thousands of stable PCR products. Based on this observation, we consequently developed a novel method for simultaneous foreground and background integrated genotyping by sequencing (FBI-seq) using one specific primer, in which foreground genotyping is performed by primer-template perfect annealing (PTPA), while background genotyping employs PTMA. Unlike DNA arrays, multiple PCR, or genome target enrichments, FBI-seq requires little preliminary work for primer design and synthesis, and it is easily adaptable to different foreground genes and species. FBI-seq therefore provides a prolific, robust, and accurate method for simultaneous foreground and background genotyping to facilitate breeding in the post-genomics era.
Subject(s)
Genome , Genotype , DNA Primers/genetics , Polymerase Chain Reaction/methodsABSTRACT
In this study, six candidate female-specific DNA sequences of octaploid Amur sturgeon (Acipenser schrenckii) were identified using comparative genomic approaches with high-throughput sequencing data. Their specificity was confirmed by traditional PCR. Two of these sex-specific sequences were also validated as female-specific in other eight sturgeon species and two hybrid sturgeons. The identified female-specific DNA fragments suggest that the family Acipenseridae has a ZZ/ZW sex-determining system. However, one of the two DNA sequences has been deleted in some sturgeons such as Sterlet sturgeon (Acipenser ruthenus), Beluga (Huso huso) and Kaluga (H. dauricus). The difference of sex-specific sequences among sturgeons indicates that there are different sex-specific regions among species of sturgeon. This study not only provided the sex-specific DNA sequences for management, conservation and studies of sex-determination mechanisms in sturgeons, but also confirmed the capability of the workflow to identify sex-specific DNA sequences in the polyploid species with complex genomes.
Subject(s)
Fishes , Genome , Animals , Base Sequence , Female , Fishes/genetics , Genomics , High-Throughput Nucleotide SequencingABSTRACT
BACKGROUND: Copy number variations (CNVs) are an important type of structural variations in the genome that usually affect gene expression levels by gene dosage effect. Understanding CNVs as part of genome evolution may provide insights into the genetic basis of important agricultural traits and contribute to the crop breeding in the future. While available methods to detect CNVs utilizing next-generation sequencing technology have helped shed light on prevalence and effects of CNVs, the complexity of crop genomes poses a major challenge and requires development of additional tools. RESULTS: Here, we generated genomic and transcriptomic data of 93 rice (Oryza sativa L.) accessions and developed a comprehensive pipeline to call CNVs in this large-scale dataset. We analyzed the correlation between CNVs and gene expression levels and found that approximately 13% of the identified genes showed a significant correlation between their expression levels and copy numbers. Further analysis showed that about 36% of duplicate pairs were involved in pseudogenetic events while only 5% of them showed functional differentiation. Moreover, the offspring copy mainly contributed to the expression levels and seemed more likely to become a pseudogene, whereas the parent copy tended to maintain the function of ancestral gene. CONCLUSION: We provide a high-accuracy CNV dataset that will contribute to functional genomics studies and molecular breeding in rice. We also showed that gene dosage effect of CNVs in rice is not exponential or linear. Our work demonstrates that the evolution of duplicated genes is asymmetric in both expression levels and gene fates, shedding a new insight into the evolution of duplicated genes.
Subject(s)
DNA Copy Number Variations , Evolution, Molecular , Gene Duplication , Genes, Plant , Oryza/genetics , Genome, Plant , TranscriptomeABSTRACT
Large genomes with elevated mutation rates are prone to accumulating deleterious mutations more rapidly than natural selection can purge (Muller's ratchet). As a consequence, it may lead to the extinction of small populations. Relative to most unicellular organisms, cancer cells, with large and nonrecombining genome and high mutation rate, could be particularly susceptible to such "mutational meltdown." However, the most common type of mutation in organismal evolution, namely, deleterious mutation, has received relatively little attention in the cancer biology literature. Here, by monitoring single-cell clones from HeLa cell lines, we characterize deleterious mutations that retard the rate of cell proliferation. The main mutation events are copy number variations (CNVs), which, estimated from fitness data, happen at a rate of 0.29 event per cell division on average. The mean fitness reduction, estimated reaching 18% per mutation, is very high. HeLa cell populations therefore have very substantial genetic load and, at this level, natural population would likely face mutational meltdown. We suspect that HeLa cell populations may avoid extinction only after the population size becomes large enough. Because CNVs are common in most cell lines and tumor tissues, the observations hint at cancer cells' vulnerability, which could be exploited by therapeutic strategies.
Subject(s)
Cell Proliferation/genetics , DNA Copy Number Variations , Genetic Load , HeLa Cells/physiology , Mutation Accumulation , Humans , Models, Biological , Mutation , PC-3 CellsABSTRACT
We present novoBreak, a genome-wide local assembly algorithm that discovers somatic and germline structural variation breakpoints in whole-genome sequencing data. novoBreak consistently outperformed existing algorithms on real cancer genome data and on synthetic tumors in the ICGC-TCGA DREAM 8.5 Somatic Mutation Calling Challenge primarily because it more effectively utilized reads spanning breakpoints. novoBreak also demonstrated great sensitivity in identifying short insertions and deletions.
Subject(s)
High-Throughput Nucleotide Sequencing/methods , Mutation/genetics , Neoplasms/genetics , Sequence Analysis, DNA/methods , Algorithms , Chromosome Breakpoints , Computational Biology , Genome, Human , Humans , Neoplasms/pathology , Software , Tumor Cells, CulturedABSTRACT
BACKGROUND: Buffalo milk is considered as a highly nutritious food owing to its higher contents of fatty acids (FA) and rich nutrient profile. Higher fat contents of buffalo milk make it suitable for processing to develop various healthy and nutritious products. Moreover, buffalo milk contains more unsaturated FAs (UFA) such as oleic and linolenic acid, which are important from the human health point of view owing to their desirable physiological effects. However, inadequate information is available about the chemical composition and mechanism of FA synthesis in buffalo milk. In this study, we hypothesized that expression of SCD1 gene could alter the biosynthesis of FA in epithelial cells of mammary gland and subsequently affect the FA contents in buffalo milk. We investigated the transcriptional and biological role of Stearoyl-CoA Desaturase 1 (SCD1) in the buffalo mammary epithelial cells (BMECs) during FA and triacylglycerol (TAG) synthesis. RESULTS: Results revealed that unsaturated fatty acid contents were much higher in concentration in buffalo milk as compared to Holstein cow. Significant increase in the expression level of FAS, ACACA, SREBP1, PPARG, GPAT, and AGPAT genes was observed in response to altered expression of SCD1 in buffalo milk. Moreover, change in SCD1 gene in BMECs also mediated the expression of genes related to FA biosynthesis subsequently leading to alter the FA composition. Overexpression of SCD1 significantly increased the expression of genes associated with FA and TAG synthesis leading to enhance FA and unsaturated FA contents in BMECs. However, down-regulation of SCD1 exhibited opposite consequences. CONCLUSION: Our study provides mechanistic insights on transcriptional regulation of SCD1 to alter FA and TAG synthesis through directly or indirectly mediating biosynthesis and metabolic pathways in BMECs. We provide preliminary findings regarding engineering of FA contents in buffalo milk through SCD1 signaling.
Subject(s)
Fatty Acids/biosynthesis , Stearoyl-CoA Desaturase/genetics , Transcription, Genetic , Animals , Buffaloes/genetics , Cattle , Female , Gene Expression Regulation/genetics , Humans , Lactation/genetics , Mammary Glands, Animal/metabolism , Milk/enzymologyABSTRACT
BACKGROUND: The advent of third-generation sequencing (TGS) technologies opens the door to improve genome assembly. Long reads are promising for enhancing the quality of fragmented draft assemblies constructed from next-generation sequencing (NGS) technologies. To date, a few algorithms that are capable of improving draft assemblies have released. There are SSPACE-LongRead, OPERA-LG, SMIS, npScarf, DBG2OLC, Unicycler, and LINKS. Hybrid assembly on large genomes remains challenging, however. RESULTS: We develop a scalable and computationally efficient scaffolder, Long Reads Scaffolder (LRScaf, https://github.com/shingocat/lrscaf), that is capable of significantly boosting assembly contiguity using long reads. In this study, we summarise a comprehensive performance assessment for state-of-the-art scaffolders and LRScaf on seven organisms, i.e., E. coli, S. cerevisiae, A. thaliana, O. sativa, S. pennellii, Z. mays, and H. sapiens. LRScaf significantly improves the contiguity of draft assemblies, e.g., increasing the NGA50 value of CHM1 from 127.1 kbp to 9.4 Mbp using 20-fold coverage PacBio dataset and the NGA50 value of NA12878 from 115.3 kbp to 12.9 Mbp using 35-fold coverage Nanopore dataset. Besides, LRScaf generates the best contiguous NGA50 on A. thaliana, S. pennellii, Z. mays, and H. sapiens. Moreover, LRScaf has the shortest run time compared with other scaffolders, and the peak RAM of LRScaf remains practical for large genomes (e.g., 20.3 and 62.6 GB on CHM1 and NA12878, respectively). CONCLUSIONS: The new algorithm, LRScaf, yields the best or, at least, moderate scaffold contiguity and accuracy in the shortest run time compared with other scaffolding algorithms. Furthermore, LRScaf provides a cost-effective way to improve contiguity of draft assemblies on large genomes.
Subject(s)
Algorithms , Computational Biology/methods , Genome/genetics , Genomics/methods , Benchmarking , High-Throughput Nucleotide Sequencing , Nanopore Sequencing , Sequence Analysis, DNAABSTRACT
Applications that use Bacterial Artificial Chromosome (BAC) libraries often require paired-end sequences and knowledge of the physical location of each clone in plates. To facilitate obtaining this information in high-throughput, we generated pBACode vectors: a pool of BAC cloning vectors, each with a pair of random barcodes flanking its cloning site. In a pBACode BAC library, the BAC ends and their linked barcodes can be sequenced in bulk. Barcode pairs are determined by sequencing the empty pBACode vectors, which allows BAC ends to be paired according to their barcodes. For physical clone mapping, the barcodes are used as unique markers for their linked genomic sequence. After multi-dimensional pooling of BAC clones, the barcodes are sequenced and deconvoluted to locate each clone. We generated a pBACode library of 94,464 clones for the flounder Paralichthys olivaceus and obtained paired-end sequence from 95.4% of the clones. Incorporating BAC paired-ends into the genome preassembly improved its continuity by over 10-fold. Furthermore, we were able to use the barcodes to map the physical locations of each clone in just 50 pools, with up to 11 808 clones per pool. Our physical clone mapping located 90.2% of BAC clones, enabling targeted characterization of chromosomal rearrangements.
Subject(s)
Chromosomes, Artificial, Bacterial , Cloning, Molecular , High-Throughput Nucleotide Sequencing/methods , Physical Chromosome Mapping/methods , Sequence Analysis, DNA/methods , Animals , Flounder/genetics , Gene Library , Genome , Saccharomyces cerevisiae/geneticsABSTRACT
BACKGROUND: NGS (next generation sequencing) has been widely used in studies of biological processes, ranging from microbial evolution to cancer genomics. However, the error rate of NGS (0.1 % ~ 1 %) is still remaining a great challenge for comprehensively investigating the low frequency variations, and the current solution methods have suffered severe amplification bias or low efficiency. RESULTS: We creatively developed Droplet-CirSeq for relatively efficient, low-bias and ultra-sensitive identification of variations by combining millions of picoliter uniform-sized droplets with Cir-seq. Droplet-CirSeq is entitled with an incredibly low error rate of 3 ~ 5 X 10(-6). To systematically evaluate the performances of amplification uniformity and capability of mutation identification for Droplet-CirSeq, we took the mixtures of two E. coli strains as specific instances to simulate the circumstances of mutations with different frequencies. Compared with Cir-seq, the coefficient of variance of read depth for Droplet-CirSeq was 10 times less (p = 2.6 X 10(-3)), and the identified allele frequency presented more concentrated to the authentic frequency of mixtures (p = 4.8 X 10(-3)), illustrating a significant improvement of amplification bias and accuracy in allele frequency determination. Additionally, Droplet-CirSeq detected 2.5 times genuine SNPs (p < 0.001), achieved a 2.8 times lower false positive rate (p < 0.05) and a 1.5 times lower false negative rate (p < 0.001), in the case of a 3 pg DNA input. Intriguingly, the false positive sites predominantly represented in two types of base substitutions (G- > A, C- > T). Our findings indicated that 30 pg DNA input accommodated in 5 ~ 10 million droplets resulted in maximal detection of authentic mutations compared to 3 pg (p = 1.2 X 10(-8)) and 300 pg input (p = 2.2 X 10(-3)). CONCLUSIONS: We developed a method namely Droplet-CirSeq to significantly improve the amplification bias, which presents obvious superiority over the currently prevalent methods in exploitation of ultra-low frequency mutations. Droplet-CirSeq would be promisingly used in the identification of low frequency mutations initiated from extremely low input DNA, such as DNA of uncultured microorganisms, captured DNA of target region, circulation DNA of plasma et al, and its creative conception of rolling circle amplification in droplets would also be used in other low input DNA amplification fields.
Subject(s)
DNA Mutational Analysis/methods , DNA, Circular/genetics , Nucleic Acid Amplification Techniques/methods , DNA, Bacterial/genetics , Escherichia coli , Gene Frequency , Polymorphism, Single NucleotideABSTRACT
Using next-generation sequencing technology alone, we have successfully generated and assembled a draft sequence of the giant panda genome. The assembled contigs (2.25 gigabases (Gb)) cover approximately 94% of the whole genome, and the remaining gaps (0.05 Gb) seem to contain carnivore-specific repeats and tandem repeats. Comparisons with the dog and human showed that the panda genome has a lower divergence rate. The assessment of panda genes potentially underlying some of its unique traits indicated that its bamboo diet might be more dependent on its gut microbiome than its own genetic composition. We also identified more than 2.7 million heterozygous single nucleotide polymorphisms in the diploid genome. Our data and analyses provide a foundation for promoting mammalian genetic research, and demonstrate the feasibility for using next-generation sequencing technologies for accurate, cost-effective and rapid de novo assembly of large eukaryotic genomes.