Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 3 de 3
Filter
1.
PLoS Comput Biol ; 15(4): e1006445, 2019 04.
Article in English | MEDLINE | ID: mdl-31002665

ABSTRACT

Genetic spaces are often described in terms of fitness landscapes or genotype-to-phenotype maps, where each genetic sequence is associated with phenotypic properties and linked to other genotypes that are a single mutational step away. The positions close to a genotype make up its "mutational landscape" and, in aggregate, determine the short-term evolutionary potential of a population. Populations with wider ranges of phenotypes in their mutational neighborhood are known to be more evolvable. Likewise, those with fewer phenotypic changes available in their local neighborhoods are more mutationally robust. Here, we examine whether forces that change the distribution of phenotypes available by mutation profoundly alter subsequent evolutionary dynamics. We compare evolved populations of digital organisms that were subject to either static or cyclically-changing environments. For each of these, we examine diversity of the phenotypes that are produced through mutations in order to characterize the local genotype-phenotype map. We demonstrate that environmental change can push populations toward more evolvable mutational landscapes where many alternate phenotypes are available, though purely deleterious mutations remain suppressed. Further, we show that populations in environments with harsh changes switch phenotypes more readily than those in environments with more benign changes. We trace this effect to repeated population bottlenecks in the harsh environments, which result in shorter coalescence times and keep populations in regions of the mutational landscape where the phenotypic shifts in question are more likely to occur. Typically, static environments select solely for immediate optimization, at the expensive of long-term evolvability. In contrast, we show that with changing environments, short-term pressures to deal with immediate challenges can align with long-term pressures to explore a more productive portion of the mutational landscape.


Subject(s)
Biological Variation, Population , Gene-Environment Interaction , Models, Genetic , Computational Biology , Computer Simulation , Environment , Evolution, Molecular , Genetic Fitness , Genetics, Population , Mutation , Phylogeny , Software
2.
Proc Natl Acad Sci U S A ; 109(33): 13272-7, 2012 Aug 14.
Article in English | MEDLINE | ID: mdl-22847406

ABSTRACT

Deep sequencing has enabled the investigation of a wide range of environmental microbial ecosystems, but the high memory requirements for de novo assembly of short-read shotgun sequencing data from these complex populations are an increasingly large practical barrier. Here we introduce a memory-efficient graph representation with which we can analyze the k-mer connectivity of metagenomic samples. The graph representation is based on a probabilistic data structure, a Bloom filter, that allows us to efficiently store assembly graphs in as little as 4 bits per k-mer, albeit inexactly. We show that this data structure accurately represents DNA assembly graphs in low memory. We apply this data structure to the problem of partitioning assembly graphs into components as a prelude to assembly, and show that this reduces the overall memory requirements for de novo assembly of metagenomes. On one soil metagenome assembly, this approach achieves a nearly 40-fold decrease in the maximum memory requirements for assembly. This probabilistic graph representation is a significant theoretical advance in storing assembly graphs and also yields immediate leverage on metagenomic assembly.


Subject(s)
Computational Biology , Genome, Bacterial/genetics , Metagenome/genetics , Sequence Analysis, DNA/methods , Base Pairing/genetics , Chromosomes, Bacterial/genetics , DNA, Circular/genetics , Escherichia coli/genetics , Information Theory , Nonlinear Dynamics , Soil Microbiology
3.
PLoS One ; 9(7): e101271, 2014.
Article in English | MEDLINE | ID: mdl-25062443

ABSTRACT

K-mer abundance analysis is widely used for many purposes in nucleotide sequence analysis, including data preprocessing for de novo assembly, repeat detection, and sequencing coverage estimation. We present the khmer software package for fast and memory efficient online counting of k-mers in sequencing data sets. Unlike previous methods based on data structures such as hash tables, suffix arrays, and trie structures, khmer relies entirely on a simple probabilistic data structure, a Count-Min Sketch. The Count-Min Sketch permits online updating and retrieval of k-mer counts in memory which is necessary to support online k-mer analysis algorithms. On sparse data sets this data structure is considerably more memory efficient than any exact data structure. In exchange, the use of a Count-Min Sketch introduces a systematic overcount for k-mers; moreover, only the counts, and not the k-mers, are stored. Here we analyze the speed, the memory usage, and the miscount rate of khmer for generating k-mer frequency distributions and retrieving k-mer counts for individual k-mers. We also compare the performance of khmer to several other k-mer counting packages, including Tallymer, Jellyfish, BFCounter, DSK, KMC, Turtle and KAnalyze. Finally, we examine the effectiveness of profiling sequencing error, k-mer abundance trimming, and digital normalization of reads in the context of high khmer false positive rates. khmer is implemented in C++ wrapped in a Python interface, offers a tested and robust API, and is freely available under the BSD license at github.com/ged-lab/khmer.


Subject(s)
Computational Biology , Nucleotides , Sequence Analysis, DNA , Software , Algorithms , Humans
SELECTION OF CITATIONS
SEARCH DETAIL