RESUMO
BACKGROUND: Pairwise comparison of time series data for both local and time-lagged relationships is a computationally challenging problem relevant to many fields of inquiry. The Local Similarity Analysis (LSA) statistic identifies the existence of local and lagged relationships, but determining significance through a p-value has been algorithmically cumbersome due to an intensive permutation test, shuffling rows and columns and repeatedly calculating the statistic. Furthermore, this p-value is calculated with the assumption of normality -- a statistical luxury dissociated from most real world datasets. RESULTS: To improve the performance of LSA on big datasets, an asymptotic upper bound on the p-value calculation was derived without the assumption of normality. This change in the bound calculation markedly improved computational speed from O(pm²n) to O(m²n), where p is the number of permutations in a permutation test, m is the number of time series, and n is the length of each time series. The bounding process is implemented as a computationally efficient software package, FASTLSA, written in C and optimized for threading on multi-core computers, improving its practical computation time. We computationally compare our approach to previous implementations of LSA, demonstrate broad applicability by analyzing time series data from public health, microbial ecology, and social media, and visualize resulting networks using the Cytoscape software. CONCLUSIONS: The FASTLSA software package expands the boundaries of LSA allowing analysis on datasets with millions of co-varying time series. Mapping metadata onto force-directed graphs derived from FASTLSA allows investigators to view correlated cliques and explore previously unrecognized network relationships. The software is freely available for download at: http://www.cmde.science.ubc.ca/hallam/fastLSA/.
Assuntos
Software , Algoritmos , Biologia Computacional , Feminino , Humanos , Internet , Intestinos/microbiologia , Masculino , Metagenoma , Boca/microbiologia , Saccharomyces cerevisiae/genética , Pele/microbiologia , Interface Usuário-ComputadorRESUMO
Microbial communities drive biogeochemical cycles through networks of metabolite exchange that are structured along energetic gradients. As energy yields become limiting, these networks favor co-metabolic interactions to maximize energy disequilibria. Here we apply single-cell genomics, metagenomics, and metatranscriptomics to study bacterial populations of the abundant "microbial dark matter" phylum Marinimicrobia along defined energy gradients. We show that evolutionary diversification of major Marinimicrobia clades appears to be closely related to energy yields, with increased co-metabolic interactions in more deeply branching clades. Several of these clades appear to participate in the biogeochemical cycling of sulfur and nitrogen, filling previously unassigned niches in the ocean. Notably, two Marinimicrobia clades, occupying different energetic niches, express nitrous oxide reductase, potentially acting as a global sink for the greenhouse gas nitrous oxide.