RESUMO
In the study of stochastic systems, the committor function describes the probability that a system starting from an initial configuration x will reach a set B before a set A. This paper introduces an efficient and interpretable algorithm for approximating the committor, called the "fast committor machine" (FCM). The FCM uses simulated trajectory data to build a kernel-based model of the committor. The kernel function is constructed to emphasize low-dimensional subspaces that optimally describe the A to B transitions. The coefficients in the kernel model are determined using randomized linear algebra, leading to a runtime that scales linearly with the number of data points. In numerical experiments involving a triple-well potential and alanine dipeptide, the FCM yields higher accuracy and trains more quickly than a neural network with the same number of parameters. The FCM is also more interpretable than the neural net.
RESUMO
Probability currents are fundamental in characterizing the kinetics of nonequilibrium processes. Notably, the steady-state current Jss for a source-sink system can provide the exact mean-first-passage time (MFPT) for the transition from the source to sink. Because transient nonequilibrium behavior is quantified in some modern path sampling approaches, such as the "weighted ensemble" strategy, there is strong motivation to determine bounds on Jss-and hence on the MFPT-as the system evolves in time. Here, we show that Jss is bounded from above and below by the maximum and minimum, respectively, of the current as a function of the spatial coordinate at any time t for one-dimensional systems undergoing overdamped Langevin (i.e., Smoluchowski) dynamics and for higher-dimensional Smoluchowski systems satisfying certain assumptions when projected onto a single dimension. These bounds become tighter with time, making them of potential practical utility in a scheme for estimating Jss and the long time scale kinetics of complex systems. Conceptually, the bounds result from the fact that extrema of the transient currents relax toward the steady-state current.
RESUMO
Analysis of microbiome data involves identifying co-occurring groups of taxa associated with sample features of interest (e.g., disease state). Elucidating such relations is often difficult as microbiome data are compositional, sparse, and have high dimensionality. Also, the configuration of co-occurring taxa may represent overlapping subcommunities that contribute to sample characteristics such as host status. Preserving the configuration of co-occurring microbes rather than detecting specific indicator species is more likely to facilitate biologically meaningful interpretations. Additionally, analyses that use taxonomic relative abundances to predict the abundances of different gene functions aggregate predicted functional profiles across taxa. This precludes straightforward identification of predicted functional components associated with subsets of co-occurring taxa. We provide an approach to explore co-occurring taxa using "topics" generated via a topic model and link these topics to specific sample features (e.g., disease state). Rather than inferring predicted functional content based on overall taxonomic relative abundances, we instead focus on inference of functional content within topics, which we parse by estimating interactions between topics and pathways through a multilevel, fully Bayesian regression model. We apply our methods to three publicly available 16S amplicon sequencing datasets: an inflammatory bowel disease dataset, an oral cancer dataset, and a time-series dataset. Using our topic model approach to uncover latent structure in 16S rRNA amplicon surveys, investigators can (1) capture groups of co-occurring taxa termed topics; (2) uncover within-topic functional potential; (3) link taxa co-occurrence, gene function, and environmental/host features; and (4) explore the way in which sets of co-occurring taxa behave and evolve over time. These methods have been implemented in a freely available R package: https://cran.r-project.org/package=themetagenomics, https://github.com/EESI/themetagenomics.