Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 34
Filtrar
1.
PLoS Comput Biol ; 19(11): e1011655, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-38011273

RESUMO

Generative models of protein sequence families are an important tool in the repertoire of protein scientists and engineers alike. However, state-of-the-art generative approaches face inference, accuracy, and overfitting- related obstacles when modeling moderately sized to large proteins and/or protein families with low sequence coverage. Here, we present a simple to learn, tunable, and accurate generative model, GENERALIST: GENERAtive nonLInear tenSor-factorizaTion for protein sequences. GENERALIST accurately captures several high order summary statistics of amino acid covariation. GENERALIST also predicts conservative local optimal sequences which are likely to fold in stable 3D structure. Importantly, unlike current methods, the density of sequences in GENERALIST-modeled sequence ensembles closely resembles the corresponding natural ensembles. Finally, GENERALIST embeds protein sequences in an informative latent space. GENERALIST will be an important tool to study protein sequence variability.


Assuntos
Aminoácidos , Proteínas , Proteínas/química , Sequência de Aminoácidos
2.
Nat Methods ; 16(8): 731-736, 2019 08.
Artigo em Inglês | MEDLINE | ID: mdl-31308552

RESUMO

Metagenomic sequencing has enabled detailed investigation of diverse microbial communities, but understanding their spatiotemporal variability remains an important challenge. Here, we present decomposition of variance using replicate sampling (DIVERS), a method based on replicate sampling and spike-in sequencing. The method quantifies the contributions of temporal dynamics, spatial sampling variability, and technical noise to the variances and covariances of absolute bacterial abundances. We applied DIVERS to investigate a high-resolution time series of the human gut microbiome and a spatial survey of a soil bacterial community in Manhattan's Central Park. Our analysis showed that in the gut, technical noise dominated the abundance variability for nearly half of the detected taxa. DIVERS also revealed substantial spatial heterogeneity of gut microbiota, and high temporal covariances of taxa within the Bacteroidetes phylum. In the soil community, spatial variability primarily contributed to abundance fluctuations at short time scales (weeks), while temporal variability dominated at longer time scales (several months).


Assuntos
Algoritmos , Bactérias/genética , Fezes/microbiologia , Microbioma Gastrointestinal , Metagenômica/métodos , Microbiologia do Solo , Análise Espaço-Temporal , Bactérias/classificação , Humanos , RNA Ribossômico 16S , Análise de Sequência de DNA , Manejo de Espécimes
3.
PLoS Comput Biol ; 17(8): e1009275, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-34358223

RESUMO

In modern computational biology, there is great interest in building probabilistic models to describe collections of a large number of co-varying binary variables. However, current approaches to build generative models rely on modelers' identification of constraints and are computationally expensive to infer when the number of variables is large (N~100). Here, we address both these issues with Super-statistical Generative Model for binary Data (SiGMoiD). SiGMoiD is a maximum entropy-based framework where we imagine the data as arising from super-statistical system; individual binary variables in a given sample are coupled to the same 'bath' whose intensive variables vary from sample to sample. Importantly, unlike standard maximum entropy approaches where modeler specifies the constraints, the SiGMoiD algorithm infers them directly from the data. Due to this optimal choice of constraints, SiGMoiD allows us to model collections of a very large number (N>1000) of binary variables. Finally, SiGMoiD offers a reduced dimensional description of the data, allowing us to identify clusters of similar data points as well as binary variables. We illustrate the versatility of SiGMoiD using multiple datasets spanning several time- and length-scales.


Assuntos
Biologia Computacional/métodos , Modelos Estatísticos , Algoritmos , Entropia
4.
Annu Rev Phys Chem ; 71: 213-238, 2020 04 20.
Artigo em Inglês | MEDLINE | ID: mdl-32075515

RESUMO

Ever since Clausius in 1865 and Boltzmann in 1877, the concepts of entropy and of its maximization have been the foundations for predicting how material equilibria derive from microscopic properties. But, despite much work, there has been no equally satisfactory general variational principle for nonequilibrium situations. However, in 1980, a new avenue was opened by E.T. Jaynes and by Shore and Johnson. We review here maximum caliber, which is a maximum-entropy-like principle that can infer distributions of flows over pathways, given dynamical constraints. This approach is providing new insights, particularly into few-particle complex systems, such as gene circuits, protein conformational reaction coordinates, network traffic, bird flocking, cell motility, and neuronal firing.


Assuntos
DNA/química , Redes Reguladoras de Genes , Modelos Teóricos , Proteínas/química , DNA/genética , Entropia , Cinética , Modelos Químicos , Modelos Genéticos , Simulação de Dinâmica Molecular , Conformação de Ácido Nucleico , Conformação Proteica , Proteínas/genética
5.
Neural Comput ; 31(5): 980-997, 2019 05.
Artigo em Inglês | MEDLINE | ID: mdl-30883279

RESUMO

Stochastic kernel-based dimensionality-reduction approaches have become popular in the past decade. The central component of many of these methods is a symmetric kernel that quantifies the vicinity between pairs of data points and a kernel-induced Markov chain on the data. Typically, the Markov chain is fully specified by the kernel through row normalization. However, in many cases, it is desirable to impose user-specified stationary-state and dynamical constraints on the Markov chain. Unfortunately, no systematic framework exists to impose such user-defined constraints. Here, based on our previous work on inference of Markov models, we introduce a path entropy maximization based approach to derive the transition probabilities of Markov chains using a kernel and additional user-specified constraints. We illustrate the usefulness of these Markov chains with examples.

6.
J Chem Phys ; 150(5): 054105, 2019 Feb 07.
Artigo em Inglês | MEDLINE | ID: mdl-30736685

RESUMO

Markov State Models (MSMs) describe the rates and routes in conformational dynamics of biomolecules. Computational estimation of MSMs can be expensive because molecular simulations are slow to find and sample the rare transient events. We describe here an efficient approximate way to determine MSM rate matrices by combining maximum caliber (maximizing path entropies) with optimal transport theory (minimizing some path cost function, as when routing trucks on transportation networks) to patch together transient dynamical information from multiple non-equilibrium simulations. We give toy examples.

7.
J Chem Phys ; 148(1): 010901, 2018 Jan 07.
Artigo em Inglês | MEDLINE | ID: mdl-29306272

RESUMO

We review here Maximum Caliber (Max Cal), a general variational principle for inferring distributions of paths in dynamical processes and networks. Max Cal is to dynamical trajectories what the principle of maximum entropy is to equilibrium states or stationary populations. In Max Cal, you maximize a path entropy over all possible pathways, subject to dynamical constraints, in order to predict relative path weights. Many well-known relationships of non-equilibrium statistical physics-such as the Green-Kubo fluctuation-dissipation relations, Onsager's reciprocal relations, and Prigogine's minimum entropy production-are limited to near-equilibrium processes. Max Cal is more general. While it can readily derive these results under those limits, Max Cal is also applicable far from equilibrium. We give examples of Max Cal as a method of inference about trajectory distributions from limited data, finding reaction coordinates in bio-molecular simulations, and modeling the complex dynamics of non-thermal systems such as gene regulatory networks or the collective firing of neurons. We also survey its basis in principle and some limitations.

8.
Proc Natl Acad Sci U S A ; 112(29): 9070-5, 2015 Jul 21.
Artigo em Inglês | MEDLINE | ID: mdl-26153419

RESUMO

An approximation to the ∼4-Mbp basic genome shared by 32 strains of Escherichia coli representing six evolutionary groups has been derived and analyzed computationally. A multiple alignment of the 32 complete genome sequences was filtered to remove mobile elements and identify the most reliable ∼90% of the aligned length of each of the resulting 496 basic-genome pairs. Patterns of single base-pair mutations (SNPs) in aligned pairs distinguish clonally inherited regions from regions where either genome has acquired DNA fragments from diverged genomes by homologous recombination since their last common ancestor. Such recombinant transfer is pervasive across the basic genome, mostly between genomes in the same evolutionary group, and generates many unique mosaic patterns. The six least-diverged genome pairs have one or two recombinant transfers of length ∼40-115 kbp (and few if any other transfers), each containing one or more gene clusters known to confer strong selective advantage in some environments. Moderately diverged genome pairs (0.4-1% SNPs) show mosaic patterns of interspersed clonal and recombinant regions of varying lengths throughout the basic genome, whereas more highly diverged pairs within an evolutionary group or pairs between evolutionary groups having >1.3% SNPs have few clonal matches longer than a few kilobase pairs. Many recombinant transfers appear to incorporate fragments of the entering DNA produced by restriction systems of the recipient cell. A simple computational model can closely fit the data. Most recombinant transfers seem likely to be due to generalized transduction by coevolving populations of phages, which could efficiently distribute variability throughout bacterial genomes.


Assuntos
Escherichia coli/genética , Genoma Bacteriano , Recombinação Genética/genética , Transformação Genética , Bacteriófagos/genética , Pareamento de Bases/genética , Evolução Biológica , Células Clonais , Escherichia coli/virologia , Vetores Genéticos , Modelos Genéticos , Anotação de Sequência Molecular , Mosaicismo , Filogenia , Polimorfismo de Nucleotídeo Único/genética , Mapeamento por Restrição , Transdução Genética
9.
J Chem Phys ; 147(16): 164901, 2017 Oct 28.
Artigo em Inglês | MEDLINE | ID: mdl-29096517

RESUMO

Quantifying the statistics of occupancy of solvent molecules in the vicinity of solutes is central to our understanding of solvation phenomena. Number fluctuations in small solvation shells around solutes cannot be described within the macroscopic grand canonical framework using a single chemical potential that represents the solvent bath. In this communication, we hypothesize that molecular-sized observation volumes such as solvation shells are best described by coupling the solvation shell with a mixture of particle baths each with its own chemical potential. We confirm our hypotheses by studying the enhanced fluctuations in the occupancy statistics of hard sphere solvent particles around a distinguished hard sphere solute particle. Connections with established theories of solvation are also discussed.

10.
Proc Natl Acad Sci U S A ; 110(51): 20380-5, 2013 Dec 17.
Artigo em Inglês | MEDLINE | ID: mdl-24297895

RESUMO

Probability distributions having power-law tails are observed in a broad range of social, economic, and biological systems. We describe here a potentially useful common framework. We derive distribution functions for situations in which a "joiner particle" k pays some form of price to enter a community of size , where costs are subject to economies of scale. Maximizing the Boltzmann-Gibbs-Shannon entropy subject to this energy-like constraint predicts a distribution having a power-law tail; it reduces to the Boltzmann distribution in the absence of economies of scale. We show that the predicted function gives excellent fits to 13 different distribution functions, ranging from friendship links in social networks, to protein-protein interactions, to the severity of terrorist attacks. This approach may give useful insights into when to expect power-law distributions in the natural and social sciences.

11.
Phys Chem Chem Phys ; 17(19): 13000-5, 2015 May 21.
Artigo em Inglês | MEDLINE | ID: mdl-25912565

RESUMO

The Gibbs and the Boltzmann definition of temperature agree only in the macroscopic limit. The ambiguity in identifying the equilibrium temperature of a finite-sized 'small' system exchanging energy with a bath is usually understood as a limitation of conventional statistical mechanics. We interpret this ambiguity as resulting from a stochastically fluctuating temperature coupled with the phase space variables giving rise to a broad temperature distribution. With this ansatz, we develop the equilibrium statistics and dynamics of small systems. Numerical evidence using an analytically tractable model shows that the effects of temperature fluctuations can be detected in the equilibrium and dynamical properties of the phase space of the small system. Our theory generalizes statistical mechanics to small systems relevant in biophysics and nanotechnology.


Assuntos
Modelos Teóricos , Temperatura , Entropia , Teoria Quântica , Processos Estocásticos
12.
J Chem Phys ; 143(5): 051104, 2015 Aug 07.
Artigo em Inglês | MEDLINE | ID: mdl-26254635

RESUMO

There has been interest in finding a general variational principle for non-equilibrium statistical mechanics. We give evidence that Maximum Caliber (Max Cal) is such a principle. Max Cal, a variant of maximum entropy, predicts dynamical distribution functions by maximizing a path entropy subject to dynamical constraints, such as average fluxes. We first show that Max Cal leads to standard near-equilibrium results­including the Green-Kubo relations, Onsager's reciprocal relations of coupled flows, and Prigogine's principle of minimum entropy production­in a way that is particularly simple. We develop some generalizations of the Onsager and Prigogine results that apply arbitrarily far from equilibrium. Because Max Cal does not require any notion of "local equilibrium," or any notion of entropy dissipation, or temperature, or even any restriction to material physics, it is more general than many traditional approaches. It also applicable to flows and traffic on networks, for example.


Assuntos
Entropia , Modelos Teóricos , Probabilidade
13.
PLoS Comput Biol ; 9(4): e1003023, 2013 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-23592969

RESUMO

In addition to their biological function, protein complexes reduce the exposure of the constituent proteins to the risk of undesired oligomerization by reducing the concentration of the free monomeric state. We interpret this reduced risk as a stabilization of the functional state of the protein. We estimate that protein-protein interactions can account for ~2-4 k(B)T of additional stabilization; a substantial contribution to intrinsic stability. We hypothesize that proteins in the interaction network act as evolutionary capacitors which allows their binding partners to explore regions of the sequence space which correspond to less stable proteins. In the interaction network of baker's yeast, we find that statistically proteins that receive higher energetic benefits from the interaction network are more likely to misfold. A simplified fitness landscape wherein the fitness of an organism is inversely proportional to the total concentration of unfolded proteins provides an evolutionary justification for the proposed trends. We conclude by outlining clear biophysical experiments to test our predictions.


Assuntos
Biologia Computacional/métodos , Evolução Molecular , Proteínas Fúngicas/química , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas , Proteínas/química , Citoplasma/química , Proteínas de Choque Térmico HSP90/química , Ligação Proteica , Conformação Proteica , Desnaturação Proteica , Dobramento de Proteína , Saccharomyces cerevisiae/química , Termodinâmica
14.
Biophys J ; 104(12): 2743-50, 2013 Jun 18.
Artigo em Inglês | MEDLINE | ID: mdl-23790383

RESUMO

We present a maximum entropy framework to separate intrinsic and extrinsic contributions to noisy gene expression solely from the profile of expression. We express the experimentally accessible probability distribution of the copy number of the gene product (mRNA or protein) by accounting for possible variations in extrinsic factors. The distribution of extrinsic factors is estimated using the maximum entropy principle. Our results show that extrinsic factors qualitatively and quantitatively affect the probability distribution of the gene product. We work out, in detail, the transcription of mRNA from a constitutively expressed promoter in Escherichia coli. We suggest that the variation in extrinsic factors may account for the observed wider-than-Poisson distribution of mRNA copy numbers. We successfully test our framework on a numerical simulation of a simple gene expression scheme that accounts for the variation in extrinsic factors. We also make falsifiable predictions, some of which are tested on previous experiments in E. coli whereas others need verification. Application of the presented framework to more complex situations is also discussed.


Assuntos
Entropia , Modelos Biológicos , Transcrição Gênica , Proteínas de Bactérias/genética , Proteínas de Bactérias/metabolismo , Escherichia coli/genética , Escherichia coli/metabolismo , Regiões Promotoras Genéticas , RNA Bacteriano/genética , RNA Bacteriano/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo
15.
J Chem Phys ; 138(18): 184111, 2013 May 14.
Artigo em Inglês | MEDLINE | ID: mdl-23676033

RESUMO

We present a maximum entropy approach to analyze the state space of a small system in contact with a large bath, e.g., a solvated macromolecular system. For the solute, the fluctuations around the mean values of observables are not negligible and the probability distribution P(r) of the state space depends on the intricate details of the interaction of the solute with the solvent. Here, we employ a superstatistical approach: P(r) is expressed as a marginal distribution summed over the variation in ß, the inverse temperature of the solute. The joint distribution P(ß, r) is estimated by maximizing its entropy. We also calculate the first order system-size corrections to the canonical ensemble description of the state space. We test the development on a simple harmonic oscillator interacting with two baths with very different chemical identities, viz., (a) Lennard-Jones particles and (b) water molecules. In both cases, our method captures the state space of the oscillator sufficiently well. Future directions and connections with traditional statistical mechanics are discussed.


Assuntos
Termodinâmica , Simulação de Dinâmica Molecular
16.
NPJ Syst Biol Appl ; 9(1): 26, 2023 06 20.
Artigo em Inglês | MEDLINE | ID: mdl-37339950

RESUMO

Dimensionality reduction offers unique insights into high-dimensional microbiome dynamics by leveraging collective abundance fluctuations of multiple bacteria driven by similar ecological perturbations. However, methods providing lower-dimensional representations of microbiome dynamics both at the community and individual taxa levels are not currently available. To that end, we present EMBED: Essential MicroBiomE Dynamics, a probabilistic nonlinear tensor factorization approach. Like normal mode analysis in structural biophysics, EMBED infers ecological normal modes (ECNs), which represent the unique orthogonal modes capturing the collective behavior of microbial communities. Using multiple real and synthetic datasets, we show that a very small number of ECNs can accurately approximate microbiome dynamics. Inferred ECNs reflect specific ecological behaviors, providing natural templates along which the dynamics of individual bacteria may be partitioned. Moreover, the multi-subject treatment in EMBED systematically identifies subject-specific and universal abundance dynamics that are not detected by traditional approaches. Collectively, these results highlight the utility of EMBED as a versatile dimensionality reduction tool for studies of microbiome dynamics.


Assuntos
Microbiota , Microbiota/genética , Bactérias/genética
17.
Nat Metab ; 4(6): 711-723, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-35739397

RESUMO

Production of oxidized biomass, which requires regeneration of the cofactor NAD+, can be a proliferation bottleneck that is influenced by environmental conditions. However, a comprehensive quantitative understanding of metabolic processes that may be affected by NAD+ deficiency is currently missing. Here, we show that de novo lipid biosynthesis can impose a substantial NAD+ consumption cost in proliferating cancer cells. When electron acceptors are limited, environmental lipids become crucial for proliferation because NAD+ is required to generate precursors for fatty acid biosynthesis. We find that both oxidative and even net reductive pathways for lipogenic citrate synthesis are gated by reactions that depend on NAD+ availability. We also show that access to acetate can relieve lipid auxotrophy by bypassing the NAD+ consuming reactions. Gene expression analysis demonstrates that lipid biosynthesis strongly anti-correlates with expression of hypoxia markers across tumor types. Overall, our results define a requirement for oxidative metabolism to support biosynthetic reactions and provide a mechanistic explanation for cancer cell dependence on lipid uptake in electron acceptor-limited conditions, such as hypoxia.


Assuntos
NAD , Neoplasias , Proliferação de Células , Elétrons , Humanos , Hipóxia , Lipídeos , NAD/metabolismo
18.
Biophys J ; 101(6): 1459-66, 2011 Sep 21.
Artigo em Inglês | MEDLINE | ID: mdl-21943427

RESUMO

We express the effective Hamiltonian of an ion-binding site in a protein as a combination of the Hamiltonian of the ion-bound site in vacuum and the restraints of the protein on the site. The protein restraints are described by the quadratic elastic network model. The Hamiltonian of the ion-bound site in vacuum is approximated as a generalized Hessian around the minimum energy configuration. The resultant of the two quadratic Hamiltonians is cast into a pure quadratic form. In the canonical ensemble, the quadratic nature of the resultant Hamiltonian allows us to express analytically the excess free energy, enthalpy, and entropy of ion binding to the protein. The analytical expressions allow us to separate the roles of the dynamic restraints imposed by the protein on the binding site and the temperature-independent chemical effects in metal-ligand coordination. For the consensus zinc-finger peptide, relative to the aqueous phase, the calculated free energy of exchanging Zn(2+) with Fe(2+), Co(2+), Ni(2+), and Cd(2+) are in agreement with experiments. The predicted excess enthalpy of ion exchange between Zn(2+) and Co(2+) also agrees with the available experimental estimate. The free energy of applying the protein restraints reveals that relative to Zn(2+), the Co(2+), and Cd(2+)-site clusters are more destabilized by the protein restraints. This leads to an experimentally testable hypothesis that a tetrahedral metal binding site with minimal protein restraints will be less selective for Zn(2+) over Co(2+) and Cd(2+) compared to a zinc finger peptide. No appreciable change is expected for Fe(2+) and Ni(2+). The framework presented here may prove useful in protein engineering to tune metal selectivity.


Assuntos
Metais , Proteínas/química , Proteínas/metabolismo , Dedos de Zinco , Sítios de Ligação , Elasticidade , Entropia , Troca Iônica , Ligação Proteica , Teoria Quântica
19.
Biophys J ; 100(6): 1542-9, 2011 Mar 16.
Artigo em Inglês | MEDLINE | ID: mdl-21402037

RESUMO

In studying ion-selectivity in biomaterials, it is common to study ion-protein interactions within a local neighborhood around the ion. This local system analysis for the S(2) site of KcsA, its semisynthetic analog, and valinomycin yields the free energy change in exchanging K(+) with Na(+) in quantitative agreement with the value obtained by considering ion-interactions with the entire system. But the energetics of ion binding in the local system and in the entire system differ significantly and lead to different conclusions regarding the physical basis of ion selectivity. For configurations sampled from an all-atom simulation, we show that the selectivity free energy can be decomposed into a contribution arising from interactions of the ion with its local neighborhood, ΔW(local), and a term arising from the field imposed on the ion and the binding site by the rest of the medium, ΔW(ϕ). The local contribution ΔW(local) is numerically close to the actual free energy difference because the field contribution is small. The field contribution is small because of cancellation of inversely related ion-medium and site-medium interactions. Our analysis presents a rigorous foundation for the numerical success of the local system analysis and shows that its implications do not always hold for the entire protein.


Assuntos
Simulação de Dinâmica Molecular , Canais de Potássio/química , Canais de Potássio/metabolismo , Valinomicina/metabolismo , Sítios de Ligação , Mutação , Canais de Potássio/genética , Ligação Proteica , Conformação Proteica , Termodinâmica , Valinomicina/química
20.
J Chem Phys ; 135(5): 054505, 2011 Aug 07.
Artigo em Inglês | MEDLINE | ID: mdl-21823710

RESUMO

Thermochemistry of gas-phase ion-water clusters together with estimates of the hydration free energy of the clusters and the water ligands are used to calculate the hydration free energy of the ion. Often the hydration calculations use a continuum model of the solvent. The primitive quasichemical approximation to the quasichemical theory provides a transparent framework to anchor such efforts. Here we evaluate the approximations inherent in the primitive quasichemical approach and elucidate the different roles of the bulk medium. We find that the bulk medium can stabilize configurations of the cluster that are usually not observed in the gas phase, while also simultaneously lowering the excess chemical potential of the ion. This effect is more pronounced for soft ions. Since the coordination number that minimizes the excess chemical potential of the ion is identified as the optimal or most probable coordination number, for such soft ions the optimum cluster size and the hydration thermodynamics obtained with and without account of the bulk medium on the ion-water clustering reaction can be different. The ideas presented in this work are expected to be relevant to experimental studies that translate thermochemistry of ion-water clusters to the thermodynamics of the hydrated ion and to evolving theoretical approaches that combine high-level calculations on clusters with coarse-grained models of the medium.


Assuntos
Água/química , Íons/química , Termodinâmica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA