Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Biophys J ; 123(3): 317-333, 2024 Feb 06.
Artigo em Inglês | MEDLINE | ID: mdl-38158653

RESUMO

Helix-coil models are routinely used to interpret circular dichroism data of helical peptides or predict the helicity of naturally-occurring and designed polypeptides. However, a helix-coil model contains significantly more information than mean helicity alone, as it defines the entire ensemble-the equilibrium population of every possible helix-coil configuration-for a given sequence. Many desirable quantities of this ensemble are either not obtained as ensemble averages or are not available using standard helicity-averaging calculations. Enumeration of the entire ensemble can allow calculation of a wider set of ensemble properties, but the exponential size of the configuration space typically renders this intractable. We present an algorithm that efficiently approximates the helix-coil ensemble to arbitrary accuracy by sequentially generating a list of the M highest populated configurations in descending order of population. Truncating this list of (configuration, population) pairs at a desired accuracy provides an approximating sub-ensemble. We demonstrate several uses of this approach for providing insight into helix-coil ensembles and folding mechanisms, including landscape visualization.


Assuntos
Peptídeos , Peptídeos/química , Dicroísmo Circular
2.
PLoS Comput Biol ; 16(10): e1008366, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-33104703

RESUMO

Substantive changes in gene expression, metabolism, and the proteome are manifested in overall changes in microbial population growth. Quantifying how microbes grow is therefore fundamental to areas such as genetics, bioengineering, and food safety. Traditional parametric growth curve models capture the population growth behavior through a set of summarizing parameters. However, estimation of these parameters from data is confounded by random effects such as experimental variability, batch effects or differences in experimental material. A systematic statistical method to identify and correct for such confounding effects in population growth data is not currently available. Further, our previous work has demonstrated that parametric models are insufficient to explain and predict microbial response under non-standard growth conditions. Here we develop a hierarchical Bayesian non-parametric model of population growth that identifies the latent growth behavior and response to perturbation, while simultaneously correcting for random effects in the data. This model enables more accurate estimates of the biological effect of interest, while better accounting for the uncertainty due to technical variation. Additionally, modeling hierarchical variation provides estimates of the relative impact of various confounding effects on measured population growth.


Assuntos
Bactérias/crescimento & desenvolvimento , Modelos Biológicos , Biologia de Sistemas/métodos , Bactérias/metabolismo , Teorema de Bayes , Estatísticas não Paramétricas
3.
Mol Biol Evol ; 31(9): 2251-66, 2014 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-24899668

RESUMO

For sequences that are highly divergent, there is often insufficient information to infer accurate alignments, and phylogenetic uncertainty may be high. One way to address this issue is to make use of protein structural information, since structures generally diverge more slowly than sequences. In this work, we extend a recently developed stochastic model of pairwise structural evolution to multiple structures on a tree, analytically integrating over ancestral structures to permit efficient likelihood computations under the resulting joint sequence-structure model. We observe that the inclusion of structural information significantly reduces alignment and topology uncertainty, and reduces the number of topology and alignment errors in cases where the true trees and alignments are known. In some cases, the inclusion of structure results in changes to the consensus topology, indicating that structure may contain additional information beyond that which can be obtained from sequences. We use the model to investigate the order of divergence of cytoglobins, myoglobins, and hemoglobins and observe a stabilization of phylogenetic inference: although a sequence-based inference assigns significant posterior probability to several different topologies, the structural model strongly favors one of these over the others and is more robust to the choice of data set.


Assuntos
Teorema de Bayes , Biologia Computacional/métodos , Globinas/química , Hemoglobinas/química , Mioglobina/química , Animais , Citoglobina , Globinas/genética , Hemoglobinas/genética , Humanos , Cadeias de Markov , Modelos Moleculares , Mutação , Mioglobina/genética , Filogenia , Conformação Proteica , Alinhamento de Sequência , Análise de Sequência de Proteína
4.
J Am Chem Soc ; 136(3): 822-5, 2014 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-24364358

RESUMO

Coupled ligand binding and conformational change plays a central role in biological regulation. Ligands often regulate protein function by modulating conformational dynamics, yet the order in which binding and conformational change occurs are often hotly debated. Here we show that the "conformational selection versus induced fit" distinction on which this debate is based is a false dichotomy because the mechanism depends on ligand concentration. Using the binding of pyrophosphate (PPi) to Bacillus subtilis RNase P protein as a model, we show that coupled reactions are best understood as a change in flux between competing pathways with distinct orders of binding and conformational change. The degree of partitioning through each pathway depends strongly on PPi concentration, with ligand binding redistributing the conformational ensemble toward the folded state by both increasing folding rates and decreasing unfolding rates. These results indicate that ligand binding induces marked and varied changes in protein conformational dynamics, and that the order of binding and conformational change is ligand concentration dependent.


Assuntos
Difosfatos/metabolismo , Dobramento de Proteína , Ribonuclease P/química , Ribonuclease P/metabolismo , Substituição de Aminoácidos , Bacillus subtilis/enzimologia , Ligantes , Modelos Moleculares , Ligação Proteica , Conformação Proteica , Ribonuclease P/genética
5.
Mol Biol Evol ; 29(11): 3575-87, 2012 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-22723302

RESUMO

We present a stochastic process model for the joint evolution of protein primary and tertiary structure, suitable for use in alignment and estimation of phylogeny. Indels arise from a classic Links model, and mutations follow a standard substitution matrix, whereas backbone atoms diffuse in three-dimensional space according to an Ornstein-Uhlenbeck process. The model allows for simultaneous estimation of evolutionary distances, indel rates, structural drift rates, and alignments, while fully accounting for uncertainty. The inclusion of structural information enables phylogenetic inference on time scales not previously attainable with sequence evolution models. The model also provides a tool for testing evolutionary hypotheses and improving our understanding of protein structural evolution.


Assuntos
Evolução Molecular , Modelos Genéticos , Filogenia , Proteínas/química , Proteínas/genética , Sequência de Aminoácidos , Animais , Simulação por Computador , Variação Genética , Hemoglobinas/química , Hemoglobinas/genética , Humanos , Ficocianina/química , Ficocianina/genética , Rodófitas/química , Alinhamento de Sequência , Processos Estocásticos
6.
bioRxiv ; 2023 Sep 17.
Artigo em Inglês | MEDLINE | ID: mdl-37745350

RESUMO

Helix-coil models are routinely used to interpret CD data of helical peptides or predict the helicity of naturally-occurring and designed polypeptides. However, a helix-coil model contains significantly more information than mean helicity alone, as it defines the entire ensemble - the equilibrium population of every possible helix-coil configuration - for a given sequence. Many desirable quantities of this ensemble are either not obtained as ensemble averages, or are not available using standard helicity-averaging calculations. Enumeration of the entire ensemble can allow calculation of a wider set of ensemble properties, but the exponential size of the configuration space typically renders this intractable. We present an algorithm that efficiently approximates the helix-coil ensemble to arbitrary accuracy, by sequentially generating a list of the M highest populated configurations in descending order of population. Truncating this list of (configuration, population) pairs at a desired accuracy provides an approximating sub-ensemble. We demonstrate several uses of this approach for providing insight into helix-coil ensembles and folding mechanisms, including landscape visualization.

7.
bioRxiv ; 2023 Oct 17.
Artigo em Inglês | MEDLINE | ID: mdl-37905016

RESUMO

A key challenge in B cell lineage-based vaccine design is understanding the inducibility of target neutralizing antibodies. We approach this problem through the use of detailed stochastic modeling of the somatic hypermutation process that occurs during affinity maturation. Under such a model, sequence mutation rates are context-dependent, rendering standard probability calculations for sequence evolution intractable. We develop an algorithmic approach to rapid, accurate approximation of key marginal sequence likelihoods required to inform modern sequential vaccine design strategies. These calculated probabilities are used to define an inducibility index for selecting among potential targets for immunogen design. We apply this approach to the problem of choosing targets for the design of boosting immunogens aimed at elicitation of the HIV broadly-neutralizing antibody DH270min11.

8.
Biophys J ; 95(10): 4497-511, 2008 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-18676654

RESUMO

We describe a statistical approach to the validation and improvement of molecular dynamics simulations of macromolecules. We emphasize the use of molecular dynamics simulations to calculate thermodynamic quantities that may be compared to experimental measurements, and the use of a common set of energetic parameters across multiple distinct molecules. We briefly review relevant results from the theory of stochastic processes and discuss the monitoring of convergence to equilibrium, the obtaining of confidence intervals for summary statistics corresponding to measured quantities, and an approach to validation and improvement of simulations based on out-of-sample prediction. We apply these methods to replica exchange molecular dynamics simulations of a set of eight helical peptides under the AMBER potential using implicit solvent. We evaluate the ability of these simulations to quantitatively reproduce experimental helicity measurements obtained by circular dichroism. In addition, we introduce notions of statistical predictive estimation for force-field parameter refinement. We perform a sensitivity analysis to identify key parameters of the potential, and introduce Bayesian updating of these parameters. We demonstrate the effect of parameter updating applied to the internal dielectric constant parameter on the out-of-sample prediction accuracy as measured by cross-validation.


Assuntos
Modelos Químicos , Modelos Moleculares , Proteínas/química , Proteínas/ultraestrutura , Simulação por Computador , Modelos Estatísticos , Conformação Proteica , Dobramento de Proteína
9.
J Chem Phys ; 129(16): 164112, 2008 Oct 28.
Artigo em Inglês | MEDLINE | ID: mdl-19045252

RESUMO

We consider the convergence behavior of replica-exchange molecular dynamics (REMD) [Sugita and Okamoto, Chem. Phys. Lett. 314, 141 (1999)] based on properties of the numerical integrators in the underlying isothermal molecular dynamics (MD) simulations. We show that a variety of deterministic algorithms favored by molecular dynamics practitioners for constant-temperature simulation of biomolecules fail either to be measure invariant or irreducible, and are therefore not ergodic. We then show that REMD using these algorithms also fails to be ergodic. As a result, the entire configuration space may not be explored even in an infinitely long simulation, and the simulation may not converge to the desired equilibrium Boltzmann ensemble. Moreover, our analysis shows that for initial configurations with unfavorable energy, it may be impossible for the system to reach a region surrounding the minimum energy configuration. We demonstrate these failures of REMD algorithms for three small systems: a Gaussian distribution (simple harmonic oscillator dynamics), a bimodal mixture of Gaussians distribution, and the alanine dipeptide. Examination of the resulting phase plots and equilibrium configuration densities indicates significant errors in the ensemble generated by REMD simulation. We describe a simple modification to address these failures based on a stochastic hybrid Monte Carlo correction, and prove that this is ergodic.


Assuntos
Algoritmos , Alanina/química , Dipeptídeos/química , Cadeias de Markov , Modelos Moleculares , Método de Monte Carlo , Conformação Proteica , Temperatura , Termodinâmica
10.
J Comput Biol ; 14(10): 1287-310, 2007 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-18047425

RESUMO

Analysis of biopolymer sequences and structures generally adopts one of two approaches: use of detailed biophysical theoretical models of the system with experimentally-determined parameters, or largely empirical statistical models obtained by extracting parameters from large datasets. In this work, we demonstrate a merger of these two approaches using Bayesian statistics. We adopt a common biophysical model for local protein folding and peptide configuration, the helix-coil model. The parameters of this model are estimated by statistical fitting to a large dataset, using prior distributions based on experimental data. L(1)-norm shrinkage priors are applied to induce sparsity among the estimated parameters, resulting in a significantly simplified model. Formal statistical procedures for evaluating support in the data for previously proposed model extensions are presented. We demonstrate the advantages of this approach including improved prediction accuracy and quantification of prediction uncertainty, and discuss opportunities for statistical design of experiments. Our approach yields a 39% improvement in mean-squared predictive error over the current best algorithm for this problem. In the process we also provide an efficient recursive algorithm for exact calculation of ensemble helicity including sidechain interactions, and derive an explicit relation between homo- and heteropolymer helix-coil theories and Markov chains and (non-standard) hidden Markov models respectively, which has not appeared in the literature previously.


Assuntos
Modelos Estatísticos , Peptídeos/química , Estrutura Secundária de Proteína , Concentração de Íons de Hidrogênio , Interações Hidrofóbicas e Hidrofílicas , Modelos Moleculares , Reprodutibilidade dos Testes , Análise de Sequência de Proteína , Temperatura
11.
mSystems ; 2(5)2017.
Artigo em Inglês | MEDLINE | ID: mdl-28951888

RESUMO

Gene regulatory networks (GRNs) are critical for dynamic transcriptional responses to environmental stress. However, the mechanisms by which GRN regulation adjusts physiology to enable stress survival remain unclear. Here we investigate the functions of transcription factors (TFs) within the global GRN of the stress-tolerant archaeal microorganism Halobacterium salinarum. We measured growth phenotypes of a panel of TF deletion mutants in high temporal resolution under heat shock, oxidative stress, and low-salinity conditions. To quantitate the noncanonical functional forms of the growth trajectories observed for these mutants, we developed a novel modeling framework based on Gaussian process regression and functional analysis of variance (FANOVA). We employ unique statistical tests to determine the significance of differential growth relative to the growth of the control strain. This analysis recapitulated known TF functions, revealed novel functions, and identified surprising secondary functions for characterized TFs. Strikingly, we observed that the majority of the TFs studied were required for growth under multiple stress conditions, pinpointing regulatory connections between the conditions tested. Correlations between quantitative phenotype trajectories of mutants are predictive of TF-TF connections within the GRN. These phenotypes are strongly concordant with predictions from statistical GRN models inferred from gene expression data alone. With genome-wide and targeted data sets, we provide detailed functional validation of novel TFs required for extreme oxidative stress and heat shock survival. Together, results presented in this study suggest that many TFs function under multiple conditions, thereby revealing high interconnectivity within the GRN and identifying the specific TFs required for communication between networks responding to disparate stressors. IMPORTANCE To ensure survival in the face of stress, microorganisms employ inducible damage repair pathways regulated by extensive and complex gene networks. Many archaea, microorganisms of the third domain of life, persist under extremes of temperature, salinity, and pH and under other conditions. In order to understand the cause-effect relationships between the dynamic function of the stress network and ultimate physiological consequences, this study characterized the physiological role of nearly one-third of all regulatory proteins known as transcription factors (TFs) in an archaeal organism. Using a unique quantitative phenotyping approach, we discovered functions for many novel TFs and revealed important secondary functions for known TFs. Surprisingly, many TFs are required for resisting multiple stressors, suggesting cross-regulation of stress responses. Through extensive validation experiments, we map the physiological roles of these novel TFs in stress response back to their position in the regulatory network wiring. This study advances understanding of the mechanisms underlying how microorganisms resist extreme stress. Given the generality of the methods employed, we expect that this study will enable future studies on how regulatory networks adjust cellular physiology in a diversity of organisms.

12.
IEEE Trans Pattern Anal Mach Intell ; 37(8): 1688-701, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26353004

RESUMO

Tree-like structures are fundamental in nature, and it is often useful to reconstruct the topology of a tree - what connects to what - from a two-dimensional image of it. However, the projected branches often cross in the image: the tree projects to a planar graph, and the inverse problem of reconstructing the topology of the tree from that of the graph is ill-posed. We regularize this problem with a generative, parametric tree-growth model. Under this model, reconstruction is possible in linear time if one knows the direction of each edge in the graph - which edge endpoint is closer to the root of the tree - but becomes NP-hard if the directions are not known. For the latter case, we present a heuristic search algorithm to estimate the most likely topology of a rooted, three-dimensional tree from a single two-dimensional image. Experimental results on retinal vessel, plant root, and synthetic tree data sets show that our methodology is both accurate and efficient.


Assuntos
Inteligência Artificial , Imageamento Tridimensional/métodos , Algoritmos , Bases de Dados Factuais , Humanos , Raio , Vasos Retinianos/anatomia & histologia , Processos Estocásticos , Árvores
13.
Ann Appl Stat ; 8(4): 2068-2095, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-26925188

RESUMO

The analysis of the three-dimensional structure of proteins is an important topic in molecular biochemistry. Structure plays a critical role in defining the function of proteins and is more strongly conserved than amino acid sequence over evolutionary timescales. A key challenge is the identification and evaluation of structural similarity between proteins; such analysis can aid in understanding the role of newly discovered proteins and help elucidate evolutionary relationships between organisms. Computational biologists have developed many clever algorithmic techniques for comparing protein structures, however, all are based on heuristic optimization criteria, making statistical interpretation somewhat difficult. Here we present a fully probabilistic framework for pairwise structural alignment of proteins. Our approach has several advantages, including the ability to capture alignment uncertainty and to estimate key "gap" parameters which critically affect the quality of the alignment. We show that several existing alignment methods arise as maximum a posteriori estimates under specific choices of prior distributions and error models. Our probabilistic framework is also easily extended to incorporate additional information, which we demonstrate by including primary sequence information to generate simultaneous sequence-structure alignments that can resolve ambiguities obtained using structure alone. This combined model also provides a natural approach for the difficult task of estimating evolutionary distance based on structural alignments. The model is illustrated by comparison with well-established methods on several challenging protein alignment examples.

14.
Ann Appl Stat ; 4(3): 1342-1364, 2010 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-21179394

RESUMO

Technological advances in genotyping have given rise to hypothesis-based association studies of increasing scope. As a result, the scientific hypotheses addressed by these studies have become more complex and more difficult to address using existing analytic methodologies. Obstacles to analysis include inference in the face of multiple comparisons, complications arising from correlations among the SNPs (single nucleotide polymorphisms), choice of their genetic parametrization and missing data. In this paper we present an efficient Bayesian model search strategy that searches over the space of genetic markers and their genetic parametrization. The resulting method for Multilevel Inference of SNP Associations, MISA, allows computation of multilevel posterior probabilities and Bayes factors at the global, gene and SNP level, with the prior distribution on SNP inclusion in the model providing an intrinsic multiplicity correction. We use simulated data sets to characterize MISA's statistical power, and show that MISA has higher power to detect association than standard procedures. Using data from the North Carolina Ovarian Cancer Study (NCOCS), MISA identifies variants that were not identified by standard methods and have been externally "validated" in independent studies. We examine sensitivity of the NCOCS results to prior choice and method for imputing missing data. MISA is available in an R package on CRAN.

15.
PLoS One ; 3(11): e3670, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-18989364

RESUMO

Eukaryotic genomes are mostly composed of noncoding DNA whose role is still poorly understood. Studies in several organisms have shown correlations between the length of the intergenic and genic sequences of a gene and the expression of its corresponding mRNA transcript. Some studies have found a positive relationship between intergenic sequence length and expression diversity between tissues, and concluded that genes under greater regulatory control require more regulatory information in their intergenic sequences. Other reports found a negative relationship between expression level and gene length and the interpretation was that there is selection pressure for highly expressed genes to remain small. However, a correlation between gene sequence length and expression diversity, opposite to that observed for intergenic sequences, has also been reported, and to date there is no testable explanation for this observation. To shed light on these varied and sometimes conflicting results, we performed a thorough study of the relationships between sequence length and gene expression using cell-type (tissue) specific microarray data in Arabidopsis thaliana. We measured median gene expression across tissues (expression level), expression variability between tissues (expression pattern uniformity), and expression variability between replicates (expression noise). We found that intergenic (upstream and downstream) and genic (coding and noncoding) sequences have generally opposite relationships with respect to expression, whether it is tissue variability, median, or expression noise. To explain these results we propose a model, in which the lengths of the intergenic and genic sequences have opposite effects on the ability of the transcribed region of the gene to be epigenetically regulated for differential expression. These findings could shed light on the role and influence of noncoding sequences on gene expression.


Assuntos
Arabidopsis/genética , DNA Intergênico/genética , Regulação da Expressão Gênica de Plantas , Epigênese Genética , Perfilação da Expressão Gênica , Variação Genética , Genoma de Planta , RNA Mensageiro/genética
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa