Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
1.
BMC Bioinformatics ; 21(1): 119, 2020 Mar 20.
Artigo em Inglês | MEDLINE | ID: mdl-32197580

RESUMO

BACKGROUND: The ability to confidently predict health outcomes from gene expression would catalyze a revolution in molecular diagnostics. Yet, the goal of developing actionable, robust, and reproducible predictive signatures of phenotypes such as clinical outcome has not been attained in almost any disease area. Here, we report a comprehensive analysis spanning prediction tasks from ulcerative colitis, atopic dermatitis, diabetes, to many cancer subtypes for a total of 24 binary and multiclass prediction problems and 26 survival analysis tasks. We systematically investigate the influence of gene subsets, normalization methods and prediction algorithms. Crucially, we also explore the novel use of deep representation learning methods on large transcriptomics compendia, such as GTEx and TCGA, to boost the performance of state-of-the-art methods. The resources and findings in this work should serve as both an up-to-date reference on attainable performance, and as a benchmarking resource for further research. RESULTS: Approaches that combine large numbers of genes outperformed single gene methods consistently and with a significant margin, but neither unsupervised nor semi-supervised representation learning techniques yielded consistent improvements in out-of-sample performance across datasets. Our findings suggest that using l2-regularized regression methods applied to centered log-ratio transformed transcript abundances provide the best predictive analyses overall. CONCLUSIONS: Transcriptomics-based phenotype prediction benefits from proper normalization techniques and state-of-the-art regularized regression approaches. In our view, breakthrough performance is likely contingent on factors which are independent of normalization and general modeling techniques; these factors might include reduction of systematic errors in sequencing data, incorporation of other data types such as single-cell sequencing and proteomics, and improved use of prior knowledge.


Assuntos
Aprendizado Profundo , Perfilação da Expressão Gênica , Aprendizado de Máquina , Fenótipo , Doença/genética , Humanos , Aprendizado de Máquina Supervisionado
2.
Phys Rep ; 810: 1-124, 2019 May 30.
Artigo em Inglês | MEDLINE | ID: mdl-31404441

RESUMO

Machine Learning (ML) is one of the most exciting and dynamic areas of modern research and application. The purpose of this review is to provide an introduction to the core concepts and tools of machine learning in a manner easily understood and intuitive to physicists. The review begins by covering fundamental concepts in ML and modern statistics such as the bias-variance tradeoff, overfitting, regularization, generalization, and gradient descent before moving on to more advanced topics in both supervised and unsupervised learning. Topics covered in the review include ensemble models, deep learning and neural networks, clustering and data visualization, energy-based models (including MaxEnt models and Restricted Boltzmann Machines), and variational methods. Throughout, we emphasize the many natural connections between ML and statistical physics. A notable aspect of the review is the use of Python Jupyter notebooks to introduce modern ML/statistical packages to readers using physics-inspired datasets (the Ising Model and Monte-Carlo simulations of supersymmetric decays of proton-proton collisions). We conclude with an extended outlook discussing possible uses of machine learning for furthering our understanding of the physical world as well as open problems in ML where physicists may be able to contribute.

3.
PLoS Comput Biol ; 13(4): e1005435, 2017 04.
Artigo em Inglês | MEDLINE | ID: mdl-28448493

RESUMO

Two species with similar resource requirements respond in a characteristic way to variations in their habitat-their abundances rise and fall in concert. We use this idea to learn how bacterial populations in the microbiota respond to habitat conditions that vary from person-to-person across the human population. Our mathematical framework shows that habitat fluctuations are sufficient for explaining intra-bodysite correlations in relative species abundances from the Human Microbiome Project. We explicitly show that the relative abundances of closely related species are positively correlated and can be predicted from taxonomic relationships. We identify a small set of functional pathways related to metabolism and maintenance of the cell wall that form the basis of a common resource sharing niche space of the human microbiota.


Assuntos
Ecossistema , Microbiota/genética , Microbiota/fisiologia , Biologia Computacional , Humanos , Modelos Biológicos , Especificidade da Espécie
4.
Proc Natl Acad Sci U S A ; 111(36): 13111-6, 2014 Sep 09.
Artigo em Inglês | MEDLINE | ID: mdl-25157131

RESUMO

An ongoing debate in ecology concerns the impacts of ecological drift and selection on community assembly. Here, we show that there is a transition in diverse ecological communities between a selection-dominated regime (the niche phase) and a drift-dominated regime (the neutral phase). Simulations and analytic arguments show that the niche phase is favored in communities with large population sizes and relatively constant environments, whereas the neutral phase is favored in communities with small population sizes and fluctuating environments. Our results demonstrate how apparently neutral populations may arise even in communities inhabited by species with varying traits.


Assuntos
Ecossistema , Modelos Biológicos , Biota , Fatores de Tempo
5.
Bioinformatics ; 31(11): 1754-61, 2015 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-25619995

RESUMO

MOTIVATION: Feature selection, identifying a subset of variables that are relevant for predicting a response, is an important and challenging component of many methods in statistics and machine learning. Feature selection is especially difficult and computationally intensive when the number of variables approaches or exceeds the number of samples, as is often the case for many genomic datasets. RESULTS: Here, we introduce a new approach--the Bayesian Ising Approximation (BIA)-to rapidly calculate posterior probabilities for feature relevance in L2 penalized linear regression. In the regime where the regression problem is strongly regularized by the prior, we show that computing the marginal posterior probabilities for features is equivalent to computing the magnetizations of an Ising model with weak couplings. Using a mean field approximation, we show it is possible to rapidly compute the feature selection path described by the posterior probabilities as a function of the L2 penalty. We present simulations and analytical results illustrating the accuracy of the BIA on some simple regression problems. Finally, we demonstrate the applicability of the BIA to high-dimensional regression by analyzing a gene expression dataset with nearly 30 000 features. These results also highlight the impact of correlations between features on Bayesian feature selection. AVAILABILITY AND IMPLEMENTATION: An implementation of the BIA in C++, along with data for reproducing our gene expression analyses, are freely available at http://physics.bu.edu/∼pankajm/BIACode.


Assuntos
Perfilação da Expressão Gênica/métodos , Genômica/métodos , Tecido Adiposo , Algoritmos , Inteligência Artificial , Teorema de Bayes , Humanos , Modelos Lineares , Masculino , Probabilidade , Glycine max/genética
6.
Neural Comput ; 27(11): 2411-22, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26378876

RESUMO

Identifying small subsets of features that are relevant for prediction and classification tasks is a central problem in machine learning and statistics. The feature selection task is especially important, and computationally difficult, for modern data sets where the number of features can be comparable to or even exceed the number of samples. Here, we show that feature selection with Bayesian inference takes a universal form and reduces to calculating the magnetizations of an Ising model under some mild conditions. Our results exploit the observation that the evidence takes a universal form for strongly regularizing priors--priors that have a large effect on the posterior probability even in the infinite data limit. We derive explicit expressions for feature selection for generalized linear models, a large class of statistical techniques that includes linear and logistic regression. We illustrate the power of our approach by analyzing feature selection in a logistic regression-based classifier trained to distinguish between the letters B and D in the notMNIST data set.

7.
Phys Rev Lett ; 113(14): 148103, 2014 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-25325665

RESUMO

The deep connection between thermodynamics, computation, and information is now well established both theoretically and experimentally. Here, we extend these ideas to show that thermodynamics also places fundamental constraints on statistical estimation and learning. To do so, we investigate the constraints placed by (nonequilibrium) thermodynamics on the ability of biochemical signaling networks to estimate the concentration of an external signal. We show that accuracy is limited by energy consumption, suggesting that there are fundamental thermodynamic constraints on statistical inference.


Assuntos
Células/química , Células/metabolismo , Modelos Biológicos , Receptores de Superfície Celular/química , Receptores de Superfície Celular/metabolismo , Biofísica , Transdução de Sinais , Termodinâmica
8.
Clin Transl Sci ; 17(7): e13897, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-39039704

RESUMO

Today's approach to medicine requires extensive trial and error to determine the proper treatment path for each patient. While many fields have benefited from technological breakthroughs in computer science, such as artificial intelligence (AI), the task of developing effective treatments is actually getting slower and more costly. With the increased availability of rich historical datasets from previous clinical trials and real-world data sources, one can leverage AI models to create holistic forecasts of future health outcomes for an individual patient in the form of an AI-generated digital twin. This could support the rapid evaluation of intervention strategies in silico and could eventually be implemented in clinical practice to make personalized medicine a reality. In this work, we focus on uses for AI-generated digital twins of clinical trial participants and contend that the regulatory outlook for this technology within drug development makes it an ideal setting for the safe application of AI-generated digital twins in healthcare. With continued research and growing regulatory acceptance, this path will serve to increase trust in this technology and provide momentum for the widespread adoption of AI-generated digital twins in clinical practice.


Assuntos
Inteligência Artificial , Ensaios Clínicos como Assunto , Medicina de Precisão , Humanos , Inteligência Artificial/tendências , Medicina de Precisão/métodos , Desenvolvimento de Medicamentos/métodos
9.
Biophys J ; 104(7): 1546-55, 2013 Apr 02.
Artigo em Inglês | MEDLINE | ID: mdl-23561531

RESUMO

Quantitative comparisons of intrinsically disordered proteins (IDPs) with similar sequences, such as mutant forms of the same protein, may provide insights into IDP aggregation-a process that plays a role in several neurodegenerative disorders. Here we describe an approach for modeling IDPs with similar sequences that simplifies the comparison of the ensembles by utilizing a single library of structures. The relative population weights of the structures are estimated using a Bayesian formalism, which provides measures of uncertainty in the resulting ensembles. We applied this approach to the comparison of ensembles for Aß40 and Aß42. Bayesian hypothesis testing finds that although both Aß species sample ß-rich conformations in solution that may represent prefibrillar intermediates, the probability that Aß42 samples these prefibrillar states is roughly an order of magnitude larger than the frequency in which Aß40 samples such structures. Moreover, the structure of the soluble prefibrillar state in our ensembles is similar to the experimentally determined structure of Aß that has been implicated as an intermediate in the aggregation pathway. Overall, our approach for comparative studies of IDPs with similar sequences provides a platform for future studies on the effect of mutations on the structure and function of disordered proteins.


Assuntos
Peptídeos beta-Amiloides/química , Fragmentos de Peptídeos/química , Sequência de Aminoácidos , Modelos Moleculares , Multimerização Proteica , Estrutura Secundária de Proteína , Desdobramento de Proteína
10.
J Am Chem Soc ; 135(10): 3865-72, 2013 Mar 13.
Artigo em Inglês | MEDLINE | ID: mdl-23398399

RESUMO

α-Synuclein, a protein that forms ordered aggregates in the brains of patients with Parkinson's disease, is intrinsically disordered in the monomeric state. Several studies, however, suggest that it can form soluble multimers in vivo that have significant secondary structure content. A number of studies demonstrate that α-synuclein can form ß-strand-rich oligomers that are neurotoxic, and recent observations argue for the existence of soluble helical tetrameric species in cellulo that do not form toxic aggregates. To gain further insight into the different types of multimeric states that this protein can adopt, we generated an ensemble for an α-synuclein construct that contains a 10-residue N-terminal extension, which forms multimers when isolated from Escherichia coli. Data from NMR chemical shifts and residual dipolar couplings were used to guide the construction of the ensemble. Our data suggest that the dominant state of this ensemble is a disordered monomer, complemented by a small fraction of helical trimers and tetramers. Interestingly, the ensemble also contains trimeric and tetrameric oligomers that are rich in ß-strand content. These data help to reconcile seemingly contradictory observations that indicate the presence of a helical tetramer in cellulo on the one hand, and a disordered monomer on the other. Furthermore, our findings are consistent with the notion that the helical tetrameric state provides a mechanism for storing α-synuclein when the protein concentration is high, thereby preventing non-membrane-bound monomers from aggregating.


Assuntos
Termodinâmica , alfa-Sinucleína/química , Dimerização , Escherichia coli/química , Modelos Moleculares , Ressonância Magnética Nuclear Biomolecular , Conformação Proteica
11.
Nature ; 450(7173): 1263-7, 2007 Dec 20.
Artigo em Inglês | MEDLINE | ID: mdl-18097416

RESUMO

RNAs fold into three-dimensional (3D) structures that subsequently undergo large, functionally important, conformational transitions in response to a variety of cellular signals. RNA structures are believed to encode spatially tuned flexibility that can direct transitions along specific conformational pathways. However, this hypothesis has proved difficult to examine directly because atomic movements in complex biomolecules cannot be visualized in 3D by using current experimental methods. Here we report the successful implementation of a strategy using NMR that has allowed us to visualize, with complete 3D rotational sensitivity, the dynamics between two RNA helices that are linked by a functionally important trinucleotide bulge over timescales extending up to milliseconds. The key to our approach is to anchor NMR frames of reference onto each helix and thereby directly measure their dynamics, one relative to the other, using 'relativistic' sets of residual dipolar couplings (RDCs). Using this approach, we uncovered super-large amplitude helix motions that trace out a surprisingly structured and spatially correlated 3D dynamic trajectory. The two helices twist around their individual axes by approximately 53 degrees and 110 degrees in a highly correlated manner (R = 0.97) while simultaneously (R = 0.81-0.92) bending by about 94 degrees. Remarkably, the 3D dynamic trajectory is dotted at various positions by seven distinct ligand-bound conformations of the RNA. Thus even partly unstructured RNAs can undergo structured dynamics that directs ligand-induced transitions along specific predefined conformational pathways.


Assuntos
Conformação de Ácido Nucleico , RNA Viral/química , RNA Viral/metabolismo , Repetição Terminal Longa de HIV/genética , HIV-1/genética , Modelos Moleculares , Movimento , Ressonância Magnética Nuclear Biomolecular , RNA Viral/genética , Rotação
12.
J Am Chem Soc ; 133(48): 19536-46, 2011 Dec 07.
Artigo em Inglês | MEDLINE | ID: mdl-22029383

RESUMO

Given that α-synuclein has been implicated in the pathogenesis of several neurodegenerative disorders, deciphering the structure of this protein is of particular importance. While monomeric α-synuclein is disordered in solution, it can form aggregates rich in cross-ß structure, relatively long helical segments when bound to micelles or lipid vesicles, and a relatively ordered helical tetramer within the native cell environment. To understand the physical basis underlying this structural plasticity, we generated an ensemble for monomeric α-synuclein using a Bayesian formalism that combines data from NMR chemical shifts, RDCs, and SAXS with molecular simulations. An analysis of the resulting ensemble suggests that a non-negligible fraction of the ensemble (0.08, 95% confidence interval 0.03-0.12) places the minimal toxic aggregation-prone segment in α-synuclein, NAC(8-18), in a solvent exposed and extended conformation that can form cross-ß structure. Our data also suggest that a sizable fraction of structures in the ensemble (0.14, 95% confidence interval 0.04-0.23) contains long-range contacts between the N- and C-termini. Moreover, a significant fraction of structures that contain these long-range contacts also place the NAC(8-18) segment in a solvent exposed orientation, a finding in contrast to the theory that such long-range contacts help to prevent aggregation. Lastly, our data suggest that α-synuclein samples structures with amphipathic helices that can self-associate via hydrophobic contacts to form tetrameric structures. Overall, these observations represent a comprehensive view of the unfolded ensemble of monomeric α-synuclein and explain how different conformations can arise from the monomeric protein.


Assuntos
alfa-Sinucleína/química , Teorema de Bayes , Dicroísmo Circular , Humanos , Modelos Moleculares , Ressonância Magnética Nuclear Biomolecular , Multimerização Proteica , Estrutura Secundária de Proteína , Espalhamento a Baixo Ângulo , Difração de Raios X
13.
J Am Chem Soc ; 133(26): 10022-5, 2011 Jul 06.
Artigo em Inglês | MEDLINE | ID: mdl-21650183

RESUMO

Thermal fluctuations cause proteins to adopt an ensemble of conformations wherein the relative stability of the different ensemble members is determined by the topography of the underlying energy landscape. "Folded" proteins have relatively homogeneous ensembles, while "unfolded" proteins have heterogeneous ensembles. Hence, the labels "folded" and "unfolded" represent attempts to provide a qualitative characterization of the extent of structural heterogeneity within the underlying ensemble. In this work, we introduce an information-theoretic order parameter to quantify this conformational heterogeneity. We demonstrate that this order parameter can be estimated in a straightforward manner from an ensemble and is applicable to both unfolded and folded proteins. In addition, a simple formula for approximating the order parameter directly from crystallographic B factors is presented. By applying these metrics to a large sample of proteins, we show that proteins span the full range of the order-disorder axis.


Assuntos
Biologia Computacional , Proteínas/química , Humanos , Simulação de Dinâmica Molecular , Conformação Proteica , Temperatura
14.
J Am Chem Soc ; 132(42): 14919-27, 2010 Oct 27.
Artigo em Inglês | MEDLINE | ID: mdl-20925316

RESUMO

The characterization of intrinsically disordered proteins is challenging because accurate models of these systems require a description of both their thermally accessible conformers and the associated relative stabilities or weights. These structures and weights are typically chosen such that calculated ensemble averages agree with some set of prespecified experimental measurements; however, the large number of degrees of freedom in these systems typically leads to multiple conformational ensembles that are degenerate with respect to any given set of experimental observables. In this work we demonstrate that estimates of the relative stabilities of conformers within an ensemble are often incorrect when one does not account for the underlying uncertainty in the estimates themselves. Therefore, we present a method for modeling the conformational properties of disordered proteins that estimates the uncertainty in the weights of each conformer. The Bayesian weighting (BW) formalism incorporates information from both experimental data and theoretical predictions to calculate a probability density over all possible ways of weighting the conformers in the ensemble. This probability density is then used to estimate the values of the weights. A unique and powerful feature of the approach is that it provides a built-in error measure that allows one to assess the accuracy of the ensemble. We validate the approach using reference ensembles constructed from the five-residue peptide met-enkephalin and then apply the BW method to construct an ensemble of the K18 isoform of the tau protein. Using this ensemble, we indentify a specific pattern of long-range contacts in K18 that correlates with the known aggregation properties of the sequence.


Assuntos
Teorema de Bayes , Modelos Moleculares , Proteínas/química , Funções Verossimilhança
15.
J Phys Chem B ; 113(18): 6173-6, 2009 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-19358547

RESUMO

NMR spectroscopy is one of the most powerful techniques for studying the internal dynamics of biomolecules. Current formalisms approximate the dynamics using simple continuous motional models or models involving discrete jumps between a small number of states. However, no approach currently exists for interpreting NMR data in terms of continuous spatially complex motional paths that may feature more than one distinct maneuver. Here, we present an approach for approximately reconstructing spatially complex continuous motions of chiral domains using NMR anisotropic interactions. The key is to express Wigner matrix elements, which can be determined experimentally using residual dipolar couplings, as a line integral over a curve in configuration space containing an ensemble of conformations and to approximate the curve using a series of geodesic segments. Using this approach and five sets of synthetic residual dipolar couplings computed for five linearly independent alignment conditions, we show that it is theoretically possible to reconstruct salient features of a multisegment interhelical motional trajectory obtained from a 65 ns molecular dynamics simulation of a stem-loop RNA. Our study shows that the 3-D atomic reconstruction of complex motions in biomolecules is within experimental reach.


Assuntos
Ressonância Magnética Nuclear Biomolecular/métodos , Conformação de Ácido Nucleico , RNA/química
16.
Sci Rep ; 9(1): 13622, 2019 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-31541187

RESUMO

Most approaches to machine learning from electronic health data can only predict a single endpoint. The ability to simultaneously simulate dozens of patient characteristics is a crucial step towards personalized medicine for Alzheimer's Disease. Here, we use an unsupervised machine learning model called a Conditional Restricted Boltzmann Machine (CRBM) to simulate detailed patient trajectories. We use data comprising 18-month trajectories of 44 clinical variables from 1909 patients with Mild Cognitive Impairment or Alzheimer's Disease to train a model for personalized forecasting of disease progression. We simulate synthetic patient data including the evolution of each sub-component of cognitive exams, laboratory tests, and their associations with baseline clinical characteristics. Synthetic patient data generated by the CRBM accurately reflect the means, standard deviations, and correlations of each variable over time to the extent that synthetic data cannot be distinguished from actual data by a logistic regression. Moreover, our unsupervised model predicts changes in total ADAS-Cog scores with the same accuracy as specifically trained supervised models, additionally capturing the correlation structure in the components of ADAS-Cog, and identifies sub-components associated with word recall as predictive of progression.


Assuntos
Doença de Alzheimer/fisiopatologia , Doença de Alzheimer/psicologia , Previsões/métodos , Idoso , Idoso de 80 Anos ou mais , Cognição , Disfunção Cognitiva , Progressão da Doença , Feminino , Humanos , Aprendizado de Máquina , Masculino , Pessoa de Meia-Idade , Modelos Estatísticos , Modelos Teóricos , Testes Neuropsicológicos
17.
J Phys Chem B ; 112(51): 16815-22, 2008 Dec 25.
Artigo em Inglês | MEDLINE | ID: mdl-19367865

RESUMO

Nuclear magnetic resonance (NMR) residual dipolar couplings (RDCs) provide a unique opportunity for spatially characterizing complex motions in biomolecules with time scale sensitivity extending up to milliseconds. Up to five motionally averaged Wigner rotation elements, (D(0k)2(alphaalpha)), can be determined experimentally using RDCs measured in five linearly independent alignment conditions and applied to define motions of axially symmetric bond vectors. Here, we show that up to 25 motionally averaged Wigner rotation elements, (D(mk)2(alphabetagamma)), can be determined experimentally from multialignment RDCs and used to characterize rigid-body motions of chiral domains. The 25 (D(mk)2(alphabetagamma)) elements form a basis set that allows one to measure motions of a domain relative to an isotropic distribution of reference frames anchored on a second domain (and vice versa), thus expanding the 3D spatial resolution with which motions can be characterized. The 25 (D(mk)2(alphabetagamma)) elements can also be used to fit an ensemble consisting of up to eight equally or six unequally populated states. For more than two domains, changing the identity of the domain governing alignment allows access to new information regarding the correlated nature of the domain fluctuations. Example simulations are provided that validate the theoretical derivation and illustrate the high spatial resolution with which rigid-body domain motions can be characterized using multialignment and multireference RDCs. Our results further motivate the development of experimental approaches for both modulating alignment and anchoring it on specifically targeted domains.


Assuntos
Ressonância Magnética Nuclear Biomolecular/métodos
18.
Phys Rev E ; 94(2-1): 022423, 2016 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-27627348

RESUMO

A fundamental problem in community ecology is understanding how ecological processes such as selection, drift, and immigration give rise to observed patterns in species composition and diversity. Here, we analyze a recently introduced, analytically tractable, presence-absence (PA) model for community assembly, and we use it to ask how ecological traits such as the strength of competition, the amount of diversity, and demographic and environmental stochasticity affect species composition in a community. In the PA model, species are treated as stochastic binary variables that can either be present or absent in a community: species can immigrate into the community from a regional species pool and can go extinct due to competition and stochasticity. Building upon previous work, we show that, despite its simplicity, the PA model reproduces the qualitative features of more complicated models of community assembly. In agreement with recent studies of large, competitive Lotka-Volterra systems, the PA model exhibits distinct ecological behaviors organized around a special ("critical") point corresponding to Hubbell's neutral theory of biodiversity. These results suggest that the concepts of ecological "phases" and phase diagrams can provide a powerful framework for thinking about community ecology, and that the PA model captures the essential ecological dynamics of community assembly.


Assuntos
Biodiversidade , Ecologia/métodos , Modelos Biológicos , Ecossistema , Dinâmica Populacional
19.
Methods Mol Biol ; 1345: 269-80, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26453218

RESUMO

Intrinsically disordered proteins (IDPs) are notoriously difficult to study experimentally because they rapidly interconvert between many dissimilar conformations during their biological lifetime, and therefore cannot be described by a single structure. The importance of studying these systems, however, is underscored by the fact that they form toxic aggregates that play a role in the pathogenesis of many disorders. The first step towards a comprehensive understanding of the aggregation mechanism of these proteins involves a description of their thermally accessible states under physiologic conditions. The resulting conformational ensembles correspond to coarse-grained descriptions of their energy landscapes, where the number of structures in the ensemble is related to the resolution in which one views the free energy surface. Here, we provide step-by-step instructions on how to use experimental data to construct a conformational ensemble for an IDP using a Variational Bayesian Weighting (VBW) algorithm. We further discuss how to leverage this Bayesian approach to identify statistically significant ensemble-wide observations that can form the basis of further experimental studies.


Assuntos
Proteínas Amiloidogênicas/química , Proteínas Intrinsicamente Desordenadas/química , Biologia Molecular/métodos , Agregação Patológica de Proteínas/genética , Proteínas Amiloidogênicas/genética , Teorema de Bayes , Humanos , Proteínas Intrinsicamente Desordenadas/genética , Modelos Moleculares , Conformação Proteica
20.
PLoS One ; 9(7): e102451, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25054627

RESUMO

Human associated microbial communities exert tremendous influence over human health and disease. With modern metagenomic sequencing methods it is now possible to follow the relative abundance of microbes in a community over time. These microbial communities exhibit rich ecological dynamics and an important goal of microbial ecology is to infer the ecological interactions between species directly from sequence data. Any algorithm for inferring ecological interactions must overcome three major obstacles: 1) a correlation between the abundances of two species does not imply that those species are interacting, 2) the sum constraint on the relative abundances obtained from metagenomic studies makes it difficult to infer the parameters in timeseries models, and 3) errors due to experimental uncertainty, or mis-assignment of sequencing reads into operational taxonomic units, bias inferences of species interactions due to a statistical problem called "errors-in-variables". Here we introduce an approach, Learning Interactions from MIcrobial Time Series (LIMITS), that overcomes these obstacles. LIMITS uses sparse linear regression with boostrap aggregation to infer a discrete-time Lotka-Volterra model for microbial dynamics. We tested LIMITS on synthetic data and showed that it could reliably infer the topology of the inter-species ecological interactions. We then used LIMITS to characterize the species interactions in the gut microbiomes of two individuals and found that the interaction networks varied significantly between individuals. Furthermore, we found that the interaction networks of the two individuals are dominated by distinct "keystone species", Bacteroides fragilis and Bacteroided stercosis, that have a disproportionate influence on the structure of the gut microbiome even though they are only found in moderate abundance. Based on our results, we hypothesize that the abundances of certain keystone species may be responsible for individuality in the human gut microbiome.


Assuntos
Bacteroides fragilis/genética , Bacteroides/genética , Trato Gastrointestinal/microbiologia , Modelos Lineares , Metagenômica/métodos , Microbiota/genética , Algoritmos , Humanos , Interações Microbianas , Fatores de Tempo
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa