Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 37
Filter
1.
Nucleic Acids Res ; 51(D1): D753-D759, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36477304

ABSTRACT

The MGnify platform (https://www.ebi.ac.uk/metagenomics) facilitates the assembly, analysis and archiving of microbiome-derived nucleic acid sequences. The platform provides access to taxonomic assignments and functional annotations for nearly half a million analyses covering metabarcoding, metatranscriptomic, and metagenomic datasets, which are derived from a wide range of different environments. Over the past 3 years, MGnify has not only grown in terms of the number of datasets contained but also increased the breadth of analyses provided, such as the analysis of long-read sequences. The MGnify protein database now exceeds 2.4 billion non-redundant sequences predicted from metagenomic assemblies. This collection is now organised into a relational database making it possible to understand the genomic context of the protein through navigation back to the source assembly and sample metadata, marking a major improvement. To extend beyond the functional annotations already provided in MGnify, we have applied deep learning-based annotation methods. The technology underlying MGnify's Application Programming Interface (API) and website has been upgraded, and we have enabled the ability to perform downstream analysis of the MGnify data through the introduction of a coupled Jupyter Lab environment.


Subject(s)
Microbiota , Sequence Analysis , Genomics/methods , Metagenome , Metagenomics/methods , Microbiota/genetics , Software , Sequence Analysis/methods
2.
Mol Syst Biol ; 16(8): e9110, 2020 08.
Article in English | MEDLINE | ID: mdl-32845085

ABSTRACT

Systems biology has experienced dramatic growth in the number, size, and complexity of computational models. To reproduce simulation results and reuse models, researchers must exchange unambiguous model descriptions. We review the latest edition of the Systems Biology Markup Language (SBML), a format designed for this purpose. A community of modelers and software authors developed SBML Level 3 over the past decade. Its modular form consists of a core suited to representing reaction-based models and packages that extend the core with features suited to other model types including constraint-based models, reaction-diffusion models, logical network models, and rule-based models. The format leverages two decades of SBML and a rich software ecosystem that transformed how systems biologists build and interact with models. More recently, the rise of multiscale models of whole cells and organs, and new data sources such as single-cell measurements and live imaging, has precipitated new ways of integrating data with models. We provide our perspectives on the challenges presented by these developments and how SBML Level 3 provides the foundation needed to support this evolution.


Subject(s)
Systems Biology/methods , Animals , Humans , Logistic Models , Models, Biological , Software
3.
Nucleic Acids Res ; 46(D1): D726-D735, 2018 01 04.
Article in English | MEDLINE | ID: mdl-29069476

ABSTRACT

EBI metagenomics (http://www.ebi.ac.uk/metagenomics) provides a free to use platform for the analysis and archiving of sequence data derived from the microbial populations found in a particular environment. Over the past two years, EBI metagenomics has increased the number of datasets analysed 10-fold. In addition to increased throughput, the underlying analysis pipeline has been overhauled to include both new or updated tools and reference databases. Of particular note is a new workflow for taxonomic assignments that has been extended to include assignments based on both the large and small subunit RNA marker genes and to encompass all cellular micro-organisms. We also describe the addition of metagenomic assembly as a new analysis service. Our pilot studies have produced over 2400 assemblies from datasets in the public domain. From these assemblies, we have produced a searchable, non-redundant protein database of over 50 million sequences. To provide improved access to the data stored within the resource, we have developed a programmatic interface that provides access to the analysis results and associated sample metadata. Finally, we have integrated the results of a series of statistical analyses that provide estimations of diversity and sample comparisons.


Subject(s)
Databases, Genetic , Metagenomics , Microbiota , Algorithms , Base Sequence , Classification/methods , Datasets as Topic , Metagenomics/methods , RNA, Archaeal/genetics , RNA, Bacterial/genetics , RNA, Viral/genetics , Ribotyping , Software , Transcriptome , User-Computer Interface , Web Browser , Workflow
4.
Stat Appl Genet Mol Biol ; 14(2): 169-88, 2015 Apr.
Article in English | MEDLINE | ID: mdl-25720091

ABSTRACT

In this paper we consider the problem of parameter inference for Markov jump process (MJP) representations of stochastic kinetic models. Since transition probabilities are intractable for most processes of interest yet forward simulation is straightforward, Bayesian inference typically proceeds through computationally intensive methods such as (particle) MCMC. Such methods ostensibly require the ability to simulate trajectories from the conditioned jump process. When observations are highly informative, use of the forward simulator is likely to be inefficient and may even preclude an exact (simulation based) analysis. We therefore propose three methods for improving the efficiency of simulating conditioned jump processes. A conditioned hazard is derived based on an approximation to the jump process, and used to generate end-point conditioned trajectories for use inside an importance sampling algorithm. We also adapt a recently proposed sequential Monte Carlo scheme to our problem. Essentially, trajectories are reweighted at a set of intermediate time points, with more weight assigned to trajectories that are consistent with the next observation. We consider two implementations of this approach, based on two continuous approximations of the MJP. We compare these constructs for a simple tractable jump process before using them to perform inference for a Lotka-Volterra system. The best performing construct is used to infer the parameters governing a simple model of motility regulation in Bacillus subtilis.


Subject(s)
Bayes Theorem , Markov Chains , Algorithms , Computer Simulation , Kinetics , Models, Biological , Monte Carlo Method , Probability
5.
Stat Appl Genet Mol Biol ; 14(2): 189-209, 2015 Apr.
Article in English | MEDLINE | ID: mdl-25720092

ABSTRACT

Approaches to Bayesian inference for problems with intractable likelihoods have become increasingly important in recent years. Approximate Bayesian computation (ABC) and "likelihood free" Markov chain Monte Carlo techniques are popular methods for tackling inference in these scenarios but such techniques are computationally expensive. In this paper we compare the two approaches to inference, with a particular focus on parameter inference for stochastic kinetic models, widely used in systems biology. Discrete time transition kernels for models of this type are intractable for all but the most trivial systems yet forward simulation is usually straightforward. We discuss the relative merits and drawbacks of each approach whilst considering the computational cost implications and efficiency of these techniques. In order to explore the properties of each approach we examine a range of observation regimes using two example models. We use a Lotka-Volterra predator-prey model to explore the impact of full or partial species observations using various time course observations under the assumption of known and unknown measurement error. Further investigation into the impact of observation error is then made using a Schlögl system, a test case which exhibits bi-modal state stability in some regions of parameter space.


Subject(s)
Bayes Theorem , Likelihood Functions , Markov Chains , Models, Biological , Monte Carlo Method , Algorithms , Computer Simulation , Kinetics , Systems Biology
6.
Stat Appl Genet Mol Biol ; 13(5): 531-51, 2014 Oct.
Article in English | MEDLINE | ID: mdl-25153608

ABSTRACT

In this paper we develop a Bayesian statistical inference approach to the unified analysis of isobaric labelled MS/MS proteomic data across multiple experiments. An explicit probabilistic model of the log-intensity of the isobaric labels' reporter ions across multiple pre-defined groups and experiments is developed. This is then used to develop a full Bayesian statistical methodology for the identification of differentially expressed proteins, with respect to a control group, across multiple groups and experiments. This methodology is implemented and then evaluated on simulated data and on two model experimental datasets (for which the differentially expressed proteins are known) that use a TMT labelling protocol.


Subject(s)
Bayes Theorem , Proteins/chemistry , Tandem Mass Spectrometry/methods , Models, Theoretical , Proteomics
7.
Nat Rev Genet ; 10(2): 122-33, 2009 02.
Article in English | MEDLINE | ID: mdl-19139763

ABSTRACT

Two related developments are currently changing traditional approaches to computational systems biology modelling. First, stochastic models are being used increasingly in preference to deterministic models to describe biochemical network dynamics at the single-cell level. Second, sophisticated statistical methods and algorithms are being used to fit both deterministic and stochastic models to time course and other experimental data. Both frameworks are needed to adequately describe observed noise, variability and heterogeneity of biological systems over a range of scales of biological organization.


Subject(s)
Computational Biology/methods , Data Interpretation, Statistical , Models, Biological , Stochastic Processes , Systems Biology , Gene Expression Regulation , Proto-Oncogene Proteins c-mdm2/metabolism , Tumor Suppressor Protein p53/metabolism
8.
Bioinformatics ; 28(11): 1495-500, 2012 Jun 01.
Article in English | MEDLINE | ID: mdl-22492647

ABSTRACT

MOTIVATION: Biological experiments give insight into networks of processes inside a cell, but are subject to error and uncertainty. However, due to the overlap between the large number of experiments reported in public databases it is possible to assess the chances of individual observations being correct. In order to do so, existing methods rely on high-quality 'gold standard' reference networks, but such reference networks are not always available. RESULTS: We present a novel algorithm for computing the probability of network interactions that operates without gold standard reference data. We show that our algorithm outperforms existing gold standard-based methods. Finally, we apply the new algorithm to a large collection of genetic interaction and protein-protein interaction experiments. AVAILABILITY: The integrated dataset and a reference implementation of the algorithm as a plug-in for the Ondex data integration framework are available for download at http://bio-nexus.ncl.ac.uk/projects/nogold/


Subject(s)
Algorithms , Bayes Theorem , Epistasis, Genetic , Protein Interaction Mapping/standards , Likelihood Functions , Protein Interaction Mapping/methods , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism
9.
Brief Bioinform ; 11(3): 278-89, 2010 May.
Article in English | MEDLINE | ID: mdl-20056731

ABSTRACT

Dynamic simulation modelling of complex biological processes forms the backbone of systems biology. Discrete stochastic models are particularly appropriate for describing sub-cellular molecular interactions, especially when critical molecular species are thought to be present at low copy-numbers. For example, these stochastic effects play an important role in models of human ageing, where ageing results from the long-term accumulation of random damage at various biological scales. Unfortunately, realistic stochastic simulation of discrete biological processes is highly computationally intensive, requiring specialist hardware, and can benefit greatly from parallel and distributed approaches to computation and analysis. For these reasons, we have developed the BASIS system for the simulation and storage of stochastic SBML models together with associated simulation results. This system is exposed as a set of web services to allow users to incorporate its simulation tools into their workflows. Parameter inference for stochastic models is also difficult and computationally expensive. The CaliBayes system provides a set of web services (together with an R package for consuming these and formatting data) which addresses this problem for SBML models. It uses a sequential Bayesian MCMC method, which is powerful and flexible, providing very rich information. However this approach is exceptionally computationally intensive and requires the use of a carefully designed architecture. Again, these tools are exposed as web services to allow users to take advantage of this system. In this article, we describe these two systems and demonstrate their integrated use with an example workflow to estimate the parameters of a simple model of Saccharomyces cerevisiae growth on agar plates.


Subject(s)
Algorithms , Computer Simulation , Models, Biological , Programming Languages , Software , Biology/methods , Software Design , Systems Integration
10.
Mol Syst Biol ; 7: 543, 2011 Oct 25.
Article in English | MEDLINE | ID: mdl-22027554

ABSTRACT

The use of computational modeling to describe and analyze biological systems is at the heart of systems biology. Model structures, simulation descriptions and numerical results can be encoded in structured formats, but there is an increasing need to provide an additional semantic layer. Semantic information adds meaning to components of structured descriptions to help identify and interpret them unambiguously. Ontologies are one of the tools frequently used for this purpose. We describe here three ontologies created specifically to address the needs of the systems biology community. The Systems Biology Ontology (SBO) provides semantic information about the model components. The Kinetic Simulation Algorithm Ontology (KiSAO) supplies information about existing algorithms available for the simulation of systems biology models, their characterization and interrelationships. The Terminology for the Description of Dynamics (TEDDY) categorizes dynamical features of the simulation results and general systems behavior. The provision of semantic information extends a model's longevity and facilitates its reuse. It provides useful insight into the biology of modeled processes, and may be used to make informed decisions on subsequent simulation experiments.


Subject(s)
Computational Biology , Semantics , Systems Biology , Vocabulary, Controlled , Algorithms , Computer Simulation , Information Storage and Retrieval , Models, Biological
11.
BMC Bioinformatics ; 11: 287, 2010 May 28.
Article in English | MEDLINE | ID: mdl-20509870

ABSTRACT

BACKGROUND: High-throughput screens comparing growth rates of arrays of distinct micro-organism cultures on solid agar are useful, rapid methods of quantifying genetic interactions. Growth rate is an informative phenotype which can be estimated by measuring cell densities at one or more times after inoculation. Precise estimates can be made by inoculating cultures onto agar and capturing cell density frequently by plate-scanning or photography, especially throughout the exponential growth phase, and summarising growth with a simple dynamic model (e.g. the logistic growth model). In order to parametrize such a model, a robust image analysis tool capable of capturing a wide range of cell densities from plate photographs is required. RESULTS: Colonyzer is a collection of image analysis algorithms for automatic quantification of the size, granularity, colour and location of micro-organism cultures grown on solid agar. Colonyzer is uniquely sensitive to extremely low cell densities photographed after dilute liquid culture inoculation (spotting) due to image segmentation using a mixed Gaussian model for plate-wide thresholding based on pixel intensity. Colonyzer is robust to slight experimental imperfections and corrects for lighting gradients which would otherwise introduce spatial bias to cell density estimates without the need for imaging dummy plates. Colonyzer is general enough to quantify cultures growing in any rectangular array format, either growing after pinning with a dense inoculum or growing with the irregular morphology characteristic of spotted cultures. Colonyzer was developed using the open source packages: Python, RPy and the Python Imaging Library and its source code and documentation are available on SourceForge under GNU General Public License. Colonyzer is adaptable to suit specific requirements: e.g. automatic detection of cultures at irregular locations on streaked plates for robotic picking, or decreasing analysis time by disabling components such as lighting correction or colour measures. CONCLUSION: Colonyzer can automatically quantify culture growth from large batches of captured images of microbial cultures grown during genome-wide scans over the wide range of cell densities observable after highly dilute liquid spot inoculation, as well as after more concentrated pinning inoculation. Colonyzer is open-source, allowing users to assess it, adapt it to particular research requirements and to contribute to its development.


Subject(s)
Culture Media , Image Processing, Computer-Assisted/methods , Saccharomyces cerevisiae/growth & development , Software , Agar/chemistry , Algorithms , Cell Count
12.
J Integr Bioinform ; 17(2-3)2020 Jul 20.
Article in English | MEDLINE | ID: mdl-32750035

ABSTRACT

Biological models often contain elements that have inexact numerical values, since they are based on values that are stochastic in nature or data that contains uncertainty. The Systems Biology Markup Language (SBML) Level 3 Core specification does not include an explicit mechanism to include inexact or stochastic values in a model, but it does provide a mechanism for SBML packages to extend the Core specification and add additional syntactic constructs. The SBML Distributions package for SBML Level 3 adds the necessary features to allow models to encode information about the distribution and uncertainty of values underlying a quantity.


Subject(s)
Programming Languages , Systems Biology , Documentation , Language , Models, Biological , Software
13.
Bioinformatics ; 24(2): 285-6, 2008 Jan 15.
Article in English | MEDLINE | ID: mdl-18025005

ABSTRACT

MOTIVATION: Stochastic simulation is a very important tool for mathematical modelling. However, it is difficult to check the correctness of a stochastic simulator, since any two realizations from a single model will typically be different. RESULTS: We have developed a test suite of stochastic models that have been solved either analytically or using numerical methods. This allows the accuracy of stochastic simulators to be tested against known results. The test suite is already being used by a number of stochastic simulator developers. AVAILABILITY: The latest version of the test suite can be obtained from http://www.calibayes.ncl.ac.uk/Resources/dsmts/ and is licensed under GNU Lesser General Public License.


Subject(s)
Algorithms , Data Interpretation, Statistical , Models, Biological , Models, Statistical , Signal Processing, Computer-Assisted , Software , Stochastic Processes , Computer Simulation
14.
BMC Neurosci ; 10: 26, 2009 Mar 25.
Article in English | MEDLINE | ID: mdl-19320982

ABSTRACT

BACKGROUND: The hippocampus is essential for declarative memory synthesis and is a core pathological substrate for Alzheimer's disease (AD), the most common aging-related dementing disease. Acute increases in plasma cortisol are associated with transient hippocampal inhibition and retrograde amnesia, while chronic cortisol elevation is associated with hippocampal atrophy. Thus, cortisol levels could be monitored and managed in older people, to decrease their risk of AD type hippocampal dysfunction. We generated an in silicomodel of the chronic effects of elevated plasma cortisol on hippocampal activity and atrophy, using the systems biology mark-up language (SBML). We further challenged the model with biologically based interventions to ascertain if cortisol associated hippocampal dysfunction could be abrogated. RESULTS: The in silicoSBML model reflected the in vivoaging of the hippocampus and increased plasma cortisol and negative feedback to the hypothalamic pituitary axis. Aging induced a 12% decrease in hippocampus activity (HA), increased to 30% by acute and 40% by chronic elevations in cortisol. The biological intervention attenuated the cortisol associated decrease in HA by 2% in the acute cortisol simulation and by 8% in the chronic simulation. CONCLUSION: Both acute and chronic elevations in cortisol secretion increased aging-associated hippocampal atrophy and a loss of HA in the model. We suggest that this first SMBL model, in tandem with in vitroand in vivostudies, may provide a backbone to further frame computational cortisol and brain aging models, which may help predict aging-related brain changes in vulnerable older people.


Subject(s)
Aging/pathology , Computer Simulation , Hippocampus/pathology , Hydrocortisone/adverse effects , Neurodegenerative Diseases/pathology , Atrophy , Computational Biology , Hippocampus/physiopathology , Humans , Hydrocortisone/blood , Hypothalamo-Hypophyseal System/physiopathology , Models, Theoretical , Neurodegenerative Diseases/physiopathology , Pituitary-Adrenal System/physiopathology
15.
J Integr Bioinform ; 16(2)2019 Jun 20.
Article in English | MEDLINE | ID: mdl-31219795

ABSTRACT

Computational models can help researchers to interpret data, understand biological functions, and make quantitative predictions. The Systems Biology Markup Language (SBML) is a file format for representing computational models in a declarative form that different software systems can exchange. SBML is oriented towards describing biological processes of the sort common in research on a number of topics, including metabolic pathways, cell signaling pathways, and many others. By supporting SBML as an input/output format, different tools can all operate on an identical representation of a model, removing opportunities for translation errors and assuring a common starting point for analyses and simulations. This document provides the specification for Release 2 of Version 2 of SBML Level 3 Core. The specification defines the data structures prescribed by SBML as well as their encoding in XML, the eXtensible Markup Language. Release 2 corrects some errors and clarifies some ambiguities discovered in Release 1. This specification also defines validation rules that determine the validity of an SBML document, and provides many examples of models in SBML form. Other materials and software are available from the SBML project website at http://sbml.org/.


Subject(s)
Computer Simulation , Models, Biological , Programming Languages , Systems Biology
16.
Stat Comput ; 28(4): 891-904, 2018.
Article in English | MEDLINE | ID: mdl-31983814

ABSTRACT

A statistical model assuming a preferential attachment network, which is generated by adding nodes sequentially according to a few simple rules, usually describes real-life networks better than a model assuming, for example, a Bernoulli random graph, in which any two nodes have the same probability of being connected, does. Therefore, to study the propagation of "infection" across a social network, we propose a network epidemic model by combining a stochastic epidemic model and a preferential attachment model. A simulation study based on the subsequent Markov Chain Monte Carlo algorithm reveals an identifiability issue with the model parameters. Finally, the network epidemic model is applied to a set of online commissioning data.

17.
PLoS One ; 13(4): e0195484, 2018.
Article in English | MEDLINE | ID: mdl-29649240

ABSTRACT

We investigate the feasibility of using a surrogate-based method to emulate the deformation and detachment behaviour of a biofilm in response to hydrodynamic shear stress. The influence of shear force, growth rate and viscoelastic parameters on the patterns of growth, structure and resulting shape of microbial biofilms was examined. We develop a statistical modelling approach to this problem, using combination of Bayesian Poisson regression and dynamic linear models for the emulation. We observe that the hydrodynamic shear force affects biofilm deformation in line with some literature. Sensitivity results also showed that the expected number of shear events, shear flow, yield coefficient for heterotrophic bacteria and extracellular polymeric substance (EPS) stiffness per unit EPS mass are the four principal mechanisms governing the bacteria detachment in this study. The sensitivity of the model parameters is temporally dynamic, emphasising the significance of conducting the sensitivity analysis across multiple time points. The surrogate models are shown to perform well, and produced ≈ 480 fold increase in computational efficiency. We conclude that a surrogate-based approach is effective, and resulting biofilm structure is determined primarily by a balance between bacteria growth, viscoelastic parameters and applied shear stress.


Subject(s)
Biofilms , Hydrodynamics , Models, Statistical , Shear Strength , Stress, Mechanical , Bayes Theorem , Poisson Distribution , Wastewater/microbiology
18.
J Integr Bioinform ; 15(1)2018 Mar 09.
Article in English | MEDLINE | ID: mdl-29522418

ABSTRACT

Computational models can help researchers to interpret data, understand biological functions, and make quantitative predictions. The Systems Biology Markup Language (SBML) is a file format for representing computational models in a declarative form that different software systems can exchange. SBML is oriented towards describing biological processes of the sort common in research on a number of topics, including metabolic pathways, cell signaling pathways, and many others. By supporting SBML as an input/output format, different tools can all operate on an identical representation of a model, removing opportunities for translation errors and assuring a common starting point for analyses and simulations. This document provides the specification for Version 2 of SBML Level 3 Core. The specification defines the data structures prescribed by SBML, their encoding in XML (the eXtensible Markup Language), validation rules that determine the validity of an SBML document, and examples of models in SBML form. The design of Version 2 differs from Version 1 principally in allowing new MathML constructs, making more child elements optional, and adding identifiers to all SBML elements instead of only selected elements. Other materials and software are available from the SBML project website at http://sbml.org/.


Subject(s)
Documentation/standards , Information Storage and Retrieval/standards , Models, Biological , Programming Languages , Software , Systems Biology/standards , Animals , Computer Simulation , Guidelines as Topic , Humans , Signal Transduction
19.
J Comput Biol ; 13(3): 838-51, 2006 Apr.
Article in English | MEDLINE | ID: mdl-16706729

ABSTRACT

As postgenomic biology becomes more predictive, the ability to infer rate parameters of genetic and biochemical networks will become increasingly important. In this paper, we explore the Bayesian estimation of stochastic kinetic rate constants governing dynamic models of intracellular processes. The underlying model is replaced by a diffusion approximation where a noise term represents intrinsic stochastic behavior and the model is identified using discrete-time (and often incomplete) data that is subject to measurement error. Sequential MCMC methods are then used to sample the model parameters on-line in several data-poor contexts. The methodology is illustrated by applying it to the estimation of parameters in a simple prokaryotic auto-regulatory gene network.


Subject(s)
Metabolism , Models, Biological , Stochastic Processes , Kinetics
20.
J R Stat Soc Ser C Appl Stat ; 65(3): 367-393, 2016 04.
Article in English | MEDLINE | ID: mdl-27134314

ABSTRACT

Quantitative fitness analysis (QFA) is a high throughput experimental and computational methodology for measuring the growth of microbial populations. QFA screens can be used to compare the health of cell populations with and without a mutation in a query gene to infer genetic interaction strengths genomewide, examining thousands of separate genotypes. We introduce Bayesian hierarchical models of population growth rates and genetic interactions that better reflect QFA experimental design than current approaches. Our new approach models population dynamics and genetic interaction simultaneously, thereby avoiding passing information between models via a univariate fitness summary. Matching experimental structure more closely, Bayesian hierarchical approaches use data more efficiently and find new evidence for genes which interact with yeast telomeres within a published data set.

SELECTION OF CITATIONS
SEARCH DETAIL