Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 59
Filtrar
Mais filtros

Bases de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Proc Natl Acad Sci U S A ; 120(7): e2216415120, 2023 Feb 14.
Artigo em Inglês | MEDLINE | ID: mdl-36763529

RESUMO

Computational models have become a powerful tool in the quantitative sciences to understand the behavior of complex systems that evolve in time. However, they often contain a potentially large number of free parameters whose values cannot be obtained from theory but need to be inferred from data. This is especially the case for models in the social sciences, economics, or computational epidemiology. Yet, many current parameter estimation methods are mathematically involved and computationally slow to run. In this paper, we present a computationally simple and fast method to retrieve accurate probability densities for model parameters using neural differential equations. We present a pipeline comprising multiagent models acting as forward solvers for systems of ordinary or stochastic differential equations and a neural network to then extract parameters from the data generated by the model. The two combined create a powerful tool that can quickly estimate densities on model parameters, even for very large systems. We demonstrate the method on synthetic time series data of the SIR model of the spread of infection and perform an in-depth analysis of the Harris-Wilson model of economic activity on a network, representing a nonconvex problem. For the latter, we apply our method both to synthetic data and to data of economic activity across Greater London. We find that our method calibrates the model orders of magnitude more accurately than a previous study of the same dataset using classical techniques, while running between 195 and 390 times faster.

2.
Proc Natl Acad Sci U S A ; 118(2)2021 01 12.
Artigo em Inglês | MEDLINE | ID: mdl-33372139

RESUMO

We present a statistical finite element method for nonlinear, time-dependent phenomena, illustrated in the context of nonlinear internal waves (solitons). We take a Bayesian approach and leverage the finite element method to cast the statistical problem as a nonlinear Gaussian state-space model, updating the solution, in receipt of data, in a filtering framework. The method is applicable to problems across science and engineering for which finite element methods are appropriate. The Korteweg-de Vries equation for solitons is presented because it reflects the necessary complexity while being suitably familiar and succinct for pedagogical purposes. We present two algorithms to implement this method, based on the extended and ensemble Kalman filters, and demonstrate effectiveness with a simulation study and a case study with experimental data. The generality of our approach is demonstrated in SI Appendix, where we present examples from additional nonlinear, time-dependent partial differential equations (Burgers equation, Kuramoto-Sivashinsky equation).

3.
PLoS Comput Biol ; 14(3): e1005995, 2018 03.
Artigo em Inglês | MEDLINE | ID: mdl-29518076

RESUMO

Passive acoustic sensing has emerged as a powerful tool for quantifying anthropogenic impacts on biodiversity, especially for echolocating bat species. To better assess bat population trends there is a critical need for accurate, reliable, and open source tools that allow the detection and classification of bat calls in large collections of audio recordings. The majority of existing tools are commercial or have focused on the species classification task, neglecting the important problem of first localizing echolocation calls in audio which is particularly problematic in noisy recordings. We developed a convolutional neural network based open-source pipeline for detecting ultrasonic, full-spectrum, search-phase calls produced by echolocating bats. Our deep learning algorithms were trained on full-spectrum ultrasonic audio collected along road-transects across Europe and labelled by citizen scientists from www.batdetective.org. When compared to other existing algorithms and commercial systems, we show significantly higher detection performance of search-phase echolocation calls with our test sets. As an example application, we ran our detection pipeline on bat monitoring data collected over five years from Jersey (UK), and compared results to a widely-used commercial system. Our detection pipeline can be used for the automatic detection and monitoring of bat populations, and further facilitates their use as indicator species on a large scale. Our proposed pipeline makes only a small number of bat specific design decisions, and with appropriate training data it could be applied to detecting other species in audio. A crucial novelty of our work is showing that with careful, non-trivial, design and implementation considerations, state-of-the-art deep learning methods can be used for accurate and efficient monitoring in audio.


Assuntos
Quirópteros/fisiologia , Ecolocação/fisiologia , Monitoramento Ambiental/métodos , Aprendizado de Máquina , Processamento de Sinais Assistido por Computador , Algoritmos , Animais , Quirópteros/classificação , Biologia Computacional , Ecolocação/classificação , Espécies em Perigo de Extinção , Redes Neurais de Computação , Zoologia
4.
Nucleic Acids Res ; 43(5): 2780-9, 2015 Mar 11.
Artigo em Inglês | MEDLINE | ID: mdl-25712098

RESUMO

Cell cycle progression is orchestrated by E2F factors. We previously reported that in ETS-driven cancers of the bone and prostate, activating E2F3 cooperates with ETS on target promoters. The mechanism of target co-regulation remained unknown. Using RNAi and time-resolved chromatin-immunoprecipitation in Ewing sarcoma we report replacement of E2F3/pRB by constitutively expressed repressive E2F4/p130 complexes on target genes upon EWS-FLI1 modulation. Using mathematical modeling we interrogated four alternative explanatory models for the observed EWS-FLI1/E2F3 cooperation based on longitudinal E2F target and regulating transcription factor expression analysis. Bayesian model selection revealed the formation of a synergistic complex between EWS-FLI1 and E2F3 as the by far most likely mechanism explaining the observed kinetics of E2F target induction. Consequently we propose that aberrant cell cycle activation in Ewing sarcoma is due to the de-repression of E2F targets as a consequence of transcriptional induction and physical recruitment of E2F3 by EWS-FLI1 replacing E2F4 on their target promoters.


Assuntos
Fator de Transcrição E2F3/metabolismo , Fator de Transcrição E2F4/metabolismo , Regulação Neoplásica da Expressão Gênica , Proteínas de Fusão Oncogênica/metabolismo , Proteína Proto-Oncogênica c-fli-1/metabolismo , Proteína EWS de Ligação a RNA/metabolismo , Algoritmos , Teorema de Bayes , Ciclo Celular/genética , Linhagem Celular Tumoral , Imunoprecipitação da Cromatina , Fator de Transcrição E2F3/genética , Fator de Transcrição E2F4/genética , Humanos , Immunoblotting , Modelos Genéticos , Proteínas de Fusão Oncogênica/genética , Regiões Promotoras Genéticas/genética , Ligação Proteica , Proteína Proto-Oncogênica c-fli-1/genética , Interferência de RNA , Proteína EWS de Ligação a RNA/genética , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Sarcoma de Ewing/genética , Sarcoma de Ewing/metabolismo , Sarcoma de Ewing/patologia
5.
Biophys J ; 111(2): 333-348, 2016 Jul 26.
Artigo em Inglês | MEDLINE | ID: mdl-27463136

RESUMO

The stochastic behavior of single ion channels is most often described as an aggregated continuous-time Markov process with discrete states. For ligand-gated channels each state can represent a different conformation of the channel protein or a different number of bound ligands. Single-channel recordings show only whether the channel is open or shut: states of equal conductance are aggregated, so transitions between them have to be inferred indirectly. The requirement to filter noise from the raw signal further complicates the modeling process, as it limits the time resolution of the data. The consequence of the reduced bandwidth is that openings or shuttings that are shorter than the resolution cannot be observed; these are known as missed events. Postulated models fitted using filtered data must therefore explicitly account for missed events to avoid bias in the estimation of rate parameters and therefore assess parameter identifiability accurately. In this article, we present the first, to our knowledge, Bayesian modeling of ion-channels with exact missed events correction. Bayesian analysis represents uncertain knowledge of the true value of model parameters by considering these parameters as random variables. This allows us to gain a full appreciation of parameter identifiability and uncertainty when estimating values for model parameters. However, Bayesian inference is particularly challenging in this context as the correction for missed events increases the computational complexity of the model likelihood. Nonetheless, we successfully implemented a two-step Markov chain Monte Carlo method that we called "BICME", which performs Bayesian inference in models of realistic complexity. The method is demonstrated on synthetic and real single-channel data from muscle nicotinic acetylcholine channels. We show that parameter uncertainty can be characterized more accurately than with maximum-likelihood methods. Our code for performing inference in these ion channel models is publicly available.


Assuntos
Canais Iônicos/metabolismo , Modelos Biológicos , Teorema de Bayes , Cadeias de Markov , Método de Monte Carlo
6.
Anal Chem ; 88(2): 1147-53, 2016 Jan 19.
Artigo em Inglês | MEDLINE | ID: mdl-26698880

RESUMO

A significant advantage of using surface enhanced Raman scattering (SERS) for DNA detection is the capability to detect multiple analytes simultaneously within the one sample. However, as the analytes approach the metallic surface required for SERS, they become more concentrated and previous studies have suggested that different dye labels will have different affinities for the metal surface. Here, the interaction of single stranded DNA labeled with either fluorescein (FAM) or tetramethylrhodamine (TAMRA) with a metal surface, using spermine induced aggregated silver nanoparticles as the SERS substrate, is investigated by analyzing the labels separately and in mixtures. Comparison studies were also undertaken using the dyes in their free isothiocyanate forms, fluorescein isothiocyanate (F-ITC) and tetramethylrhodamine isothiocyanate (TR-ITC). When the two dyes are premixed prior to the addition of nanoparticles, TAMRA exerts a strong masking effect over FAM due to a stronger affinity for the metal surface. When parameters such as order of analyte addition, analysis time, and analyte concentration are investigated, the masking effect of TAMRA is still observed but the extent changes depending on the experimental parameters. By using bootstrap estimation of changes in SERS peak intensity, a greater insight has been achieved into the surface affinity of the two dyes as well as how they interact with each other. It has been shown that the order of addition of the analytes is important and that specific dye related interactions occur, which could greatly affect the observed SERS spectra. SERS has been used successfully for the simultaneous detection of several analytes; however, this work has highlighted the significant factors that must be taken into consideration when planning a multiple analyte assay.


Assuntos
DNA/química , Fluoresceína/química , Corantes Fluorescentes/química , Rodaminas/química , Análise Espectral Raman/métodos , Sequência de Bases , Estrutura Molecular , Propriedades de Superfície
7.
Bioinformatics ; 30(20): 2991-2, 2014 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-25005749

RESUMO

SUMMARY: We present a new C implementation of an advanced Markov chain Monte Carlo (MCMC) method for the sampling of ordinary differential equation (ode) model parameters. The software mcmc_clib uses the simplified manifold Metropolis-adjusted Langevin algorithm (SMMALA), which is locally adaptive; it uses the parameter manifold's geometry (the Fisher information) to make efficient moves. This adaptation does not diminish with MC length, which is highly advantageous compared with adaptive Metropolis techniques when the parameters have large correlations and/or posteriors substantially differ from multivariate Gaussians. The software is standalone (not a toolbox), though dependencies include the GNU scientific library and sundials libraries for ode integration and sensitivity analysis. AVAILABILITY AND IMPLEMENTATION: The source code and binary files are freely available for download at http://a-kramer.github.io/mcmc_clib/. This also includes example files and data. A detailed documentation, an example model and user manual are provided with the software. CONTACT: andrei.kramer@ist.uni-stuttgart.de.


Assuntos
Algoritmos , Cadeias de Markov , Modelos Estatísticos , Método de Monte Carlo , Software
8.
PNAS Nexus ; 3(4): pgae063, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38560526

RESUMO

Network structures underlie the dynamics of many complex phenomena, from gene regulation and foodwebs to power grids and social media. Yet, as they often cannot be observed directly, their connectivities must be inferred from observations of the dynamics to which they give rise. In this work, we present a powerful computational method to infer large network adjacency matrices from time series data using a neural network, in order to provide uncertainty quantification on the prediction in a manner that reflects both the degree to which the inference problem is underdetermined as well as the noise on the data. This is a feature that other approaches have hitherto been lacking. We demonstrate our method's capabilities by inferring line failure locations in the British power grid from its response to a power cut, providing probability densities on each edge and allowing the use of hypothesis testing to make meaningful probabilistic statements about the location of the cut. Our method is significantly more accurate than both Markov-chain Monte Carlo sampling and least squares regression on noisy data and when the problem is underdetermined, while naturally extending to the case of nonlinear dynamics, which we demonstrate by learning an entire cost matrix for a nonlinear model of economic activity in Greater London. Not having been specifically engineered for network inference, this method in fact represents a general parameter estimation scheme that is applicable to any high-dimensional parameter space.

9.
Neural Comput ; 25(3): 567-625, 2013 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-23272919

RESUMO

This review examines kernel methods for online learning, in particular, multiclass classification. We examine margin-based approaches, stemming from Rosenblatt's original perceptron algorithm, as well as nonparametric probabilistic approaches that are based on the popular gaussian process framework. We also examine approaches to online learning that use combinations of kernels--online multiple kernel learning. We present empirical validation of a wide range of methods on a protein fold recognition data set, where different biological feature types are available, and two object recognition data sets, Caltech101 and Caltech256, where multiple feature spaces are available in terms of different image feature extraction methods.

10.
Neural Comput ; 24(6): 1462-86, 2012 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-22364499

RESUMO

This letter considers how a number of modern Markov chain Monte Carlo (MCMC) methods can be applied for parameter estimation and inference in state-space models with point process observations. We quantified the efficiencies of these MCMC methods on synthetic data, and our results suggest that the Reimannian manifold Hamiltonian Monte Carlo method offers the best performance. We further compared such a method with a previously tested variational Bayes method on two experimental data sets. Results indicate similar performance on the large data sets and superior performance on small ones. The work offers an extensive suite of MCMC algorithms evaluated on an important class of models for physiological signal analysis.


Assuntos
Cadeias de Markov , Modelos Neurológicos , Método de Monte Carlo , Algoritmos , Teorema de Bayes , Simulação por Computador , Neurônios/fisiologia
11.
Mol Cell Proteomics ; 9(11): 2424-37, 2010 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-20616184

RESUMO

Because of its availability, ease of collection, and correlation with physiology and pathology, urine is an attractive source for clinical proteomics/peptidomics. However, the lack of comparable data sets from large cohorts has greatly hindered the development of clinical proteomics. Here, we report the establishment of a reproducible, high resolution method for peptidome analysis of naturally occurring human urinary peptides and proteins, ranging from 800 to 17,000 Da, using samples from 3,600 individuals analyzed by capillary electrophoresis coupled to MS. All processed data were deposited in an Structured Query Language (SQL) database. This database currently contains 5,010 relevant unique urinary peptides that serve as a pool of potential classifiers for diagnosis and monitoring of various diseases. As an example, by using this source of information, we were able to define urinary peptide biomarkers for chronic kidney diseases, allowing diagnosis of these diseases with high accuracy. Application of the chronic kidney disease-specific biomarker set to an independent test cohort in the subsequent replication phase resulted in 85.5% sensitivity and 100% specificity. These results indicate the potential usefulness of capillary electrophoresis coupled to MS for clinical applications in the analysis of naturally occurring urinary peptides.


Assuntos
Biomarcadores/urina , Falência Renal Crônica , Peptídeos/urina , Proteômica/métodos , Adulto , Idoso , Bases de Dados Factuais , Eletroforese Capilar/métodos , Feminino , Humanos , Falência Renal Crônica/diagnóstico , Falência Renal Crônica/urina , Masculino , Espectrometria de Massas/métodos , Pessoa de Meia-Idade , Curva ROC , Adulto Jovem
12.
Nucleic Acids Res ; 38(20): 6831-40, 2010 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-20571087

RESUMO

This article describes and illustrates a novel method of microarray data analysis that couples model-based clustering and binary classification to form clusters of `response-relevant' genes; that is, genes that are informative when discriminating between the different values of the response. Predictions are subsequently made using an appropriate statistical summary of each gene cluster, which we call the `meta-covariate' representation of the cluster, in a probit regression model. We first illustrate this method by analysing a leukaemia expression dataset, before focusing closely on the meta-covariate analysis of a renal gene expression dataset in a rat model of salt-sensitive hypertension. We explore the biological insights provided by our analysis of these data. In particular, we identify a highly influential cluster of 13 genes--including three transcription factors (Arntl, Bhlhe41 and Npas2)-that is implicated as being protective against hypertension in response to increased dietary sodium. Functional and canonical pathway analysis of this cluster using Ingenuity Pathway Analysis implicated transcriptional activation and circadian rhythm signalling, respectively. Although we illustrate our method using only expression data, the method is applicable to any high-dimensional datasets. Expression data are available at ArrayExpress (accession number E-MEXP-2514) and code is available at http://www.dcs.gla.ac.uk/inference/metacovariateanalysis/.


Assuntos
Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Animais , Ritmo Circadiano/genética , Análise por Conglomerados , Redes Reguladoras de Genes , Humanos , Hipertensão/genética , Hipertensão/metabolismo , Rim/metabolismo , Leucemia/genética , Leucemia/metabolismo , Ratos , Análise de Regressão
13.
PLoS Genet ; 4(5): e1000080, 2008 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-18497862

RESUMO

The functional consequences of missense variants in disease genes are difficult to predict. We assessed if gene expression profiles could distinguish between BRCA1 or BRCA2 pathogenic truncating and missense mutation carriers and familial breast cancer cases whose disease was not attributable to BRCA1 or BRCA2 mutations (BRCAX cases). 72 cell lines from affected women in high-risk breast ovarian families were assayed after exposure to ionising irradiation, including 23 BRCA1 carriers, 22 BRCA2 carriers, and 27 BRCAX individuals. A subset of 10 BRCAX individuals carried rare BRCA1/2 sequence variants considered to be of low clinical significance (LCS). BRCA1 and BRCA2 mutation carriers had similar expression profiles, with some subclustering of missense mutation carriers. The majority of BRCAX individuals formed a distinct cluster, but BRCAX individuals with LCS variants had expression profiles similar to BRCA1/2 mutation carriers. Gaussian Process Classifier predicted BRCA1, BRCA2 and BRCAX status, with a maximum of 62% accuracy, and prediction accuracy decreased with inclusion of BRCAX samples carrying an LCS variant, and inclusion of pathogenic missense carriers. Similarly, prediction of mutation status with gene lists derived using Support Vector Machines was good for BRCAX samples without an LCS variant (82-94%), poor for BRCAX with an LCS (40-50%), and improved for pathogenic BRCA1/2 mutation carriers when the gene list used for prediction was appropriate to mutation effect being tested (71-100%). This study indicates that mutation effect, and presence of rare variants possibly associated with a low risk of cancer, must be considered in the development of array-based assays of variant pathogenicity.


Assuntos
Proteína BRCA1/genética , Proteína BRCA2/genética , Expressão Gênica/efeitos da radiação , Linfócitos/fisiologia , Mutação de Sentido Incorreto , Proteínas Reguladoras de Apoptose , Linhagem Celular Tumoral , Análise Mutacional de DNA , Feminino , Perfilação da Expressão Gênica , Humanos , Linfócitos/efeitos da radiação , Radiação Ionizante , Células Tumorais Cultivadas
14.
Nat Comput Sci ; 1(5): 313-320, 2021 May.
Artigo em Inglês | MEDLINE | ID: mdl-38217216

RESUMO

Mathematical modeling and simulation are moving from being powerful development and analysis tools towards having increased roles in operational monitoring, control and decision support, in which models of specific entities are continually updated in the form of a digital twin. However, current digital twins are largely the result of bespoke technical solutions that are difficult to scale. We discuss two exemplar applications that motivate challenges and opportunities for scaling digital twins, and that underscore potential barriers to wider adoption of this technology.

15.
BMC Bioinformatics ; 11: 594, 2010 Dec 10.
Artigo em Inglês | MEDLINE | ID: mdl-21208396

RESUMO

BACKGROUND: The purpose of this manuscript is to provide, based on an extensive analysis of a proteomic data set, suggestions for proper statistical analysis for the discovery of sets of clinically relevant biomarkers. As tractable example we define the measurable proteomic differences between apparently healthy adult males and females. We choose urine as body-fluid of interest and CE-MS, a thoroughly validated platform technology, allowing for routine analysis of a large number of samples. The second urine of the morning was collected from apparently healthy male and female volunteers (aged 21-40) in the course of the routine medical check-up before recruitment at the Hannover Medical School. RESULTS: We found that the Wilcoxon-test is best suited for the definition of potential biomarkers. Adjustment for multiple testing is necessary. Sample size estimation can be performed based on a small number of observations via resampling from pilot data. Machine learning algorithms appear ideally suited to generate classifiers. Assessment of any results in an independent test-set is essential. CONCLUSIONS: Valid proteomic biomarkers for diagnosis and prognosis only can be defined by applying proper statistical data mining procedures. In particular, a justification of the sample size should be part of the study design.


Assuntos
Biomarcadores/urina , Proteômica/métodos , Adulto , Algoritmos , Eletroforese Capilar , Feminino , Humanos , Masculino , Espectrometria de Massas , Valores de Referência , Tamanho da Amostra , Adulto Jovem
16.
Bioinformatics ; 25(4): 512-8, 2009 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-19095699

RESUMO

MOTIVATION: High-accuracy mass spectrometry is a popular technology for high-throughput measurements of cellular metabolites (metabolomics). One of the major challenges is the correct identification of the observed mass peaks, including the assignment of their empirical formula, based on the measured mass. RESULTS: We propose a novel probabilistic method for the assignment of empirical formulas to mass peaks in high-throughput metabolomics mass spectrometry measurements. The method incorporates information about possible biochemical transformations between the empirical formulas to assign higher probability to formulas that could be created from other metabolites in the sample. In a series of experiments, we show that the method performs well and provides greater insight than assignments based on mass alone. In addition, we extend the model to incorporate isotope information to achieve even more reliable formula identification. AVAILABILITY: A supplementary document, Matlab code, data and further information are available from http://www.dcs.gla.ac.uk/inference/metsamp.


Assuntos
Espectrometria de Massas/métodos , Metabolômica/métodos , Simulação por Computador , Proteoma/metabolismo
17.
Parasitology ; 137(9): 1333-41, 2010 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-20444304

RESUMO

African trypanosomes have emerged as promising unicellular model organisms for the next generation of systems biology. They offer unique advantages, due to their relative simplicity, the availability of all standard genomics techniques and a long history of quantitative research. Reproducible cultivation methods exist for morphologically and physiologically distinct life-cycle stages. The genome has been sequenced, and microarrays, RNA-interference and high-accuracy metabolomics are available. Furthermore, the availability of extensive kinetic data on all glycolytic enzymes has led to the early development of a complete, experiment-based dynamic model of an important biochemical pathway. Here we describe the achievements of trypanosome systems biology so far and outline the necessary steps towards the ambitious aim of creating a 'Silicon Trypanosome', a comprehensive, experiment-based, multi-scale mathematical model of trypanosome physiology. We expect that, in the long run, the quantitative modelling enabled by the Silicon Trypanosome will play a key role in selecting the most suitable targets for developing new anti-parasite drugs.


Assuntos
Parasitologia/métodos , Parasitologia/tendências , Biologia de Sistemas , Trypanosoma/genética , Trypanosoma/metabolismo , Biologia de Sistemas/métodos , Biologia de Sistemas/tendências
18.
Sci Total Environ ; 745: 140846, 2020 Nov 25.
Artigo em Inglês | MEDLINE | ID: mdl-32717598

RESUMO

The increased use of the urban subsurface for competing purposes, such as anthropogenic infrastructures and geothermal energy applications, leads to an urgent need for large-scale sophisticated modelling approaches for coupled mass and heat transfer. However, such models are subject to large uncertainties in model parameters, the physical model itself and in available measured data, which is often rare. Thus, the robustness and reliability of the computer model and its outcomes largely depend on successful parameter estimation and model calibration, which are hampered by the computational burden of large-scale coupled models. To tackle this problem, we develop a novel Bayesian approach for parameter estimation, which allows us to account for different sources of uncertainty, is capable of dealing with sparse field data and makes optimal use of the output data from expensive numerical model runs. This is achieved by combining output data from different models that represent the same physical problem, but at different levels of fidelity, e.g. reflected by different spatial resolution. By applying this new approach to a 1D analytical heat transfer model and a large-scale semi-3D numerical model while using synthetic data, we show that the accuracy and precision of parameter estimation by this multi-fidelity framework by far exceeds the standard single-fidelity results. The consideration of different error terms in the Bayesian framework also allows assessment of the model bias and the discrepancy between the different fidelity levels. These are emulated by Gaussian Process models, which facilitate re-iteration of the parameter estimation without additional model runs.

19.
Bioinformatics ; 24(10): 1264-70, 2008 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-18378524

RESUMO

MOTIVATION: The problems of protein fold recognition and remote homology detection have recently attracted a great deal of interest as they represent challenging multi-feature multi-class problems for which modern pattern recognition methods achieve only modest levels of performance. As with many pattern recognition problems, there are multiple feature spaces or groups of attributes available, such as global characteristics like the amino-acid composition (C), predicted secondary structure (S), hydrophobicity (H), van der Waals volume (V), polarity (P), polarizability (Z), as well as attributes derived from local sequence alignment such as the Smith-Waterman scores. This raises the need for a classification method that is able to assess the contribution of these potentially heterogeneous object descriptors while utilizing such information to improve predictive performance. To that end, we offer a single multi-class kernel machine that informatively combines the available feature groups and, as is demonstrated in this article, is able to provide the state-of-the-art in performance accuracy on the fold recognition problem. Furthermore, the proposed approach provides some insight by assessing the significance of recently introduced protein features and string kernels. The proposed method is well-founded within a Bayesian hierarchical framework and a variational Bayes approximation is derived which allows for efficient CPU processing times. RESULTS: The best performance which we report on the SCOP PDB-40D benchmark data-set is a 70% accuracy by combining all the available feature groups from global protein characteristics but also including sequence-alignment features. We offer an 8% improvement on the best reported performance that combines multi-class k-nn classifiers while at the same time reducing computational costs and assessing the predictive power of the various available features. Furthermore, we examine the performance of our methodology on the SCOP 1.53 benchmark data-set that simulates remote homology detection and examine the combination of various state-of-the-art string kernels that have recently been proposed.


Assuntos
Inteligência Artificial , Modelos Químicos , Reconhecimento Automatizado de Padrão/métodos , Proteínas/química , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Homologia de Sequência de Aminoácidos , Algoritmos , Sequência de Aminoácidos , Simulação por Computador , Interpretação Estatística de Dados , Dados de Sequência Molecular , Dobramento de Proteína
20.
Bioinformatics ; 24(6): 833-9, 2008 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-18057018

RESUMO

MOTIVATION: There often are many alternative models of a biochemical system. Distinguishing models and finding the most suitable ones is an important challenge in Systems Biology, as such model ranking, by experimental evidence, will help to judge the support of the working hypotheses forming each model. Bayes factors are employed as a measure of evidential preference for one model over another. Marginal likelihood is a key component of Bayes factors, however computing the marginal likelihood is a difficult problem, as it involves integration of nonlinear functions in multidimensional space. There are a number of methods available to compute the marginal likelihood approximately. A detailed investigation of such methods is required to find ones that perform appropriately for biochemical modelling. RESULTS: We assess four methods for estimation of the marginal likelihoods required for computing Bayes factors. The Prior Arithmetic Mean estimator, the Posterior Harmonic Mean estimator, the Annealed Importance Sampling and the Annealing-Melting Integration methods are investigated and compared on a typical case study in Systems Biology. This allows us to understand the stability of the analysis results and make reliable judgements in uncertain context. We investigate the variance of Bayes factor estimates, and highlight the stability of the Annealed Importance Sampling and the Annealing-Melting Integration methods for the purposes of comparing nonlinear models. AVAILABILITY: Models used in this study are available in SBML format as the supplementary material to this article.


Assuntos
Algoritmos , Inteligência Artificial , Modelos Biológicos , Proteoma/metabolismo , Transdução de Sinais/fisiologia , Teorema de Bayes , Bioquímica/métodos , Simulação por Computador , Reconhecimento Automatizado de Padrão/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA