RESUMO
Reconstructing biological networks using high-throughput technologies has the potential to produce condition-specific interactomes. But are these reconstructed networks a reliable source of biological interactions? Do some network inference methods offer dramatically improved performance on certain types of networks? To facilitate the use of network inference methods in systems biology, we report a large-scale simulation study comparing the ability of Markov chain Monte Carlo (MCMC) samplers to reverse engineer Bayesian networks. The MCMC samplers we investigated included foundational and state-of-the-art Metropolis-Hastings and Gibbs sampling approaches, as well as novel samplers we have designed. To enable a comprehensive comparison, we simulated gene expression and genetics data from known network structures under a range of biologically plausible scenarios. We examine the overall quality of network inference via different methods, as well as how their performance is affected by network characteristics. Our simulations reveal that network size, edge density, and strength of gene-to-gene signaling are major parameters that differentiate the performance of various samplers. Specifically, more recent samplers including our novel methods outperform traditional samplers for highly interconnected large networks with strong gene-to-gene signaling. Our newly developed samplers show comparable or superior performance to the top existing methods. Moreover, this performance gain is strongest in networks with biologically oriented topology, which indicates that our novel samplers are suitable for inferring biological networks. The performance of MCMC samplers in this simulation framework can guide the choice of methods for network reconstruction using systems genetics data.
Assuntos
Algoritmos , Redes Reguladoras de Genes , Modelos Genéticos , Teorema de Bayes , Cadeias de Markov , Método de Monte CarloRESUMO
Although molecular prognostics in breast cancer are among the most successful examples of translating genomic analysis to clinical applications, optimal approaches to breast cancer clinical risk prediction remain controversial. The Sage Bionetworks-DREAM Breast Cancer Prognosis Challenge (BCC) is a crowdsourced research study for breast cancer prognostic modeling using genome-scale data. The BCC provided a community of data analysts with a common platform for data access and blinded evaluation of model accuracy in predicting breast cancer survival on the basis of gene expression data, copy number data, and clinical covariates. This approach offered the opportunity to assess whether a crowdsourced community Challenge would generate models of breast cancer prognosis commensurate with or exceeding current best-in-class approaches. The BCC comprised multiple rounds of blinded evaluations on held-out portions of data on 1981 patients, resulting in more than 1400 models submitted as open source code. Participants then retrained their models on the full data set of 1981 samples and submitted up to five models for validation in a newly generated data set of 184 breast cancer patients. Analysis of the BCC results suggests that the best-performing modeling strategy outperformed previously reported methods in blinded evaluations; model performance was consistent across several independent evaluations; and aggregating community-developed models achieved performance on par with the best-performing individual models.
Assuntos
Neoplasias da Mama/diagnóstico , Neoplasias da Mama/genética , Modelos Biológicos , Bases de Dados Genéticas , Feminino , Humanos , Pessoa de Meia-Idade , Prognóstico , Análise de Sobrevida , Fatores de TempoRESUMO
Kinetic Monte Carlo on coarse-grained systems, such as nucleic acid secondary structure, is advantageous for being able to access behavior at long time scales, even minutes or hours. Transition rates between coarse-grained states depend upon intermediate barriers, which are not directly simulated. We propose an Arrhenius rate model and an intermediate energy model that incorporates the effects of the barrier between simulated states without enlarging the state space itself. Applying our Arrhenius rate model to DNA hairpin folding, we demonstrate improved agreement with experiment compared to the usual kinetic Monte Carlo model. Further improvement results from including rigidity of single-stranded stacking.