Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 38
Filtrar
1.
PLoS Genet ; 18(1): e1009975, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-35085229

RESUMO

Clustering genetic variants based on their associations with different traits can provide insight into their underlying biological mechanisms. Existing clustering approaches typically group variants based on the similarity of their association estimates for various traits. We present a new procedure for clustering variants based on their proportional associations with different traits, which is more reflective of the underlying mechanisms to which they relate. The method is based on a mixture model approach for directional clustering and includes a noise cluster that provides robustness to outliers. The procedure performs well across a range of simulation scenarios. In an applied setting, clustering genetic variants associated with body mass index generates groups reflective of distinct biological pathways. Mendelian randomization analyses support that the clusters vary in their effect on coronary heart disease, including one cluster that represents elevated body mass index with a favourable metabolic profile and reduced coronary heart disease risk. Analysis of the biological pathways underlying this cluster identifies inflammation as potentially explaining differences in the effects of increased body mass index on coronary heart disease.


Assuntos
Biologia Computacional/métodos , Variação Genética , Obesidade/genética , Índice de Massa Corporal , Análise por Conglomerados , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Humanos , Análise da Randomização Mendeliana , Modelos Genéticos
2.
BMC Bioinformatics ; 24(1): 161, 2023 Apr 21.
Artigo em Inglês | MEDLINE | ID: mdl-37085771

RESUMO

In this paper we propose PIICM, a probabilistic framework for dose-response prediction in high-throughput drug combination datasets. PIICM utilizes a permutation invariant version of the intrinsic co-regionalization model for multi-output Gaussian process regression, to predict dose-response surfaces in untested drug combination experiments. Coupled with an observation model that incorporates experimental uncertainty, PIICM is able to learn from noisily observed cell-viability measurements in settings where the underlying dose-response experiments are of varying quality, utilize different experimental designs, and the resulting training dataset is sparsely observed. We show that the model can accurately predict dose-response in held out experiments, and the resulting function captures relevant features indicating synergistic interaction between drugs.


Assuntos
Projetos de Pesquisa , Incerteza , Combinação de Medicamentos
3.
Biostatistics ; 24(1): 85-107, 2022 12 12.
Artigo em Inglês | MEDLINE | ID: mdl-34363680

RESUMO

Risk prediction models are a crucial tool in healthcare. Risk prediction models with a binary outcome (i.e., binary classification models) are often constructed using methodology which assumes the costs of different classification errors are equal. In many healthcare applications, this assumption is not valid, and the differences between misclassification costs can be quite large. For instance, in a diagnostic setting, the cost of misdiagnosing a person with a life-threatening disease as healthy may be larger than the cost of misdiagnosing a healthy person as a patient. In this article, we present Tailored Bayes (TB), a novel Bayesian inference framework which "tailors" model fitting to optimize predictive performance with respect to unbalanced misclassification costs. We use simulation studies to showcase when TB is expected to outperform standard Bayesian methods in the context of logistic regression. We then apply TB to three real-world applications, a cardiac surgery, a breast cancer prognostication task, and a breast cancer tumor classification task and demonstrate the improvement in predictive performance over standard methods.


Assuntos
Neoplasias da Mama , Modelos Estatísticos , Humanos , Feminino , Teorema de Bayes , Modelos Logísticos , Simulação por Computador , Neoplasias da Mama/diagnóstico
4.
Bioinformatics ; 38(9): 2529-2535, 2022 04 28.
Artigo em Inglês | MEDLINE | ID: mdl-35191485

RESUMO

MOTIVATION: Inferring the parameters of models describing biological systems is an important problem in the reverse engineering of the mechanisms underlying these systems. Much work has focused on parameter inference of stochastic and ordinary differential equation models using Approximate Bayesian Computation (ABC). While there is some recent work on inference in spatial models, this remains an open problem. Simultaneously, advances in topological data analysis (TDA), a field of computational mathematics, have enabled spatial patterns in data to be characterized. RESULTS: Here, we focus on recent work using TDA to study different regimes of parameter space for a well-studied model of angiogenesis. We propose a method for combining TDA with ABC to infer parameters in the Anderson-Chaplain model of angiogenesis. We demonstrate that this topological approach outperforms ABC approaches that use simpler statistics based on spatial features of the data. This is a first step toward a general framework of spatial parameter inference for biological systems, for which there may be a variety of filtrations, vectorizations and summary statistics to be considered. AVAILABILITY AND IMPLEMENTATION: All code used to produce our results is available as a Snakemake workflow from github.com/tt104/tabc_angio.


Assuntos
Algoritmos , Teorema de Bayes , Simulação por Computador
5.
BMC Bioinformatics ; 23(1): 290, 2022 Jul 21.
Artigo em Inglês | MEDLINE | ID: mdl-35864476

RESUMO

BACKGROUND: Cluster analysis is an integral part of precision medicine and systems biology, used to define groups of patients or biomolecules. Consensus clustering is an ensemble approach that is widely used in these areas, which combines the output from multiple runs of a non-deterministic clustering algorithm. Here we consider the application of consensus clustering to a broad class of heuristic clustering algorithms that can be derived from Bayesian mixture models (and extensions thereof) by adopting an early stopping criterion when performing sampling-based inference for these models. While the resulting approach is non-Bayesian, it inherits the usual benefits of consensus clustering, particularly in terms of computational scalability and providing assessments of clustering stability/robustness. RESULTS: In simulation studies, we show that our approach can successfully uncover the target clustering structure, while also exploring different plausible clusterings of the data. We show that, when a parallel computation environment is available, our approach offers significant reductions in runtime compared to performing sampling-based Bayesian inference for the underlying model, while retaining many of the practical benefits of the Bayesian approach, such as exploring different numbers of clusters. We propose a heuristic to decide upon ensemble size and the early stopping criterion, and then apply consensus clustering to a clustering algorithm derived from a Bayesian integrative clustering method. We use the resulting approach to perform an integrative analysis of three 'omics datasets for budding yeast and find clusters of co-expressed genes with shared regulatory proteins. We validate these clusters using data external to the analysis. CONCLUSTIONS: Our approach can be used as a wrapper for essentially any existing sampling-based Bayesian clustering implementation, and enables meaningful clustering analyses to be performed using such implementations, even when computational Bayesian inference is not feasible, e.g. due to poor exploration of the target density (often as a result of increasing numbers of features) or a limited computational budget that does not along sufficient samples to drawn from a single chain. This enables researchers to straightforwardly extend the applicability of existing software to much larger datasets, including implementations of sophisticated models such as those that jointly model multiple datasets.


Assuntos
Algoritmos , Software , Teorema de Bayes , Análise por Conglomerados , Consenso , Humanos
6.
Bioinformatics ; 37(4): 531-541, 2021 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-32915962

RESUMO

MOTIVATION: Mendelian randomization is an epidemiological technique that uses genetic variants as instrumental variables to estimate the causal effect of a risk factor on an outcome. We consider a scenario in which causal estimates based on each variant in turn differ more strongly than expected by chance alone, but the variants can be divided into distinct clusters, such that all variants in the cluster have similar causal estimates. This scenario is likely to occur when there are several distinct causal mechanisms by which a risk factor influences an outcome with different magnitudes of causal effect. We have developed an algorithm MR-Clust that finds such clusters of variants, and so can identify variants that reflect distinct causal mechanisms. Two features of our clustering algorithm are that it accounts for differential uncertainty in the causal estimates, and it includes 'null' and 'junk' clusters, to provide protection against the detection of spurious clusters. RESULTS: Our algorithm correctly detected the number of clusters in a simulation analysis, outperforming methods that either do not account for uncertainty or do not include null and junk clusters. In an applied example considering the effect of blood pressure on coronary artery disease risk, the method detected four clusters of genetic variants. A post hoc hypothesis-generating search suggested that variants in the cluster with a negative effect of blood pressure on coronary artery disease risk were more strongly related to trunk fat percentage and other adiposity measures than variants not in this cluster. AVAILABILITY AND IMPLEMENTATION: MR-Clust can be downloaded from https://github.com/cnfoley/mrclust. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Análise da Randomização Mendeliana , Causalidade , Análise por Conglomerados , Simulação por Computador , Fatores de Risco
7.
Bioinformatics ; 36(18): 4789-4796, 2020 09 15.
Artigo em Inglês | MEDLINE | ID: mdl-32592464

RESUMO

MOTIVATION: Diverse applications-particularly in tumour subtyping-have demonstrated the importance of integrative clustering techniques for combining information from multiple data sources. Cluster Of Clusters Analysis (COCA) is one such approach that has been widely applied in the context of tumour subtyping. However, the properties of COCA have never been systematically explored, and its robustness to the inclusion of noisy datasets is unclear. RESULTS: We rigorously benchmark COCA, and present Kernel Learning Integrative Clustering (KLIC) as an alternative strategy. KLIC frames the challenge of combining clustering structures as a multiple kernel learning problem, in which different datasets each provide a weighted contribution to the final clustering. This allows the contribution of noisy datasets to be down-weighted relative to more informative datasets. We compare the performances of KLIC and COCA in a variety of situations through simulation studies. We also present the output of KLIC and COCA in real data applications to cancer subtyping and transcriptional module discovery. AVAILABILITY AND IMPLEMENTATION: R packages klic and coca are available on the Comprehensive R Archive Network. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Neoplasias , Algoritmos , Análise por Conglomerados , Consenso , Humanos , Armazenamento e Recuperação da Informação , Neoplasias/genética
8.
Bioinformatics ; 36(5): 1484-1491, 2020 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-31608923

RESUMO

MOTIVATION: Many methods have been developed to cluster genes on the basis of their changes in mRNA expression over time, using bulk RNA-seq or microarray data. However, single-cell data may present a particular challenge for these algorithms, since the temporal ordering of cells is not directly observed. One way to address this is to first use pseudotime methods to order the cells, and then apply clustering techniques for time course data. However, pseudotime estimates are subject to high levels of uncertainty, and failing to account for this uncertainty is liable to lead to erroneous and/or over-confident gene clusters. RESULTS: The proposed method, GPseudoClust, is a novel approach that jointly infers pseudotemporal ordering and gene clusters, and quantifies the uncertainty in both. GPseudoClust combines a recent method for pseudotime inference with non-parametric Bayesian clustering methods, efficient Markov Chain Monte Carlo sampling and novel subsampling strategies which aid computation. We consider a broad array of simulated and experimental datasets to demonstrate the effectiveness of GPseudoClust in a range of settings. AVAILABILITY AND IMPLEMENTATION: An implementation is available on GitHub: https://github.com/magStra/nonparametricSummaryPSM and https://github.com/magStra/GPseudoClust. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Análise de Célula Única , Teorema de Bayes , Análise por Conglomerados , Cadeias de Markov
9.
PLoS Comput Biol ; 16(11): e1008288, 2020 11.
Artigo em Inglês | MEDLINE | ID: mdl-33166281

RESUMO

The cell is compartmentalised into complex micro-environments allowing an array of specialised biological processes to be carried out in synchrony. Determining a protein's sub-cellular localisation to one or more of these compartments can therefore be a first step in determining its function. High-throughput and high-accuracy mass spectrometry-based sub-cellular proteomic methods can now shed light on the localisation of thousands of proteins at once. Machine learning algorithms are then typically employed to make protein-organelle assignments. However, these algorithms are limited by insufficient and incomplete annotation. We propose a semi-supervised Bayesian approach to novelty detection, allowing the discovery of additional, previously unannotated sub-cellular niches. Inference in our model is performed in a Bayesian framework, allowing us to quantify uncertainty in the allocation of proteins to new sub-cellular niches, as well as in the number of newly discovered compartments. We apply our approach across 10 mass spectrometry based spatial proteomic datasets, representing a diverse range of experimental protocols. Application of our approach to hyperLOPIT datasets validates its utility by recovering enrichment with chromatin-associated proteins without annotation and uncovers sub-nuclear compartmentalisation which was not identified in the original analysis. Moreover, using sub-cellular proteomics data from Saccharomyces cerevisiae, we uncover a novel group of proteins trafficking from the ER to the early Golgi apparatus. Overall, we demonstrate the potential for novelty detection to yield biologically relevant niches that are missed by current approaches.


Assuntos
Teorema de Bayes , Proteínas de Saccharomyces cerevisiae/metabolismo , Frações Subcelulares/metabolismo , Algoritmos , Animais , Conjuntos de Dados como Assunto , Humanos , Aprendizado de Máquina , Espectrometria de Massas , Camundongos , Proteômica
10.
Stat Appl Genet Mol Biol ; 18(6)2019 12 12.
Artigo em Inglês | MEDLINE | ID: mdl-31829970

RESUMO

The Dirichlet Process (DP) mixture model has become a popular choice for model-based clustering, largely because it allows the number of clusters to be inferred. The sequential updating and greedy search (SUGS) algorithm (Wang & Dunson, 2011) was proposed as a fast method for performing approximate Bayesian inference in DP mixture models, by posing clustering as a Bayesian model selection (BMS) problem and avoiding the use of computationally costly Markov chain Monte Carlo methods. Here we consider how this approach may be extended to permit variable selection for clustering, and also demonstrate the benefits of Bayesian model averaging (BMA) in place of BMS. Through an array of simulation examples and well-studied examples from cancer transcriptomics, we show that our method performs competitively with the current state-of-the-art, while also offering computational benefits. We apply our approach to reverse-phase protein array (RPPA) data from The Cancer Genome Atlas (TCGA) in order to perform a pan-cancer proteomic characterisation of 5157 tumour samples. We have implemented our approach, together with the original SUGS algorithm, in an open-source R package named sugsvarsel, which accelerates analysis by performing intensive computations in C++ and provides automated parallel processing. The R package is freely available from: https://github.com/ococrook/sugsvarsel.


Assuntos
Biologia Computacional , Modelos Estatísticos , Neoplasias/metabolismo , Proteoma , Proteômica , Algoritmos , Teorema de Bayes , Biologia Computacional/métodos , Humanos , Proteômica/métodos
11.
PLoS Comput Biol ; 14(11): e1006516, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30481170

RESUMO

Analysis of the spatial sub-cellular distribution of proteins is of vital importance to fully understand context specific protein function. Some proteins can be found with a single location within a cell, but up to half of proteins may reside in multiple locations, can dynamically re-localise, or reside within an unknown functional compartment. These considerations lead to uncertainty in associating a protein to a single location. Currently, mass spectrometry (MS) based spatial proteomics relies on supervised machine learning algorithms to assign proteins to sub-cellular locations based on common gradient profiles. However, such methods fail to quantify uncertainty associated with sub-cellular class assignment. Here we reformulate the framework on which we perform statistical analysis. We propose a Bayesian generative classifier based on Gaussian mixture models to assign proteins probabilistically to sub-cellular niches, thus proteins have a probability distribution over sub-cellular locations, with Bayesian computation performed using the expectation-maximisation (EM) algorithm, as well as Markov-chain Monte-Carlo (MCMC). Our methodology allows proteome-wide uncertainty quantification, thus adding a further layer to the analysis of spatial proteomics. Our framework is flexible, allowing many different systems to be analysed and reveals new modelling opportunities for spatial proteomics. We find our methods perform competitively with current state-of-the art machine learning methods, whilst simultaneously providing more information. We highlight several examples where classification based on the support vector machine is unable to make any conclusions, while uncertainty quantification using our approach provides biologically intriguing results. To our knowledge this is the first Bayesian model of MS-based spatial proteomics data.


Assuntos
Teorema de Bayes , Modelos Teóricos , Proteômica , Algoritmos , Animais , Células-Tronco Embrionárias/metabolismo , Aprendizado de Máquina , Camundongos , Reprodutibilidade dos Testes , Frações Subcelulares/metabolismo , Incerteza
12.
Bioinformatics ; 32(18): 2863-5, 2016 09 15.
Artigo em Inglês | MEDLINE | ID: mdl-27153663

RESUMO

MOTIVATION: Many biochemical systems require stochastic descriptions. Unfortunately these can only be solved for the simplest cases and their direct simulation can become prohibitively expensive, precluding thorough analysis. As an alternative, moment closure approximation methods generate equations for the time-evolution of the system's moments and apply a closure ansatz to obtain a closed set of differential equations; that can become the basis for the deterministic analysis of the moments of the outputs of stochastic systems. RESULTS: We present a free, user-friendly tool implementing an efficient moment expansion approximation with parametric closures that integrates well with the IPython interactive environment. Our package enables the analysis of complex stochastic systems without any constraints on the number of species and moments studied and the type of rate laws in the system. In addition to the approximation method our package provides numerous tools to help non-expert users in stochastic analysis. AVAILABILITY AND IMPLEMENTATION: https://github.com/theosysbio/means CONTACTS: m.stumpf@imperial.ac.uk or e.lakatos13@imperial.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Software , Processos Estocásticos , Simulação por Computador , Expressão Gênica , Cinética , Modelos Estatísticos
13.
Stat Appl Genet Mol Biol ; 15(2): 107-22, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26992203

RESUMO

The rapid development of high throughput experimental techniques has resulted in a growing diversity of genomic datasets being produced and requiring analysis. Therefore, it is increasingly being recognized that we can gain deeper understanding about underlying biology by combining the insights obtained from multiple, diverse datasets. Thus we propose a novel scalable computational approach to unsupervised data fusion. Our technique exploits network representations of the data to identify similarities among the datasets. We may work within the Bayesian formalism, using Bayesian nonparametric approaches to model each dataset; or (for fast, approximate, and massive scale data fusion) can naturally switch to more heuristic modeling techniques. An advantage of the proposed approach is that each dataset can initially be modeled independently (in parallel), before applying a fast post-processing step to perform data integration. This allows us to incorporate new experimental data in an online fashion, without having to rerun all of the analysis. We first demonstrate the applicability of our tool on artificial data, and then on examples from the literature, which include yeast cell cycle, breast cancer and sporadic inclusion body myositis datasets.


Assuntos
Biologia Computacional , Bases de Dados Genéticas , Genômica , Saccharomyces cerevisiae/genética , Algoritmos , Teorema de Bayes , Humanos , Modelos Teóricos
14.
Stat Appl Genet Mol Biol ; 15(1): 83-6, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26910751

RESUMO

The integration of multi-dimensional datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct--but often complementary--information. However, the large amount of data adds burden to any inference task. Flexible Bayesian methods may reduce the necessity for strong modelling assumptions, but can also increase the computational burden. We present an improved implementation of a Bayesian correlated clustering algorithm, that permits integrated clustering to be routinely performed across multiple datasets, each with tens of thousands of items. By exploiting GPU based computation, we are able to improve runtime performance of the algorithm by almost four orders of magnitude. This permits analysis across genomic-scale data sets, greatly expanding the range of applications over those originally possible. MDI is available here: http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/.


Assuntos
Biologia Computacional/métodos , Genômica/métodos , Algoritmos , Análise por Conglomerados , Cadeias de Markov , Método de Monte Carlo , Software , Biologia de Sistemas/métodos
15.
PLoS Comput Biol ; 10(6): e1003650, 2014 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-24922483

RESUMO

Experimental design attempts to maximise the information available for modelling tasks. An optimal experiment allows the inferred models or parameters to be chosen with the highest expected degree of confidence. If the true system is faithfully reproduced by one of the models, the merit of this approach is clear - we simply wish to identify it and the true parameters with the most certainty. However, in the more realistic situation where all models are incorrect or incomplete, the interpretation of model selection outcomes and the role of experimental design needs to be examined more carefully. Using a novel experimental design and model selection framework for stochastic state-space models, we perform high-throughput in-silico analyses on families of gene regulatory cascade models, to show that the selected model can depend on the experiment performed. We observe that experimental design thus makes confidence a criterion for model choice, but that this does not necessarily correlate with a model's predictive power or correctness. Finally, in the special case of linear ordinary differential equation (ODE) models, we explore how wrong a model has to be before it influences the conclusions of a model selection analysis.


Assuntos
Modelos Biológicos , Biologia de Sistemas , Biologia Computacional , Simulação por Computador , Conceitos Matemáticos , Método de Monte Carlo , Transdução de Sinais
16.
J Chem Phys ; 143(9): 094107, 2015 Sep 07.
Artigo em Inglês | MEDLINE | ID: mdl-26342359

RESUMO

Stochastic effects dominate many chemical and biochemical processes. Their analysis, however, can be computationally prohibitively expensive and a range of approximation schemes have been proposed to lighten the computational burden. These, notably the increasingly popular linear noise approximation and the more general moment expansion methods, perform well for many dynamical regimes, especially linear systems. At higher levels of nonlinearity, it comes to an interplay between the nonlinearities and the stochastic dynamics, which is much harder to capture correctly by such approximations to the true stochastic processes. Moment-closure approaches promise to address this problem by capturing higher-order terms of the temporally evolving probability distribution. Here, we develop a set of multivariate moment-closures that allows us to describe the stochastic dynamics of nonlinear systems. Multivariate closure captures the way that correlations between different molecular species, induced by the reaction dynamics, interact with stochastic effects. We use multivariate Gaussian, gamma, and lognormal closure and illustrate their use in the context of two models that have proved challenging to the previous attempts at approximating stochastic dynamics: oscillations in p53 and Hes1. In addition, we consider a larger system, Erk-mediated mitogen-activated protein kinases signalling, where conventional stochastic simulation approaches incur unacceptably high computational costs.


Assuntos
Modelos Químicos , Processos Estocásticos , Cinética , Análise Multivariada
17.
Methodology (Gott) ; 73(2): 314-339, 2024 Mar 11.
Artigo em Inglês | MEDLINE | ID: mdl-38577633

RESUMO

The identification of sets of co-regulated genes that share a common function is a key question of modern genomics. Bayesian profile regression is a semi-supervised mixture modelling approach that makes use of a response to guide inference toward relevant clusterings. Previous applications of profile regression have considered univariate continuous, categorical, and count outcomes. In this work, we extend Bayesian profile regression to cases where the outcome is longitudinal (or multivariate continuous) and provide PReMiuMlongi, an updated version of PReMiuM, the R package for profile regression. We consider multivariate normal and Gaussian process regression response models and provide proof of principle applications to four simulation studies. The model is applied on budding yeast data to identify groups of genes co-regulated during the Saccharomyces cerevisiae cell cycle. We identify 4 distinct groups of genes associated with specific patterns of gene expression trajectories, along with the bound transcriptional factors, likely involved in their co-regulation process.

18.
Immunol Cell Biol ; 91(1): 60-9, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23165607

RESUMO

The recruitment and migration of macrophages and neutrophils is an important process during the early stages of the innate immune system in response to acute injury. Transgenic pu.1:EGFP zebrafish permit the acquisition of leukocyte migration trajectories during inflammation. Currently, these high-quality live-imaging data are mainly analysed using general statistics, for example, cell velocity. Here, we present a spatio-temporal analysis of the cell dynamics using transition matrices, which provide information of the type of cell migration. We find evidence that leukocytes exhibit types of migratory behaviour, which differ from previously described random walk processes. Dimethyl sulfoxide treatment decreased the level of persistence at early time points after wounding and ablated temporal dependencies observed in untreated embryos. We then use pharmacological inhibition of p38 and c-Jun N-terminal kinase mitogen-activated protein kinases to determine their effects on in vivo leukocyte migration patterns and discuss how they modify the characteristics of the cell migration process. In particular, we find that their respective inhibition leads to decreased and increased levels of persistent motion in leukocytes following wounding. This example shows the high level of information content, which can be gained from live-imaging data if appropriate statistical tools are used.


Assuntos
Movimento Celular/imunologia , Proteínas Quinases JNK Ativadas por Mitógeno/imunologia , Leucócitos/imunologia , Proteínas de Peixe-Zebra/imunologia , Peixe-Zebra/imunologia , Proteínas Quinases p38 Ativadas por Mitógeno/imunologia , Animais , Animais Geneticamente Modificados , Movimento Celular/efeitos dos fármacos , Crioprotetores/farmacologia , Dimetil Sulfóxido/farmacologia , Proteínas Quinases JNK Ativadas por Mitógeno/antagonistas & inibidores , Proteínas Quinases JNK Ativadas por Mitógeno/genética , Leucócitos/citologia , Inibidores de Proteínas Quinases/farmacologia , Ferimentos e Lesões/genética , Ferimentos e Lesões/imunologia , Peixe-Zebra/genética , Proteínas de Peixe-Zebra/antagonistas & inibidores , Proteínas de Peixe-Zebra/genética , Proteínas Quinases p38 Ativadas por Mitógeno/antagonistas & inibidores , Proteínas Quinases p38 Ativadas por Mitógeno/genética
19.
Lancet Public Health ; 8(7): e535-e545, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37393092

RESUMO

BACKGROUND: To inform targeted public health strategies, it is crucial to understand how coexisting diseases develop over time and their associated impacts on patient outcomes and health-care resources. This study aimed to examine how psychosis, diabetes, and congestive heart failure, in a cluster of physical-mental health multimorbidity, develop and coexist over time, and to assess the associated effects of different temporal sequences of these diseases on life expectancy in Wales. METHODS: In this retrospective cohort study, we used population-scale, individual-level, anonymised, linked, demographic, administrative, and electronic health record data from the Wales Multimorbidity e-Cohort. We included data on all individuals aged 25 years and older who were living in Wales on Jan 1, 2000 (the start of follow-up), with follow-up continuing until Dec 31, 2019, first break in Welsh residency, or death. Multistate models were applied to these data to model trajectories of disease in multimorbidity and their associated effect on all-cause mortality, accounting for competing risks. Life expectancy was calculated as the restricted mean survival time (bound by the maximum follow-up of 20 years) for each of the transitions from the health states to death. Cox regression models were used to estimate baseline hazards for transitions between health states, adjusted for sex, age, and area-level deprivation (Welsh Index of Multiple Deprivation [WIMD] quintile). FINDINGS: Our analyses included data for 1 675 585 individuals (811 393 [48·4%] men and 864 192 [51·6%] women) with a median age of 51·0 years (IQR 37·0-65·0) at cohort entry. The order of disease acquisition in cases of multimorbidity had an important and complex association with patient life expectancy. Individuals who developed diabetes, psychosis, and congestive heart failure, in that order (DPC), had reduced life expectancy compared with people who developed the same three conditions in a different order: for a 50-year-old man in the third quintile of the WIMD (on which we based our main analyses to allow comparability), DPC was associated with a loss in life expectancy of 13·23 years (SD 0·80) compared with the general otherwise healthy or otherwise diseased population. Congestive heart failure as a single condition was associated with mean a loss in life expectancy of 12·38 years (0·00), and with a loss of 12·95 years (0·06) when preceded by psychosis and 13·45 years (0·13) when followed by psychosis. Findings were robust in people of older ages, more deprived populations, and women, except that the trajectory of psychosis, congestive heart failure, and diabetes was associated with higher mortality in women than men. Within 5 years of an initial diagnosis of diabetes, the risk of developing psychosis or congestive heart failure, or both, was increased. INTERPRETATION: The order in which individuals develop psychosis, diabetes, and congestive heart failure as combinations of conditions can substantially affect life expectancy. Multistate models offer a flexible framework to assess temporal sequences of diseases and allow identification of periods of increased risk of developing subsequent conditions and death. FUNDING: Health Data Research UK.


Assuntos
Diabetes Mellitus , Insuficiência Cardíaca , Transtornos Psicóticos , Masculino , Humanos , Feminino , Adulto , Pessoa de Meia-Idade , Idoso , Web Semântica , Multimorbidade , Estudos Retrospectivos , País de Gales/epidemiologia , Diabetes Mellitus/epidemiologia , Insuficiência Cardíaca/epidemiologia , Transtornos Psicóticos/epidemiologia , Expectativa de Vida
20.
Ann Appl Stat ; 16(4)2022 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-36507469

RESUMO

Understanding sub-cellular protein localisation is an essential component in the analysis of context specific protein function. Recent advances in quantitative mass-spectrometry (MS) have led to high resolution mapping of thousands of proteins to sub-cellular locations within the cell. Novel modelling considerations to capture the complex nature of these data are thus necessary. We approach analysis of spatial proteomics data in a non-parametric Bayesian framework, using K-component mixtures of Gaussian process regression models. The Gaussian process regression model accounts for correlation structure within a sub-cellular niche, with each mixture component capturing the distinct correlation structure observed within each niche. The availability of marker proteins (i.e. proteins with a priori known labelled locations) motivates a semi-supervised learning approach to inform the Gaussian process hyperparameters. We moreover provide an efficient Hamiltonian-within-Gibbs sampler for our model. Furthermore, we reduce the computational burden associated with inversion of covariance matrices by exploiting the structure in the covariance matrix. A tensor decomposition of our covariance matrices allows extended Trench and Durbin algorithms to be applied to reduce the computational complexity of inversion and hence accelerate computation. We provide detailed case-studies on Drosophila embryos and mouse pluripotent embryonic stem cells to illustrate the benefit of semi-supervised functional Bayesian modelling of the data.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa