Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 46
Filtrar
1.
Eur Rev Med Pharmacol Sci ; 28(11): 3699, 2024 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38884518

RESUMEN

The article "Correlation between COVID-19 and air pollution: the effects of PM2.5 and PM10 on COVID-19 outcomes", by E. Kalluçi, E. Noka, K. Bani, X. Dhamo, I. Alimehmeti, K. Dhuli, G. Madeo, C. Micheletti, G. Bonetti, C. Zuccato, E. Borghetti, G. Marceddu, M. Bertelli, published in Eur Rev Med Pharmacol Sci 2023; 27 (6 Suppl): 39-47-DOI: 10.26355/eurrev_202312_34688-PMID: 38112947 has been retracted by the Editor in Chief. Following concerns raised on PubPeer, the Editor in Chief has initiated an investigation to evaluate the validity of the results. Despite the authors' prompt responses to the identified issues, the Editor in Chief has decided to withdraw the article due to significant errors in the text and final statements, as well as undisclosed conflicts of interest. The Publisher apologizes if these concerns have not been detected during the review process. The authors have been informed about the retraction. This article has been retracted. The Publisher apologizes for any inconvenience this may cause. https://www.europeanreview.org/article/34688.

2.
Sci Rep ; 14(1): 9516, 2024 04 25.
Artículo en Inglés | MEDLINE | ID: mdl-38664448

RESUMEN

Recent technologies such as spatial transcriptomics, enable the measurement of gene expressions at the single-cell level along with the spatial locations of these cells in the tissue. Spatial clustering of the cells provides valuable insights into the understanding of the functional organization of the tissue. However, most such clustering methods involve some dimension reduction that leads to a loss of the inherent dependency structure among genes at any spatial location in the tissue. This destroys valuable insights of gene co-expression patterns apart from possibly impacting spatial clustering performance. In spatial transcriptomics, the matrix-variate gene expression data, along with spatial coordinates of the single cells, provides information on both gene expression dependencies and cell spatial dependencies through its row and column covariances. In this work, we propose a joint Bayesian approach to simultaneously estimate these gene and spatial cell correlations. These estimates provide data summaries for downstream analyses. We illustrate our method with simulations and analysis of several real spatial transcriptomic datasets. Our work elucidates gene co-expression networks as well as clear spatial clustering patterns of the cells. Furthermore, our analysis reveals that downstream spatial-differential analysis may aid in the discovery of unknown cell types from known marker genes.


Asunto(s)
Teorema de Bayes , Perfilación de la Expresión Génica , Transcriptoma , Perfilación de la Expresión Génica/métodos , Análisis por Conglomerados , Humanos , Análisis de la Célula Individual/métodos , Redes Reguladoras de Genes , Algoritmos , Simulación por Computador
3.
Biometrics ; 80(1)2024 Jan 29.
Artículo en Inglés | MEDLINE | ID: mdl-38364805

RESUMEN

Survival models are used to analyze time-to-event data in a variety of disciplines. Proportional hazard models provide interpretable parameter estimates, but proportional hazard assumptions are not always appropriate. Non-parametric models are more flexible but often lack a clear inferential framework. We propose a Bayesian treed hazards partition model that is both flexible and inferential. Inference is obtained through the posterior tree structure and flexibility is preserved by modeling the log-hazard function in each partition using a latent Gaussian process. An efficient reversible jump Markov chain Monte Carlo algorithm is accomplished by marginalizing the parameters in each partition element via a Laplace approximation. Consistency properties for the estimator are established. The method can be used to help determine subgroups as well as prognostic and/or predictive biomarkers in time-to-event data. The method is compared with some existing methods on simulated data and a liver cirrhosis dataset.


Asunto(s)
Algoritmos , Modelos de Riesgos Proporcionales , Teorema de Bayes , Cadenas de Markov , Método de Montecarlo
4.
Eur Rev Med Pharmacol Sci ; 27(6 Suppl): 39-47, 2023 12.
Artículo en Inglés | MEDLINE | ID: mdl-38112947

RESUMEN

OBJECTIVE: Given its effects on long-term illnesses, like heart problems and diabetes, air pollution may be among the reasons that led COVID-19 to get worse and kill a larger number of people. Experiments have shown that breathing in polluted air weakens the immune system, making it easier for viruses to enter the body and grow. Viruses may be able to survive in the air by interacting in complex ways with particles and gases. These interactions depend on the air's chemical makeup, the particles' electric charges, and environmental conditions like humidity, UV light, and temperature. Moreover, exposure to UV rays and air pollution may reduce the organism's production of antimicrobial molecules, thus supporting viral infections. More epidemiological studies are needed to determine what effects air pollution has on COVID-19. In this review, we will discuss how air pollutants such as PM2.5 and PM10 contribute to the transmission of COVID-19. MATERIALS AND METHODS: We have used nine target cities in the Tuscany region to verify this certainty, and in all these cases, the air pollution factors were found to be strongly correlated with COVID-19 cases. For each city, we applied a multivariate analysis and found an appropriate model that better fits the data. RESULTS: This review underlines that both short-term and long-term exposure to air pollution may be crucial exasperating factors for SARS-CoV-2 transmission and COVID-19 severity and lethality. The statistical analysis concludes that air pollution should be accounted for as a possible risk factor in future COVID-19 investigations, and it should be avoided as much as possible by the general population. CONCLUSIONS: Our research highlighted the correlation between COVID-19 and air pollution. Reducing air pollution exposure should be one of the first measures against COVID-19 spread.


Asunto(s)
Contaminantes Atmosféricos , Contaminación del Aire , COVID-19 , Humanos , SARS-CoV-2 , Material Particulado/efectos adversos , Material Particulado/análisis , Contaminación del Aire/efectos adversos , Contaminantes Atmosféricos/efectos adversos , Contaminantes Atmosféricos/análisis , Exposición a Riesgos Ambientales/efectos adversos
5.
Clin Ter ; 174(Suppl 2(6)): 263-278, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37994774

RESUMEN

Background: Infectious diseases are disorders caused by microorganisms such as bacteria, viruses, fungi, or parasites. Many organisms live in and on our bodies. They are normally harmless or even helpful. However under certain conditions, some organisms may cause disease. Infectious diseases are also called contagious diseases due to the fact that they can be passed from person to person. Some are transmitted by insects or other animals. COVID-19 is an infectious disease that has "pervaded" the whole world during the last three years. The World Health Organization (WHO) has declared COVID-19 a Public Health Emergency of International Concern. Methods: In this paper, we will study the outbreak of this pandemic in Albania based on some mathematical models, such as SIR, SIRD, and SEIRD. We will present a detailed analysis of these models and also demonstrate how they can be used to predict the spread of infectious diseases. More precisely, we will see the spread of COVID-19 in our country, Albania. Software such as MATLAB and RStudio will be used to do this. The data that we will use when working with these programs is taken from the Institute of Public Health, Tirana, Albania. Results: We've developed an application utilizing actual data to estimate SEIRD model parameters. It's able to compute the basic reproduction number and, more significantly, provides forecasts on the disease's progression. Conclusions: Our aim is to calculate the Basic Reproduction Number, using the Next Generation Matrix, and use it to see the future of the disease. This is the average number of new infections generated by an infected individual. A large value indicates that the infection is transmitted very quickly. We will try to calculate what the values of Basic Number Reproduction have been over different time periods.


Asunto(s)
COVID-19 , Enfermedades Transmisibles , Humanos , COVID-19/epidemiología , Número Básico de Reproducción , Brotes de Enfermedades , Albania
6.
Genet Epidemiol ; 47(1): 95-104, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36378773

RESUMEN

The clustering of proteins is of interest in cancer cell biology. This article proposes a hierarchical Bayesian model for protein (variable) clustering hinging on correlation structure. Starting from a multivariate normal likelihood, we enforce the clustering through prior modeling using angle-based unconstrained reparameterization of correlations and assume a truncated Poisson distribution (to penalize a large number of clusters) as prior on the number of clusters. The posterior distributions of the parameters are not in explicit form and we use a reversible jump Markov chain Monte Carlo based technique is used to simulate the parameters from the posteriors. The end products of the proposed method are estimated cluster configuration of the proteins (variables) along with the number of clusters. The Bayesian method is flexible enough to cluster the proteins as well as estimate the number of clusters. The performance of the proposed method has been substantiated with extensive simulation studies and one protein expression data with a hereditary disposition in breast cancer where the proteins are coming from different pathways.


Asunto(s)
Neoplasias de la Mama , Humanos , Femenino , Teorema de Bayes , Neoplasias de la Mama/genética , Modelos Genéticos , Análisis por Conglomerados , Cadenas de Markov , Método de Montecarlo
7.
J Am Stat Assoc ; 116(535): 1075-1087, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34898760

RESUMEN

Estimating the marginal and joint densities of the long-term average intakes of different dietary components is an important problem in nutritional epidemiology. Since these variables cannot be directly measured, data are usually collected in the form of 24-hour recalls of the intakes, which show marked patterns of conditional heteroscedasticity. Significantly compounding the challenges, the recalls for episodically consumed dietary components also include exact zeros. The problem of estimating the density of the latent long-time intakes from their observed measurement error contaminated proxies is then a problem of deconvolution of densities with zero-inflated data. We propose a Bayesian semiparametric solution to the problem, building on a novel hierarchical latent variable framework that translates the problem to one involving continuous surrogates only. Crucial to accommodating important aspects of the problem, we then design a copula based approach to model the involved joint distributions, adopting different modeling strategies for the marginals of the different dietary components. We design efficient Markov chain Monte Carlo algorithms for posterior inference and illustrate the efficacy of the proposed method through simulation experiments. Applied to our motivating nutritional epidemiology problems, compared to other approaches, our method provides more realistic estimates of the consumption patterns of episodically consumed dietary components.

8.
Bernoulli (Andover) ; 27(1): 637-672, 2021 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-34305432

RESUMEN

Gaussian graphical models are a popular tool to learn the dependence structure in the form of a graph among variables of interest. Bayesian methods have gained in popularity in the last two decades due to their ability to simultaneously learn the covariance and the graph. There is a wide variety of model-based methods to learn the underlying graph assuming various forms of the graphical structure. Although for scalability of the Markov chain Monte Carlo algorithms, decomposability is commonly imposed on the graph space, its possible implication on the posterior distribution of the graph is not clear. An open problem in Bayesian decomposable structure learning is whether the posterior distribution is able to select a meaningful decomposable graph that is "close" to the true non-decomposable graph, when the dimension of the variables increases with the sample size. In this article, we explore specific conditions on the true precision matrix and the graph, which results in an affirmative answer to this question with a commonly used hyper-inverse Wishart prior on the covariance matrix and a suitable complexity prior on the graph space. In absence of structural sparsity assumptions, our strong selection consistency holds in a high-dimensional setting where p = O(nα ) for α < 1/3. We show when the true graph is non-decomposable, the posterior distribution concentrates on a set of graphs that are minimal triangulations of the true graph.

9.
Adv Exp Med Biol ; 1332: 211-227, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34251646

RESUMEN

Measuring usual dietary intake in freely living humans is difficult to accomplish. As a part of our recent study, a food frequency questionnaire was completed by healthy adult men and women at days 0 and 90 of the study. Data from the food questionnaire were analyzed with a nutrient analysis program ( www.Harvardsffq.date ). Healthy men and women consumed protein as 19-20% and 17-19% of their total energy intakes, respectively, with animal protein representing about 75 and 70% of their total protein intakes, respectively. The intake of each nutritionally essential amino acid (EAA) by the persons exceeded that recommended for healthy adults with a minimal physical activity. In all individuals, the dietary intake of leucine was the highest, followed by lysine, valine, and isoleucine in descending order, and the ingestion of amino acids that are synthesizable de novo in animal cells (AASAs) was about 20% greater than that of total EAAs. The intake of each AASA met those recommended for healthy adults with a minimal physical activity. Intakes of some AASAs (alanine, arginine, aspartate, glutamate, and glycine) from a typical diet providing 90-110 g food protein/day does not meet the requirements of adults with an intensive physical activity. Within the male or female group, there were not significant differences in the dietary intakes of all amino acids between days 0 and 90 of the study, and this was also true for nearly all other essential nutrients. Our findings will help to improve amino acid nutrition and health in both the general population and exercising individuals.


Asunto(s)
Aminoácidos , Dieta , Adulto , Ingestión de Alimentos , Ingestión de Energía , Femenino , Humanos , Masculino , Nutrientes
10.
Chemometr Intell Lab Syst ; 2122021 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-35068632

RESUMEN

BACKGROUND: The endogenous circadian clock, which controls daily rhythms in the expression of at least half of the mammalian genome, has a major influence on cell physiology. Consequently, disruption of the circadian system is associated with wide range of diseases including cancer. While several circadian clock genes have been associated with cancer progression, little is known about the survival when two or more platforms are considered together. Our goal was to determine if survival outcomes are associated with circadian clock function. To accomplish this goal, we developed a Bayesian hierarchical survival model coupled with the global local shrinkage prior and applied this model to available RNASeq and Copy Number Variation data to select significant circadian genes associates with cancer progression. RESULTS: Using a Bayesian shrinkage approach with the Bayesian accelerated failure time (AFT) model we showed the circadian clock associated gene DEC1 is positively correlated to survival outcome in breast cancer patients. The R package circgene implementing the methodology is available at https://github.com/MAITYA02/circgene. CONCLUSIONS: The proposed Bayesian hierarchical model is the first shrinkage prior based model in its kind which integrates two omics platforms to identify the significant circadian gene for cancer survival.

11.
FEBS J ; 288(4): 1305-1324, 2021 02.
Artículo en Inglés | MEDLINE | ID: mdl-32649051

RESUMEN

Ribosome hibernation is a prominent cellular strategy to modulate protein synthesis during starvation and the stationary phase of bacterial cell growth. Translational suppression involves the formation of either factor-bound inactive 70S monomers or dimeric 100S hibernating ribosomal complexes, the biological significance of which is poorly understood. Here, we demonstrate that the Escherichia coli 70S ribosome associated with stationary phase factors hibernation promoting factor or protein Y or ribosome-associated inhibitor A and the 100S ribosome isolated from both Gram-negative and Gram-positive bacteria are resistant to unfolded protein-mediated subunit dissociation and subsequent degradation by cellular ribonucleases. Considering that the increase in cellular stress is accompanied by accumulation of unfolded proteins, such resistance of hibernating ribosomes towards dissociation might contribute to their maintenance during the stationary phase. Analysis of existing structures provided clues on the mechanism of inhibition of the unfolded protein-mediated disassembly in case of hibernating factor-bound ribosome. Further, the factor-bound 70S and 100S ribosomes can suppress protein aggregation and assist in protein folding. The chaperoning activity of these ribosomes is the first evidence of a potential biological activity of the hibernating ribosome that might be crucial for cell survival under stress conditions.


Asunto(s)
Proteínas Bacterianas/metabolismo , Biosíntesis de Proteínas , Proteínas Ribosómicas/metabolismo , Ribosomas/metabolismo , Proteínas Bacterianas/química , Proteínas Bacterianas/genética , Sitios de Unión , Escherichia coli/genética , Escherichia coli/metabolismo , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Guanosina Trifosfato/metabolismo , Modelos Moleculares , Unión Proteica , Dominios Proteicos , Pliegue de Proteína , Subunidades de Proteína/genética , Subunidades de Proteína/metabolismo , Proteínas Ribosómicas/química , Proteínas Ribosómicas/genética , Ribosomas/química , Staphylococcus aureus/genética , Staphylococcus aureus/metabolismo
12.
Biometrika ; 107(1): 205-221, 2020 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-33100350

RESUMEN

We develop a Bayesian methodology aimed at simultaneously estimating low-rank and row-sparse matrices in a high-dimensional multiple-response linear regression model. We consider a carefully devised shrinkage prior on the matrix of regression coefficients which obviates the need to specify a prior on the rank, and shrinks the regression matrix towards low-rank and row-sparse structures. We provide theoretical support to the proposed methodology by proving minimax optimality of the posterior mean under the prediction risk in ultra-high dimensional settings where the number of predictors can grow sub-exponentially relative to the sample size. A one-step post-processing scheme induced by group lasso penalties on the rows of the estimated coefficient matrix is proposed for variable selection, with default choices of tuning parameters. We additionally provide an estimate of the rank using a novel optimization function achieving dimension reduction in the covariate space. We exhibit the performance of the proposed methodology in an extensive simulation study and a real data example.

13.
PLoS One ; 15(10): e0238996, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33095785

RESUMEN

Recent developments in high-throughput methods have resulted in the collection of high-dimensional data types from multiple sources and technologies that measure distinct yet complementary information. Integrated clustering of such multiple data types or multi-view clustering is critical for revealing pathological insights. However, multi-view clustering is challenging due to the complex dependence structure between multiple data types, including directional dependency. Specifically, genomics data types have pre-specified directional dependencies known as the central dogma that describes the process of information flow from DNA to messenger RNA (mRNA) and then from mRNA to protein. Most of the existing multi-view clustering approaches assume an independent structure or pair-wise (non-directional) dependence between data types, thereby ignoring their directional relationship. Motivated by this, we propose a biology-inspired Bayesian integrated multi-view clustering model that uses an asymmetric copula to accommodate the directional dependencies between the data types. Via extensive simulation experiments, we demonstrate the negative impact of ignoring directional dependency on clustering performance. We also present an application of our model to a real-world dataset of breast cancer tumor samples collected from The Cancer Genome Altas program and provide comparative results.


Asunto(s)
Genómica/métodos , Modelos Estadísticos , Teorema de Bayes , Neoplasias de la Mama/genética , Análisis por Conglomerados , Simulación por Computador , Interpretación Estadística de Datos , Bases de Datos Genéticas/estadística & datos numéricos , Femenino , Genómica/estadística & datos numéricos , Humanos , Cadenas de Markov , Distribución Normal
14.
ACS Omega ; 5(26): 16128-16138, 2020 Jul 07.
Artículo en Inglés | MEDLINE | ID: mdl-32656435

RESUMEN

Pathogenesis of Alzheimer's disease (AD), the most common type of dementia, involves misfolding and aggregation of the extracellular amyloid-ß (Aß) protein where the intermediate oligomers, formed during the aggregation progression cascade, are considered the prime toxic species. Here, we identify an active peptide fragment from a medicinal plant-derived (Aristolochia indica) fibrinolytic enzyme having anti-amyloidogenic effects against Aß fibrillation and toxicity. Liquid chromatography with tandem mass spectrometry (LC-MS/MS), followed by computational analysis of the peptide pool generated by proteolytic digestion of the enzyme, identifies two peptide sequences with predictive high-propensity binding to Aß42. Microscopic visualizations in conjunction with biochemical and biophysical assessments suggest that the synthetic version of one of the peptides (termed here Pactive, GFLLHQK) arrests Aß molecules in off-pathway oligomers that can no longer participate in the cytotoxic fibrillation pathway. In contrast, the other peptide (termed P1) aggravates the fibrillation process. Further investigations confirm the strong binding affinity of Pactive with both Aß42 monomers and toxic oligomers by biolayer interferometric assays. We have also shown that, mechanistically, Pactive binding induces conformational alterations in the Aß molecule along with modification of Aß hydrophobicity, one of the key players in aggregation. Importantly, the biostability of Pactive in human blood serum and its nontoxic nature make it a promising therapeutic candidate against Alzheimer's, for which no disease-modifying treatments are available to date.

15.
Bioinformatics ; 36(13): 3951-3958, 2020 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-32369552

RESUMEN

MOTIVATION: It is well known that the integration among different data-sources is reliable because of its potential of unveiling new functionalities of the genomic expressions, which might be dormant in a single-source analysis. Moreover, different studies have justified the more powerful analyses of multi-platform data. Toward this, in this study, we consider the circadian genes' omics profile, such as copy number changes and RNA-sequence data along with their survival response. We develop a Bayesian structural equation modeling coupled with linear regressions and log normal accelerated failure-time regression to integrate the information between these two platforms to predict the survival of the subjects. We place conjugate priors on the regression parameters and derive the Gibbs sampler using the conditional distributions of them. RESULTS: Our extensive simulation study shows that the integrative model provides a better fit to the data than its closest competitor. The analyses of glioblastoma cancer data and the breast cancer data from TCGA, the largest genomics and transcriptomics database, support our findings. AVAILABILITY AND IMPLEMENTATION: The developed method is wrapped in R package available at https://github.com/MAITYA02/semmcmc. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Genoma , Genómica , Teorema de Bayes , Biología Computacional , Humanos , Análisis de Clases Latentes , Programas Informáticos
16.
J Mach Learn Res ; 21(79): 1-47, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-34305477

RESUMEN

Graphical models are ubiquitous tools to describe the interdependence between variables measured simultaneously such as large-scale gene or protein expression data. Gaussian graphical models (GGMs) are well-established tools for probabilistic exploration of dependence structures using precision matrices and they are generated under a multivariate normal joint distribution. However, they suffer from several shortcomings since they are based on Gaussian distribution assumptions. In this article, we propose a Bayesian quantile based approach for sparse estimation of graphs. We demonstrate that the resulting graph estimation is robust to outliers and applicable under general distributional assumptions. Furthermore, we develop efficient variational Bayes approximations to scale the methods for large data sets. Our methods are applied to a novel cancer proteomics data dataset where-in multiple proteomic antibodies are simultaneously assessed on tumor samples using reverse-phase protein arrays (RPPA) technology.

17.
Biometrics ; 76(1): 316-325, 2020 03.
Artículo en Inglés | MEDLINE | ID: mdl-31393003

RESUMEN

Accurate prognostic prediction using molecular information is a challenging area of research, which is essential to develop precision medicine. In this paper, we develop translational models to identify major actionable proteins that are associated with clinical outcomes, like the survival time of patients. There are considerable statistical and computational challenges due to the large dimension of the problems. Furthermore, data are available for different tumor types; hence data integration for various tumors is desirable. Having censored survival outcomes escalates one more level of complexity in the inferential procedure. We develop Bayesian hierarchical survival models, which accommodate all the challenges mentioned here. We use the hierarchical Bayesian accelerated failure time model for survival regression. Furthermore, we assume sparse horseshoe prior distribution for the regression coefficients to identify the major proteomic drivers. We borrow strength across tumor groups by introducing a correlation structure among the prior distributions. The proposed methods have been used to analyze data from the recently curated "The Cancer Proteome Atlas" (TCPA), which contains reverse-phase protein arrays-based high-quality protein expression data as well as detailed clinical annotation, including survival times. Our simulation and the TCPA data analysis illustrate the efficacy of the proposed integrative model, which links different tumors with the correlated prior structures.


Asunto(s)
Biometría/métodos , Neoplasias/metabolismo , Neoplasias/mortalidad , Proteoma/metabolismo , Proteómica/estadística & datos numéricos , Teorema de Bayes , Simulación por Computador , Interpretación Estadística de Datos , Humanos , Neoplasias Renales/metabolismo , Neoplasias Renales/mortalidad , Cadenas de Markov , Modelos Estadísticos , Método de Montecarlo , Pronóstico , Análisis por Matrices de Proteínas/estadística & datos numéricos , Análisis de Supervivencia
18.
Cancer Inform ; 18: 1176935119871933, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31488946

RESUMEN

Long non-coding RNAs (lncRNAs) are a large and diverse class of transcribed RNAs, which have been shown to play a significant role in developing cancer. In this study, we apply integrative modeling framework to integrate the DNA copy number variation (CNV), lncRNA expression, and downstream target protein expression to predict patient survival in breast cancer. We develop a 3-stage model combining a mechanical model (lncRNA regressed on CNV and target proteins regressed on lncRNA) and a clinical model (survival regressed on estimated effects from the mechanical models). Using lncRNAs (such as HOTAIR and MALAT1) along with their CNV, target protein expressions, and survival outcomes from The Cancer Genome Atlas (TCGA) database, we show that predicted mean square error and integrated Brier score (IBS) are both lower for the proposed 3-step integrated model than that of 2-step model. Therefore, the integrative model has better predictive ability than the 2-step model not considering target protein information.

19.
Bayesian Anal ; 14(2): 449-476, 2019 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-33123305

RESUMEN

There has been an intense development in the Bayesian graphical model literature over the past decade; however, most of the existing methods are restricted to moderate dimensions. We propose a novel graphical model selection approach for large dimensional settings where the dimension increases with the sample size, by decoupling model fitting and covariance selection. First, a full model based on a complete graph is fit under a novel class of mixtures of inverse-Wishart priors, which induce shrinkage on the precision matrix under an equivalence with Cholesky-based regularization, while enabling conjugate updates. Subsequently, a post-fitting model selection step uses penalized joint credible regions to perform model selection. This allows our methods to be computationally feasible for large dimensional settings using a combination of straightforward Gibbs samplers and efficient post-fitting inferences. Theoretical guarantees in terms of selection consistency are also established. Simulations show that the proposed approach compares favorably with competing methods, both in terms of accuracy metrics and computation times. We apply this approach to a cancer genomics data example.

20.
J R Stat Soc Ser C Appl Stat ; 68(5): 1577-1595, 2019 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-33311813

RESUMEN

We consider the problem where the data consist of a survival time and a binary outcome measurement for each individual, as well as corresponding predictors. The goal is to select the common set of predictors which affect both the responses, and not just only one of them. In addition, we develop a survival prediction model based on data integration. This article is motivated by the Cancer Genomic Atlas (TCGA) databank, which is currently the largest genomics and transcriptomics database. The data contain cancer survival information along with cancer stages for each patient. Furthermore, it contains Reverse-phase Protein Array (RPPA) measurements for each individual, which are the predictors associated with these responses. The biological motivation is to identify the major actionable proteins associated with both survival outcomes and cancer stages. We develop a Bayesian hierarchical model to jointly model the survival time and the classification of the cancer stages. Moreover, to deal with the high dimensionality of the RPPA measurements, we use a shrinkage prior to identify significant proteins. Simulations and TCGA data analysis show that the joint integrated modeling approach improves survival prediction.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...