Pesquisa | BVS - MINISTÉRIO DA SAÚDE

1.

Retraction Note: Correlation between COVID-19 and air pollution: the effects of PM2.5 and PM10 on COVID-19 outcomes.

Kalluçi, E; Noka, E; Bani, K; Dhamo, X; Alimehmeti, I; Dhuli, K; Madeo, G; Micheletti, C; Bonetti, G; Zuccato, C; Borghetti, E; Marceddu, G; Bertelli, M.

Eur Rev Med Pharmacol Sci ; 28(11): 3699, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38884518

RESUMO

The article "Correlation between COVID-19 and air pollution: the effects of PM2.5 and PM10 on COVID-19 outcomes", by E. Kalluçi, E. Noka, K. Bani, X. Dhamo, I. Alimehmeti, K. Dhuli, G. Madeo, C. Micheletti, G. Bonetti, C. Zuccato, E. Borghetti, G. Marceddu, M. Bertelli, published in Eur Rev Med Pharmacol Sci 2023; 27 (6 Suppl): 39-47-DOI: 10.26355/eurrev_202312_34688-PMID: 38112947 has been retracted by the Editor in Chief. Following concerns raised on PubPeer, the Editor in Chief has initiated an investigation to evaluate the validity of the results. Despite the authors' prompt responses to the identified issues, the Editor in Chief has decided to withdraw the article due to significant errors in the text and final statements, as well as undisclosed conflicts of interest. The Publisher apologizes if these concerns have not been detected during the review process. The authors have been informed about the retraction. This article has been retracted. The Publisher apologizes for any inconvenience this may cause. https://www.europeanreview.org/article/34688.

2.

Joint Bayesian estimation of cell dependence and gene associations in spatially resolved transcriptomic data.

Chakrabarti, Arhit; Ni, Yang; Mallick, Bani K.

Sci Rep ; 14(1): 9516, 2024 04 25.

Artigo em Inglês | MEDLINE | ID: mdl-38664448

RESUMO

Recent technologies such as spatial transcriptomics, enable the measurement of gene expressions at the single-cell level along with the spatial locations of these cells in the tissue. Spatial clustering of the cells provides valuable insights into the understanding of the functional organization of the tissue. However, most such clustering methods involve some dimension reduction that leads to a loss of the inherent dependency structure among genes at any spatial location in the tissue. This destroys valuable insights of gene co-expression patterns apart from possibly impacting spatial clustering performance. In spatial transcriptomics, the matrix-variate gene expression data, along with spatial coordinates of the single cells, provides information on both gene expression dependencies and cell spatial dependencies through its row and column covariances. In this work, we propose a joint Bayesian approach to simultaneously estimate these gene and spatial cell correlations. These estimates provide data summaries for downstream analyses. We illustrate our method with simulations and analysis of several real spatial transcriptomic datasets. Our work elucidates gene co-expression networks as well as clear spatial clustering patterns of the cells. Furthermore, our analysis reveals that downstream spatial-differential analysis may aid in the discovery of unknown cell types from known marker genes.

Assuntos

Teorema de Bayes , Perfilação da Expressão Gênica , Transcriptoma , Perfilação da Expressão Gênica/métodos , Análise por Conglomerados , Humanos , Análise de Célula Única/métodos , Redes Reguladoras de Genes , Algoritmos , Simulação por Computador

3.

A Bayesian survival treed hazards model using latent Gaussian processes.

Payne, Richard D; Guha, Nilabja; Mallick, Bani K.

Biometrics ; 80(1)2024 Jan 29.

Artigo em Inglês | MEDLINE | ID: mdl-38364805

RESUMO

Survival models are used to analyze time-to-event data in a variety of disciplines. Proportional hazard models provide interpretable parameter estimates, but proportional hazard assumptions are not always appropriate. Non-parametric models are more flexible but often lack a clear inferential framework. We propose a Bayesian treed hazards partition model that is both flexible and inferential. Inference is obtained through the posterior tree structure and flexibility is preserved by modeling the log-hazard function in each partition using a latent Gaussian process. An efficient reversible jump Markov chain Monte Carlo algorithm is accomplished by marginalizing the parameters in each partition element via a Laplace approximation. Consistency properties for the estimator are established. The method can be used to help determine subgroups as well as prognostic and/or predictive biomarkers in time-to-event data. The method is compared with some existing methods on simulated data and a liver cirrhosis dataset.

Assuntos

Algoritmos , Modelos de Riscos Proporcionais , Teorema de Bayes , Cadeias de Markov , Método de Monte Carlo

4.

Correlation between COVID-19 and air pollution: the effects of PM2.5 and PM10 on COVID-19 outcomes.

Kalluçi, E; Noka, E; Bani, K; Dhamo, X; Alimehmeti, I; Dhuli, K; Madeo, G; Micheletti, C; Bonetti, G; Zuccato, C; Borghetti, E; Marceddu, G; Bertelli, M.

Eur Rev Med Pharmacol Sci ; 27(6 Suppl): 39-47, 2023 12.

Artigo em Inglês | MEDLINE | ID: mdl-38112947

RESUMO

OBJECTIVE: Given its effects on long-term illnesses, like heart problems and diabetes, air pollution may be among the reasons that led COVID-19 to get worse and kill a larger number of people. Experiments have shown that breathing in polluted air weakens the immune system, making it easier for viruses to enter the body and grow. Viruses may be able to survive in the air by interacting in complex ways with particles and gases. These interactions depend on the air's chemical makeup, the particles' electric charges, and environmental conditions like humidity, UV light, and temperature. Moreover, exposure to UV rays and air pollution may reduce the organism's production of antimicrobial molecules, thus supporting viral infections. More epidemiological studies are needed to determine what effects air pollution has on COVID-19. In this review, we will discuss how air pollutants such as PM2.5 and PM10 contribute to the transmission of COVID-19. MATERIALS AND METHODS: We have used nine target cities in the Tuscany region to verify this certainty, and in all these cases, the air pollution factors were found to be strongly correlated with COVID-19 cases. For each city, we applied a multivariate analysis and found an appropriate model that better fits the data. RESULTS: This review underlines that both short-term and long-term exposure to air pollution may be crucial exasperating factors for SARS-CoV-2 transmission and COVID-19 severity and lethality. The statistical analysis concludes that air pollution should be accounted for as a possible risk factor in future COVID-19 investigations, and it should be avoided as much as possible by the general population. CONCLUSIONS: Our research highlighted the correlation between COVID-19 and air pollution. Reducing air pollution exposure should be one of the first measures against COVID-19 spread.

Assuntos

Poluentes Atmosféricos , Poluição do Ar , COVID-19 , Humanos , SARS-CoV-2 , Material Particulado/efeitos adversos , Material Particulado/análise , Poluição do Ar/efeitos adversos , Poluentes Atmosféricos/efeitos adversos , Poluentes Atmosféricos/análise , Exposição Ambiental/efeitos adversos

5.

The application of next generation matrix in the calculation of basic reproduction number for COVID-19.

Kalluci, E; Noka, E; Gordani, O; Merkaj, X; Macchia, A; Marceddu, G; Bertelli, M; Bani, K.

Clin Ter ; 174(Suppl 2(6)): 263-278, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37994774

RESUMO

Background: Infectious diseases are disorders caused by microorganisms such as bacteria, viruses, fungi, or parasites. Many organisms live in and on our bodies. They are normally harmless or even helpful. However under certain conditions, some organisms may cause disease. Infectious diseases are also called contagious diseases due to the fact that they can be passed from person to person. Some are transmitted by insects or other animals. COVID-19 is an infectious disease that has "pervaded" the whole world during the last three years. The World Health Organization (WHO) has declared COVID-19 a Public Health Emergency of International Concern. Methods: In this paper, we will study the outbreak of this pandemic in Albania based on some mathematical models, such as SIR, SIRD, and SEIRD. We will present a detailed analysis of these models and also demonstrate how they can be used to predict the spread of infectious diseases. More precisely, we will see the spread of COVID-19 in our country, Albania. Software such as MATLAB and RStudio will be used to do this. The data that we will use when working with these programs is taken from the Institute of Public Health, Tirana, Albania. Results: We've developed an application utilizing actual data to estimate SEIRD model parameters. It's able to compute the basic reproduction number and, more significantly, provides forecasts on the disease's progression. Conclusions: Our aim is to calculate the Basic Reproduction Number, using the Next Generation Matrix, and use it to see the future of the disease. This is the average number of new infections generated by an infected individual. A large value indicates that the infection is transmitted very quickly. We will try to calculate what the values of Basic Number Reproduction have been over different time periods.

Assuntos

COVID-19 , Doenças Transmissíveis , Humanos , COVID-19/epidemiologia , Número Básico de Reprodução , Surtos de Doenças , Albânia

6.

Adaptive Bayesian variable clustering via structural learning of breast cancer data.

Ghosh, Riddhi Pratim; Maity, Arnab K; Pourahmadi, Mohsen; Mallick, Bani K.

Genet Epidemiol ; 47(1): 95-104, 2023 02.

Artigo em Inglês | MEDLINE | ID: mdl-36378773

RESUMO

The clustering of proteins is of interest in cancer cell biology. This article proposes a hierarchical Bayesian model for protein (variable) clustering hinging on correlation structure. Starting from a multivariate normal likelihood, we enforce the clustering through prior modeling using angle-based unconstrained reparameterization of correlations and assume a truncated Poisson distribution (to penalize a large number of clusters) as prior on the number of clusters. The posterior distributions of the parameters are not in explicit form and we use a reversible jump Markov chain Monte Carlo based technique is used to simulate the parameters from the posteriors. The end products of the proposed method are estimated cluster configuration of the proteins (variables) along with the number of clusters. The Bayesian method is flexible enough to cluster the proteins as well as estimate the number of clusters. The performance of the proposed method has been substantiated with extensive simulation studies and one protein expression data with a hereditary disposition in breast cancer where the proteins are coming from different pathways.

Assuntos

Neoplasias da Mama , Humanos , Feminino , Teorema de Bayes , Neoplasias da Mama/genética , Modelos Genéticos , Análise por Conglomerados , Cadeias de Markov , Método de Monte Carlo

7.

Bayesian Copula Density Deconvolution for Zero-Inflated Data in Nutritional Epidemiology.

Sarkar, Abhra; Pati, Debdeep; Mallick, Bani K; Carroll, Raymond J.

J Am Stat Assoc ; 116(535): 1075-1087, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34898760

RESUMO

Estimating the marginal and joint densities of the long-term average intakes of different dietary components is an important problem in nutritional epidemiology. Since these variables cannot be directly measured, data are usually collected in the form of 24-hour recalls of the intakes, which show marked patterns of conditional heteroscedasticity. Significantly compounding the challenges, the recalls for episodically consumed dietary components also include exact zeros. The problem of estimating the density of the latent long-time intakes from their observed measurement error contaminated proxies is then a problem of deconvolution of densities with zero-inflated data. We propose a Bayesian semiparametric solution to the problem, building on a novel hierarchical latent variable framework that translates the problem to one involving continuous surrogates only. Crucial to accommodating important aspects of the problem, we then design a copula based approach to model the involved joint distributions, adopting different modeling strategies for the marginals of the different dietary components. We design efficient Markov chain Monte Carlo algorithms for posterior inference and illustrate the efficacy of the proposed method through simulation experiments. Applied to our motivating nutritional epidemiology problems, compared to other approaches, our method provides more realistic estimates of the consumption patterns of episodically consumed dietary components.

8.

Bayesian graph selection consistency under model misspecification.

Niu, Yabo; Pati, Debdeep; Mallick, Bani K.

Bernoulli (Andover) ; 27(1): 637-672, 2021 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-34305432

RESUMO

Gaussian graphical models are a popular tool to learn the dependence structure in the form of a graph among variables of interest. Bayesian methods have gained in popularity in the last two decades due to their ability to simultaneously learn the covariance and the graph. There is a wide variety of model-based methods to learn the underlying graph assuming various forms of the graphical structure. Although for scalability of the Markov chain Monte Carlo algorithms, decomposability is commonly imposed on the graph space, its possible implication on the posterior distribution of the graph is not clear. An open problem in Bayesian decomposable structure learning is whether the posterior distribution is able to select a meaningful decomposable graph that is "close" to the true non-decomposable graph, when the dimension of the variables increases with the sample size. In this article, we explore specific conditions on the true precision matrix and the graph, which results in an affirmative answer to this question with a commonly used hyper-inverse Wishart prior on the covariance matrix and a suitable complexity prior on the graph space. In absence of structural sparsity assumptions, our strong selection consistency holds in a high-dimensional setting where p = O(nα ) for α < 1/3. We show when the true graph is non-decomposable, the posterior distribution concentrates on a set of graphs that are minimal triangulations of the true graph.

9.

Dietary Intakes of Amino Acids and Other Nutrients by Adult Humans.

Sarkar, Tapasree R; McNeal, Catherine J; Meininger, Cynthia J; Niu, Yabo; Mallick, Bani K; Carroll, Raymond J; Wu, Guoyao.

Adv Exp Med Biol ; 1332: 211-227, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34251646

RESUMO

Measuring usual dietary intake in freely living humans is difficult to accomplish. As a part of our recent study, a food frequency questionnaire was completed by healthy adult men and women at days 0 and 90 of the study. Data from the food questionnaire were analyzed with a nutrient analysis program ( www.Harvardsffq.date ). Healthy men and women consumed protein as 19-20% and 17-19% of their total energy intakes, respectively, with animal protein representing about 75 and 70% of their total protein intakes, respectively. The intake of each nutritionally essential amino acid (EAA) by the persons exceeded that recommended for healthy adults with a minimal physical activity. In all individuals, the dietary intake of leucine was the highest, followed by lysine, valine, and isoleucine in descending order, and the ingestion of amino acids that are synthesizable de novo in animal cells (AASAs) was about 20% greater than that of total EAAs. The intake of each AASA met those recommended for healthy adults with a minimal physical activity. Intakes of some AASAs (alanine, arginine, aspartate, glutamate, and glycine) from a typical diet providing 90-110 g food protein/day does not meet the requirements of adults with an intensive physical activity. Within the male or female group, there were not significant differences in the dietary intakes of all amino acids between days 0 and 90 of the study, and this was also true for nearly all other essential nutrients. Our findings will help to improve amino acid nutrition and health in both the general population and exercising individuals.

Assuntos

Aminoácidos , Dieta , Adulto , Ingestão de Alimentos , Ingestão de Energia , Feminino , Humanos , Masculino , Nutrientes

10.

Circadian Gene Selection for Time-to-event Phenotype by Integrating CNV and RNAseq Data.

Maity, Arnab Kumar; Lee, Sang Chan; Hu, Linhan; Bell-Pedersen, Deborah; Mallick, Bani K; Sarkar, Tapasree Roy.

Chemometr Intell Lab Syst ; 2122021 May 15.

Artigo em Inglês | MEDLINE | ID: mdl-35068632

RESUMO

BACKGROUND: The endogenous circadian clock, which controls daily rhythms in the expression of at least half of the mammalian genome, has a major influence on cell physiology. Consequently, disruption of the circadian system is associated with wide range of diseases including cancer. While several circadian clock genes have been associated with cancer progression, little is known about the survival when two or more platforms are considered together. Our goal was to determine if survival outcomes are associated with circadian clock function. To accomplish this goal, we developed a Bayesian hierarchical survival model coupled with the global local shrinkage prior and applied this model to available RNASeq and Copy Number Variation data to select significant circadian genes associates with cancer progression. RESULTS: Using a Bayesian shrinkage approach with the Bayesian accelerated failure time (AFT) model we showed the circadian clock associated gene DEC1 is positively correlated to survival outcome in breast cancer patients. The R package circgene implementing the methodology is available at https://github.com/MAITYA02/circgene. CONCLUSIONS: The proposed Bayesian hierarchical model is the first shrinkage prior based model in its kind which integrates two omics platforms to identify the significant circadian gene for cancer survival.

11.

Hibernating ribosomes exhibit chaperoning activity but can resist unfolded protein-mediated subunit dissociation.

Ferdosh, Sehnaz; Banerjee, Senjuti; Pathak, Bani K; Sengupta, Jayati; Barat, Chandana.

FEBS J ; 288(4): 1305-1324, 2021 02.

Artigo em Inglês | MEDLINE | ID: mdl-32649051

RESUMO

Ribosome hibernation is a prominent cellular strategy to modulate protein synthesis during starvation and the stationary phase of bacterial cell growth. Translational suppression involves the formation of either factor-bound inactive 70S monomers or dimeric 100S hibernating ribosomal complexes, the biological significance of which is poorly understood. Here, we demonstrate that the Escherichia coli 70S ribosome associated with stationary phase factors hibernation promoting factor or protein Y or ribosome-associated inhibitor A and the 100S ribosome isolated from both Gram-negative and Gram-positive bacteria are resistant to unfolded protein-mediated subunit dissociation and subsequent degradation by cellular ribonucleases. Considering that the increase in cellular stress is accompanied by accumulation of unfolded proteins, such resistance of hibernating ribosomes towards dissociation might contribute to their maintenance during the stationary phase. Analysis of existing structures provided clues on the mechanism of inhibition of the unfolded protein-mediated disassembly in case of hibernating factor-bound ribosome. Further, the factor-bound 70S and 100S ribosomes can suppress protein aggregation and assist in protein folding. The chaperoning activity of these ribosomes is the first evidence of a potential biological activity of the hibernating ribosome that might be crucial for cell survival under stress conditions.

Assuntos

Proteínas de Bactérias/metabolismo , Biossíntese de Proteínas , Proteínas Ribossômicas/metabolismo , Ribossomos/metabolismo , Proteínas de Bactérias/química , Proteínas de Bactérias/genética , Sítios de Ligação , Escherichia coli/genética , Escherichia coli/metabolismo , Proteínas de Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Guanosina Trifosfato/metabolismo , Modelos Moleculares , Ligação Proteica , Domínios Proteicos , Dobramento de Proteína , Subunidades Proteicas/genética , Subunidades Proteicas/metabolismo , Proteínas Ribossômicas/química , Proteínas Ribossômicas/genética , Ribossomos/química , Staphylococcus aureus/genética , Staphylococcus aureus/metabolismo

12.

Bayesian sparse multiple regression for simultaneous rank reduction and variable selection.

Chakraborty, Antik; Bhattacharya, Anirban; Mallick, Bani K.

Biometrika ; 107(1): 205-221, 2020 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-33100350

RESUMO

We develop a Bayesian methodology aimed at simultaneously estimating low-rank and row-sparse matrices in a high-dimensional multiple-response linear regression model. We consider a carefully devised shrinkage prior on the matrix of regression coefficients which obviates the need to specify a prior on the rank, and shrinks the regression matrix towards low-rank and row-sparse structures. We provide theoretical support to the proposed methodology by proving minimax optimality of the posterior mean under the prediction risk in ultra-high dimensional settings where the number of predictors can grow sub-exponentially relative to the sample size. A one-step post-processing scheme induced by group lasso penalties on the rows of the estimated coefficient matrix is proposed for variable selection, with default choices of tuning parameters. We additionally provide an estimate of the rank using a novel optimization function achieving dimension reduction in the covariate space. We exhibit the performance of the proposed methodology in an extensive simulation study and a real data example.

13.

Directionally dependent multi-view clustering using copula model.

Afrin, Kahkashan; Iquebal, Ashif S; Karimi, Mostafa; Souris, Allyson; Lee, Se Yoon; Mallick, Bani K.

PLoS One ; 15(10): e0238996, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-33095785

RESUMO

Recent developments in high-throughput methods have resulted in the collection of high-dimensional data types from multiple sources and technologies that measure distinct yet complementary information. Integrated clustering of such multiple data types or multi-view clustering is critical for revealing pathological insights. However, multi-view clustering is challenging due to the complex dependence structure between multiple data types, including directional dependency. Specifically, genomics data types have pre-specified directional dependencies known as the central dogma that describes the process of information flow from DNA to messenger RNA (mRNA) and then from mRNA to protein. Most of the existing multi-view clustering approaches assume an independent structure or pair-wise (non-directional) dependence between data types, thereby ignoring their directional relationship. Motivated by this, we propose a biology-inspired Bayesian integrated multi-view clustering model that uses an asymmetric copula to accommodate the directional dependencies between the data types. Via extensive simulation experiments, we demonstrate the negative impact of ignoring directional dependency on clustering performance. We also present an application of our model to a real-world dataset of breast cancer tumor samples collected from The Cancer Genome Altas program and provide comparative results.

Assuntos

Genômica/métodos , Modelos Estatísticos , Teorema de Bayes , Neoplasias da Mama/genética , Análise por Conglomerados , Simulação por Computador , Interpretação Estatística de Dados , Bases de Dados Genéticas/estatística & dados numéricos , Feminino , Genômica/estatística & dados numéricos , Humanos , Cadeias de Markov , Distribuição Normal

14.

Heptameric Peptide Interferes with Amyloid-ß Aggregation by Structural Reorganization of the Toxic Oligomers.

Bhattacharyya, Rajanya; Bhattacharjee, Sayan; Pathak, Bani K; Sengupta, Jayati.

ACS Omega ; 5(26): 16128-16138, 2020 Jul 07.

Artigo em Inglês | MEDLINE | ID: mdl-32656435

RESUMO

Pathogenesis of Alzheimer's disease (AD), the most common type of dementia, involves misfolding and aggregation of the extracellular amyloid-ß (Aß) protein where the intermediate oligomers, formed during the aggregation progression cascade, are considered the prime toxic species. Here, we identify an active peptide fragment from a medicinal plant-derived (Aristolochia indica) fibrinolytic enzyme having anti-amyloidogenic effects against Aß fibrillation and toxicity. Liquid chromatography with tandem mass spectrometry (LC-MS/MS), followed by computational analysis of the peptide pool generated by proteolytic digestion of the enzyme, identifies two peptide sequences with predictive high-propensity binding to Aß42. Microscopic visualizations in conjunction with biochemical and biophysical assessments suggest that the synthetic version of one of the peptides (termed here Pactive, GFLLHQK) arrests Aß molecules in off-pathway oligomers that can no longer participate in the cytotoxic fibrillation pathway. In contrast, the other peptide (termed P1) aggravates the fibrillation process. Further investigations confirm the strong binding affinity of Pactive with both Aß42 monomers and toxic oligomers by biolayer interferometric assays. We have also shown that, mechanistically, Pactive binding induces conformational alterations in the Aß molecule along with modification of Aß hydrophobicity, one of the key players in aggregation. Importantly, the biostability of Pactive in human blood serum and its nontoxic nature make it a promising therapeutic candidate against Alzheimer's, for which no disease-modifying treatments are available to date.

15.

Bayesian structural equation modeling in multiple omics data with application to circadian genes.

Maity, Arnab Kumar; Lee, Sang Chan; Mallick, Bani K; Sarkar, Tapasree Roy.

Bioinformatics ; 36(13): 3951-3958, 2020 07 01.

Artigo em Inglês | MEDLINE | ID: mdl-32369552

RESUMO

MOTIVATION: It is well known that the integration among different data-sources is reliable because of its potential of unveiling new functionalities of the genomic expressions, which might be dormant in a single-source analysis. Moreover, different studies have justified the more powerful analyses of multi-platform data. Toward this, in this study, we consider the circadian genes' omics profile, such as copy number changes and RNA-sequence data along with their survival response. We develop a Bayesian structural equation modeling coupled with linear regressions and log normal accelerated failure-time regression to integrate the information between these two platforms to predict the survival of the subjects. We place conjugate priors on the regression parameters and derive the Gibbs sampler using the conditional distributions of them. RESULTS: Our extensive simulation study shows that the integrative model provides a better fit to the data than its closest competitor. The analyses of glioblastoma cancer data and the breast cancer data from TCGA, the largest genomics and transcriptomics database, support our findings. AVAILABILITY AND IMPLEMENTATION: The developed method is wrapped in R package available at https://github.com/MAITYA02/semmcmc. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Genoma , Genômica , Teorema de Bayes , Biologia Computacional , Humanos , Análise de Classes Latentes , Software

16.

Quantile Graphical Models: Bayesian Approaches.

Guha, Nilabja; Baladandayuthapani, Veera; Mallick, Bani K.

J Mach Learn Res ; 21(79): 1-47, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-34305477

RESUMO

Graphical models are ubiquitous tools to describe the interdependence between variables measured simultaneously such as large-scale gene or protein expression data. Gaussian graphical models (GGMs) are well-established tools for probabilistic exploration of dependence structures using precision matrices and they are generated under a multivariate normal joint distribution. However, they suffer from several shortcomings since they are based on Gaussian distribution assumptions. In this article, we propose a Bayesian quantile based approach for sparse estimation of graphs. We demonstrate that the resulting graph estimation is robust to outliers and applicable under general distributional assumptions. Furthermore, we develop efficient variational Bayes approximations to scale the methods for large data sets. Our methods are applied to a novel cancer proteomics data dataset where-in multiple proteomic antibodies are simultaneously assessed on tumor samples using reverse-phase protein arrays (RPPA) technology.

17.

Bayesian data integration and variable selection for pan-cancer survival prediction using protein expression data.

Maity, Arnab Kumar; Bhattacharya, Anirban; Mallick, Bani K; Baladandayuthapani, Veerabhadran.

Biometrics ; 76(1): 316-325, 2020 03.

Artigo em Inglês | MEDLINE | ID: mdl-31393003

RESUMO

Accurate prognostic prediction using molecular information is a challenging area of research, which is essential to develop precision medicine. In this paper, we develop translational models to identify major actionable proteins that are associated with clinical outcomes, like the survival time of patients. There are considerable statistical and computational challenges due to the large dimension of the problems. Furthermore, data are available for different tumor types; hence data integration for various tumors is desirable. Having censored survival outcomes escalates one more level of complexity in the inferential procedure. We develop Bayesian hierarchical survival models, which accommodate all the challenges mentioned here. We use the hierarchical Bayesian accelerated failure time model for survival regression. Furthermore, we assume sparse horseshoe prior distribution for the regression coefficients to identify the major proteomic drivers. We borrow strength across tumor groups by introducing a correlation structure among the prior distributions. The proposed methods have been used to analyze data from the recently curated "The Cancer Proteome Atlas" (TCPA), which contains reverse-phase protein arrays-based high-quality protein expression data as well as detailed clinical annotation, including survival times. Our simulation and the TCPA data analysis illustrate the efficacy of the proposed integrative model, which links different tumors with the correlated prior structures.

Assuntos

Biometria/métodos , Neoplasias/metabolismo , Neoplasias/mortalidade , Proteoma/metabolismo , Proteômica/estatística & dados numéricos , Teorema de Bayes , Simulação por Computador , Interpretação Estatística de Dados , Humanos , Neoplasias Renais/metabolismo , Neoplasias Renais/mortalidade , Cadeias de Markov , Modelos Estatísticos , Método de Monte Carlo , Prognóstico , Análise Serial de Proteínas/estatística & dados numéricos , Análise de Sobrevida

18.

Multiple Omics Data Integration to Identify Long Noncoding RNA Responsible for Breast Cancer-Related Mortality.

Roy Sarkar, Tapasree; Maity, Arnab Kumar; Niu, Yabo; Mallick, Bani K.

Cancer Inform ; 18: 1176935119871933, 2019.

Artigo em Inglês | MEDLINE | ID: mdl-31488946

RESUMO

Long non-coding RNAs (lncRNAs) are a large and diverse class of transcribed RNAs, which have been shown to play a significant role in developing cancer. In this study, we apply integrative modeling framework to integrate the DNA copy number variation (CNV), lncRNA expression, and downstream target protein expression to predict patient survival in breast cancer. We develop a 3-stage model combining a mechanical model (lncRNA regressed on CNV and target proteins regressed on lncRNA) and a clinical model (survival regressed on estimated effects from the mechanical models). Using lncRNAs (such as HOTAIR and MALAT1) along with their CNV, target protein expressions, and survival outcomes from The Cancer Genome Atlas (TCGA) database, we show that predicted mean square error and integrated Brier score (IBS) are both lower for the proposed 3-step integrated model than that of 2-step model. Therefore, the integrative model has better predictive ability than the 2-step model not considering target protein information.

19.

Efficient Bayesian Regularization for Graphical Model Selection.

Kundu, Suprateek; Mallick, Bani K; Baladandayuthapan, Veera.

Bayesian Anal ; 14(2): 449-476, 2019 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-33123305

RESUMO

There has been an intense development in the Bayesian graphical model literature over the past decade; however, most of the existing methods are restricted to moderate dimensions. We propose a novel graphical model selection approach for large dimensional settings where the dimension increases with the sample size, by decoupling model fitting and covariance selection. First, a full model based on a complete graph is fit under a novel class of mixtures of inverse-Wishart priors, which induce shrinkage on the precision matrix under an equivalence with Cholesky-based regularization, while enabling conjugate updates. Subsequently, a post-fitting model selection step uses penalized joint credible regions to perform model selection. This allows our methods to be computationally feasible for large dimensional settings using a combination of straightforward Gibbs samplers and efficient post-fitting inferences. Theoretical guarantees in terms of selection consistency are also established. Simulations show that the proposed approach compares favorably with competing methods, both in terms of accuracy metrics and computation times. We apply this approach to a cancer genomics data example.

20.

Integration of Survival and Binary Data for Variable Selection and Prediction: A Bayesian Approach.

Maity, Arnab Kumar; Carroll, Raymond J; Mallick, Bani K.

J R Stat Soc Ser C Appl Stat ; 68(5): 1577-1595, 2019 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-33311813

RESUMO

We consider the problem where the data consist of a survival time and a binary outcome measurement for each individual, as well as corresponding predictors. The goal is to select the common set of predictors which affect both the responses, and not just only one of them. In addition, we develop a survival prediction model based on data integration. This article is motivated by the Cancer Genomic Atlas (TCGA) databank, which is currently the largest genomics and transcriptomics database. The data contain cancer survival information along with cancer stages for each patient. Furthermore, it contains Reverse-phase Protein Array (RPPA) measurements for each individual, which are the predictors associated with these responses. The biological motivation is to identify the major actionable proteins associated with both survival outcomes and cancer stages. We develop a Bayesian hierarchical model to jointly model the survival time and the classification of the cancer stages. Moreover, to deal with the high dimensionality of the RPPA measurements, we use a shrinkage prior to identify significant proteins. Simulations and TCGA data analysis show that the joint integrated modeling approach improves survival prediction.

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA