Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
Biometrics ; 75(1): 183-192, 2019 03.
Artículo en Inglés | MEDLINE | ID: mdl-30125947

RESUMEN

In this article, we develop a Bayesian hierarchical mixture regression model for studying the association between a multivariate response, measured as counts on a set of features, and a set of covariates. We have available RNA-Seq and DNA methylation data measured on breast cancer patients at different stages of the disease. We account for the heterogeneity and over-dispersion of count data (here, RNA-Seq data) by considering a mixture of negative binomial distributions and incorporate the covariates (here, methylation data) into the model via a linear modeling construction on the mean components. Our modeling construction includes several innovative characteristics. First, it employs selection techniques that allow the identification of a small subset of features that best discriminate the samples while simultaneously selecting a set of covariates associated to each feature. Second, it incorporates known dependencies into the feature selection process via the use of Markov random field (MRF) priors. On simulated data, we show how incorporating existing information via the prior model can improve the accuracy of feature selection. In the analysis of RNA-Seq and DNA methylation data on breast cancer, we incorporate knowledge on relationships among genes via a gene-gene network, which we extract from the KEGG database. Our data analysis identifies genes which are discriminatory of cancer stages and simultaneously selects significant associations between those genes and DNA methylation sites. A biological interpretation of our findings reveals several biomarkers that can help understanding the effect of DNA methylation on gene expression transcription across cancer stages.


Asunto(s)
Teorema de Bayes , Distribución Binomial , Neoplasias de la Mama/genética , Redes Reguladoras de Genes , Modelos Estadísticos , Análisis de Regresión , Secuencia de Bases , Biomarcadores de Tumor , Metilación de ADN , Interpretación Estadística de Datos , Femenino , Humanos
2.
Anal Bioanal Chem ; 410(23): 5969-5980, 2018 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-29968108

RESUMEN

Mass spectrometry imaging (MSI) has provided many results with translational character, which still have to be proven robust in large patient cohorts and across different centers. Although formalin-fixed paraffin-embedded (FFPE) specimens are most common in clinical practice, no MSI multicenter study has been reported for FFPE samples. Here, we report the results of the first round robin MSI study on FFPE tissues with the goal to investigate the consequences of inter- and intracenter technical variation on masking biological effects. A total of four centers were involved with similar MSI instrumentation and sample preparation equipment. A FFPE multi-organ tissue microarray containing eight different types of tissue was analyzed on a peptide and metabolite level, which enabled investigating different molecular and biological differences. Statistical analyses revealed that peptide intercenter variation was significantly lower and metabolite intercenter variation was significantly higher than the respective intracenter variations. When looking at relative univariate effects of mass signals with statistical discriminatory power, the metabolite data was more reproducible across centers compared to the peptide data. With respect to absolute effects (cross-center common intensity scale), multivariate classifiers were able to reach on average > 90% accuracy for peptides and > 80% for metabolites if trained with sufficient amount of cross-center data. Overall, our study showed that MSI data from FFPE samples could be reproduced to a high degree across centers. While metabolite data exhibited more reproducibility with respect to relative effects, peptide data-based classifiers were more directly transferable between centers and therefore more robust than expected. Graphical abstract ᅟ.


Asunto(s)
Espectrometría de Masas , Metabolómica , Adhesión en Parafina , Péptidos/análisis , Análisis de Matrices Tisulares , Fijación del Tejido , Animales , Formaldehído/química , Espectrometría de Masas/métodos , Metabolómica/métodos , Ratones , Adhesión en Parafina/métodos , Proteómica/métodos , Reproducibilidad de los Resultados , Análisis de Matrices Tisulares/métodos , Fijación del Tejido/métodos
3.
PLoS Comput Biol ; 12(4): e1004884, 2016 04.
Artículo en Inglés | MEDLINE | ID: mdl-27124473

RESUMEN

The advent of functional genomics has enabled the genome-wide characterization of the molecular state of cells and tissues, virtually at every level of biological organization. The difficulty in organizing and mining this unprecedented amount of information has stimulated the development of computational methods designed to infer the underlying structure of regulatory networks from observational data. These important developments had a profound impact in biological sciences since they triggered the development of a novel data-driven investigative approach. In cancer research, this strategy has been particularly successful. It has contributed to the identification of novel biomarkers, to a better characterization of disease heterogeneity and to a more in depth understanding of cancer pathophysiology. However, so far these approaches have not explicitly addressed the challenge of identifying networks representing the interaction of different cell types in a complex tissue. Since these interactions represent an essential part of the biology of both diseased and healthy tissues, it is of paramount importance that this challenge is addressed. Here we report the definition of a network reverse engineering strategy designed to infer directional signals linking adjacent cell types within a complex tissue. The application of this inference strategy to prostate cancer genome-wide expression profiling data validated the approach and revealed that normal epithelial cells exert an anti-tumour activity on prostate carcinoma cells. Moreover, by using a Bayesian hierarchical model integrating genetics and gene expression data and combining this with survival analysis, we show that the expression of putative cell communication genes related to focal adhesion and secretion is affected by epistatic gene copy number variation and it is predictive of patient survival. Ultimately, this study represents a generalizable approach to the challenge of deciphering cell communication networks in a wide spectrum of biological systems.


Asunto(s)
Redes Reguladoras de Genes , Próstata/citología , Próstata/metabolismo , Neoplasias de la Próstata/genética , Neoplasias de la Próstata/patología , Teorema de Bayes , Comunicación Celular , Línea Celular , Línea Celular Tumoral , Técnicas de Cocultivo , Biología Computacional , Células Epiteliales/metabolismo , Perfilación de la Expresión Génica , Humanos , Masculino , Modelos Biológicos , Neoplasias de la Próstata/metabolismo , Transducción de Señal/genética
4.
Neuroimage ; 125: 601-615, 2016 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-26518632

RESUMEN

Brain graphs provide a useful way to computationally model the network structure of the connectome, and this has led to increasing interest in the use of graph theory to quantitate and investigate the topological characteristics of the healthy brain and brain disorders on the network level. The majority of graph theory investigations of functional connectivity have relied on the assumption of temporal stationarity. However, recent evidence increasingly suggests that functional connectivity fluctuates over the length of the scan. In this study, we investigate the stationarity of brain network topology using a Bayesian hidden Markov model (HMM) approach that estimates the dynamic structure of graph theoretical measures of whole-brain functional connectivity. In addition to extracting the stationary distribution and transition probabilities of commonly employed graph theory measures, we propose two estimators of temporal stationarity: the S-index and N-index. These indexes can be used to quantify different aspects of the temporal stationarity of graph theory measures. We apply the method and proposed estimators to resting-state functional MRI data from healthy controls and patients with temporal lobe epilepsy. Our analysis shows that several graph theory measures, including small-world index, global integration measures, and betweenness centrality, may exhibit greater stationarity over time and therefore be more robust. Additionally, we demonstrate that accounting for subject-level differences in the level of temporal stationarity of network topology may increase discriminatory power in discriminating between disease states. Our results confirm and extend findings from other studies regarding the dynamic nature of functional connectivity, and suggest that using statistical models which explicitly account for the dynamic nature of functional connectivity in graph theory analyses may improve the sensitivity of investigations and consistency across investigations.


Asunto(s)
Encéfalo/fisiología , Conectoma/métodos , Epilepsia del Lóbulo Temporal/fisiopatología , Procesamiento de Imagen Asistido por Computador/métodos , Vías Nerviosas/fisiología , Adulto , Algoritmos , Teorema de Bayes , Femenino , Humanos , Imagen por Resonancia Magnética/métodos , Masculino , Cadenas de Markov , Persona de Mediana Edad , Adulto Joven
5.
Anal Chem ; 88(11): 5871-8, 2016 06 07.
Artículo en Inglés | MEDLINE | ID: mdl-27180608

RESUMEN

Mass spectrometry imaging (MSI) is a powerful molecular imaging technique. In microprobe MSI, images are created through a grid-wise interrogation of individual spots by mass spectrometry across a surface. Classical statistical tests for within-sample comparisons fail as close-by measurement spots violate the assumption of independence of these tests, which can lead to an increased false-discovery rate. For spatial data, this effect is referred to as spatial autocorrelation. In this study, we investigated spatial autocorrelation in three different matrix-assisted laser desorption/ionization MSI data sets. These data sets cover different molecular classes (metabolites/drugs, lipids, and proteins) and different spatial resolutions ranging from 20 to 100 µm. Significant spatial autocorrelation was detected in all three data sets and found to increase with decreasing pixel size. To enable statistical testing for differences in mass signal intensities between regions of interest within MSI data sets, we propose the use of Conditional Autoregressive (CAR) models. We show that, by accounting for spatial autocorrelation, discovery rates (i.e., the ratio between the features identified and the total number of features) could be reduced between 21% and 69%. The reliability of this approach was validated by control mass signals based on prior knowledge. In light of the advent of larger MSI data sets based on either an increased spatial resolution or 3D data sets, accounting for effects due to spatial autocorrelation becomes even more indispensable. Here, we propose a generic and easily applicable workflow to enable within-sample statistical comparisons.

6.
Biometrics ; 71(3): 803-11, 2015 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-25771699

RESUMEN

In this article we propose a Bayesian hierarchical model for the identification of differentially expressed genes in Daphnia magna organisms exposed to chemical compounds, specifically munition pollutants in water. The model we propose constitutes one of the very first attempts at a rigorous modeling of the biological effects of water purification. We have data acquired from a purification system that comprises four consecutive purification stages, which we refer to as "ponds," of progressively more contaminated water. We model the expected expression of a gene in a pond as the sum of the mean of the same gene in the previous pond plus a gene-pond specific difference. We incorporate a variable selection mechanism for the identification of the differential expressions, with a prior distribution on the probability of a change that accounts for the available information on the concentration of chemical compounds present in the water. We carry out posterior inference via MCMC stochastic search techniques. In the application, we reduce the complexity of the data by grouping genes according to their functional characteristics, based on the KEGG pathway database. This also increases the biological interpretability of the results. Our model successfully identifies a number of pathways that show differential expression between consecutive purification stages. We also find that changes in the transcriptional response are more strongly associated to the presence of certain compounds, with the remaining contributing to a lesser extent. We discuss the sensitivity of these results to the model parameters that measure the influence of the prior information on the posterior inference.


Asunto(s)
Daphnia/metabolismo , Sustancias Explosivas/envenenamiento , Perfilación de la Expresión Génica/métodos , Modelos Estadísticos , Proteoma/metabolismo , Contaminantes Químicos del Agua/toxicidad , Animales , Teorema de Bayes , Simulación por Computador , Exposición a Riesgos Ambientales/efectos adversos , Regulación de la Expresión Génica/efectos de los fármacos , Regulación de la Expresión Génica/fisiología , Reproducibilidad de los Resultados , Sensibilidad y Especificidad
7.
Stat Methods Med Res ; 33(3): 532-553, 2024 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-38320802

RESUMEN

Reliability of measurement instruments providing quantitative outcomes is usually assessed by an intraclass correlation coefficient. When participants are repeatedly measured by a single rater or device, or, are each rated by a different group of raters, the intraclass correlation coefficient is based on a one-way analysis of variance model. When planning a reliability study, it is essential to determine the number of participants and measurements per participant (i.e. number of raters or number of repeated measurements). Three different sample size determination approaches under the one-way analysis of variance model were identified in the literature, all based on a confidence interval for the intraclass correlation coefficient. Although eight different confidence interval methods can be identified, Wald confidence interval with Fisher's large sample variance approximation remains most commonly used despite its well-known poor statistical properties. Therefore, a first objective of this work is comparing the statistical properties of all identified confidence interval methods-including those overlooked in previous studies. A second objective is developing a general procedure to determine the sample size using all approaches since a closed-form formula is not always available. This procedure is implemented in an R Shiny app. Finally, we provide advice for choosing an appropriate sample size determination method when planning a reliability study.


Asunto(s)
Tamaño de la Muestra , Humanos , Reproducibilidad de los Resultados , Variaciones Dependientes del Observador , Análisis de Varianza
8.
BMC Bioinformatics ; 11: 270, 2010 May 20.
Artículo en Inglés | MEDLINE | ID: mdl-20487547

RESUMEN

BACKGROUND: In microarray studies researchers are often interested in the comparison of relevant quantities between two or more similar experiments, involving different treatments, tissues, or species. Typically each experiment reports measures of significance (e.g. p-values) or other measures that rank its features (e.g genes). Our objective is to find a list of features that are significant in all experiments, to be further investigated. In this paper we present an R package called sdef, that allows the user to quantify the evidence of communality between the experiments using previously proposed statistical methods based on the ranked lists of p-values. sdef implements two approaches that address this objective: the first is a permutation test of the maximal ratio of observed to expected common features under the hypothesis of independence between the experiments. The second approach, set in a Bayesian framework, is more flexible as it takes into account the uncertainty on the number of genes differentially expressed in each experiment. RESULTS: We used sdef to re-analyze publicly available data i) on Type 2 diabetes susceptibility in mice on liver and skeletal muscle (two experiments); ii) on molecular similarities between mammalian sexes (three experiments). For the first example, we found between 68 and 104 genes commonly perturbed between the two tissues, using the two methods described above, and enrichment of the inflammation pathways, which are related to obesity and diabetes. For the second example, looking at three lists of features, we found 110 genes commonly perturbed between the three tissues, using the same two methods, and enrichment on genes involved in cell development. CONCLUSIONS: sdef is an R package that provides researchers with an easy and powerful methodology to find lists of features commonly perturbed in two or more experiments to be further investigated. The package is provided with plots and tables to help the user visualize and interpret the results. The Windows, Linux and MacOS versions of the package, together with the documentation are available on the website http://cran.r-project.org/web/packages/sdef/index.html.


Asunto(s)
Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Programas Informáticos , Bases de Datos Genéticas , Perfilación de la Expresión Génica/métodos
9.
Cancer Inform ; 13(Suppl 2): 29-37, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25288877

RESUMEN

We consider a Bayesian hierarchical model for the integration of gene expression levels with comparative genomic hybridization (CGH) array measurements collected on the same subjects. The approach defines a measurement error model that relates the gene expression levels to latent copy number states. In turn, the latent states are related to the observed surrogate CGH measurements via a hidden Markov model. The model further incorporates variable selection with a spatial prior based on a probit link that exploits dependencies across adjacent DNA segments. Posterior inference is carried out via Markov chain Monte Carlo stochastic search techniques. We study the performance of the model in simulations and show better results than those achieved with recently proposed alternative priors. We also show an application to data from a genomic study on lung squamous cell carcinoma, where we identify potential candidates of associations between copy number variants and the transcriptional activity of target genes. Gene ontology (GO) analyses of our findings reveal enrichments in genes that code for proteins involved in cancer. Our model also identifies a number of potential candidate biomarkers for further experimental validation.

10.
Ann Appl Stat ; 8(1): 148-175, 2014 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-24834139

RESUMEN

A number of statistical models have been successfully developed for the analysis of high-throughput data from a single source, but few methods are available for integrating data from different sources. Here we focus on integrating gene expression levels with comparative genomic hybridization (CGH) array measurements collected on the same subjects. We specify a measurement error model that relates the gene expression levels to latent copy number states which, in turn, are related to the observed surrogate CGH measurements via a hidden Markov model. We employ selection priors that exploit the dependencies across adjacent copy number states and investigate MCMC stochastic search techniques for posterior inference. Our approach results in a unified modeling framework for simultaneously inferring copy number variants (CNV) and identifying their significant associations with mRNA transcripts abundance. We show performance on simulated data and illustrate an application to data from a genomic study on human cancer cell lines.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA