Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 20
Filter
1.
BMC Med Res Methodol ; 23(1): 225, 2023 10 10.
Article in English | MEDLINE | ID: mdl-37817074

ABSTRACT

BACKGROUND: INTEROCC is a seven-country cohort study of occupational exposures and brain cancer risk, including occupational exposure to electromagnetic fields (EMF). In the absence of data on individual exposures, a Job Exposure Matrix (JEM) may be used to construct likely exposure scenarios in occupational settings. This tool was constructed using statistical summaries of exposure to EMF for various occupational categories for a comparable group of workers. METHODS: In this study, we use the Canadian data from INTEROCC to determine the best EMF exposure surrogate/estimate from three appropriately chosen surrogates from the JEM, along with a fourth surrogate based on Berkson error adjustments obtained via numerical approximation of the likelihood function. In this article, we examine the case in which exposures are gamma-distributed for each occupation in the JEM, as an alternative to the log-normal exposure distribution considered in a previous study conducted by our research team. We also study using those surrogates and the Berkson error adjustment in Poisson regression and conditional logistic regression. RESULTS: Simulations show that the introduced methods of Berkson error adjustment for non-stratified analyses provide accurate estimates of the risk of developing tumors in case of gamma exposure model. Alternatively, and under some technical assumptions, the arithmetic mean is the best surrogate when a gamma-distribution is used as an exposure model. Simulations also show that none of the present methods could provide an accurate estimate of the risk in case of stratified analyses. CONCLUSION: While our previous study found the geometric mean to be the best exposure surrogate, the present study suggests that the best surrogate is dependent on the exposure model; the arithmetic means in case of gamma-exposure model and the geometric means in case of log-normal exposure model. However, we could present a better method of Berkson error adjustment for each of the two exposure models. Our results provide useful guidance on the application of JEMs for occupational exposure assessments, with adjustment for Berkson error.


Subject(s)
Occupational Exposure , Humans , Logistic Models , Cohort Studies , Canada/epidemiology , Occupational Exposure/adverse effects , Electromagnetic Fields/adverse effects
2.
Comput Toxicol ; 212022 Feb.
Article in English | MEDLINE | ID: mdl-35083394

ABSTRACT

Computational methods for genomic dose-response integrate dose-response modeling with bioinformatics tools to evaluate changes in molecular and cellular functions related to pathogenic processes. These methods use parametric models to describe each gene's dose-response, but such models may not adequately capture expression changes. Additionally, current approaches do not consider gene co-expression networks. When assessing co-expression networks, one typically does not consider the dose-response relationship, resulting in 'co-regulated' gene sets containing genes having different dose-response patterns. To avoid these limitations, we develop an analysis pipeline called Aggregated Local Extrema Splines for High-throughput Analysis (ALOHA), which computes individual genomic dose-response functions using a flexible class Bayesian shape constrained splines and clusters gene co-regulation based upon these fits. Using splines, we reduce information loss due to parametric lack-of-fit issues, and because we cluster on dose-response relationships, we better identify co-regulation clusters for genes that have co-expressed dose-response patterns from chemical exposure. The clustered pathways can then be used to estimate a dose associated with a pre-specified biological response, i.e., the benchmark dose (BMD), and approximate a point of departure dose corresponding to minimal adverse response in the whole tissue/organism. We compare our approach to current parametric methods and our biologically enriched gene sets to cluster on normalized expression data. Using this methodology, we can more effectively extract the underlying structure leading to more cohesive estimates of gene set potency.

3.
Stat Med ; 41(4): 681-697, 2022 02 20.
Article in English | MEDLINE | ID: mdl-34897771

ABSTRACT

In omics experiments, estimation and variable selection can involve thousands of proteins/genes observed from a relatively small number of subjects. Many regression regularization procedures have been developed for estimation and variable selection in such high-dimensional problems. However, approaches have predominantly focused on linear regression models that ignore correlation arising from long sequences of repeated measurements on the outcome. Our work is motivated by the need to identify proteomic biomarkers that improve the prediction of rapid lung-function decline for individuals with cystic fibrosis (CF) lung disease. We extend four Bayesian penalized regression approaches for a Gaussian linear mixed effects model with nonstationary covariance structure to account for the complicated structure of longitudinal lung function data while simultaneously estimating unknown parameters and selecting important protein isoforms to improve predictive performance. Different types of shrinkage priors are evaluated to induce variable selection in a fully Bayesian framework. The approaches are studied with simulations. We apply the proposed method to real proteomics and lung-function outcome data from our motivating CF study, identifying a set of relevant clinical/demographic predictors and a proteomic biomarker for rapid decline of lung function. We also illustrate the methods on CD4 yeast cell-cycle genomic data, confirming that the proposed method identifies transcription factors that have been highlighted in the literature for their importance as cell cycle transcription factors.


Subject(s)
Genomics , Proteomics , Bayes Theorem , Humans , Linear Models , Normal Distribution
4.
Diabetes Res Clin Pract ; 171: 108520, 2021 Jan.
Article in English | MEDLINE | ID: mdl-33096188

ABSTRACT

AIMS: The aim of this pilot study was to assess the Laboratory Risk Indicator for Necrotizing Fasciitis (LRINEC), a scoring system for Necrotizing Soft Tissue Infections, to diagnose Necrotizing Soft Tissue Infections of the lower extremity in patients with diabetes. METHODS: Sixty-nine patients with lower extremity infections were prospectively enrolled. The Laboratory Risk Indicator for Necrotizing Fasciitis was calculated and logistic regression was performed for each laboratory value. RESULTS: The Laboratory Risk Indicator for Necrotizing Fasciitis was associated with Necrotizing Soft Tissue Infection diagnosis in patients with diabetes (p = 0.01). Sensitivity, specificity, positive predictive value, and negative predictive value were 100%, 69%, 16.6%, and 100% respectively. Elevated C-reactive protein (OR 1.01, p = 0.02, 95% CI [1.002-1.23]) and white blood cell count (OR 1.34, p < 0.01, 95% CI [1.1-1.7]) were associated with Necrotizing Soft Tissue Infection. CONCLUSIONS: The Laboratory Risk Indicator for Necrotizing Fasciitis was useful as a negative predictor of Necrotizing Soft Tissue Infection while C- reactive protein and white blood cell count may have value as individual predictors. We recommend high clinical suspicion of Necrotizing Soft Tissue Infections in diabetics as laboratory evaluation may be non-specific.


Subject(s)
Diabetes Complications/complications , Fasciitis, Necrotizing/diagnosis , Lower Extremity/pathology , Soft Tissue Infections/diagnosis , Fasciitis, Necrotizing/blood , Female , Humans , Laboratories , Male , Middle Aged , Pilot Projects , Prospective Studies , Retrospective Studies , Risk Factors , Soft Tissue Infections/blood
5.
Bioinformatics ; 36(18): 4781-4788, 2020 09 15.
Article in English | MEDLINE | ID: mdl-32653926

ABSTRACT

MOTIVATION: Misregulation of signaling pathway activity is etiologic for many human diseases, and modulating activity of signaling pathways is often the preferred therapeutic strategy. Understanding the mechanism of action (MOA) of bioactive chemicals in terms of targeted signaling pathways is the essential first step in evaluating their therapeutic potential. Changes in signaling pathway activity are often not reflected in changes in expression of pathway genes which makes MOA inferences from transcriptional signatures (TSeses) a difficult problem. RESULTS: We developed a new computational method for implicating pathway targets of bioactive chemicals and other cellular perturbations by integrated analysis of pathway network topology, the Library of Integrated Network-based Cellular Signature TSes of genetic perturbations of pathway genes and the TS of the perturbation. Our methodology accurately predicts signaling pathways targeted by the perturbation when current pathway analysis approaches utilizing only the TS of the perturbation fail. AVAILABILITY AND IMPLEMENTATION: Open source R package paslincs is available at https://github.com/uc-bd2k/paslincs. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Signal Transduction , Software , Humans
6.
Article in English | MEDLINE | ID: mdl-35935278

ABSTRACT

In longitudinal studies in which a medical device is used to monitor outcome repeatedly and frequently on the same patients over a prespecified duration of time, two clustering goals can arise. One goal is to assess the degree of heterogeneity among patient profiles. A second yet equally important goal unique to such studies is to determine frequency and duration of monitoring sufficient to identify longitudinal changes. Considering these goals jointly would identify clusters of patients who share similar patterns over time and characterize temporal stability within each cluster. We use a biclustering approach, allowing simultaneous clustering of observations at both patient and time levels and using a nonparametric hierarchical Bayesian model. Because clustering units at the time level (i.e., time points) are ordered and hence unexchangeable, we utilize a multivariate Dirichlet process mixture model by specifying a Dirichlet process prior at the patient level whose base measure employs change points at the time level to achieve the desired joint clustering. We consider structured covariance between consecutive time points and assess model performance through simulation studies. We apply the model to data on 24-hr ambulatory blood pressure monitoring and examine the relationship between diastolic blood pressure and pediatric obstructive sleep apnoea.

7.
J Expo Sci Environ Epidemiol ; 28(3): 251-258, 2018 05.
Article in English | MEDLINE | ID: mdl-28352117

ABSTRACT

Many epidemiological studies assessing the relationship between exposure and disease are carried out without data on individual exposures. When this barrier is encountered in occupational studies, the subject exposures are often evaluated with a job-exposure matrix (JEM), which consists of mean exposure for occupational categories measured on a comparable group of workers. One of the objectives of the seven-country case-control study of occupational exposure and brain cancer risk, INTEROCC, was to investigate the relationship of occupational exposure to electromagnetic fields (EMF) in different frequency ranges and brain cancer risk. In this paper, we use the Canadian data from INTEROCC to estimate the odds of developing brain tumours due to occupational exposure to EMF. The first step was to find the best EMF exposure surrogate among the arithmetic mean, the geometric mean, and the mean of log-normal exposure distribution for each occupation in the JEM, in comparison to Berkson error adjustments via numerical approximation of the likelihood function. Contrary to previous studies of Berkson errors in JEMs, we found that the geometric mean was the best exposure surrogate. This analysis provided no evidence that cumulative lifetime exposure to extremely low frequency magnetic fields increases brain cancer risk, a finding consistent with other recent epidemiological studies.


Subject(s)
Electromagnetic Fields , Environmental Monitoring/methods , Epidemiologic Methods , Occupational Exposure/analysis , Risk Assessment/methods , Adult , Bias , Brain Neoplasms/epidemiology , Brain Neoplasms/etiology , Canada/epidemiology , Case-Control Studies , Computer Simulation , Electromagnetic Fields/adverse effects , Female , Humans , Likelihood Functions , Male , Middle Aged , Occupational Diseases/epidemiology , Occupational Diseases/etiology , Occupational Exposure/adverse effects , Risk Factors
8.
Cell Syst ; 6(1): 13-24, 2018 01 24.
Article in English | MEDLINE | ID: mdl-29199020

ABSTRACT

The Library of Integrated Network-Based Cellular Signatures (LINCS) is an NIH Common Fund program that catalogs how human cells globally respond to chemical, genetic, and disease perturbations. Resources generated by LINCS include experimental and computational methods, visualization tools, molecular and imaging data, and signatures. By assembling an integrated picture of the range of responses of human cells exposed to many perturbations, the LINCS program aims to better understand human disease and to advance the development of new therapies. Perturbations under study include drugs, genetic perturbations, tissue micro-environments, antibodies, and disease-causing mutations. Responses to perturbations are measured by transcript profiling, mass spectrometry, cell imaging, and biochemical methods, among other assays. The LINCS program focuses on cellular physiology shared among tissues and cell types relevant to an array of diseases, including cancer, heart disease, and neurodegenerative disorders. This Perspective describes LINCS technologies, datasets, tools, and approaches to data accessibility and reusability.


Subject(s)
Cataloging/methods , Systems Biology/methods , Computational Biology/methods , Databases, Chemical/standards , Gene Expression Profiling/methods , Gene Library , Humans , Information Storage and Retrieval/methods , National Health Programs , National Institutes of Health (U.S.)/standards , Transcriptome , United States
9.
Stat Med ; 36(15): 2391-2403, 2017 07 10.
Article in English | MEDLINE | ID: mdl-28276142

ABSTRACT

We provide a Bayesian decision theoretic approach to finding subgroups that have elevated treatment effects. Our approach separates the modeling of the response variable from the task of subgroup finding and allows a flexible modeling of the response variable irrespective of potential subgroups of interest. We use Bayesian additive regression trees to model the response variable and use a utility function defined in terms of a candidate subgroup and the predicted response for that subgroup. Subgroups are identified by maximizing the expected utility where the expectation is taken with respect to the posterior predictive distribution of the response, and the maximization is carried out over an a priori specified set of candidate subgroups. Our approach allows subgroups based on both quantitative and categorical covariates. We illustrate the approach using simulated data set study and a real data set. Copyright © 2017 John Wiley & Sons, Ltd.


Subject(s)
Bayes Theorem , Randomized Controlled Trials as Topic/statistics & numerical data , Regression Analysis , Adolescent , Arthritis, Juvenile/diet therapy , Arthritis, Juvenile/physiopathology , Biostatistics , Bone Density/drug effects , Calcium, Dietary/administration & dosage , Child , Computer Simulation , Decision Theory , Dietary Supplements , Female , Humans , Male , Models, Statistical , Treatment Outcome
10.
Biom J ; 59(4): 746-766, 2017 Jul.
Article in English | MEDLINE | ID: mdl-28319254

ABSTRACT

We develop a Bayesian approach to subgroup analysis using ANOVA models with multiple covariates, extending an earlier work. We assume a two-arm clinical trial with normally distributed response variable. We also assume that the covariates for subgroup finding are categorical and are a priori specified, and parsimonious easy-to-interpret subgroups are preferable. We represent the subgroups of interest by a collection of models and use a model selection approach to finding subgroups with heterogeneous effects. We develop suitable priors for the model space and use an objective Bayesian approach that yields multiplicity adjusted posterior probabilities for the models. We use a structured algorithm based on the posterior probabilities of the models to determine which subgroup effects to report. Frequentist operating characteristics of the approach are evaluated using simulation. While our approach is applicable in more general cases, we mainly focus on the 2 × 2 case of two covariates each at two levels for ease of presentation. The approach is illustrated using a real data example.


Subject(s)
Analysis of Variance , Biometry/methods , Models, Statistical , Algorithms , Bayes Theorem , Humans , Probability
11.
Stat Med ; 35(25): 4509-4527, 2016 11 10.
Article in English | MEDLINE | ID: mdl-27364101

ABSTRACT

Inference about the treatment effect in a crossover design has received much attention over time owing to the uncertainty in the existence of the carryover effect and its impact on the estimation of the treatment effect. Adding to this uncertainty is that the existence of the carryover effect and its size may depend on the presence of the treatment effect and its size. We consider estimation and testing hypothesis about the treatment effect in a two-period crossover design, assuming normally distributed response variable, and use an objective Bayesian approach to test the hypothesis about the treatment effect and to estimate its size when it exists while accounting for the uncertainty about the presence of the carryover effect as well as the treatment and period effects. We evaluate and compare the performance of the proposed approach with a standard frequentist approach using simulated data, and real data. Copyright © 2016 John Wiley & Sons, Ltd.


Subject(s)
Bayes Theorem , Cross-Over Studies , Research Design , Uncertainty
12.
Math Biosci ; 277: 136-40, 2016 07.
Article in English | MEDLINE | ID: mdl-27140527

ABSTRACT

BACKGROUND: Lateralization of seizure-onset zone (SOZ) during electroencephalography (EEG) monitoring in people with bilateral potentially epileptogenic lesions is important to facilitate clinical decision making for resective surgery. METHODS: We develop two Bayesian approaches for estimating the number of consecutive ipsilateral seizures required to lateralize the SOZ to a given lower limit of 95% credible interval (LLI, assuming continuous prior distribution), or to a given posterior probability (assuming mixture of discrete and continuous prior probabilities). RESULTS: With estimation approach, if both the cerebral hemispheres are a priori equi-probable to contain SOZ, then using Jeffrey's prior, a minimum of 9, 18, and 38 consecutive ipsilateral seizures will yield an LLI of 0.81, 0.90, and 0.95 respectively. If one of the hemisphere is a priori more likely to have SOZ, then prior beta distributions with α=3, ß=2, and α=4, ß=3 will require a minimum of 18 and 24 consecutive ipsilateral seizures to yield an LLI of 0.80. Contrariwise, the testing approach allows approximation of the number of consecutive ipsilateral seizures to lateralize the SOZ depending on an estimate of prior probability of lateralized SOZ, to a desired posterior probability. For a prior probability of 0.5, using uniform prior, mixture model will require 7, 17, and 37 consecutive ipsilateral seizures to lateralize the SOZ with a posterior probability of 0.8, 0.9, and 0.95 respectively. CONCLUSION: While the reasoning presented here is based on probability theory, it is hoped that it may help clinical decision making and stimulate further validation with actual clinical data.


Subject(s)
Bayes Theorem , Drug Resistant Epilepsy/diagnosis , Electroencephalography/methods , Probability Theory , Humans
13.
PLoS Comput Biol ; 9(9): e1003198, 2013.
Article in English | MEDLINE | ID: mdl-24039560

ABSTRACT

Identifying transcription factors (TF) involved in producing a genome-wide transcriptional profile is an essential step in building mechanistic model that can explain observed gene expression data. We developed a statistical framework for constructing genome-wide signatures of TF activity, and for using such signatures in the analysis of gene expression data produced by complex transcriptional regulatory programs. Our framework integrates ChIP-seq data and appropriately matched gene expression profiles to identify True REGulatory (TREG) TF-gene interactions. It provides genome-wide quantification of the likelihood of regulatory TF-gene interaction that can be used to either identify regulated genes, or as genome-wide signature of TF activity. To effectively use ChIP-seq data, we introduce a novel statistical model that integrates information from all binding "peaks" within 2 Mb window around a gene's transcription start site (TSS), and provides gene-level binding scores and probabilities of regulatory interaction. In the second step we integrate these binding scores and regulatory probabilities with gene expression data to assess the likelihood of True REGulatory (TREG) TF-gene interactions. We demonstrate the advantages of TREG framework in identifying genes regulated by two TFs with widely different distribution of functional binding events (ERα and E2f1). We also show that TREG signatures of TF activity vastly improve our ability to detect involvement of ERα in producing complex diseases-related transcriptional profiles. Through a large study of disease-related transcriptional signatures and transcriptional signatures of drug activity, we demonstrate that increase in statistical power associated with the use of TREG signatures makes the crucial difference in identifying key targets for treatment, and drugs to use for treatment. All methods are implemented in an open-source R package treg. The package also contains all data used in the analysis including 494 TREG binding profiles based on ENCODE ChIP-seq data. The treg package can be downloaded at http://GenomicsPortals.org.


Subject(s)
Genome-Wide Association Study , Transcription Factors/physiology , Chromatin Immunoprecipitation , Disease , Gene Expression Profiling , Humans , Probability , Transcription Factors/genetics
14.
Bioinformatics ; 27(1): 70-7, 2011 Jan 01.
Article in English | MEDLINE | ID: mdl-20971985

ABSTRACT

MOTIVATION: Functional enrichment analysis using primary genomics datasets is an emerging approach to complement established methods for functional enrichment based on predefined lists of functionally related genes. Currently used methods depend on creating lists of 'significant' and 'non-significant' genes based on ad hoc significance cutoffs. This can lead to loss of statistical power and can introduce biases affecting the interpretation of experimental results. RESULTS: We developed and validated a new statistical framework, generalized random set (GRS) analysis, for comparing the genomic signatures in two datasets without the need for gene categorization. In our tests, GRS produced correct measures of statistical significance, and it showed dramatic improvement in the statistical power over other methods currently used in this setting. We also developed a procedure for identifying genes driving the concordance of the genomics profiles and demonstrated a dramatic improvement in functional coherence of genes identified in such analysis. AVAILABILITY: GRS can be downloaded as part of the R package CLEAN from http://ClusterAnalysis.org/. An online implementation is available at http://GenomicsPortals.org/.


Subject(s)
Gene Expression Profiling/methods , Genomics/methods , Animals , Breast Neoplasms/genetics , Data Interpretation, Statistical , Diet , Female , Gene Expression , Humans , Rats
15.
BMC Bioinformatics ; 11: 234, 2010 May 07.
Article in English | MEDLINE | ID: mdl-20459663

ABSTRACT

BACKGROUND: Differential co-expression analysis is an emerging strategy for characterizing disease related dysregulation of gene expression regulatory networks. Given pre-defined sets of biological samples, such analysis aims at identifying genes that are co-expressed in one, but not in the other set of samples. RESULTS: We developed a novel probabilistic framework for jointly uncovering contexts (i.e. groups of samples) with specific co-expression patterns, and groups of genes with different co-expression patterns across such contexts. In contrast to current clustering and bi-clustering procedures, the implicit similarity measure in this model used for grouping biological samples is based on the clustering structure of genes within each sample and not on traditional measures of gene expression level similarities. Within this framework, biological samples with widely discordant expression patterns can be placed in the same context as long as the co-clustering structure of genes is concordant within these samples. To the best of our knowledge, this is the first method to date for unsupervised differential co-expression analysis in this generality. When applied to the problem of identifying molecular subtypes of breast cancer, our method identified reproducible patterns of differential co-expression across several independent expression datasets. Sample groupings induced by these patterns were highly informative of the disease outcome. Expression patterns of differentially co-expressed genes provided new insights into the complex nature of the ERalpha regulatory network. CONCLUSIONS: We demonstrated that the use of the co-clustering structure as the similarity measure in the unsupervised analysis of sample gene expression profiles provides valuable information about expression regulatory networks.


Subject(s)
Bayes Theorem , Gene Expression Profiling/methods , Breast Neoplasms/genetics , Gene Regulatory Networks
16.
BMC Bioinformatics ; 8: 283, 2007 Aug 03.
Article in English | MEDLINE | ID: mdl-17683565

ABSTRACT

BACKGROUND: Transcriptional modules (TM) consist of groups of co-regulated genes and transcription factors (TF) regulating their expression. Two high-throughput (HT) experimental technologies, gene expression microarrays and Chromatin Immuno-Precipitation on Chip (ChIP-chip), are capable of producing data informative about expression regulatory mechanism on a genome scale. The optimal approach to joint modeling of data generated by these two complementary biological assays, with the goal of identifying and characterizing TMs, is an important open problem in computational biomedicine. RESULTS: We developed and validated a novel probabilistic model and related computational procedure for identifying TMs by jointly modeling gene expression and ChIP-chip binding data. We demonstrate an improved functional coherence of the TMs produced by the new method when compared to either analyzing expression or ChIP-chip data separately or to alternative approaches for joint analysis. We also demonstrate the ability of the new algorithm to identify novel regulatory relationships not revealed by ChIP-chip data alone. The new computational procedure can be used in more or less the same way as one would use simple hierarchical clustering without performing any special transformation of data prior to the analysis. The R and C-source code for implementing our algorithm is incorporated within the R package gimmR which is freely available at http://eh3.uc.edu/gimm. CONCLUSION: Our results indicate that, whenever available, ChIP-chip and expression data should be analyzed within the unified probabilistic modeling framework, which will likely result in improved clusters of co-regulated genes and improved ability to detect meaningful regulatory relationships. Given the good statistical properties and the ease of use, the new computational procedure offers a worthy new tool for reconstructing transcriptional regulatory networks.


Subject(s)
Algorithms , Chromatin Immunoprecipitation/methods , Chromosome Mapping/methods , Gene Expression Profiling/methods , Models, Genetic , Oligonucleotide Array Sequence Analysis/methods , Sequence Analysis, DNA/methods , Transcription Factors/genetics , Artificial Intelligence , Bayes Theorem , Databases, Genetic , Information Storage and Retrieval/methods
17.
BMC Bioinformatics ; 7: 538, 2006 Dec 19.
Article in English | MEDLINE | ID: mdl-17177995

ABSTRACT

BACKGROUND: The small sample sizes often used for microarray experiments result in poor estimates of variance if each gene is considered independently. Yet accurately estimating variability of gene expression measurements in microarray experiments is essential for correctly identifying differentially expressed genes. Several recently developed methods for testing differential expression of genes utilize hierarchical Bayesian models to "pool" information from multiple genes. We have developed a statistical testing procedure that further improves upon current methods by incorporating the well-documented relationship between the absolute gene expression level and the variance of gene expression measurements into the general empirical Bayes framework. RESULTS: We present a novel Bayesian moderated-T, which we show to perform favorably in simulations, with two real, dual-channel microarray experiments and in two controlled single-channel experiments. In simulations, the new method achieved greater power while correctly estimating the true proportion of false positives, and in the analysis of two publicly-available "spike-in" experiments, the new method performed favorably compared to all tested alternatives. We also applied our method to two experimental datasets and discuss the additional biological insights as revealed by our method in contrast to the others. The R-source code for implementing our algorithm is freely available at http://eh3.uc.edu/ibmt. CONCLUSION: We use a Bayesian hierarchical normal model to define a novel Intensity-Based Moderated T-statistic (IBMT). The method is completely data-dependent using empirical Bayes philosophy to estimate hyperparameters, and thus does not require specification of any free parameters. IBMT has the strength of balancing two important factors in the analysis of microarray data: the degree of independence of variances relative to the degree of identity (i.e. t-tests vs. equal variance assumption), and the relationship between variance and signal intensity. When this variance-intensity relationship is weak or does not exist, IBMT reduces to a previously described moderated t-statistic. Furthermore, our method may be directly applied to any array platform and experimental design. Together, these properties show IBMT to be a valuable option in the analysis of virtually any microarray experiment.


Subject(s)
Bayes Theorem , Computer Simulation , Gene Expression Profiling/methods , Models, Genetic , Oligonucleotide Array Sequence Analysis/methods , Animals , Aquaporin 5/antagonists & inhibitors , Aquaporin 5/biosynthesis , Aquaporin 5/genetics , Basic Helix-Loop-Helix Transcription Factors , Cells, Cultured , Computer Simulation/statistics & numerical data , Female , Fibroblast Growth Factor 2/biosynthesis , Fibroblast Growth Factor 2/genetics , Fibroblast Growth Factor 2/physiology , Gene Expression Profiling/statistics & numerical data , Mice , Mice, Knockout , Nickel/toxicity , Oligonucleotide Array Sequence Analysis/statistics & numerical data , Receptors, Aryl Hydrocarbon/biosynthesis , Receptors, Aryl Hydrocarbon/deficiency , Receptors, Aryl Hydrocarbon/physiology , Signal Transduction/genetics , Transforming Growth Factor beta/physiology
18.
Environ Sci Technol ; 39(11): 4166-71, 2005 Jun 01.
Article in English | MEDLINE | ID: mdl-15984796

ABSTRACT

Numerous studies have demonstrated the efficiency of ultraviolet (UV) radiation for the inactivation of oocysts of Cryptosporidium parvum. In these studies inactivation is measured as reduction in oocysts. A primary goal is to estimate the UV radiation required to achieve a high degree of inactivation. Different lots of Cryptosporidium parvum oocysts are used in these studies, and the inactivation rate may vary depending on the lot of oocysts used. The goal of this paper is to account for the error in estimating the amount of inactivation after exposure to UV radiation, and for the effect of lot variability in determining the required UV radiation. A Bayesian approach is used to simultaneously model the logistic dose-response model and the UV inactivation kinetic model. The oocysts lot variability is incorporated using a hierarchical Bayesian model. Posterior distributions using Markov Chain Monte Carlo method is used to obtain estimates and Bayesian credible interval for the required UV radiation to achieve a given inactivation level of Cryptosporidium parvum oocysts.


Subject(s)
Cryptosporidium parvum/radiation effects , Oocysts/radiation effects , Ultraviolet Rays , Water Purification/methods , Animals , Cryptosporidium parvum/growth & development , Dose-Response Relationship, Radiation , Kinetics , Models, Biological , Monte Carlo Method , Oocysts/growth & development
19.
Stat Med ; 23(17): 2713-28, 2004 Sep 15.
Article in English | MEDLINE | ID: mdl-15316954

ABSTRACT

The concept of mediation has broad applications in medical health studies. Although the statistical assessment of a mediational effect under the normal assumption has been well established in linear structural equation models (SEM), it has not been extended to the general case where normality is not a usual assumption. In this paper, we propose to extend the definition of mediational effects through causal inference. The new definition is consistent with that in linear SEM and does not rely on the assumption of normality. Here, we focus our attention on the logistic mediation model, where all variables involved are binary. Three approaches to the estimation of mediational effects-Delta method, bootstrap, and Bayesian modelling via Monte Carlo simulation are investigated. Simulation studies are used to examine the behaviour of the three approaches. Measured by 95 per cent confidence interval (CI) coverage rate and root mean square error (RMSE) criteria, it was found that the Bayesian method using a non-informative prior outperformed both bootstrap and the Delta methods, particularly for small sample sizes. Case studies are presented to demonstrate the application of the proposed method to public health research using a nationally representative database. Extending the proposed method to other types of mediational model and to multiple mediators are also discussed.


Subject(s)
Bayes Theorem , Data Interpretation, Statistical , Logistic Models , Adolescent , Computer Simulation , Depression/psychology , Female , Humans , Monte Carlo Method , Public Health , Social Class
20.
Bioinformatics ; 18(9): 1194-206, 2002 Sep.
Article in English | MEDLINE | ID: mdl-12217911

ABSTRACT

MOTIVATION: The biologic significance of results obtained through cluster analyses of gene expression data generated in microarray experiments have been demonstrated in many studies. In this article we focus on the development of a clustering procedure based on the concept of Bayesian model-averaging and a precise statistical model of expression data. RESULTS: We developed a clustering procedure based on the Bayesian infinite mixture model and applied it to clustering gene expression profiles. Clusters of genes with similar expression patterns are identified from the posterior distribution of clusterings defined implicitly by the stochastic data-generation model. The posterior distribution of clusterings is estimated by a Gibbs sampler. We summarized the posterior distribution of clusterings by calculating posterior pairwise probabilities of co-expression and used the complete linkage principle to create clusters. This approach has several advantages over usual clustering procedures. The analysis allows for incorporation of a reasonable probabilistic model for generating data. The method does not require specifying the number of clusters and resulting optimal clustering is obtained by averaging over models with all possible numbers of clusters. Expression profiles that are not similar to any other profile are automatically detected, the method incorporates experimental replicates, and it can be extended to accommodate missing data. This approach represents a qualitative shift in the model-based cluster analysis of expression data because it allows for incorporation of uncertainties involved in the model selection in the final assessment of confidence in similarities of expression profiles. We also demonstrated the importance of incorporating the information on experimental variability into the clustering model. AVAILABILITY: The MS Windows(TM) based program implementing the Gibbs sampler and supplemental material is available at http://homepages.uc.edu/~medvedm/BioinformaticsSupplement.htm CONTACT: medvedm@email.uc.edu


Subject(s)
Bayes Theorem , Cluster Analysis , Gene Expression Profiling/methods , Models, Genetic , Models, Statistical , Oligonucleotide Array Sequence Analysis/methods , Computer Simulation , Gene Expression Regulation, Fungal/genetics , Gene Expression Regulation, Fungal/physiology , Mitosis/genetics , Mitosis/physiology , Normal Distribution , Reproducibility of Results , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/physiology , Sensitivity and Specificity , Sequence Analysis, DNA/methods , Stochastic Processes
SELECTION OF CITATIONS
SEARCH DETAIL
...