ABSTRACT
Dissolution studies are a fundamental component of pharmaceutical drug development, yet many studies rely upon the f 1 and f 2 model-independent approach that is not capable of accounting for uncertainty in parameter estimation when comparing dissolution profiles. In this paper, we deal with the issue of uncertainty quantification by proposing several model-dependent approaches for assessing the similarity of two dissolution profiles. We take a statistical modeling approach and allow the dissolution data to be modeled using either a Dirichlet distribution, gamma process model, or Wiener process model. These parametric forms are shown to be reasonable assumptions that are capable of modeling dissolution data well. Furthermore, based on a given statistical model, we are able to use the f 1 difference factor and f 2 similarity factor to test the equivalency of two dissolution profiles via bootstrap confidence intervals. Illustrations highlighting the success of our methods are provided for both Monte Carlo simulation studies, and real dissolution data sets.
Subject(s)
Drug Development , Models, Statistical , Humans , Solubility , Computer Simulation , Monte Carlo MethodABSTRACT
There has been a considerable amount of literature on binomial regression models that utilize well-known link functions, such as logistic, probit, and complementary log-log functions. The conventional binomial model is focused only on a single parameter representing one probability of success. However, we often encounter data for which two different success probabilities are of interest simultaneously. For instance, there are several offensive measures in baseball to predict the future performance of batters. Under these circumstances, it would be meaningful to consider more than one success probability. In this article, we employ a bivariate binomial distribution that possesses two success probabilities to conduct a regression analysis with random effects being incorporated under a Bayesian framework. Major League Baseball data are analyzed to demonstrate our methodologies. Extensive simulation studies are conducted to investigate model performances.
ABSTRACT
In geometry and topology, a family of probability distributions can be analyzed as the points on a manifold, known as statistical manifold, with intrinsic coordinates corresponding to the parameters of the distribution. Consider the exponential family of distributions with progressive Type-II censoring as the manifold of a statistical model, we use the information geometry methods to investigate the geometric quantities such as the tangent space, the Fisher metric tensors, the affine connection and the α-connection of the manifold. As an application of the geometric quantities, the asymptotic expansions of the posterior density function and the posterior Bayesian predictive density function of the manifold are discussed. The results show that the asymptotic expansions are related to the coefficients of the α-connections and metric tensors, and the predictive density function is the estimated density function in an asymptotic sense. The main results are illustrated by considering the Rayleigh distribution.
ABSTRACT
In this paper, E-Bayesian estimation of the scale parameter, reliability and hazard rate functions of Chen distribution are considered when a sample is obtained from a type-I censoring scheme. The E-Bayesian estimators are obtained based on the balanced squared error loss function and using the gamma distribution as a conjugate prior for the unknown scale parameter. Also, the E-Bayesian estimators are derived using three different distributions for the hyper-parameters. Some properties of E-Bayesian estimators based on balanced squared error loss function are discussed. A simulation study is performed to compare the efficiencies of different estimators in terms of minimum mean squared errors. Finally, a real data set is analyzed to illustrate the applicability of the proposed estimators.
ABSTRACT
In this paper, we propose a stochastic gamma process model for assessing the similarity of two dissolution profiles. Based on the proposed stochastic model, we utilize the difference factor and similarity factor to test the similarity of two dissolution profiles based on bootstrap confidence intervals. The performances of the proposed methods are compared with a multivariate test procedure via Monte Carlo simulation studies. The proposed testing methods are shown to be powerful and effectively controlling the error rate. The proposed model provides a simple yet useful alternative parametric statistical model for assessing the similarity of two dissolution profiles. All the methods are illustrated with a numerical example.
Subject(s)
Chemistry, Pharmaceutical/methods , Solubility , Algorithms , Computer Simulation , Drug Industry/methods , Monte Carlo Method , Reproducibility of Results , Stochastic ProcessesABSTRACT
Obstructive sleep apnea (OSA) is an underestimated and overlooked comorbidity in head and neck cancer (HNC) care. Refining HNC-OSA management requires an improved grasp of the HNC-OSA relationship. Thus, this paper reviews the current course of HNC therapy, causal and associative relationships before and after treatment, and statistical methods quantifying HNC-OSA interactions. This evaluation serves a dual purpose: to support oncologists and sleep physicians in improving the treatment outcomes of patients undergoing HNC treatment by considering OSA as a comorbidity and to assist researchers in selecting suitable analytical models for investigating the correlation between OSA and HNC. The investigation confirms a positive correlation between the apnea-hypopnea index (AHI) and primary tumor size, consistent with prior findings. Case studies also reported new evidence of lipoma and head-neck tumors triggering OSA, and sleep apnea surgery prompting tumor development. This paper provides an overview of existing statistical models and offers suggestions for model selection and a framework for designing experiments that delve into research questions surrounding the link between OSA and HNC across various stages of cancer treatment. Despite progress, understanding the HNC-OSA interplay remains incomplete due to limited histological, molecular, and clinical data. Future studies with longitudinal data are crucial for comprehensive insights.
ABSTRACT
In genetic association studies, due to the varying underlying genetic models, no single statistical test can be the most powerful test under all situations. Current studies show that if the underlying genetic models are known, trend-based tests, which outperform the classical Pearson χĀ² test, can be constructed. However, when the underlying genetic models are unknown, the χĀ² test is usually more robust than trend-based tests. In this paper, we propose a new association test based on a generalized genetic model, namely the generalized order-restricted relative risks model. Through a Monte Carlo simulation study, we show that the proposed association test is generally more powerful than the χĀ² test, and more robust than those trend-based tests. The proposed methodologies are also illustrated by some real SNP datasets.
Subject(s)
Genome-Wide Association Study , Models, Genetic , Algorithms , Case-Control Studies , Computer Simulation , Humans , Polymorphism, Single NucleotideABSTRACT
The aim of this study was to validate the single nucleotide polymorphisms (SNPs) of four candidate genes (TCF7L2, HHEX, KCNJ11, and ADIPOQ) related to type 2 diabetes (T2D) in an endogamous population of north India; the Aggarwal population, having 18-clans. This endogamous population model was heavily supported by recent land mark work and we also verified the homogeneity of this population by clan-based stratification analysis. Two SNPs (rs4506565; rs7903146) in TCF7L2 were found to be significant (p-value = 0.00191; p-value = 0.00179, respectively), and odds ratios of 2.1 (dominant-model) and 2.0 (recessive-model) respectively, were obtained for this population. The TTT haplotype in the TCF7L2 gene was significantly associated with T2D. Waist-Hip ratio (WHR), systolic blood pressure (SBP), and age were significant covariates for increasing risk of T2D. Single-SNP, combined-SNPs and haplotype analysis provides clear evidence that the causal mutation is near to or within the significant haplotype (TTT) of the TCF7L2 gene. In spite of a culturally-learned sedentary lifestyle and fat-enriched dietary habits, WHR rather than body-mass-index emerged as a robust predictor of risk for T2D in this population.
Subject(s)
Diabetes Mellitus, Type 2/genetics , Homeodomain Proteins/genetics , Polymorphism, Single Nucleotide , Potassium Channels, Inwardly Rectifying/genetics , TCF Transcription Factors/genetics , Transcription Factors/genetics , Adiponectin/genetics , Adult , Age Factors , Aged , Blood Glucose/genetics , Body Mass Index , Ethnicity/genetics , Female , Humans , India , Male , Middle Aged , Risk , Transcription Factor 7-Like 2 Protein , Waist-Hip RatioABSTRACT
The Cochran-Armitage trend test (CATT) is well suited for testing association between a marker and a disease in case-control studies. When the underlying genetic model for the disease is known, the CATT optimal for the genetic model is used. For complex diseases, however, the genetic models of the true disease loci are unknown. In this situation, robust tests are preferable. We propose a two-phase analysis with model selection for the case-control design. In the first phase, we use the difference of Hardy-Weinberg disequilibrium coefficients between the cases and the controls for model selection. Then, an optimal CATT corresponding to the selected model is used for testing association. The correlation of the statistics used for selection and the test for association is derived to adjust the two-phase analysis with control of the Type-I error rate. The simulation studies show that this new approach has greater efficiency robustness than the existing methods.
Subject(s)
Case-Control Studies , Genetic Predisposition to Disease/epidemiology , Models, Genetic , Models, Statistical , Polymorphism, Single Nucleotide , Algorithms , Alleles , Computer Simulation , Gene Frequency , Genetic Predisposition to Disease/classification , Genetics, Population/methods , Genotype , Humans , Inheritance Patterns , Linkage Disequilibrium , Polymorphism, Single Nucleotide/physiology , Research Design/statistics & numerical data , Risk Assessment/methods , Selection BiasABSTRACT
PURPOSE: In clinical studies, patients may experience several types of events during follow up under the competing risks (CR) framework. Patients are often classified into low- and high-risk groups based on prognostic factors. We propose a method to determine an optimal cutpoint value for prognostic factors on censored outcomes in the presence of CR. MATERIALS AND METHODS: We applied our method to data collected in a study of lung cancer patients. From September 1, 1991 to December 31, 2005, 758 lung cancer patients received tumor removal surgery at Samsung Medical Center in Korea. The proposed statistic converges in distribution to that of the supremum of a standardized Brownian bridge. To overcome the conservativeness of the test based on an approximation of the asymptotic distribution, we also propose a permutation test based on permuted samples. RESULTS: Most cases considered in our simulation studies showed that the permutation-based test satisfied a significance level of 0.05, while the approximation-based test was very conservative: the powers of the former were larger than those of the latter. The optimal cutpoint value for tumor size (unit: cm) prior to surgery for classifying patients into two groups (low and high risks for relapse) was found to be 1.8, with decent significance reflected as p values less than 0.001. CONCLUSION: The cutpoint estimator based on the maximally selected linear rank statistic was reasonable in terms of bias and standard deviation in the CR framework. The permutation-based test well satisfied type I error probability and provided higher power than the approximation-based test.
Subject(s)
Lung Neoplasms/pathology , Statistics as Topic , Tumor Burden , Computer Simulation , Data Analysis , Humans , Male , Risk FactorsABSTRACT
In this paper we compare the properties of four different general approaches for testing the ratio of two Poisson rates. Asymptotically normal tests, tests based on approximate p -values, exact conditional tests, and a likelihood ratio test are considered. The properties and power performance of these tests are studied by a Monte Carlo simulation experiment. Sample size calculation formulae are given for each of the test procedures and their validities are studied. Some recommendations favoring the likelihood ratio and certain asymptotic tests are based on these simulation results. Finally, all of the test procedures are illustrated with two real life medical examples.
Subject(s)
Data Interpretation, Statistical , Poisson Distribution , Sample Size , Breast Neoplasms/epidemiology , Computer Simulation , Epidemiologic Methods , Humans , Likelihood Functions , Monte Carlo MethodABSTRACT
In the past decade, hundreds of genome-wide association studies have been conducted to detect the significant single-nucleotide polymorphisms that are associated with certain diseases. However, most of the data from the X chromosome were not analyzed and only a few significant associated single-nucleotide polymorphisms from the X chromosome have been identified from genome-wide association studies. This is mainly due to the lack of powerful statistical tests. In this paper, we propose a novel statistical approach that combines the information of single-nucleotide polymorphisms on the X chromosome from both males and females in an efficient way. The proposed approach avoids the need of making strong assumptions about the underlying genetic models. Our proposed statistical test is a robust method that only makes the assumption that the risk allele is the same for both females and males if the single-nucleotide polymorphism is associated with the disease for both genders. Through simulation study and a real data application, we show that the proposed procedure is robust and have excellent performance compared to existing methods. We expect that many more associated single-nucleotide polymorphisms on the X chromosome will be identified if the proposed approach is applied to current available genome-wide association studies data.
Subject(s)
Chromosomes, Human, X/genetics , Genome-Wide Association Study/methods , Polymorphism, Single Nucleotide , Alleles , Alzheimer Disease/enzymology , Alzheimer Disease/genetics , Biostatistics/methods , Case-Control Studies , Computer Simulation , DNA Methylation/genetics , Female , Genetic Markers , Genome-Wide Association Study/statistics & numerical data , Genotype , Humans , Male , Models, Genetic , Ornithine Carbamoyltransferase/genetics , Promoter Regions, GeneticABSTRACT
The statistical analysis of genome-wide association studies (GWASs) with multiple diseases and shared controls (SCs) is discussed. The usual method for analyzing data from these studies is to compare each individual disease with either the SCs or the pooled controls which include other diseases. We observed that applying individual association tests can be problematic because these tests may suffer from power loss in detecting significant associations between diseases and single-nucleotide polymorphism or copy number variant. We propose here a two-stage procedure wherein we first apply an overall chi-square test for multiple diseases with SCs; if the overall test is rejected, then individual tests using the chi-square partition method will be applied to each disease against SCs. A real GWAS data set with SCs and a Monte Carlo simulation study are used to demonstrate that the proposed method is more effective and preferable than other existing methods for analyzing data from GWASs with multiple diseases and SCs.
Subject(s)
Genome-Wide Association Study/methods , Breast Neoplasms/genetics , Case-Control Studies , Chi-Square Distribution , DNA Copy Number Variations , Genome-Wide Association Study/standards , Humans , Major Histocompatibility Complex/genetics , Monte Carlo Method , Multiple Sclerosis/genetics , Polymorphism, Single Nucleotide , Spondylitis, Ankylosing/genetics , Thyroiditis, Autoimmune/genetics , United KingdomABSTRACT
Parkinson's disease (PD) is a neurodegenerative disease with the absence of markers for diagnosis. Several studies on PD reported the elements imbalance in biofluids as biomarkers. However, their results remained inconclusive. This study integrates metallomics, multivariate and artificial neural network (ANN) to understand element variations in CSF and serum of PD patients from the largest cohort of Indian population to solve the inconsistent results of previous studies. Also, this study is aimed to (1) ascertain a common element signature between CSF and serum. (2) Assess cross sectional element variation with clinical symptoms. (3) Develop ANN models for rapid diagnosis. A metallomic profile of 110 CSF and 530 serum samples showed significant variations in 10 elements of CSF and six in serum of patients compared to controls. Consistent variations in elements pattern were noticed for Calcium, Magnesium and Iron in both the fluids of PD, which provides feasible diagnosis from serum. Furthermore, implementing multivariate analyses showed clear classification between normal and PD in both the fluids. Also, ANN provides 99% accuracy in detection of disease from CSF and serum. Overall, our analyses demonstrate that elements profile in biofluids of PD will be useful in development of diagnostic markers for PD.
Subject(s)
Calcium/blood , Calcium/cerebrospinal fluid , Iron/blood , Iron/cerebrospinal fluid , Magnesium/blood , Magnesium/cerebrospinal fluid , Parkinson Disease/diagnosis , Biomarkers/blood , Biomarkers/cerebrospinal fluid , Cross-Sectional Studies , Female , Humans , India , Male , Middle Aged , Multivariate Analysis , Parkinson Disease/blood , Parkinson Disease/cerebrospinal fluid , Parkinson Disease/drug therapy , Spectrophotometry, Atomic , Trace Elements/blood , Trace Elements/cerebrospinal fluidABSTRACT
In genome-wide association studies (GWAS), multiple diseases with shared controls is one of the case-control study designs. If data obtained from these studies are appropriately analyzed, this design can have several advantages such as improving statistical power in detecting associations and reducing the time and cost in the data collection process. In this paper, we propose a study design for GWAS which involves multiple diseases but without controls. We also propose corresponding statistical data analysis strategy for GWAS with multiple diseases but no controls. Through a simulation study, we show that the statistical association test with the proposed study design is more powerful than the test with single disease sharing common controls, and it has comparable power to the overall test based on the whole dataset including the controls. We also apply the proposed method to a real GWAS dataset to illustrate the methodologies and the advantages of the proposed design. Some possible limitations of this study design and testing method and their solutions are also discussed. Our findings indicate that the proposed study design and statistical analysis strategy could be more efficient than the usual case-control GWAS as well as those with shared controls.
Subject(s)
Genetic Predisposition to Disease/genetics , Genome, Human/genetics , Genome-Wide Association Study/methods , Polymorphism, Single Nucleotide , Algorithms , Case-Control Studies , Chi-Square Distribution , Computer Simulation , Control Groups , Genome-Wide Association Study/statistics & numerical data , Humans , Monte Carlo Method , Research DesignABSTRACT
BACKGROUND: For RNA-seq data, the aggregated counts of the short reads from the same gene is used to approximate the gene expression level. The count data can be modelled as samples from Poisson distributions with possible different parameters. To detect differentially expressed genes under two situations, statistical methods for detecting the difference of two Poisson means are used. When the expression level of a gene is low, i.e., the number of count is small, it is usually more difficult to detect the mean differences, and therefore statistical methods which are more powerful for low expression level are particularly desirable. In statistical literature, several methods have been proposed to compare two Poisson means (rates). In this paper, we compare these methods by using simulated and real RNA-seq data. RESULTS: Through simulation study and real data analysis, we find that the Wald test with the data being log-transformed is more powerful than other methods, including the likelihood ratio test, which has similar power as the variance stabilizing transformation test; both are more powerful than the conditional exact test and Fisher exact test. CONCLUSIONS: When the count data in RNA-seq can be reasonably modelled as Poisson distribution, the Wald-Log test is more powerful and should be used to detect the differentially expressed genes.
Subject(s)
Computational Biology/methods , Gene Expression Regulation , Sequence Analysis, RNA/methods , Models, Statistical , Poisson DistributionABSTRACT
Using molecular genetic data on Aggarwals (Vaish/Vysya), an endogamous population group of north India, we provide evidence of its homogeneous unstratified population structure. We found the mean average heterozygosity value of 0.33 for 14 single nucleotide polymorphisms belonging to four genes (TCF7L2-, HHEX-, KCNJ11-, and ADIPOQ-) in the Aggarwal population (sample of 184 individuals) and tried to evaluate the genomic efficiency of endogamy in this population with the help of clan-based stratified analysis. We concluded that the sociocultural identity of the endogamous population groups could act as a robust proxy maker for inferring their homogeneity and population structure in India, which is ideal also for population selection for future genome-wide association studies in the country.
Subject(s)
Adiponectin/genetics , Homeodomain Proteins/genetics , Polymorphism, Genetic , Population/genetics , Potassium Channels, Inwardly Rectifying/genetics , Transcription Factor 7-Like 2 Protein/genetics , Transcription Factors/genetics , Female , Genetic Markers , Humans , India/ethnology , Male , MarriageABSTRACT
In this paper, we investigate different procedures for testing the equality of two mean survival times in paired lifetime studies. We consider Owen's M-test and Q-test, a likelihood ratio test, the paired t-test, the Wilcoxon signed rank test and a permutation test based on log-transformed survival times in the comparative study. We also consider the paired t-test, the Wilcoxon signed rank test and a permutation test based on original survival times for the sake of comparison. The size and power characteristics of these tests are studied by means of Monte Carlo simulations under a frailty Weibull model. For less skewed marginal distributions, the Wilcoxon signed rank test based on original survival times is found to be desirable. Otherwise, the M-test and the likelihood ratio test are the best choices in terms of power. In general, one can choose a test procedure based on information about the correlation between the two survival times and the skewness of the marginal survival distributions.
Subject(s)
Models, Statistical , Survival Analysis , Biometry , Humans , Life Tables , Likelihood Functions , Monte Carlo MethodABSTRACT
In this article, we investigate procedures for comparing two independent Poisson variates that are observed over unequal sampling frames (i.e. time intervals, populations, areas or any combination thereof). We consider two statistics (with and without the logarithmic transformation) for testing the equality of two Poisson rates. Two methods for implementing these statistics are reviewed. They are (1) the sample-based method, and (2) the constrained maximum likelihood estimation (CMLE) method. We conduct an empirical study to evaluate the performance of different statistics and methods. Generally, we find that the CMLE method works satisfactorily only for the statistic without the logarithmic transformation (denoted as W(2)) while sample-based method performs better for the statistic using the logarithmic transformation (denoted as W(3)). It is noteworthy that both statistics perform well for moderate to large Poisson rates (e.g. > or =10). For small Poisson rates (e.g. <10), W(2) can be liberal (e.g. actual type I error rate/nominal level > or =1.2) while W(3) can be conservative (e.g. actual type I error rate/nominal level < or =0.8). The corresponding sample size formulae are provided and valid in the sense that the simulated powers associated with the approximate sample size formulae are generally close to the pre-chosen power level. We illustrate our methodologies with a real example from a breast cancer study.