Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 95
Filtrar
1.
Genet Epidemiol ; 47(5): 379-393, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37042632

RESUMO

Variation in RNA-Seq data creates modeling challenges for differential gene expression (DE) analysis. Statistical approaches address conventional small sample sizes and implement empirical Bayes or non-parametric tests, but frequently produce different conclusions. Increasing sample sizes enable proposal of alternative DE paradigms. Here we develop RoPE, which uses a data-driven adjustment for variation and a robust profile likelihood ratio DE test. Simulation studies show RoPE can have improved performance over existing tools as sample size increases and has the most reliable control of error rates. Application of RoPE demonstrates that an active Pseudomonas aeruginosa infection downregulates the SLC9A3 Cystic Fibrosis modifier gene.


Assuntos
Perfilação da Expressão Gênica , Modelos Genéticos , Humanos , Funções Verossimilhança , Perfilação da Expressão Gênica/métodos , Teorema de Bayes , Simulação por Computador
2.
Biometrics ; 80(2)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38768225

RESUMO

Conventional supervised learning usually operates under the premise that data are collected from the same underlying population. However, challenges may arise when integrating new data from different populations, resulting in a phenomenon known as dataset shift. This paper focuses on prior probability shift, where the distribution of the outcome varies across datasets but the conditional distribution of features given the outcome remains the same. To tackle the challenges posed by such shift, we propose an estimation algorithm that can efficiently combine information from multiple sources. Unlike existing methods that are restricted to discrete outcomes, the proposed approach accommodates both discrete and continuous outcomes. It also handles high-dimensional covariate vectors through variable selection using an adaptive least absolute shrinkage and selection operator penalty, producing efficient estimates that possess the oracle property. Moreover, a novel semiparametric likelihood ratio test is proposed to check the validity of prior probability shift assumptions by embedding the null conditional density function into Neyman's smooth alternatives (Neyman, 1937) and testing study-specific parameters. We demonstrate the effectiveness of our proposed method through extensive simulations and a real data example. The proposed methods serve as a useful addition to the repertoire of tools for dealing dataset shifts.


Assuntos
Algoritmos , Simulação por Computador , Modelos Estatísticos , Probabilidade , Humanos , Funções Verossimilhança , Biometria/métodos , Interpretação Estatística de Dados , Aprendizado de Máquina Supervisionado
3.
Bull Math Biol ; 86(7): 80, 2024 May 27.
Artigo em Inglês | MEDLINE | ID: mdl-38801489

RESUMO

Many commonly used mathematical models in the field of mathematical biology involve challenges of parameter non-identifiability. Practical non-identifiability, where the quality and quantity of data does not provide sufficiently precise parameter estimates is often encountered, even with relatively simple models. In particular, the situation where some parameters are identifiable and others are not is often encountered. In this work we apply a recent likelihood-based workflow, called Profile-Wise Analysis (PWA), to non-identifiable models for the first time. The PWA workflow addresses identifiability, parameter estimation, and prediction in a unified framework that is simple to implement and interpret. Previous implementations of the workflow have dealt with idealised identifiable problems only. In this study we illustrate how the PWA workflow can be applied to both structurally non-identifiable and practically non-identifiable models in the context of simple population growth models. Dealing with simple mathematical models allows us to present the PWA workflow in a didactic, self-contained document that can be studied together with relatively straightforward Julia code provided on GitHub . Working with simple mathematical models allows the PWA workflow prediction intervals to be compared with gold standard full likelihood prediction intervals. Together, our examples illustrate how the PWA workflow provides us with a systematic way of dealing with non-identifiability, especially compared to other approaches, such as seeking ad hoc parameter combinations, or simply setting parameter values to some arbitrary default value. Importantly, we show that the PWA workflow provides insight into the commonly-encountered situation where some parameters are identifiable and others are not, allowing us to explore how uncertainty in some parameters, and combinations of parameters, regardless of their identifiability status, influences model predictions in a way that is insightful and interpretable.


Assuntos
Conceitos Matemáticos , Modelos Biológicos , Humanos , Funções Verossimilhança , Simulação por Computador , Dinâmica Populacional/estatística & dados numéricos , Fluxo de Trabalho , Algoritmos
4.
Bull Math Biol ; 86(6): 70, 2024 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-38717656

RESUMO

Practical limitations of quality and quantity of data can limit the precision of parameter identification in mathematical models. Model-based experimental design approaches have been developed to minimise parameter uncertainty, but the majority of these approaches have relied on first-order approximations of model sensitivity at a local point in parameter space. Practical identifiability approaches such as profile-likelihood have shown potential for quantifying parameter uncertainty beyond linear approximations. This research presents a genetic algorithm approach to optimise sample timing across various parameterisations of a demonstrative PK-PD model with the goal of aiding experimental design. The optimisation relies on a chosen metric of parameter uncertainty that is based on the profile-likelihood method. Additionally, the approach considers cases where multiple parameter scenarios may require simultaneous optimisation. The genetic algorithm approach was able to locate near-optimal sampling protocols for a wide range of sample number (n = 3-20), and it reduced the parameter variance metric by 33-37% on average. The profile-likelihood metric also correlated well with an existing Monte Carlo-based metric (with a worst-case r > 0.89), while reducing computational cost by an order of magnitude. The combination of the new profile-likelihood metric and the genetic algorithm demonstrate the feasibility of considering the nonlinear nature of models in optimal experimental design at a reasonable computational cost. The outputs of such a process could allow for experimenters to either improve parameter certainty given a fixed number of samples, or reduce sample quantity while retaining the same level of parameter certainty.


Assuntos
Algoritmos , Simulação por Computador , Conceitos Matemáticos , Modelos Biológicos , Método de Monte Carlo , Funções Verossimilhança , Humanos , Relação Dose-Resposta a Droga , Projetos de Pesquisa/estatística & dados numéricos , Modelos Genéticos , Incerteza
5.
Bull Math Biol ; 86(4): 36, 2024 03 02.
Artigo em Inglês | MEDLINE | ID: mdl-38430382

RESUMO

Identifying unique parameters for mathematical models describing biological data can be challenging and often impossible. Parameter identifiability for partial differential equations models in cell biology is especially difficult given that many established in vivo measurements of protein dynamics average out the spatial dimensions. Here, we are motivated by recent experiments on the binding dynamics of the RNA-binding protein PTBP3 in RNP granules of frog oocytes based on fluorescence recovery after photobleaching (FRAP) measurements. FRAP is a widely-used experimental technique for probing protein dynamics in living cells, and is often modeled using simple reaction-diffusion models of the protein dynamics. We show that current methods of structural and practical parameter identifiability provide limited insights into identifiability of kinetic parameters for these PDE models and spatially-averaged FRAP data. We thus propose a pipeline for assessing parameter identifiability and for learning parameter combinations based on re-parametrization and profile likelihoods analysis. We show that this method is able to recover parameter combinations for synthetic FRAP datasets and investigate its application to real experimental data.


Assuntos
Conceitos Matemáticos , Modelos Biológicos , Recuperação de Fluorescência Após Fotodegradação , Modelos Teóricos , Difusão
6.
Bull Math Biol ; 86(4): 40, 2024 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-38489047

RESUMO

Use of nonlinear statistical methods and models are ubiquitous in scientific research. However, these methods may not be fully understood, and as demonstrated here, commonly-reported parameter p-values and confidence intervals may be inaccurate. The gentle introduction to nonlinear regression modelling and comprehensive illustrations given here provides applied researchers with the needed overview and tools to appreciate the nuances and breadth of these important methods. Since these methods build upon topics covered in first and second courses in applied statistics and predictive modelling, the target audience includes practitioners and students alike. To guide practitioners, we summarize, illustrate, develop, and extend nonlinear modelling methods, and underscore caveats of Wald statistics using basic illustrations and give key reasons for preferring likelihood methods. Parameter profiling in multiparameter models and exact or near-exact versus approximate likelihood methods are discussed and curvature measures are connected with the failure of the Wald approximations regularly used in statistical software. The discussion in the main paper has been kept at an introductory level and it can be covered on a first reading; additional details given in the Appendices can be worked through upon further study. The associated online Supplementary Information also provides the data and R computer code which can be easily adapted to aid researchers to fit nonlinear models to their data.


Assuntos
Modelos Biológicos , Dinâmica não Linear , Humanos , Simulação por Computador , Conceitos Matemáticos , Funções Verossimilhança , Modelos Estatísticos
7.
Hum Hered ; 88(1): 38-49, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37100044

RESUMO

INTRODUCTION: The case-mother-control-mother design allows to study fetal and maternal genetic factors together with environmental exposures on early life outcomes. Mendelian constraints and conditional independence between child genotype and environmental factors enabled semiparametric likelihood methods to estimate logistic models with greater efficiency than standard logistic regression. Difficulties in child genotype collection require methods handling missing child genotype. METHODS: We review a stratified retrospective likelihood and two semiparametric likelihood approaches: a prospective one and a modified retrospective one, the latter either modeling the maternal genotype as a function of covariates or leaving their joint distribution unspecified (robust version). We also review software implementing these modeling alternatives, compare their statistical properties in a simulation study, and illustrate their application, focusing on gene-environment interactions and partially missing child genotype. RESULTS: The robust retrospective likelihood provides generally unbiased estimates, with standard errors only slightly larger than when modeling maternal genotype based on exposure. The prospective likelihood encounters maximization problems. In the application to the association of small-for-gestational-age babies with CYP2E1 and drinking water disinfection by-products, the retrospective likelihood allowed a full array of covariates, while the prospective likelihood was limited to few covariates. CONCLUSION: We recommend the robust version of the modified retrospective likelihood.


Assuntos
Interação Gene-Ambiente , Genótipo , Mães , Software , Criança , Feminino , Humanos , Estudos de Casos e Controles , Funções Verossimilhança , Estudos Prospectivos , Estudos Retrospectivos
8.
Stat Med ; 42(15): 2600-2618, 2023 07 10.
Artigo em Inglês | MEDLINE | ID: mdl-37019798

RESUMO

We propose an improved estimation method for the Box-Cox transformation (BCT) cure rate model parameters. Specifically, we propose a generic maximum likelihood estimation algorithm through a non-linear conjugate gradient (NCG) method with an efficient line search technique. We then apply the proposed NCG algorithm to BCT cure model. Through a detailed simulation study, we compare the model fitting results of the NCG algorithm with those obtained by the existing expectation maximization (EM) algorithm. First, we show that our proposed NCG algorithm allows simultaneous maximization of all model parameters unlike the EM algorithm when the likelihood surface is flat with respect to the BCT index parameter. Then, we show that the NCG algorithm results in smaller bias and noticeably smaller root mean square error of the estimates of the model parameters that are associated with the cure rate. This results in more accurate and precise inference on the cure rate. In addition, we show that when the sample size is large the NCG algorithm, which only needs the computation of the gradient and not the Hessian, takes less CPU time to produce the estimates. These advantages of the NCG algorithm allows us to conclude that the NCG method should be the preferred estimation method over the already existing EM algorithm in the context of BCT cure model. Finally, we apply the NCG algorithm to analyze a well-known melanoma data and show that it results in a better fit when compared to the EM algorithm.


Assuntos
Melanoma , Modelos Estatísticos , Humanos , Funções Verossimilhança , Modelos de Riscos Proporcionais , Melanoma/terapia , Simulação por Computador , Algoritmos
9.
Bull Math Biol ; 86(1): 8, 2023 12 13.
Artigo em Inglês | MEDLINE | ID: mdl-38091169

RESUMO

Co-culture tumour spheroid experiments are routinely performed to investigate cancer progression and test anti-cancer therapies. Therefore, methods to quantitatively characterise and interpret co-culture spheroid growth are of great interest. However, co-culture spheroid growth is complex. Multiple biological processes occur on overlapping timescales and different cell types within the spheroid may have different characteristics, such as differing proliferation rates or responses to nutrient availability. At present there is no standard, widely-accepted mathematical model of such complex spatio-temporal growth processes. Typical approaches to analyse these experiments focus on the late-time temporal evolution of spheroid size and overlook early-time spheroid formation, spheroid structure and geometry. Here, using a range of ordinary differential equation-based mathematical models and parameter estimation, we interpret new co-culture experimental data. We provide new biological insights about spheroid formation, growth, and structure. As part of this analysis we connect Greenspan's seminal mathematical model to co-culture data for the first time. Furthermore, we generalise a class of compartment-based spheroid mathematical models that have previously been restricted to one population so they can be applied to multiple populations. As special cases of the general model, we explore multiple natural two population extensions to Greenspan's seminal model and reveal biological mechanisms that can describe the internal dynamics of growing co-culture spheroids and those that cannot. This mathematical and statistical modelling-based framework is well-suited to analyse spheroids grown with multiple different cell types and the new class of mathematical models provide opportunities for further mathematical and biological insights.


Assuntos
Neoplasias , Esferoides Celulares , Humanos , Técnicas de Cocultura , Esferoides Celulares/patologia , Modelos Biológicos , Conceitos Matemáticos , Neoplasias/patologia , Modelos Teóricos
10.
J Biopharm Stat ; 33(3): 371-385, 2023 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-36533908

RESUMO

For ordered categorical data from randomized clinical trials, the relative effect, the probability that observations in one group tend to be larger, has been considered appropriate for a measure of an effect size. Although the Wilcoxon-Mann-Whitney test is widely used to compare two groups, the null hypothesis is not just the relative effect of 50%, but the identical distribution between groups. The null hypothesis of the Brunner-Munzel test, another rank-based method used for arbitrary types of data, is just the relative effect of 50%. In this study, we compared actual type I error rates (or 1 - coverage probability) of the profile-likelihood-based confidence intervals for the relative effect and other rank-based methods in simulation studies at the relative effect of 50%. The profile-likelihood method, as with the Brunner- Munzel test, does not require any assumptions on distributions. Actual type I error rates of the profile-likelihood method and the Brunner-Munzel test were close to the nominal level in large or medium samples, even under unequal distributions. Those of the Wilcoxon-Mann-Whitney test largely differed from the nominal level under unequal distributions, especially under unequal sample sizes. In small samples, the actual type I error rates of Brunner-Munzel test were slightly larger than the nominal level and those of the profile-likelihood method were even larger. We provide a paradoxical numerical example: only the Wilcoxon-Mann-Whitney test was significant under equal sample sizes, but by changing only the allocation ratio, it was not significant but the profile-likelihood method and the Brunner-Munzel test were significant. This phenomenon might reflect the nature of the Wilcoxon-Mann-Whitney test in the simulation study, that is, the actual type I error rates become over and under the nominal level depending on the allocation ratio.


Assuntos
Modelos Estatísticos , Humanos , Simulação por Computador , Intervalos de Confiança , Funções Verossimilhança , Estatísticas não Paramétricas
11.
Biom J ; 65(8): e2300006, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37394716

RESUMO

We study parametric inference on a rich class of hazard regression models in the presence of right-censoring. Previous literature has reported some inferential challenges, such as multimodal or flat likelihood surfaces, in this class of models for some particular data sets. We formalize the study of these inferential problems by linking them to the concepts of near-redundancy and practical nonidentifiability of parameters. We show that the maximum likelihood estimators of the parameters in this class of models are consistent and asymptotically normal. Thus, the inferential problems in this class of models are related to the finite-sample scenario, where it is difficult to distinguish between the fitted model and a nested nonidentifiable (i.e., parameter-redundant) model. We propose a method for detecting near-redundancy, based on distances between probability distributions. We also employ methods used in other areas for detecting practical nonidentifiability and near-redundancy, including the inspection of the profile likelihood function and the Hessian method. For cases where inferential problems are detected, we discuss alternatives such as using model selection tools to identify simpler models that do not exhibit these inferential problems, increasing the sample size, or extending the follow-up time. We illustrate the performance of the proposed methods through a simulation study. Our simulation study reveals a link between the presence of near-redundancy and practical nonidentifiability. Two illustrative applications using real data, with and without inferential problems, are presented.


Assuntos
Modelos de Riscos Proporcionais , Funções Verossimilhança , Simulação por Computador
12.
J Theor Biol ; 549: 111201, 2022 09 21.
Artigo em Inglês | MEDLINE | ID: mdl-35752285

RESUMO

Stochastic individual-based mathematical models are attractive for modelling biological phenomena because they naturally capture the stochasticity and variability that is often evident in biological data. Such models also allow us to track the motion of individuals within the population of interest. Unfortunately, capturing this microscopic detail means that simulation and parameter inference can become computationally expensive. One approach for overcoming this computational limitation is to coarse-grain the stochastic model to provide an approximate continuum model that can be solved using far less computational effort. However, coarse-grained continuum models can be biased or inaccurate, particularly for certain parameter regimes. In this work, we combine stochastic and continuum mathematical models in the context of lattice-based models of two-dimensional cell biology experiments by demonstrating how to simulate two commonly used experiments: cell proliferation assays and barrier assays. Our approach involves building a simple statistical model of the discrepancy between the expensive stochastic model and the associated computationally inexpensive coarse-grained continuum model. We form this statistical model based on a limited number of expensive stochastic model evaluations at design points sampled from a user-chosen distribution, corresponding to a computer experiment design problem. With straightforward design point selection schemes, we show that using the statistical model of the discrepancy in tandem with the computationally inexpensive continuum model allows us to carry out prediction and inference while correcting for biases and inaccuracies due to the continuum approximation. We demonstrate this approach by simulating a proliferation assay, where the continuum limit model is the well-known logistic ordinary differential equation, as well as a barrier assay where the continuum limit model is closely related to the well-known Fisher-KPP partial differential equation. We construct an approximate likelihood function for parameter inference, both with and without discrepancy correction terms. Using maximum likelihood estimation, we provide point estimates of the unknown parameters, and use the profile likelihood to characterise the uncertainty in these estimates and form approximate confidence intervals. For the range of inference problems considered, working with the continuum limit model alone leads to biased parameter estimation and confidence intervals with poor coverage. In contrast, incorporating correction terms arising from the statistical model of the model discrepancy allows us to recover the parameters accurately with minimal computational overhead. The main tradeoff is that the associated confidence intervals are typically broader, reflecting the additional uncertainty introduced by the approximation process. All algorithms required to replicate the results in this work are written in the open source Julia language and are available at GitHub.


Assuntos
Algoritmos , Modelos Biológicos , Simulação por Computador , Humanos , Funções Verossimilhança , Processos Estocásticos
13.
Stat Med ; 41(17): 3336-3348, 2022 07 30.
Artigo em Inglês | MEDLINE | ID: mdl-35527474

RESUMO

Outbreaks of an endemic infectious disease can occur when the disease is introduced into a highly susceptible subpopulation or when the disease enters a network of connected individuals. For example, significant HIV outbreaks among people who inject drugs have occurred in at least half a dozen US states in recent years. This motivates the current study: how can limited testing resources be allocated across geographic regions to rapidly detect outbreaks of an endemic infectious disease? We develop an adaptive sampling algorithm that uses profile likelihood to estimate the distribution of the number of positive tests that would occur for each location in a future time period if that location were sampled. Sampling is performed in the location with the highest estimated probability of triggering an outbreak alarm in the next time period. The alarm function is determined by a semiparametric likelihood ratio test. We compare the profile likelihood sampling (PLS) method numerically to uniform random sampling (URS) and Thompson sampling (TS). TS was worse than URS when the outbreak occurred in a location with lower initial prevalence than other locations. PLS had lower time to outbreak detection than TS in some but not all scenarios, but was always better than URS even when the outbreak occurred in a location with a lower initial prevalence than other locations. PLS provides an effective and reliable method for rapidly detecting endemic disease outbreaks that is robust to this uncertainty.


Assuntos
Surtos de Doenças , Humanos , Funções Verossimilhança , Prevalência
14.
Stat Med ; 41(5): 910-931, 2022 02 28.
Artigo em Inglês | MEDLINE | ID: mdl-35067954

RESUMO

In nutritional epidemiology, measurement error in covariates is a well-known problem since dietary intakes are usually assessed through self-reporting. In this article, we consider an additive error model in which error variables are highly correlated, and propose a new method called approximate profile likelihood estimation (APLE) for covariates measured with error in the Cox regression. Asymptotic normality of this estimator is established under regularity conditions, and simulation studies are conducted to examine the finite sample performance of the proposed estimator empirically. Moreover, the popular correction method called regression calibration is shown to be a special case of APLE. We then apply APLE to deal with measurement error in some nutrients of interest in the EPIC-InterAct Study under a sensitivity analysis framework.


Assuntos
Projetos de Pesquisa , Calibragem , Simulação por Computador , Humanos , Funções Verossimilhança
15.
Stat Med ; 41(29): 5698-5714, 2022 12 20.
Artigo em Inglês | MEDLINE | ID: mdl-36165535

RESUMO

In medical research, it is often of great interest to have an accurate estimation of cure rates by different treatment options and for different patient groups. If the follow-up time is sufficiently long and the sample size is large, the proportion of cured patients will make the Kaplan-Meier estimator of survival function have a flat plateau at its tail, whose value indicates the overall cure rate. However, it may be difficult to estimate and compare the cure rates for all the subsets of interest in this way, due to the limit of sample sizes and curse of dimensionality. In the current literature, most regression models for estimating cure rates assume proportional hazards (PH) between different subgroups. It turns out that the estimation of cure rates for subgroups is highly sensitive to this assumption, so more flexible models are needed, especially when this PH assumption is clearly violated. We propose a new cure model to simultaneously incorporate both PH and non-PH scenarios for different covariates. We develop a stable and easily implementable iterative procedure for parameter estimation through maximization of the nonparametric likelihood function. The covariance matrix is estimated by adding perturbation weights to the estimation procedure. In simulation studies, the proposed method provides unbiased estimation for the regression coefficients, survival curves, and cure rates given covariates, while existing models are biased. Our model is applied to a study of stage III soft tissue sarcoma and provides trustworthy estimation of cure rates for different treatment and demographic groups.


Assuntos
Sarcoma , Neoplasias de Tecidos Moles , Humanos , Modelos de Riscos Proporcionais , Modelos Estatísticos , Análise de Sobrevida , Funções Verossimilhança , Sarcoma/terapia , Simulação por Computador
16.
Biom J ; 64(7): 1340-1360, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-35754152

RESUMO

The DerSimonian-Laird (DL) weighted average method for aggregated data meta-analysis has been widely used for the estimation of overall effect sizes. It is criticized for its underestimation of the standard error of the overall effect size in the presence of heterogeneous effect sizes. Due to this negative property, many alternative estimation approaches have been proposed in the literature. One of the earliest alternative approaches was developed by Hardy and Thompson (HT), who implemented a profile likelihood instead of the moment-based approach of DL. Others have further extended this likelihood approach and proposed higher-order likelihood inferences (e.g., Bartlett-type corrections). In addition, corrections factors for the estimated DL standard error, like the Hartung-Knapp-Sidik-Jonkman (HKSJ) adjustment, and the restricted maximum likelihood (REML) estimation have been suggested too. Although these improvements address the uncertainty in estimating the between-study variance better than the DL method, they all assume that the true within-study standard errors are known and equal to the observed standard errors of the effect sizes. Here, we will treat the observed standard errors as estimators for the within-study variability and we propose a bivariate likelihood approach that jointly estimates the overall effect size, the between-study variance, and the potentially heteroskedastic within-study variances. We study the performance of the proposed method by means of simulation, and compare it to DL (with and without HKSJ), HT, their higher-order likelihood methods, and REML. Our proposed approach seems to have better or similar coverages compared to the other approaches and it appears to be less biased in the case of heteroskedastic within-study variances when this heteroskedasticty is correlated with the effect size.


Assuntos
Projetos de Pesquisa , Simulação por Computador , Funções Verossimilhança , Incerteza
17.
Stat Med ; 40(3): 668-689, 2021 02 10.
Artigo em Inglês | MEDLINE | ID: mdl-33210329

RESUMO

In this article, we introduce the recently developed intrinsic estimator method in the age-period-cohort (APC) models in examining disease incidence and mortality data, further develop a likelihood ratio (L-R) test for testing differences in temporal trends across populations, and apply the methods to examining temporal trends in the age, period or calendar time, and birth cohort of the US heart disease mortality across racial and sex groups. The temporal trends are estimated with the intrinsic estimator method to address the model identification problem, in which multiple sets of parameter estimates yield the same fitted values for a given dataset, making it difficult to conduct comparison of and hypothesis testing on the temporal trends in the age, period, and cohort across populations. We employ a penalized profile log-likelihood approach in developing the L-R test to deal with the issues of multiple estimators and the diverging number of model parameters. The identification problem also induces overparametrization of the APC model, which requires a correction of the degree of freedom of the L-R test. Monte Carlo simulation studies demonstrate that the L-R test performs well in the Type I error calculation and is powerful to detect differences in the age or period trends. The L-R test further reveals disparities of heart disease mortality among the US populations and between the US and Japanese populations.


Assuntos
Cardiopatias , Estudos de Coortes , Humanos , Japão/epidemiologia , Funções Verossimilhança , Grupos Raciais
18.
Stat Med ; 40(1): 119-132, 2021 01 15.
Artigo em Inglês | MEDLINE | ID: mdl-33015853

RESUMO

In this article, we develop a so-called profile likelihood ratio test (PLRT) based on the estimated error density for the multiple linear regression model. Unlike the existing likelihood ratio test (LRT), our proposed PLRT does not require any specification on the error distribution. The asymptotic properties are developed and the Wilks phenomenon is studied. Simulation studies are conducted to examine the performance of the PLRT. It is observed that our proposed PLRT generally outperforms the existing LRT, empirical likelihood ratio test and the weighted profile likelihood ratio test in sense that (i) its type I error rates are closer to the prespecified nominal level; (ii) it generally has higher powers; (iii) it performs satisfactorily when moments of the error do not exist (eg, Cauchy distribution); and (iv) it has higher probability of correctly selecting the correct model in the multiple testing problem. A mammalian eye gene expression dataset and a concrete compressive strength dataset are analyzed to illustrate our methodologies.


Assuntos
Funções Verossimilhança , Simulação por Computador , Humanos , Modelos Lineares
19.
J Stat Plan Inference ; 213: 16-32, 2021 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-33281277

RESUMO

We introduce an estimation method of covariance matrices in a high-dimensional setting, i.e., when the dimension of the matrix, p, is larger than the sample size n. Specifically, we propose an orthogonally equivariant estimator. The eigenvectors of such estimator are the same as those of the sample covariance matrix. The eigenvalue estimates are obtained from an adjusted profile likelihood function derived by approximating the integral of the density function of the sample covariance matrix over its eigenvectors, which is a challenging problem in its own right. Exact solutions to the approximate likelihood equations are obtained and employed to construct estimates that involve a tuning parameter. Bootstrap and cross-validation based algorithms are proposed to choose this tuning parameter under various loss functions. Finally, comparisons with two well-known orthogonally equivariant estimators are given, which are based on Monte-Carlo risk estimates for simulated data and misclassification errors in real data analyses.

20.
Mol Biol Evol ; 36(10): 2352-2357, 2019 10 01.
Artigo em Inglês | MEDLINE | ID: mdl-31119293

RESUMO

Maximum likelihood estimation in phylogenetics requires a means of handling unknown ancestral states. Classical maximum likelihood averages over these unknown intermediate states, leading to provably consistent estimation of the topology and continuous model parameters. Recently, a computationally efficient approach has been proposed to jointly maximize over these unknown states and phylogenetic parameters. Although this method of joint maximum likelihood estimation can obtain estimates more quickly, its properties as an estimator are not yet clear. In this article, we show that this method of jointly estimating phylogenetic parameters along with ancestral states is not consistent in general. We find a sizeable region of parameter space that generates data on a four-taxon tree for which this joint method estimates the internal branch length to be exactly zero, even in the limit of infinite-length sequences. More generally, we show that this joint method only estimates branch lengths correctly on a set of measure zero. We show empirically that branch length estimates are systematically biased downward, even for short branches.


Assuntos
Filogenia , Funções Verossimilhança
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa