Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 182
Filtrar
1.
Biometrics ; 80(2)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38819309

RESUMO

Doubly adaptive biased coin design (DBCD), a response-adaptive randomization scheme, aims to skew subject assignment probabilities based on accrued responses for ethical considerations. Recent years have seen substantial advances in understanding DBCD's theoretical properties, assuming correct model specification for the responses. However, concerns have been raised about the impact of model misspecification on its design and analysis. In this paper, we assess the robustness to both design model misspecification and analysis model misspecification under DBCD. On one hand, we confirm that the consistency and asymptotic normality of the allocation proportions can be preserved, even when the responses follow a distribution other than the one imposed by the design model during the implementation of DBCD. On the other hand, we extensively investigate three commonly used linear regression models for estimating and inferring the treatment effect, namely difference-in-means, analysis of covariance (ANCOVA) I, and ANCOVA II. By allowing these regression models to be arbitrarily misspecified, thereby not reflecting the true data generating process, we derive the consistency and asymptotic normality of the treatment effect estimators evaluated from the three models. The asymptotic properties show that the ANCOVA II model, which takes covariate-by-treatment interaction terms into account, yields the most efficient estimator. These results can provide theoretical support for using DBCD in scenarios involving model misspecification, thereby promoting the widespread application of this randomization procedure.


Assuntos
Modelos Estatísticos , Distribuição Aleatória , Humanos , Simulação por Computador , Ensaios Clínicos Controlados Aleatórios como Assunto/estatística & dados numéricos , Modelos Lineares , Biometria/métodos , Interpretação Estatística de Dados , Viés , Análise de Variância , Projetos de Pesquisa
2.
Biometrics ; 80(3)2024 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-39011739

RESUMO

Electronic health records and other sources of observational data are increasingly used for drawing causal inferences. The estimation of a causal effect using these data not meant for research purposes is subject to confounding and irregularly-spaced covariate-driven observation times affecting the inference. A doubly-weighted estimator accounting for these features has previously been proposed that relies on the correct specification of two nuisance models used for the weights. In this work, we propose a novel consistent multiply robust estimator and demonstrate analytically and in comprehensive simulation studies that it is more flexible and more efficient than the only alternative estimator proposed for the same setting. It is further applied to data from the Add Health study in the United States to estimate the causal effect of therapy counseling on alcohol consumption in American adolescents.


Assuntos
Simulação por Computador , Modelos Estatísticos , Estudos Observacionais como Assunto , Humanos , Estudos Observacionais como Assunto/estatística & dados numéricos , Adolescente , Causalidade , Estados Unidos , Interpretação Estatística de Dados , Registros Eletrônicos de Saúde/estatística & dados numéricos , Biometria/métodos , Consumo de Bebidas Alcoólicas
3.
Pharm Stat ; 2024 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-38568372

RESUMO

In several therapeutic areas, including chronic kidney disease (CKD) and immunoglobulin A nephropathy (IgAN), there is a growing interest in how best to analyze estimated glomerular filtration rate (eGFR) data over time in randomized clinical trials including how to best accommodate situations where the rate of change is not anticipated to be linear over time, often due to possible short term hemodynamic effects of certain classes of interventions. In such situations, concerns have been expressed by regulatory authorities that the common application of single slope analysis models may induce Type I error inflation. This article aims to offer practical advice and guidance, including SAS codes, on the statistical methodology to be employed in an eGFR rate of change analysis and offers guidance on trial design considerations for eGFR endpoints. A two-slope statistical model for eGFR data over time is proposed allowing for an analysis to simultaneously evaluate short term acute effects and long term chronic effects. A simulation study was conducted under a range of credible null and alternative hypotheses to evaluate the performance of the two-slope model in comparison to commonly used single slope random coefficients models as well as to non-slope based analyses of change from baseline or time normalized area under the curve (TAUC). Importantly, and contrary to preexisting concerns, these simulations demonstrate the absence of alpha inflation associated with the use of single or two-slope random coefficient models, even when such models are misspecified, and highlight that any concern regarding model misspecification relates to power and not to lack of Type I error control.

4.
Biom J ; 66(1): e2200348, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38240577

RESUMO

An inference procedure is proposed to provide consistent estimators of parameters in a modal regression model with a covariate prone to measurement error. A score-based diagnostic tool exploiting parametric bootstrap is developed to assess adequacy of parametric assumptions imposed on the regression model. The proposed estimation method and diagnostic tool are applied to synthetic data generated from simulation experiments and data from real-world applications to demonstrate their implementation and performance. These empirical examples illustrate the importance of adequately accounting for measurement error in the error-prone covariate when inferring the association between a response and covariates based on a modal regression model that is especially suitable for skewed and heavy-tailed response data.


Assuntos
Simulação por Computador
5.
Entropy (Basel) ; 26(1)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38275503

RESUMO

The paper makes a case that the current discussions on replicability and the abuse of significance testing have overlooked a more general contributor to the untrustworthiness of published empirical evidence, which is the uninformed and recipe-like implementation of statistical modeling and inference. It is argued that this contributes to the untrustworthiness problem in several different ways, including [a] statistical misspecification, [b] unwarranted evidential interpretations of frequentist inference results, and [c] questionable modeling strategies that rely on curve-fitting. What is more, the alternative proposals to replace or modify frequentist testing, including [i] replacing p-values with observed confidence intervals and effects sizes, and [ii] redefining statistical significance, will not address the untrustworthiness of evidence problem since they are equally vulnerable to [a]-[c]. The paper calls for distinguishing between unduly data-dependant 'statistical results', such as a point estimate, a p-value, and accept/reject H0, from 'evidence for or against inferential claims'. The post-data severity (SEV) evaluation of the accept/reject H0 results, converts them into evidence for or against germane inferential claims. These claims can be used to address/elucidate several foundational issues, including (i) statistical vs. substantive significance, (ii) the large n problem, and (iii) the replicability of evidence. Also, the SEV perspective sheds light on the impertinence of the proposed alternatives [i]-[iii], and oppugns [iii] the alleged arbitrariness of framing H0 and H1 which is often exploited to undermine the credibility of frequentist testing.

6.
Mol Biol Evol ; 39(12)2022 12 05.
Artigo em Inglês | MEDLINE | ID: mdl-36317198

RESUMO

Genomic sequence data provide a rich source of information about the history of species divergence and interspecific hybridization or introgression. Despite recent advances in genomics and statistical methods, it remains challenging to infer gene flow, and as a result, one may have to estimate introgression rates and times under misspecified models. Here we use mathematical analysis and computer simulation to examine estimation bias and issues of interpretation when the model of gene flow is misspecified in analysis of genomic datasets, for example, if introgression is assigned to the wrong lineages. In the case of two species, we establish a correspondence between the migration rate in the continuous migration model and the introgression probability in the introgression model. When gene flow occurs continuously through time but in the analysis is assumed to occur at a fixed time point, common evolutionary parameters such as species divergence times are surprisingly well estimated. However, the time of introgression tends to be estimated towards the recent end of the period of continuous gene flow. When introgression events are assigned incorrectly to the parental or daughter lineages, introgression times tend to collapse onto species divergence times, with introgression probabilities underestimated. Overall, our analyses suggest that the simple introgression model is useful for extracting information concerning between-specific gene flow and divergence even when the model may be misspecified. However, for reliable inference of gene flow it is important to include multiple samples per species, in particular, from hybridizing species.


Assuntos
Fluxo Gênico , Genômica , Simulação por Computador
7.
Am J Epidemiol ; 192(11): 1887-1895, 2023 11 03.
Artigo em Inglês | MEDLINE | ID: mdl-37338985

RESUMO

The noniterative conditional expectation (NICE) parametric g-formula can be used to estimate the causal effect of sustained treatment strategies. In addition to identifiability conditions, the validity of the NICE parametric g-formula generally requires the correct specification of models for time-varying outcomes, treatments, and confounders at each follow-up time point. An informal approach for evaluating model specification is to compare the observed distributions of the outcome, treatments, and confounders with their parametric g-formula estimates under the "natural course." In the presence of loss to follow-up, however, the observed and natural-course risks can differ even if the identifiability conditions of the parametric g-formula hold and there is no model misspecification. Here, we describe 2 approaches for evaluating model specification when using the parametric g-formula in the presence of censoring: 1) comparing factual risks estimated by the g-formula with nonparametric Kaplan-Meier estimates and 2) comparing natural-course risks estimated by inverse probability weighting with those estimated by the g-formula. We also describe how to correctly compute natural-course estimates of time-varying covariate means when using a computationally efficient g-formula algorithm. We evaluate the proposed methods via simulation and implement them to estimate the effects of dietary interventions in 2 cohort studies.


Assuntos
Modelos Estatísticos , Humanos , Simulação por Computador , Probabilidade , Causalidade , Estimativa de Kaplan-Meier , Estudos de Coortes
8.
Biometrics ; 79(4): 3227-3238, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-37312587

RESUMO

It has been increasingly appealing to evaluate whether expression levels of two genes in a gene coexpression network are still dependent given samples' clinical information, in which the conditional independence test plays an essential role. For enhanced robustness regarding model assumptions, we propose a class of double-robust tests for evaluating the dependence of bivariate outcomes after controlling for known clinical information. Although the proposed test relies on the marginal density functions of bivariate outcomes given clinical information, the test remains valid as long as one of the density functions is correctly specified. Because of the closed-form variance formula, the proposed test procedure enjoys computational efficiency without requiring a resampling procedure or tuning parameters. We acknowledge the need to infer the conditional independence network with high-dimensional gene expressions, and further develop a procedure for multiple testing by controlling the false discovery rate. Numerical results show that our method accurately controls both the type-I error and false discovery rate, and it provides certain levels of robustness regarding model misspecification. We apply the method to a gastric cancer study with gene expression data to understand the associations between genes belonging to the transforming growth factor ß signaling pathway given cancer-stage information.


Assuntos
Redes Reguladoras de Genes , Neoplasias , Humanos , Neoplasias/genética
9.
Biometrics ; 79(4): 3050-3065, 2023 12.
Artigo em Inglês | MEDLINE | ID: mdl-36915949

RESUMO

The Cox proportional hazards model, commonly used in clinical trials, assumes proportional hazards. However, it does not hold when, for example, there is a delayed onset of the treatment effect. In such a situation, an acute change in the hazard ratio function is expected to exist. This paper considers the Cox model with change-points and derives Akaike information criterion (AIC)-type information criteria for detecting those change-points. The change-point model does not allow for conventional statistical asymptotics due to its irregularity, thus a formal AIC that penalizes twice the number of parameters would not be analytically derived, and using it would clearly give overfitting analysis results. Therefore, we will construct specific asymptotics using the partial likelihood estimation method in the Cox model with change-points, and propose information criteria based on the original derivation method for AIC. If the partial likelihood is used in the estimation, information criteria with penalties much larger than twice the number of parameters could be obtained in an explicit form. Numerical experiments confirm that the proposed criteria are clearly superior in terms of the original purpose of AIC, which are to provide an estimate that is close to the true structure. We also apply the proposed criterion to actual clinical trial data to indicate that it will easily lead to different results from the formal AIC.


Assuntos
Modelos de Riscos Proporcionais , Funções Verossimilhança
10.
Biometrics ; 79(3): 2023-2035, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-35841231

RESUMO

We consider analyses of case-control studies assembled from electronic health records (EHRs) where the pool of cases is contaminated by patients who are ineligible for the study. These ineligible patients, referred to as "false cases," should be excluded from the analyses if known. However, the true outcome status of a patient in the case pool is unknown except in a subset whose size may be arbitrarily small compared to the entire pool. To effectively remove the influence of the false cases on estimating odds ratio parameters defined by a working association model of the logistic form, we propose a general strategy to adaptively impute the unknown case status without requiring a correct phenotyping model to help discern the true and false case statuses. Our method estimates the target parameters as the solution to a set of unbiased estimating equations constructed using all available data. It outperforms existing methods by achieving robustness to mismodeling the relationship between the outcome status and covariates of interest, as well as improved estimation efficiency. We further show that our estimator is root-n-consistent and asymptotically normal. Through extensive simulation studies and analysis of real EHR data, we demonstrate that our method has desirable robustness to possible misspecification of both the association and phenotyping models, along with statistical efficiency superior to the competitors.


Assuntos
Registros Eletrônicos de Saúde , Modelos Estatísticos , Humanos , Simulação por Computador , Estudos de Casos e Controles
11.
Stat Med ; 42(14): 2420-2438, 2023 06 30.
Artigo em Inglês | MEDLINE | ID: mdl-37019876

RESUMO

Modeling longitudinal trajectories and identifying latent classes of trajectories is of great interest in biomedical research, and software to identify latent classes of such is readily available for latent class trajectory analysis (LCTA), growth mixture modeling (GMM) and covariance pattern mixture models (CPMM). In biomedical applications, the level of within-person correlation is often non-negligible, which can impact the model choice and interpretation. LCTA does not incorporate this correlation. GMM does so through random effects, while CPMM specifies a model for within-class marginal covariance matrix. Previous work has investigated the impact of constraining covariance structures, both within and across classes, in GMMs-an approach often used to solve convergence problems. Using simulation, we focused specifically on how misspecification of the temporal correlation structure and strength, but correct variances, impacts class enumeration and parameter estimation under LCTA and CPMM. We found (1) even in the presence of weak correlation, LCTA often does not reproduce original classes, (2) CPMM performs well in class enumeration when the correct correlation structure is selected, and (3) regardless of misspecification of the correlation structure, both LCTA and CPMM give unbiased estimates of the class trajectory parameters when the within-individual correlation is weak and the number of classes is correctly specified. However, the bias increases markedly when the correlation is moderate for LCTA and when the incorrect correlation structure is used for CPMM. This work highlights the importance of correlation alone in obtaining appropriate model interpretations and provides insight into model choice.


Assuntos
Pesquisa Biomédica , Software , Humanos , Simulação por Computador , Análise de Classes Latentes , Viés
12.
Pharm Res ; 40(9): 2147-2153, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37594592

RESUMO

PURPOSE: The one-compartment model with first order absorption (ka1C) has been extensively used to fit oral data. But when the disposition parameters of the drug are not available, the bias in the parameter estimates remains unclear. In this paper, the effect of potential misspecification of the area under the curve (AUC) and the mean absorption time (MAT) was evaluated for three relatively slowly absorbed drugs/formulations. METHODS: Assuming a three-compartment disposition model with an input (absorption) rate described as a sum of two inverse Gaussian functions (2IG3C) as the true model, the deviations of AUC and MAT estimated with simpler models were analyzed. Simpler models, as the ka1C model (Bateman function), the one-compartment model with IG input function (IG1C) and the gamma density function were fitted to the oral data alone, and compared to the fits obtained with the 2IG3C model which also uses the 3C disposition parameters of the drug. Data from pharmacokinetic studies of trospium, propiverine and ketamine in healthy volunteers were analyzed using a population approach. RESULTS: The Bateman function (ka1C) allowed a robust estimation of the population mean AUC, but the individual estimates were highly biased. It failed in evaluating MAT. The simple alternative models did not improve the situation. CONCLUSIONS: The Bateman function appears to be useful for estimating the population mean value of AUC after oral administration. The results reemphasize the fact that insight into the absorption process can be only gained when also intravenous reference data are available.

13.
BMC Med Res Methodol ; 23(1): 274, 2023 11 21.
Artigo em Inglês | MEDLINE | ID: mdl-37990159

RESUMO

BACKGROUND: For certain conditions, treatments aim to lessen deterioration over time. A trial outcome could be change in a continuous measure, analysed using a random slopes model with a different slope in each treatment group. A sample size for a trial with a particular schedule of visits (e.g. annually for three years) can be obtained using a two-stage process. First, relevant (co-) variances are estimated from a pre-existing dataset e.g. an observational study conducted in a similar setting. Second, standard formulae are used to calculate sample size. However, the random slopes model assumes linear trajectories with any difference in group means increasing proportionally to follow-up time. The impact of these assumptions failing is unclear. METHODS: We used simulation to assess the impact of a non-linear trajectory and/or non-proportional treatment effect on the proposed trial's power. We used four trajectories, both linear and non-linear, and simulated observational studies to calculate sample sizes. Trials of this size were then simulated, with treatment effects proportional or non-proportional to time. RESULTS: For a proportional treatment effect and a trial visit schedule matching the observational study, powers are close to nominal even for non-linear trajectories. However, if the schedule does not match the observational study, powers can be above or below nominal levels, with the extent of this depending on parameters such as the residual error variance. For a non-proportional treatment effect, using a random slopes model can lead to powers far from nominal levels. CONCLUSIONS: If trajectories are suspected to be non-linear, observational data used to inform power calculations should have the same visit schedule as the proposed trial where possible. Additionally, if the treatment effect is expected to be non-proportional, the random slopes model should not be used. A model allowing trajectories to vary freely over time could be used instead, either as a second line analysis method (bearing in mind that power will be lost) or when powering the trial.


Assuntos
Tamanho da Amostra , Humanos , Simulação por Computador
14.
Philos Trans A Math Phys Eng Sci ; 381(2247): 20220149, 2023 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-36970819

RESUMO

Bayesian cluster analysis offers substantial benefits over algorithmic approaches by providing not only point estimates but also uncertainty in the clustering structure and patterns within each cluster. An overview of Bayesian cluster analysis is provided, including both model-based and loss-based approaches, along with a discussion on the importance of the kernel or loss selected and prior specification. Advantages are demonstrated in an application to cluster cells and discover latent cell types in single-cell RNA sequencing data to study embryonic cellular development. Lastly, we focus on the ongoing debate between finite and infinite mixtures in a model-based approach and robustness to model misspecification. While much of the debate and asymptotic theory focuses on the marginal posterior of the number of clusters, we empirically show that quite a different behaviour is obtained when estimating the full clustering structure. This article is part of the theme issue 'Bayesian inference: challenges, perspectives, and prospects'.


Assuntos
Teorema de Bayes , Análise por Conglomerados
15.
Health Econ ; 32(2): 395-412, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36314282

RESUMO

This paper re-examines a well-established hypothesis postulating that life expectancy augments incentives for human capital accumulation, leading to global income differences. A major distinguishing feature of the current study is to estimate heterogeneous panel data models under a common factor framework, which explicitly accounts for parameter heterogeneity, unobserved common factors (UCFs), and variables' non-stationarity. In sharp contrast to most previous studies, I find that the impact of health improvements on human capital accumulation turns out to be imprecisely estimated at conventionally accepted levels of statistical significance. I demonstrate that conventional estimates of the educational returns to rising longevity are derived from estimating misspecified models at least partially due to parameter heterogeneity and the presence of UCFs.


Assuntos
Renda , Expectativa de Vida , Humanos , Longevidade , Escolaridade
16.
Bull Math Biol ; 86(1): 2, 2023 11 24.
Artigo em Inglês | MEDLINE | ID: mdl-37999811

RESUMO

When using mathematical models to make quantitative predictions for clinical or industrial use, it is important that predictions come with a reliable estimate of their accuracy (uncertainty quantification). Because models of complex biological systems are always large simplifications, model discrepancy arises-models fail to perfectly recapitulate the true data generating process. This presents a particular challenge for making accurate predictions, and especially for accurately quantifying uncertainty in these predictions. Experimentalists and modellers must choose which experimental procedures (protocols) are used to produce data used to train models. We propose to characterise uncertainty owing to model discrepancy with an ensemble of parameter sets, each of which results from training to data from a different protocol. The variability in predictions from this ensemble provides an empirical estimate of predictive uncertainty owing to model discrepancy, even for unseen protocols. We use the example of electrophysiology experiments that investigate the properties of hERG potassium channels. Here, 'information-rich' protocols allow mathematical models to be trained using numerous short experiments performed on the same cell. In this case, we simulate data with one model and fit it with a different (discrepant) one. For any individual experimental protocol, parameter estimates vary little under repeated samples from the assumed additive independent Gaussian noise model. Yet parameter sets arising from the same model applied to different experiments conflict-highlighting model discrepancy. Our methods will help select more suitable ion channel models for future studies, and will be widely applicable to a range of biological modelling problems.


Assuntos
Conceitos Matemáticos , Modelos Biológicos , Incerteza , Modelos Teóricos , Canais Iônicos
17.
Multivariate Behav Res ; 58(1): 195-219, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36787523

RESUMO

Factor analysis is often used to model scales created to measure latent constructs, and internal structure validity evidence is commonly assessed with indices like RMSEA, and CFI. These indices are essentially effect size measures and definitive benchmarks regarding which values connote reasonable fit have been elusive. Simulations from the 1990s suggesting possible benchmark values are among the most highly cited methodological papers across any discipline. However, simulations have suggested that fixed benchmarks do not generalize well - fit indices are systematically impacted by characteristics like the number of items and the magnitude of the loadings, so fixed benchmarks can confound misfit with model characteristics. Alternative frameworks for creating customized, model-specific benchmarks have recently been proposed to circumvent these issues but they have not been systematically evaluated. Motivated by two empirical applications where different methods yield inconsistent conclusions, two simulation studies are performed to assess the ability of three different approaches to correctly classify models that are correct or misspecified across different conditions. Results show that dynamic fit indices and equivalence testing both improved upon the traditional Hu & Bentler benchmarks and dynamic fit indices appeared to be least confounded with model characteristics in the conditions studied.


Assuntos
Simulação por Computador , Análise Fatorial
18.
Multivariate Behav Res ; 58(3): 580-597, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-35507677

RESUMO

Diagnostic classification models (DCMs) are psychometric models for evaluating a student's mastery of the essential skills in a content domain based upon their responses to a set of test items. Currently, diagnostic model and/or Q-matrix misspecification is a known problem with limited avenues for remediation. To address this problem, this paper defines a one-sided score statistic that is a computationally efficient method for detecting under-specification at the item level of both the Q-matrix and the model parameters of the particular DCM chosen in an analysis. This method is analogous to the modification indices widely used in structural equation modeling. The results of a simulation study show the Type I error rate of modification indices for DCMs are acceptably close to the nominal significance level when the appropriate mixture χ2 reference distribution is used. The simulation results indicate that modification indices are very powerful in the detection of an under-specified Q-matrix and have ample power to detect the omission of model parameters in large samples or when the items are highly discriminating. An application of modification indices for DCMs to an analysis of response data from a large-scale administration of a diagnostic test demonstrates how they can be useful in diagnostic model refinement.


Assuntos
Simulação por Computador , Humanos , Psicometria/métodos , Análise de Classes Latentes
19.
Behav Res Methods ; 2023 Nov 28.
Artigo em Inglês | MEDLINE | ID: mdl-38017204

RESUMO

With mixed-effects regression models becoming a mainstream tool for every psycholinguist, there has become an increasing need to understand them more fully. In the last decade, most work on mixed-effects models in psycholinguistics has focused on properly specifying the random-effects structure to minimize error in evaluating the statistical significance of fixed-effects predictors. The present study examines a potential misspecification of random effects that has not been discussed in psycholinguistics: violation of the single-subject-population assumption, in the context of logistic regression. Estimated random-effects distributions in real studies often appear to be bi- or multimodal. However, there is no established way to estimate whether a random-effects distribution corresponds to more than one underlying population, especially in the more common case of a multivariate distribution of random effects. We show that violations of the single-subject-population assumption can usually be detected by assessing the (multivariate) normality of the inferred random-effects structure, unless the data show quasi-separability, i.e., many subjects or items show near-categorical behavior. In the absence of quasi-separability, several clustering methods are successful in determining which group each participant belongs to. The BIC difference between a two-cluster and a one-cluster solution can be used to determine that subjects (or items) do not come from a single population. This then allows the researcher to define and justify a new post hoc variable specifying the groups to which participants or items belong, which can be incorporated into regression analysis.

20.
Behav Res Methods ; 55(6): 3281-3296, 2023 09.
Artigo em Inglês | MEDLINE | ID: mdl-36097102

RESUMO

Factor mixture modeling (FMM) has been increasingly used in behavioral and social sciences to examine unobserved population heterogeneity. Covariates (e.g., gender, race) are often included in FMM to help understand the formation and characterization of latent subgroups or classes. This Monte Carlo simulation study evaluated the performance of one-step and three-step approaches to covariate inclusion across three scenarios, i.e., correct specification (study 1), model misspecification (study 2), and model overfitting (study 3), in terms of direct covariate effects on factors. Results showed that the performance of these two approaches was comparable when class separation was large and the specification of covariate effect was correct. However, one-step FMM had better class enumeration than the three-step approach when class separation was poor, and was more robust to the misspecification or overfitting concerning direct covariate effects. Recommendations regarding covariate inclusion approaches are provided herein depending on class separation and sample size. Large sample size (1000 or more) and the use of sample size-adjusted BIC (saBIC) in class enumeration are recommended.


Assuntos
Simulação por Computador , Humanos , Tamanho da Amostra , Método de Monte Carlo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA