Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 13 de 13
Filter
1.
J Intell ; 12(3)2024 Feb 25.
Article in English | MEDLINE | ID: mdl-38535160

ABSTRACT

Language proficiency assessments are pivotal in educational and professional decision-making. With the integration of AI-driven technologies, these assessments can more frequently use item types, such as dictation tasks, producing response features with a mixture of discrete and continuous distributions. This study evaluates novel measurement models tailored to these unique response features. Specifically, we evaluated the performance of the zero-and-one-inflated extensions of the Beta, Simplex, and Samejima's Continuous item response models and incorporated collateral information into the estimation using latent regression. Our findings highlight that while all models provided highly correlated results regarding item and person parameters, the Beta item response model showcased superior out-of-sample predictive accuracy. However, a significant challenge was the absence of established benchmarks for evaluating model and item fit for these novel item response models. There is a need for further research to establish benchmarks for evaluating the fit of these innovative models to ensure their reliability and validity in real-world applications.

2.
Eval Rev ; 43(6): 335-369, 2019 12.
Article in English | MEDLINE | ID: mdl-31578089

ABSTRACT

BACKGROUND: Analysis of covariance (ANCOVA) is commonly used to adjust for potential confounders in observational studies of intervention effects. Measurement error in the covariates used in ANCOVA models can lead to inconsistent estimators of intervention effects. While errors-in-variables (EIV) regression can restore consistency, it requires surrogacy assumptions for the error-prone covariates that may be violated in practical settings. OBJECTIVES: The objectives of this article are (1) to derive asymptotic results for ANCOVA using EIV regression when measurement errors may not satisfy the standard surrogacy assumptions and (2) to demonstrate how these results can be used to explore the potential bias from ANCOVA models that either ignore measurement error by using ordinary least squares (OLS) regression or use EIV regression when its required assumptions do not hold. RESULTS: The article derives asymptotic results for ANCOVA with error-prone covariates that cover a variety of cases relevant to applications. It then uses the results in a case study of choosing among ANCOVA model specifications for estimating teacher effects using longitudinal data from a large urban school system. It finds evidence that estimates of teacher effects computed using EIV regression may have smaller bias than estimates computed using OLS regression when the data available for adjusting for students' prior achievement are limited.


Subject(s)
Bias , Models, Statistical , Observational Studies as Topic/statistics & numerical data , Analysis of Variance
3.
Psychometrika ; 2017 Mar 29.
Article in English | MEDLINE | ID: mdl-28397085

ABSTRACT

This article considers the application of the simulation-extrapolation (SIMEX) method for measurement error correction when the error variance is a function of the latent variable being measured. Heteroskedasticity of this form arises in educational and psychological applications with ability estimates from item response theory models. We conclude that there is no simple solution for applying SIMEX that generally will yield consistent estimators in this setting. However, we demonstrate that several approximate SIMEX methods can provide useful estimators, leading to recommendations for analysts dealing with this form of error in settings where SIMEX may be the most practical option.

4.
Educ Psychol Meas ; 77(6): 917-944, 2017 Dec.
Article in English | MEDLINE | ID: mdl-29795939

ABSTRACT

Student Growth Percentiles (SGPs) increasingly are being used in the United States for inferences about student achievement growth and educator effectiveness. Emerging research has indicated that SGPs estimated from observed test scores have large measurement errors. As such, little is known about "true" SGPs, which are defined in terms of nonlinear functions of latent achievement attributes for individual students and their distributions across students. We develop a novel framework using latent regression multidimensional item response theory models to study distributional properties of true SGPs. We apply these methods to several cohorts of longitudinal item response data from more than 330,000 students in a large urban metropolitan area to provide new empirical information about true SGPs. We find that true SGPs are correlated 0.3 to 0.5 across mathematics and English language arts, and that they have nontrivial relationships with individual student characteristics, particularly student race/ethnicity and absenteeism. We evaluate the potential of using these relationships to improve the accuracy of SGPs estimated from observed test scores, finding that accuracy gains even under optimal circumstances are modest. We also consider the properties of SGPs averaged to the teacher level, widely used for teacher evaluations. We find that average true SGPs for individual teachers vary substantially as a function of the characteristics of the students they teach. We discuss implications of our findings for the estimation and interpretation of SGPs at both the individual and aggregate levels.

5.
Educ Psychol Meas ; 75(2): 311-337, 2015 Apr.
Article in English | MEDLINE | ID: mdl-29795823

ABSTRACT

Observations and ratings of classroom teaching and interactions collected over time are susceptible to trends in both the quality of instruction and rater behavior. These trends have potential implications for inferences about teaching and for study design. We use scores on the Classroom Assessment Scoring System-Secondary (CLASS-S) protocol from 458 middle school teachers over a 2-year period to study changes over time in (a) the average quality of teaching for the population of teachers, (b) the average severity of the population of raters, and (c) the severity of individual raters. To obtain these estimates and assess them in the context of other factors that contribute to the variability in scores, we develop an augmented G study model that is broadly applicable for modeling sources of variability in classroom observation ratings data collected over time. In our data, we found that trends in teaching quality were small. Rater drift was very large during raters' initial days of observation and persisted throughout nearly 2 years of scoring. Raters did not converge to a common level of severity; using our model we estimate that variability among raters actually increases over the course of the study. Variance decompositions based on the model find that trends are a modest source of variance relative to overall rater effects, rater errors on specific lessons, and residual error. The discussion provides possible explanations for trends and rater divergence as well as implications for designs collecting ratings over time.

6.
Biometrika ; 100(3): 671-680, 2013.
Article in English | MEDLINE | ID: mdl-24795484

ABSTRACT

Inverse probability-weighted estimators are widely used in applications where data are missing due to nonresponse or censoring and in the estimation of causal effects from observational studies. Current estimators rely on ignorability assumptions for response indicators or treatment assignment and outcomes being conditional on observed covariates which are assumed to be measured without error. However, measurement error is common for the variables collected in many applications. For example, in studies of educational interventions, student achievement as measured by standardized tests is almost always used as the key covariate for removing hidden biases, but standardized test scores may have substantial measurement errors. We provide several expressions for a weighting function that can yield a consistent estimator for population means using incomplete data and covariates measured with error. We propose a method to estimate the weighting function from data. The results of a simulation study show that the estimator is consistent and has no bias and small variance.

7.
Rand Health Q ; 2(1): 18, 2012.
Article in English | MEDLINE | ID: mdl-28083240

ABSTRACT

The Modified Kalman Filter approach for pooling information across time and across outcomes is shown to improve accuracy in national estimates of health outcomes, including cancer, diabetes, and hypertension, especially in small racial/ethnic subgroups. The developed SAS macro models true health states in each subgroup assuming a linear time evolution and an autoregressive deviation around such trend. The macro provides multiple options for users.

8.
Stat Med ; 30(5): 584-94, 2011 Feb 28.
Article in English | MEDLINE | ID: mdl-21290400

ABSTRACT

Repeated cross-sectional samples are common in national surveys of health like the National Health Interview Survey (NHIS). Because population health outcomes generally evolve slowly, pooling data across years can improve the precision of current-year annual estimates of disease prevalence and other health outcomes. Pooling over time is particularly valuable in health disparities research, where outcomes for small groups are often of interest and pooling data across groups would bias disparity estimates. State-space modeling and Kalman filtering are appealing choices for smoothing data across time. However, filtering can be problematic when few time points are available, as is common with annual cross-sectional data. Problems arise because filtering relies on estimated variance components, which can be biased and imprecise when estimated with small samples, especially when estimated in tandem with linear trends. We conduct a simulation study showing that even when trends and variance components are estimated poorly, smoothing with these estimates can improve the mean squared error (MSE) of estimated health states for multiple racial/ethnic groups when the variance components are estimated with the pooled sample. We consider frequentist estimators with no trends, one common trend across groups, and separate trends for every group, as well as shrinkage estimators of trends through a Bayesian model. We show that the Bayesian model offers the greatest improvement in MSE, and that Bayesian Information Criterion (BIC)-based model averaging of the frequentist estimators with different trend assumptions performs nearly as well. We present empirical examples using the NHIS data.


Subject(s)
Cross-Sectional Studies/statistics & numerical data , Health Surveys/statistics & numerical data , Models, Statistical , Algorithms , Bayes Theorem , Body Mass Index , Computer Simulation , Ethnicity/statistics & numerical data , Health Status Disparities , Humans , Likelihood Functions , Prevalence , Racial Groups/statistics & numerical data , Selection Bias , Stroke/epidemiology , Time Factors , United States
9.
J Subst Abuse Treat ; 33(1): 107-10, 2007 Jul.
Article in English | MEDLINE | ID: mdl-17376636

ABSTRACT

Because it is well known that the power to measure differences between two groups is typically best with an even distribution of any given fixed sample size, great emphasis is often placed on exactly equal treatment and control allocations in evaluations of substance abuse interventions. Independent randomization of individuals (e.g., a "coin flip") when study participants are enrolled in an ongoing fashion by multiple recruiters and assigned to treatment conditions does not guarantee exact balance, often prompting the use of schemes that are complex and burdensome to implement. Our results suggest that departures from simple randomization are only warranted for single-site trials involving fewer than 77 total subjects or for multisite trials with substantially fewer than 77 subjects per site. With such a rule, simple randomization will produce samples that are at least 95% as efficient as a fully balanced sample of equal size at least 95% of the time.


Subject(s)
Randomized Controlled Trials as Topic/statistics & numerical data , Substance-Related Disorders/rehabilitation , Humans , Multicenter Studies as Topic , Reproducibility of Results , Sample Size
10.
J Educ Behav Stat ; 29(1): 67-101, 2004.
Article in English | MEDLINE | ID: mdl-19756248

ABSTRACT

The use of complex value-added models that attempt to isolate the contributions of teachers or schools to student development is increasing. Several variations on these models are being applied in the research literature, and policy makers have expressed interest in using these models for evaluating teachers and schools. In this article, we present a general multivariate, longitudinal mixed-model that incorporates the complex grouping structures inherent to longitudinal student data linked to teachers. We summarize the principal existing modeling approaches, show how these approaches are special cases of the proposed model, and discuss possible extensions to model more complex data structures. We present simulation and analytical results that clarify the interplay between estimated teacher effects and repeated outcomes on students over time. We also explore the potential impact of model misspecifications, including missing student covariates and assumptions about the accumulation of teacher effects over time, on key inferences made from the models. We conclude that mixed models that account for student correlation over time are reasonably robust to such misspecifications when all the schools in the sample serve similar student populations. However, student characteristics are likely to confound estimated teacher effects when schools serve distinctly different populations.

11.
J Educ Behav Stat ; 27(3): 255-270, 2002 Jan 01.
Article in English | MEDLINE | ID: mdl-19830272

ABSTRACT

Accountability for public education often requires estimating and ranking the quality of individual teachers or schools on the basis of student test scores. Although the properties of estimators of teacher-or-school effects are well established, less is known about the properties of rank estimators. We investigate performance of rank (percentile) estimators in a basic, two-stage hierarchical model capturing the essential features of the more complicated models that are commonly used to estimate effects. We use simulation to study mean squared error (MSE) performance of percentile estimates and to find the operating characteristics of decision rules based on estimated percentiles. Each depends on the signal-to-noise ratio (the ratio of the teacher or school variance component to the variance of the direct, teacher- or school-specific estimator) and only moderately on the number of teachers or schools. Results show that even when using optimal procedures, MSE is large for the commonly encountered variance ratios, with an unrealistically large ratio required for ideal performance. Percentile-specific MSE results reveal interesting interactions between variance ratios and estimators, especially for extreme percentiles, which are of considerable practical import. These interactions are apparent in the performance of decision rules for the identification of extreme percentiles, underscoring the statistical and practical complexity of the multiple-goal inferences faced in value-added modeling. Our results highlight the need to assess whether even optimal percentile estimators perform sufficiently well to be used in evaluating teachers or schools.

12.
Genet Epidemiol ; 20(1): 17-33, 2001 Jan.
Article in English | MEDLINE | ID: mdl-11119294

ABSTRACT

Genetic epidemiological methodologies, such as linkage analysis, often require accurate estimates of allele frequencies. When studies involve multiple sub-populations with different evolutionary histories, accurate estimates can be difficult to obtain because the number of subjects per sub-population tends to be limited. Given allele counts for a collection of loci and sub-populations, we propose a Bayesian hierarchical model that extends existing empirical Bayesian approaches by allowing for explicit inclusion of prior information about both allele frequencies and inter-population divergence. We describe how such information can be derived from published data and then incorporated into the model via prior distributions for model parameters. By analysis of simulated data, we highlight how the hierarchical model, as implemented in the publicly available program AllDist, combines prior information with the observed data to refine allele frequency estimates.


Subject(s)
Bayes Theorem , Gene Frequency , Genetic Linkage , Models, Genetic , Alleles , Computer Simulation , Humans
13.
Environ Sci Technol ; 35(22): 4414-20, 2001 Nov 15.
Article in English | MEDLINE | ID: mdl-11757595

ABSTRACT

The current effort to revise the arsenic drinking water standard is one of the first times that the promulgation of a Maximum Contaminant Level (MCL) for drinking water has been influenced explicitly by benefit-cost considerations. Different stakeholders have developed different estimates of the costs, benefits, and appropriate decision-making criteria for a lower standard. In this study, alternative analyses prepared by the U.S. EPA and by independent researchers are compared. The large discrepancies in the aggregate national cost estimates are shown to result largely from differences in the engineering cost estimates for arsenic treatment processes. Further research is needed to resolve these discrepancies. Alternative regulatory approaches, such as providing point-of-use treatment or exempting water systems with high household compliance costs, yield only modest improvement in the overall cost-effectiveness of lower standards but are effective at addressing serious affordability problems for the small percentage of (primarily small) water systems where these problems are predicted to occur. The U.S. EPA may wish to provide more explicit guidance to state regulators and to water utilities as to the conditions under which these options will be acceptable.


Subject(s)
Arsenic/standards , Public Policy , Water Supply , Arsenic/adverse effects , Arsenic/economics , Cost-Benefit Analysis , Engineering , Guideline Adherence , Humans , Models, Statistical , Policy Making , Public Health , United States , United States Environmental Protection Agency
SELECTION OF CITATIONS
SEARCH DETAIL