Búsqueda | Portal Regional de la BVS

1.

A general algorithm for error-in-variables regression modelling using Monte Carlo expectation maximization.

Stoklosa, Jakub; Hwang, Wen-Han; Warton, David I.

PLoS One ; 18(4): e0283798, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-37011065

RESUMEN

In regression modelling, measurement error models are often needed to correct for uncertainty arising from measurements of covariates/predictor variables. The literature on measurement error (or errors-in-variables) modelling is plentiful, however, general algorithms and software for maximum likelihood estimation of models with measurement error are not as readily available, in a form that they can be used by applied researchers without relatively advanced statistical expertise. In this study, we develop a novel algorithm for measurement error modelling, which could in principle take any regression model fitted by maximum likelihood, or penalised likelihood, and extend it to account for uncertainty in covariates. This is achieved by exploiting an interesting property of the Monte Carlo Expectation-Maximization (MCEM) algorithm, namely that it can be expressed as an iteratively reweighted maximisation of complete data likelihoods (formed by imputing the missing values). Thus we can take any regression model for which we have an algorithm for (penalised) likelihood estimation when covariates are error-free, nest it within our proposed iteratively reweighted MCEM algorithm, and thus account for uncertainty in covariates. The approach is demonstrated on examples involving generalized linear models, point process models, generalized additive models and capture-recapture models. Because the proposed method uses maximum (penalised) likelihood, it inherits advantageous optimality and inferential properties, as illustrated by simulation. We also study the model robustness of some violations in predictor distributional assumptions. Software is provided as the refitME package on R, whose key function behaves like a refit() function, taking a fitted regression model object and re-fitting with a pre-specified amount of measurement error.

Asunto(s)

Algoritmos , Motivación , Funciones de Verosimilitud , Modelos Lineales , Simulación por Computador , Método de Montecarlo , Modelos Estadísticos

2.

Integrating Citizen Scientist Data into the Surveillance System for Avian Influenza Virus, Taiwan.

Wu, Hong-Dar Isaac; Lin, Ruey-Shing; Hwang, Wen-Han; Huang, Mei-Liang; Chen, Bo-Jia; Yen, Tseng-Chang; Chao, Day-Yu.

Emerg Infect Dis ; 29(1): 45-53, 2023 01.

Artículo en Inglés | MEDLINE | ID: mdl-36573518

RESUMEN

The continuing circulation and reassortment with low-pathogenicity avian influenza Gs/Gd (goose/Guangdong/1996)-like avian influenza viruses (AIVs) has caused huge economic losses and raised public health concerns over the zoonotic potential. Virologic surveillance of wild birds has been suggested as part of a global AIV surveillance system. However, underreporting and biased selection of sampling sites has rendered gaining information about the transmission and evolution of highly pathogenic AIV problematic. We explored the use of the Citizen Scientist eBird database to elucidate the dynamic distribution of wild birds in Taiwan and their potential for AIV exchange with domestic poultry. Through the 2-stage analytical framework, we associated nonignorable risk with 10 species of wild birds with >100 significant positive results. We generated a risk map, which served as the guide for highly pathogenic AIV surveillance. Our methodologic blueprint has the potential to be incorporated into the global AIV surveillance system of wild birds.

Asunto(s)

Virus de la Influenza A , Gripe Aviar , Animales , Taiwán/epidemiología , Filogenia , Virus de la Influenza A/genética , Aves , Aves de Corral , Animales Salvajes

3.

Population Size Estimation using Zero-truncated Poisson Regression with Measurement Error.

Hwang, Wen-Han; Stoklosa, Jakub; Wang, Ching-Yun.

J Agric Biol Environ Stat ; 27(2): 303-320, 2022 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-35813491

RESUMEN

Population size estimation is an important research field in biological sciences. In practice, covariates are often measured upon capture on individuals sampled from the population. However, some biological measurements, such as body weight may vary over time within a subject's capture history. This can be treated as a population size estimation problem in the presence of covariate measurement error. We show that if the unobserved true covariate and measurement error are both normally distributed, then a naïve estimator without taking into account measurement error will under-estimate the population size. We then develop new methods to correct for the effect of measurement errors. In particular, we present a conditional score and a nonparametric corrected score approach that are both consistent for population size estimation. Importantly, the proposed approaches do not require the distribution assumption on the true covariates, furthermore the latter does not require normality assumptions on the measurement errors. This is highly relevant in biological applications, as the distribution of covariates is often non-normal or unknown. We investigate finite sample performance of the new estimators via extensive simulated studies. The methods are applied to real data from a capture-recapture study.

4.

What can occupancy models gain from time-to-detection data?

Priyadarshani, Dinusha; Altwegg, Res; Lee, Alan T K; Hwang, Wen-Han.

Ecology ; 103(12): e3832, 2022 12.

Artículo en Inglés | MEDLINE | ID: mdl-35876117

RESUMEN

The time taken to detect a species during site occupancy surveys contains information about the observation process. Accounting for the observation process leads to better inference about site occupancy. We explore the gain in efficiency that can be obtained from time-to-detection (TTD) data and show that this model type has a significant benefit for estimating the parameters related to detection intensity. However, for estimating occupancy probability parameters, the efficiency improvement is generally very minor. To explore whether TTD data could add valuable information when detection intensities vary between sites and surveys, we developed a mixed exponential TTD occupancy model. This new model can simultaneously estimate the detection intensity and aggregation parameters when the number of detectable individuals at the site follows a negative binomial distribution. We found that this model provided a much better description of the occupancy patterns than conventional detection/nondetection methods among 63 bird species data from the Karoo region of South Africa. Ignoring the heterogeneity of detection intensity in the TTD model generally yielded a negative bias in the estimated occupancy probability. Using simulations, we briefly explore study design trade offs between numbers of sites and surveys for different occupancy modeling strategies.

Asunto(s)

Aves , Modelos Biológicos , Animales , Probabilidad

5.

A model for analyzing clustered occurrence data.

Hwang, Wen-Han; Huggins, Richard; Stoklosa, Jakub.

Biometrics ; 78(2): 598-611, 2022 06.

Artículo en Inglés | MEDLINE | ID: mdl-33527374

RESUMEN

Spatial or temporal clustering commonly arises in various biological and ecological applications, for example, species or communities may cluster in groups. In this paper, we develop a new clustered occurrence data model where presence-absence data are modeled under a multivariate negative binomial framework. We account for spatial or temporal clustering by introducing a community parameter in the model that controls the strength of dependence between observations thereby enhancing the estimation of the mean and dispersion parameters. We provide conditions to show the existence of maximum likelihood estimates when cluster sizes are homogeneous and equal to 2 or 3 and consider a composite likelihood approach that allows for additional robustness and flexibility in fitting for clustered occurrence data. The proposed method is evaluated in a simulation study and demonstrated using forest plot data from the Center for Tropical Forest Science. Finally, we present several examples using multiple visit occupancy data to illustrate the difference between the proposed model and those of N-mixture models.

Asunto(s)

Funciones de Verosimilitud , Análisis por Conglomerados , Simulación por Computador

6.

A weighted partial likelihood approach for zero-truncated models.

Hwang, Wen-Han; Heinze, Dean; Stoklosa, Jakub.

Biom J ; 61(4): 1073-1087, 2019 07.

Artículo en Inglés | MEDLINE | ID: mdl-31090104

RESUMEN

Zero-truncated data arises in various disciplines where counts are observed but the zero count category cannot be observed during sampling. Maximum likelihood estimation can be used to model these data; however, due to its nonstandard form it cannot be easily implemented using well-known software packages, and additional programming is often required. Motivated by the Rao-Blackwell theorem, we develop a weighted partial likelihood approach to estimate model parameters for zero-truncated binomial and Poisson data. The resulting estimating function is equivalent to a weighted score function for standard count data models, and allows for applying readily available software. We evaluate the efficiency for this new approach and show that it performs almost as well as maximum likelihood estimation. The weighted partial likelihood approach is then extended to regression modelling and variable selection. We examine the performance of the proposed methods through simulation and present two case studies using real data.

Asunto(s)

Biometría/métodos , Modelos Estadísticos , Anciano , Animales , Femenino , Humanos , Funciones de Verosimilitud , Masculino , Marsupiales , Medicare/estadística & datos numéricos , Distribución de Poisson , Densidad de Población , Estados Unidos

7.

Estimating negative binomial parameters from occurrence data with detection times.

Hwang, Wen-Han; Huggins, Richard; Stoklosa, Jakub.

Biom J ; 58(6): 1409-1427, 2016 Nov.

Artículo en Inglés | MEDLINE | ID: mdl-27477340

RESUMEN

The negative binomial distribution is a common model for the analysis of count data in biology and ecology. In many applications, we may not observe the complete frequency count in a quadrat but only that a species occurred in the quadrat. If only occurrence data are available then the two parameters of the negative binomial distribution, the aggregation index and the mean, are not identifiable. This can be overcome by data augmentation or through modeling the dependence between quadrat occupancies. Here, we propose to record the (first) detection time while collecting occurrence data in a quadrat. We show that under what we call proportionate sampling, where the time to survey a region is proportional to the area of the region, that both negative binomial parameters are estimable. When the mean parameter is larger than two, our proposed approach is more efficient than the data augmentation method developed by Solow and Smith (, Am. Nat. 176, 96-98), and in general is cheaper to conduct. We also investigate the effect of misidentification when collecting negative binomially distributed data, and conclude that, in general, the effect can be simply adjusted for provided that the mean and variance of misidentification probabilities are known. The results are demonstrated in a simulation study and illustrated in several real examples.

Asunto(s)

Biometría/métodos , Modelos Estadísticos , Distribución Binomial , Simulación por Computador , Humanos , Probabilidad , Sesgo de Selección , Factores de Tiempo

8.

Improving efficiency using the Rao-Blackwell theorem in corrected and conditional score estimation methods for joint models.

Huang, Yih-Huei; Hwang, Wen-Han; Chen, Fei-Yin.

Biometrics ; 72(4): 1136-1144, 2016 12.

Artículo en Inglés | MEDLINE | ID: mdl-26953722

RESUMEN

Longitudinal covariates in survival models are generally analyzed using random effects models. By framing the estimation of these survival models as a functional measurement error problem, semiparametric approaches such as the conditional score or the corrected score can be applied to find consistent estimators for survival model parameters without distributional assumptions on the random effects. However, in order to satisfy the standard assumptions of a survival model, the semiparametric methods in the literature only use covariate data before each event time. This suggests that these methods may make inefficient use of the longitudinal data. We propose an extension of these approaches that follows a generalization of Rao-Blackwell theorem. A Monte Carlo error augmentation procedure is developed to utilize the entirety of longitudinal information available. The efficiency improvement of the proposed semiparametric approach is confirmed theoretically and demonstrated in a simulation study. A real data set is analyzed as an illustration of a practical application.

Asunto(s)

Estudios Longitudinales , Modelos Estadísticos , Análisis de Supervivencia , Síndrome de Inmunodeficiencia Adquirida/tratamiento farmacológico , Biometría/métodos , Simulación por Computador , Humanos , Método de Montecarlo

9.

Estimation in closed capture-recapture models when covariates are missing at random.

Lee, Shen-Ming; Hwang, Wen-Han; de Dieu Tapsoba, Jean.

Biometrics ; 72(4): 1294-1304, 2016 12.

Artículo en Inglés | MEDLINE | ID: mdl-26909877

RESUMEN

Individual covariates are commonly used in capture-recapture models as they can provide important information for population size estimation. However, in practice, one or more covariates may be missing at random for some individuals, which can lead to unreliable inference if records with missing data are treated as missing completely at random. We show that, in general, such a naive complete-case analysis in closed capture-recapture models with some covariates missing at random underestimates the population size. We develop methods for estimating regression parameters and population size using regression calibration, inverse probability weighting, and multiple imputation without any distributional assumptions about the covariates. We show that the inverse probability weighting and multiple imputation approaches are asymptotically equivalent. We present a simulation study to investigate the effects of missing covariates and to evaluate the performance of the proposed methods. We also illustrate an analysis using data on the bird species yellow-bellied prinia collected in Hong Kong.

Asunto(s)

Exactitud de los Datos , Modelos Estadísticos , Densidad de Población , Análisis de Regresión , Animales , Aves , Simulación por Computador , Hong Kong , Humanos , Probabilidad

10.

Good-Turing frequency estimation in a finite population.

Hwang, Wen-Han; Lin, Chih-Wei; Shen, Tsung-Jen.

Biom J ; 57(2): 321-39, 2015 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-25394337

RESUMEN

Good-Turing frequency estimation (Good, ) is a simple, effective method for predicting detection probabilities of objects of both observed and unobserved classes based on observed frequencies of classes in a sample. The method has been used widely in several disciplines, such as information retrieval, computational linguistics, text recognition, and ecological diversity estimation. Nevertheless, existing studies assume sampling with replacement or sampling from an infinite population, which might be inappropriate for many practical applications. In light of this limitation, this article presents a modification of the Good-Turing estimation method to account for finite population sampling. We provide three practical extensions of the modified method, and we examine performance of the modified method and its extensions in simulation experiments.

Asunto(s)

Estadística como Asunto/métodos , Plantas , Probabilidad , Tamaño de la Muestra

11.

Estimating the richness of a population when the maximum number of classes is fixed: a nonparametric solution to an archaeological problem.

Eren, Metin I; Chao, Anne; Hwang, Wen-Han; Colwell, Robert K.

PLoS One ; 7(5): e34179, 2012.

Artículo en Inglés | MEDLINE | ID: mdl-22666316

RESUMEN

BACKGROUND: Estimating assemblage species or class richness from samples remains a challenging, but essential, goal. Though a variety of statistical tools for estimating species or class richness have been developed, they are all singly-bounded: assuming only a lower bound of species or classes. Nevertheless there are numerous situations, particularly in the cultural realm, where the maximum number of classes is fixed. For this reason, a new method is needed to estimate richness when both upper and lower bounds are known. METHODOLOGY/PRINCIPAL FINDINGS: Here, we introduce a new method for estimating class richness: doubly-bounded confidence intervals (both lower and upper bounds are known). We specifically illustrate our new method using the Chao1 estimator, rarefaction, and extrapolation, although any estimator of asymptotic richness can be used in our method. Using a case study of Clovis stone tools from the North American Lower Great Lakes region, we demonstrate that singly-bounded richness estimators can yield confidence intervals with upper bound estimates larger than the possible maximum number of classes, while our new method provides estimates that make empirical sense. CONCLUSIONS/SIGNIFICANCE: Application of the new method for constructing doubly-bound richness estimates of Clovis stone tools permitted conclusions to be drawn that were not otherwise possible with singly-bounded richness estimates, namely, that Lower Great Lakes Clovis Paleoindians utilized a settlement pattern that was probably more logistical in nature than residential. However, our new method is not limited to archaeological applications. It can be applied to any set of data for which there is a fixed maximum number of classes, whether that be site occupancy models, commercial products (e.g. athletic shoes), or census information (e.g. nationality, religion, age, race).

Asunto(s)

Arqueología/estadística & datos numéricos , Biodiversidad , Censos , Modelos Estadísticos , Zapatos/economía , Estadísticas no Paramétricas

12.

Specimen-based modeling, stopping rules, and the extinction of the Ivory-billed Woodpecker.

Gotelli, Nicholas J; Chao, Anne; Colwell, Robert K; Hwang, Wen-Han; Graves, Gary R.

Conserv Biol ; 26(1): 47-56, 2012 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-21797923

RESUMEN

Assessing species survival status is an essential component of conservation programs. We devised a new statistical method for estimating the probability of species persistence from the temporal sequence of collection dates of museum specimens. To complement this approach, we developed quantitative stopping rules for terminating the search for missing or allegedly extinct species. These stopping rules are based on survey data for counts of co-occurring species that are encountered in the search for a target species. We illustrate both these methods with a case study of the Ivory-billed Woodpecker (Campephilus principalis), long assumed to have become extinct in the United States in the 1950s, but reportedly rediscovered in 2004. We analyzed the temporal pattern of the collection dates of 239 geo-referenced museum specimens collected throughout the southeastern United States from 1853 to 1932 and estimated the probability of persistence in 2011 as <6.4 × 10(-5) , with a probable extinction date no later than 1980. From an analysis of avian census data (counts of individuals) at 4 sites where searches for the woodpecker were conducted since 2004, we estimated that at most 1-3 undetected species may remain in 3 sites (one each in Louisiana, Mississippi, Florida). At a fourth site on the Congaree River (South Carolina), no singletons (species represented by one observation) remained after 15,500 counts of individual birds, indicating that the number of species already recorded (56) is unlikely to increase with additional survey effort. Collectively, these results suggest there is virtually no chance the Ivory-billed Woodpecker is currently extant within its historical range in the southeastern United States. The results also suggest conservation resources devoted to its rediscovery and recovery could be better allocated to other species. The methods we describe for estimating species extinction dates and the probability of persistence are generally applicable to other species for which sufficient museum collections and field census results are available.

Asunto(s)

Aves , Conservación de los Recursos Naturales/métodos , Extinción Biológica , Modelos Estadísticos , Animales , Sudeste de Estados Unidos

13.

Differential measurement errors in zero-truncated regression models for count data.

Huang, Yih-Huei; Hwang, Wen-Han; Chen, Fei-Yin.

Biometrics ; 67(4): 1471-80, 2011 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-21466529

RESUMEN

Measurement errors in covariates may result in biased estimates in regression analysis. Most methods to correct this bias assume nondifferential measurement errors-i.e., that measurement errors are independent of the response variable. However, in regression models for zero-truncated count data, the number of error-prone covariate measurements for a given observational unit can equal its response count, implying a situation of differential measurement errors. To address this challenge, we develop a modified conditional score approach to achieve consistent estimation. The proposed method represents a novel technique, with efficiency gains achieved by augmenting random errors, and performs well in a simulation study. The method is demonstrated in an ecology application.

Asunto(s)

Antropometría/métodos , Artefactos , Biometría/métodos , Peso Corporal/fisiología , Modelos Estadísticos , Animales , Simulación por Computador , Ratones , Análisis de Regresión , Reproducibilidad de los Resultados , Tamaño de la Muestra , Sensibilidad y Especificidad

14.

Heterogeneous capture-recapture models with covariates: a partial likelihood approach for closed populations.

Stoklosa, Jakub; Hwang, Wen-Han; Wu, Sheng-Hai; Huggins, Richard.

Biometrics ; 67(4): 1659-65, 2011 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-21466530

RESUMEN

In practice, when analyzing data from a capture-recapture experiment it is tempting to apply modern advanced statistical methods to the observed capture histories. However, unless the analysis takes into account that the data have only been collected from individuals who have been captured at least once, the results may be biased. Without the development of new software packages, methods such as generalized additive models, generalized linear mixed models, and simulation-extrapolation cannot be readily implemented. In contrast, the partial likelihood approach allows the analysis of a capture-recapture experiment to be conducted using commonly available software. Here we examine the efficiency of this approach and apply it to several data sets.

Asunto(s)

Censos , Interpretación Estadística de Datos , Emigración e Inmigración/estadística & datos numéricos , Modelos Estadísticos , Densidad de Población , Animales , Simulación por Computador , Funciones de Verosimilitud

15.

A varying coefficient model to measure the effectiveness of mass media anti-smoking campaigns in generating calls to a Quitline.

Bui, Quang M; Huggins, Richard M; Hwang, Wen-Han; White, Victoria; Erbas, Bircan.

J Epidemiol ; 20(6): 473-9, 2010.

Artículo en Inglés | MEDLINE | ID: mdl-20827036

RESUMEN

BACKGROUND: Anti-smoking advertisements are an effective population-based smoking reduction strategy. The Quitline telephone service provides a first point of contact for adults considering quitting. Because of data complexity, the relationship between anti-smoking advertising placement, intensity, and time trends in total call volume is poorly understood. In this study we use a recently developed semi-varying coefficient model to elucidate this relationship. METHODS: Semi-varying coefficient models comprise parametric and nonparametric components. The model is fitted to the daily number of calls to Quitline in Victoria, Australia to estimate a nonparametric long-term trend and parametric terms for day-of-the-week effects and to clarify the relationship with target audience rating points (TARPs) for the Quit and nicotine replacement advertising campaigns. RESULTS: The number of calls to Quitline increased with the TARP value of both the Quit and other smoking cessation advertisement; the TARP values associated with the Quit program were almost twice as effective. The varying coefficient term was statistically significant for peak periods with little or no advertising. CONCLUSIONS: Semi-varying coefficient models are useful for modeling public health data when there is little or no information on other factors related to the at-risk population. These models are well suited to modeling call volume to Quitline, because the varying coefficient allowed the underlying time trend to depend on fixed covariates that also vary with time, thereby explaining more of the variation in the call model.

Asunto(s)

Publicidad , Promoción de la Salud , Líneas Directas/estadística & datos numéricos , Cese del Hábito de Fumar/métodos , Prevención del Hábito de Fumar , Humanos , Medios de Comunicación de Masas , Modelos Estadísticos , Evaluación de Programas y Proyectos de Salud , Factores de Tiempo , Victoria

16.

Small-sample estimation of species richness applied to forest communities.

Hwang, Wen-Han; Shen, Tsung-Jen.

Biometrics ; 66(4): 1052-60, 2010 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-20002401

RESUMEN

Many well-known methods are available for estimating the number of species in a forest community. However, most existing methods result in considerable negative bias in applications, where field surveys typically represent only a small fraction of sampled communities. This article develops a new method based on sampling with replacement to estimate species richness via the generalized jackknife procedure. The proposed estimator yields small bias and reasonably accurate interval estimation even with small samples. The performance of the proposed estimator is compared with several typical estimators via simulation study using two complete census datasets from Panama and Malaysia.

Asunto(s)

Biodiversidad , Árboles , Censos , Bases de Datos Factuales , Malasia , Métodos , Modelos Biológicos , Panamá

17.

Estimation in capture-recapture models when covariates are subject to measurement errors.

Hwang, Wen-Han; Huang, Steve Y H.

Biometrics ; 59(4): 1113-22, 2003 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-14969492

RESUMEN

We consider estimation problems in capture-recapture models when the covariates or the auxiliary variables are measured with errors. The naive approach, which ignores measurement errors, is found to be unacceptable in the estimation of both regression parameters and population size: it yields estimators with biases increasing with the magnitude of errors, and flawed confidence intervals. To account for measurement errors, we derive a regression parameter estimator using a regression calibration method. We develop modified estimators of the population size accordingly. A simulation study shows that the resulting estimators are more satisfactory than those from either the naive approach or the simulation extrapolation (SIMEX) method. Data from a bird species Prinia flaviventris in Hong Kong are analyzed with and without the assumption of measurement errors, to demonstrate the effects of errors on estimations.

Asunto(s)

Biometría/métodos , Modelos Estadísticos , Análisis de Varianza , Animales , Aves/clasificación , Simulación por Computador , Hong Kong , Población , Probabilidad , Reproducibilidad de los Resultados , Tamaño de la Muestra , Especificidad de la Especie

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA