Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 18 de 18
Filter
1.
Emerg Infect Dis ; 29(1): 45-53, 2023 01.
Article in English | MEDLINE | ID: mdl-36573518

ABSTRACT

The continuing circulation and reassortment with low-pathogenicity avian influenza Gs/Gd (goose/Guangdong/1996)-like avian influenza viruses (AIVs) has caused huge economic losses and raised public health concerns over the zoonotic potential. Virologic surveillance of wild birds has been suggested as part of a global AIV surveillance system. However, underreporting and biased selection of sampling sites has rendered gaining information about the transmission and evolution of highly pathogenic AIV problematic. We explored the use of the Citizen Scientist eBird database to elucidate the dynamic distribution of wild birds in Taiwan and their potential for AIV exchange with domestic poultry. Through the 2-stage analytical framework, we associated nonignorable risk with 10 species of wild birds with >100 significant positive results. We generated a risk map, which served as the guide for highly pathogenic AIV surveillance. Our methodologic blueprint has the potential to be incorporated into the global AIV surveillance system of wild birds.


Subject(s)
Influenza A virus , Influenza in Birds , Animals , Taiwan/epidemiology , Phylogeny , Influenza A virus/genetics , Birds , Poultry , Animals, Wild
2.
Biometrics ; 78(2): 598-611, 2022 06.
Article in English | MEDLINE | ID: mdl-33527374

ABSTRACT

Spatial or temporal clustering commonly arises in various biological and ecological applications, for example, species or communities may cluster in groups. In this paper, we develop a new clustered occurrence data model where presence-absence data are modeled under a multivariate negative binomial framework. We account for spatial or temporal clustering by introducing a community parameter in the model that controls the strength of dependence between observations thereby enhancing the estimation of the mean and dispersion parameters. We provide conditions to show the existence of maximum likelihood estimates when cluster sizes are homogeneous and equal to 2 or 3 and consider a composite likelihood approach that allows for additional robustness and flexibility in fitting for clustered occurrence data. The proposed method is evaluated in a simulation study and demonstrated using forest plot data from the Center for Tropical Forest Science. Finally, we present several examples using multiple visit occupancy data to illustrate the difference between the proposed model and those of N-mixture models.


Subject(s)
Likelihood Functions , Cluster Analysis , Computer Simulation
3.
Biom J ; 61(4): 1073-1087, 2019 07.
Article in English | MEDLINE | ID: mdl-31090104

ABSTRACT

Zero-truncated data arises in various disciplines where counts are observed but the zero count category cannot be observed during sampling. Maximum likelihood estimation can be used to model these data; however, due to its nonstandard form it cannot be easily implemented using well-known software packages, and additional programming is often required. Motivated by the Rao-Blackwell theorem, we develop a weighted partial likelihood approach to estimate model parameters for zero-truncated binomial and Poisson data. The resulting estimating function is equivalent to a weighted score function for standard count data models, and allows for applying readily available software. We evaluate the efficiency for this new approach and show that it performs almost as well as maximum likelihood estimation. The weighted partial likelihood approach is then extended to regression modelling and variable selection. We examine the performance of the proposed methods through simulation and present two case studies using real data.


Subject(s)
Biometry/methods , Models, Statistical , Aged , Animals , Female , Humans , Likelihood Functions , Male , Marsupialia , Medicare/statistics & numerical data , Poisson Distribution , Population Density , United States
4.
Biometrics ; 72(4): 1136-1144, 2016 12.
Article in English | MEDLINE | ID: mdl-26953722

ABSTRACT

Longitudinal covariates in survival models are generally analyzed using random effects models. By framing the estimation of these survival models as a functional measurement error problem, semiparametric approaches such as the conditional score or the corrected score can be applied to find consistent estimators for survival model parameters without distributional assumptions on the random effects. However, in order to satisfy the standard assumptions of a survival model, the semiparametric methods in the literature only use covariate data before each event time. This suggests that these methods may make inefficient use of the longitudinal data. We propose an extension of these approaches that follows a generalization of Rao-Blackwell theorem. A Monte Carlo error augmentation procedure is developed to utilize the entirety of longitudinal information available. The efficiency improvement of the proposed semiparametric approach is confirmed theoretically and demonstrated in a simulation study. A real data set is analyzed as an illustration of a practical application.


Subject(s)
Longitudinal Studies , Models, Statistical , Survival Analysis , Acquired Immunodeficiency Syndrome/drug therapy , Biometry/methods , Computer Simulation , Humans , Monte Carlo Method
5.
Biometrics ; 72(4): 1294-1304, 2016 12.
Article in English | MEDLINE | ID: mdl-26909877

ABSTRACT

Individual covariates are commonly used in capture-recapture models as they can provide important information for population size estimation. However, in practice, one or more covariates may be missing at random for some individuals, which can lead to unreliable inference if records with missing data are treated as missing completely at random. We show that, in general, such a naive complete-case analysis in closed capture-recapture models with some covariates missing at random underestimates the population size. We develop methods for estimating regression parameters and population size using regression calibration, inverse probability weighting, and multiple imputation without any distributional assumptions about the covariates. We show that the inverse probability weighting and multiple imputation approaches are asymptotically equivalent. We present a simulation study to investigate the effects of missing covariates and to evaluate the performance of the proposed methods. We also illustrate an analysis using data on the bird species yellow-bellied prinia collected in Hong Kong.


Subject(s)
Data Accuracy , Models, Statistical , Population Density , Regression Analysis , Animals , Birds , Computer Simulation , Hong Kong , Humans , Probability
6.
Biom J ; 58(6): 1409-1427, 2016 Nov.
Article in English | MEDLINE | ID: mdl-27477340

ABSTRACT

The negative binomial distribution is a common model for the analysis of count data in biology and ecology. In many applications, we may not observe the complete frequency count in a quadrat but only that a species occurred in the quadrat. If only occurrence data are available then the two parameters of the negative binomial distribution, the aggregation index and the mean, are not identifiable. This can be overcome by data augmentation or through modeling the dependence between quadrat occupancies. Here, we propose to record the (first) detection time while collecting occurrence data in a quadrat. We show that under what we call proportionate sampling, where the time to survey a region is proportional to the area of the region, that both negative binomial parameters are estimable. When the mean parameter is larger than two, our proposed approach is more efficient than the data augmentation method developed by Solow and Smith (, Am. Nat. 176, 96-98), and in general is cheaper to conduct. We also investigate the effect of misidentification when collecting negative binomially distributed data, and conclude that, in general, the effect can be simply adjusted for provided that the mean and variance of misidentification probabilities are known. The results are demonstrated in a simulation study and illustrated in several real examples.


Subject(s)
Biometry/methods , Models, Statistical , Binomial Distribution , Computer Simulation , Humans , Probability , Selection Bias , Time Factors
7.
Biom J ; 57(2): 321-39, 2015 Mar.
Article in English | MEDLINE | ID: mdl-25394337

ABSTRACT

Good-Turing frequency estimation (Good, ) is a simple, effective method for predicting detection probabilities of objects of both observed and unobserved classes based on observed frequencies of classes in a sample. The method has been used widely in several disciplines, such as information retrieval, computational linguistics, text recognition, and ecological diversity estimation. Nevertheless, existing studies assume sampling with replacement or sampling from an infinite population, which might be inappropriate for many practical applications. In light of this limitation, this article presents a modification of the Good-Turing estimation method to account for finite population sampling. We provide three practical extensions of the modified method, and we examine performance of the modified method and its extensions in simulation experiments.


Subject(s)
Statistics as Topic/methods , Plants , Probability , Sample Size
8.
Article in English | MEDLINE | ID: mdl-39113782

ABSTRACT

A biomarker is a measurable indicator of the severity or presence of a disease or medical condition in biomedical or epidemiological research. Biomarkers may help in early diagnosis and prevention of diseases. Several biomarkers have been identified for many diseases such as carbohydrate antigen 19-9 for pancreatic cancer. However, biomarkers may be measured with errors due to many reasons such as specimen collection or day-to-day within-subject variability of the biomarker, among others. Measurement error in the biomarker leads to bias in the regression parameter estimation for the association of the biomarker with disease in epidemiological studies. In addition, measurement error in the biomarkers may affect standard diagnostic measures to evaluate the performance of biomarkers such as the receiver operating characteristic (ROC) curve, area under the ROC curve, sensitivity, and specificity. Measurement error may also have an effect on how to combine multiple cancer biomarkers as a composite predictor for disease diagnosis. In follow-up studies, biomarkers are often collected intermittently at examination times, which may be sparse and typically biomarkers are not observed at the event times. Joint modeling of longitudinal and time-to-event data is a valid approach to account for measurement error in the analysis of repeatedly measured biomarkers and time-to-event outcomes. In this article, we provide a literature review on existing methods to correct for estimation in regression analysis, diagnostic measures, and joint modeling of longitudinal biomarkers and survival outcomes when the biomarkers are measured with errors. This article is categorized under: Statistical and Graphical Methods of Data Analysis > Robust MethodsStatistical and Graphical Methods of Data Analysis > EM AlgorithmStatistical Models > Survival Models.

9.
PLoS One ; 18(4): e0283798, 2023.
Article in English | MEDLINE | ID: mdl-37011065

ABSTRACT

In regression modelling, measurement error models are often needed to correct for uncertainty arising from measurements of covariates/predictor variables. The literature on measurement error (or errors-in-variables) modelling is plentiful, however, general algorithms and software for maximum likelihood estimation of models with measurement error are not as readily available, in a form that they can be used by applied researchers without relatively advanced statistical expertise. In this study, we develop a novel algorithm for measurement error modelling, which could in principle take any regression model fitted by maximum likelihood, or penalised likelihood, and extend it to account for uncertainty in covariates. This is achieved by exploiting an interesting property of the Monte Carlo Expectation-Maximization (MCEM) algorithm, namely that it can be expressed as an iteratively reweighted maximisation of complete data likelihoods (formed by imputing the missing values). Thus we can take any regression model for which we have an algorithm for (penalised) likelihood estimation when covariates are error-free, nest it within our proposed iteratively reweighted MCEM algorithm, and thus account for uncertainty in covariates. The approach is demonstrated on examples involving generalized linear models, point process models, generalized additive models and capture-recapture models. Because the proposed method uses maximum (penalised) likelihood, it inherits advantageous optimality and inferential properties, as illustrated by simulation. We also study the model robustness of some violations in predictor distributional assumptions. Software is provided as the refitME package on R, whose key function behaves like a refit() function, taking a fitted regression model object and re-fitting with a pre-specified amount of measurement error.


Subject(s)
Algorithms , Motivation , Likelihood Functions , Linear Models , Computer Simulation , Monte Carlo Method , Models, Statistical
10.
Conserv Biol ; 26(1): 47-56, 2012 Feb.
Article in English | MEDLINE | ID: mdl-21797923

ABSTRACT

Assessing species survival status is an essential component of conservation programs. We devised a new statistical method for estimating the probability of species persistence from the temporal sequence of collection dates of museum specimens. To complement this approach, we developed quantitative stopping rules for terminating the search for missing or allegedly extinct species. These stopping rules are based on survey data for counts of co-occurring species that are encountered in the search for a target species. We illustrate both these methods with a case study of the Ivory-billed Woodpecker (Campephilus principalis), long assumed to have become extinct in the United States in the 1950s, but reportedly rediscovered in 2004. We analyzed the temporal pattern of the collection dates of 239 geo-referenced museum specimens collected throughout the southeastern United States from 1853 to 1932 and estimated the probability of persistence in 2011 as <6.4 × 10(-5) , with a probable extinction date no later than 1980. From an analysis of avian census data (counts of individuals) at 4 sites where searches for the woodpecker were conducted since 2004, we estimated that at most 1-3 undetected species may remain in 3 sites (one each in Louisiana, Mississippi, Florida). At a fourth site on the Congaree River (South Carolina), no singletons (species represented by one observation) remained after 15,500 counts of individual birds, indicating that the number of species already recorded (56) is unlikely to increase with additional survey effort. Collectively, these results suggest there is virtually no chance the Ivory-billed Woodpecker is currently extant within its historical range in the southeastern United States. The results also suggest conservation resources devoted to its rediscovery and recovery could be better allocated to other species. The methods we describe for estimating species extinction dates and the probability of persistence are generally applicable to other species for which sufficient museum collections and field census results are available.


Subject(s)
Birds , Conservation of Natural Resources/methods , Extinction, Biological , Models, Statistical , Animals , Southeastern United States
11.
J Agric Biol Environ Stat ; 27(2): 303-320, 2022 Jun.
Article in English | MEDLINE | ID: mdl-35813491

ABSTRACT

Population size estimation is an important research field in biological sciences. In practice, covariates are often measured upon capture on individuals sampled from the population. However, some biological measurements, such as body weight may vary over time within a subject's capture history. This can be treated as a population size estimation problem in the presence of covariate measurement error. We show that if the unobserved true covariate and measurement error are both normally distributed, then a naïve estimator without taking into account measurement error will under-estimate the population size. We then develop new methods to correct for the effect of measurement errors. In particular, we present a conditional score and a nonparametric corrected score approach that are both consistent for population size estimation. Importantly, the proposed approaches do not require the distribution assumption on the true covariates, furthermore the latter does not require normality assumptions on the measurement errors. This is highly relevant in biological applications, as the distribution of covariates is often non-normal or unknown. We investigate finite sample performance of the new estimators via extensive simulated studies. The methods are applied to real data from a capture-recapture study.

12.
Ecology ; 103(12): e3832, 2022 12.
Article in English | MEDLINE | ID: mdl-35876117

ABSTRACT

The time taken to detect a species during site occupancy surveys contains information about the observation process. Accounting for the observation process leads to better inference about site occupancy. We explore the gain in efficiency that can be obtained from time-to-detection (TTD) data and show that this model type has a significant benefit for estimating the parameters related to detection intensity. However, for estimating occupancy probability parameters, the efficiency improvement is generally very minor. To explore whether TTD data could add valuable information when detection intensities vary between sites and surveys, we developed a mixed exponential TTD occupancy model. This new model can simultaneously estimate the detection intensity and aggregation parameters when the number of detectable individuals at the site follows a negative binomial distribution. We found that this model provided a much better description of the occupancy patterns than conventional detection/nondetection methods among 63 bird species data from the Karoo region of South Africa. Ignoring the heterogeneity of detection intensity in the TTD model generally yielded a negative bias in the estimated occupancy probability. Using simulations, we briefly explore study design trade offs between numbers of sites and surveys for different occupancy modeling strategies.


Subject(s)
Birds , Models, Biological , Animals , Probability
13.
Biometrics ; 67(4): 1471-80, 2011 Dec.
Article in English | MEDLINE | ID: mdl-21466529

ABSTRACT

Measurement errors in covariates may result in biased estimates in regression analysis. Most methods to correct this bias assume nondifferential measurement errors-i.e., that measurement errors are independent of the response variable. However, in regression models for zero-truncated count data, the number of error-prone covariate measurements for a given observational unit can equal its response count, implying a situation of differential measurement errors. To address this challenge, we develop a modified conditional score approach to achieve consistent estimation. The proposed method represents a novel technique, with efficiency gains achieved by augmenting random errors, and performs well in a simulation study. The method is demonstrated in an ecology application.


Subject(s)
Anthropometry/methods , Artifacts , Biometry/methods , Body Weight/physiology , Models, Statistical , Animals , Computer Simulation , Mice , Regression Analysis , Reproducibility of Results , Sample Size , Sensitivity and Specificity
14.
Biometrics ; 67(4): 1659-65, 2011 Dec.
Article in English | MEDLINE | ID: mdl-21466530

ABSTRACT

In practice, when analyzing data from a capture-recapture experiment it is tempting to apply modern advanced statistical methods to the observed capture histories. However, unless the analysis takes into account that the data have only been collected from individuals who have been captured at least once, the results may be biased. Without the development of new software packages, methods such as generalized additive models, generalized linear mixed models, and simulation-extrapolation cannot be readily implemented. In contrast, the partial likelihood approach allows the analysis of a capture-recapture experiment to be conducted using commonly available software. Here we examine the efficiency of this approach and apply it to several data sets.


Subject(s)
Censuses , Data Interpretation, Statistical , Emigration and Immigration/statistics & numerical data , Models, Statistical , Population Density , Animals , Computer Simulation , Likelihood Functions
15.
Biometrics ; 66(4): 1052-60, 2010 Dec.
Article in English | MEDLINE | ID: mdl-20002401

ABSTRACT

Many well-known methods are available for estimating the number of species in a forest community. However, most existing methods result in considerable negative bias in applications, where field surveys typically represent only a small fraction of sampled communities. This article develops a new method based on sampling with replacement to estimate species richness via the generalized jackknife procedure. The proposed estimator yields small bias and reasonably accurate interval estimation even with small samples. The performance of the proposed estimator is compared with several typical estimators via simulation study using two complete census datasets from Panama and Malaysia.


Subject(s)
Biodiversity , Trees , Censuses , Databases, Factual , Malaysia , Methods , Models, Biological , Panama
16.
J Epidemiol ; 20(6): 473-9, 2010.
Article in English | MEDLINE | ID: mdl-20827036

ABSTRACT

BACKGROUND: Anti-smoking advertisements are an effective population-based smoking reduction strategy. The Quitline telephone service provides a first point of contact for adults considering quitting. Because of data complexity, the relationship between anti-smoking advertising placement, intensity, and time trends in total call volume is poorly understood. In this study we use a recently developed semi-varying coefficient model to elucidate this relationship. METHODS: Semi-varying coefficient models comprise parametric and nonparametric components. The model is fitted to the daily number of calls to Quitline in Victoria, Australia to estimate a nonparametric long-term trend and parametric terms for day-of-the-week effects and to clarify the relationship with target audience rating points (TARPs) for the Quit and nicotine replacement advertising campaigns. RESULTS: The number of calls to Quitline increased with the TARP value of both the Quit and other smoking cessation advertisement; the TARP values associated with the Quit program were almost twice as effective. The varying coefficient term was statistically significant for peak periods with little or no advertising. CONCLUSIONS: Semi-varying coefficient models are useful for modeling public health data when there is little or no information on other factors related to the at-risk population. These models are well suited to modeling call volume to Quitline, because the varying coefficient allowed the underlying time trend to depend on fixed covariates that also vary with time, thereby explaining more of the variation in the call model.


Subject(s)
Advertising , Health Promotion , Hotlines/statistics & numerical data , Smoking Cessation/methods , Smoking Prevention , Humans , Mass Media , Models, Statistical , Program Evaluation , Time Factors , Victoria
17.
PLoS One ; 7(5): e34179, 2012.
Article in English | MEDLINE | ID: mdl-22666316

ABSTRACT

BACKGROUND: Estimating assemblage species or class richness from samples remains a challenging, but essential, goal. Though a variety of statistical tools for estimating species or class richness have been developed, they are all singly-bounded: assuming only a lower bound of species or classes. Nevertheless there are numerous situations, particularly in the cultural realm, where the maximum number of classes is fixed. For this reason, a new method is needed to estimate richness when both upper and lower bounds are known. METHODOLOGY/PRINCIPAL FINDINGS: Here, we introduce a new method for estimating class richness: doubly-bounded confidence intervals (both lower and upper bounds are known). We specifically illustrate our new method using the Chao1 estimator, rarefaction, and extrapolation, although any estimator of asymptotic richness can be used in our method. Using a case study of Clovis stone tools from the North American Lower Great Lakes region, we demonstrate that singly-bounded richness estimators can yield confidence intervals with upper bound estimates larger than the possible maximum number of classes, while our new method provides estimates that make empirical sense. CONCLUSIONS/SIGNIFICANCE: Application of the new method for constructing doubly-bound richness estimates of Clovis stone tools permitted conclusions to be drawn that were not otherwise possible with singly-bounded richness estimates, namely, that Lower Great Lakes Clovis Paleoindians utilized a settlement pattern that was probably more logistical in nature than residential. However, our new method is not limited to archaeological applications. It can be applied to any set of data for which there is a fixed maximum number of classes, whether that be site occupancy models, commercial products (e.g. athletic shoes), or census information (e.g. nationality, religion, age, race).


Subject(s)
Archaeology/statistics & numerical data , Biodiversity , Censuses , Models, Statistical , Shoes/economics , Statistics, Nonparametric
18.
Biometrics ; 59(4): 1113-22, 2003 Dec.
Article in English | MEDLINE | ID: mdl-14969492

ABSTRACT

We consider estimation problems in capture-recapture models when the covariates or the auxiliary variables are measured with errors. The naive approach, which ignores measurement errors, is found to be unacceptable in the estimation of both regression parameters and population size: it yields estimators with biases increasing with the magnitude of errors, and flawed confidence intervals. To account for measurement errors, we derive a regression parameter estimator using a regression calibration method. We develop modified estimators of the population size accordingly. A simulation study shows that the resulting estimators are more satisfactory than those from either the naive approach or the simulation extrapolation (SIMEX) method. Data from a bird species Prinia flaviventris in Hong Kong are analyzed with and without the assumption of measurement errors, to demonstrate the effects of errors on estimations.


Subject(s)
Biometry/methods , Models, Statistical , Analysis of Variance , Animals , Birds/classification , Computer Simulation , Hong Kong , Population , Probability , Reproducibility of Results , Sample Size , Species Specificity
SELECTION OF CITATIONS
SEARCH DETAIL