Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 49
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Stat Med ; 43(14): 2695-2712, 2024 Jun 30.
Artigo em Inglês | MEDLINE | ID: mdl-38606437

RESUMO

Our work was motivated by the question whether, and to what extent, well-established risk factors mediate the racial disparity observed for colorectal cancer (CRC) incidence in the United States. Mediation analysis examines the relationships between an exposure, a mediator and an outcome. All available methods require access to a single complete data set with these three variables. However, because population-based studies usually include few non-White participants, these approaches have limited utility in answering our motivating question. Recently, we developed novel methods to integrate several data sets with incomplete information for mediation analysis. These methods have two limitations: (i) they only consider a single mediator and (ii) they require a data set containing individual-level data on the mediator and exposure (and possibly confounders) obtained by independent and identically distributed sampling from the target population. Here, we propose a new method for mediation analysis with several different data sets that accommodates complex survey and registry data, and allows for multiple mediators. The proposed approach yields unbiased causal effects estimates and confidence intervals with nominal coverage in simulations. We apply our method to data from U.S. cancer registries, a U.S.-population-representative survey and summary level odds-ratio estimates, to rigorously evaluate what proportion of the difference in CRC risk between non-Hispanic Whites and Blacks is mediated by three potentially modifiable risk factors (CRC screening history, body mass index, and regular aspirin use).


Assuntos
Neoplasias Colorretais , Análise de Mediação , Humanos , Neoplasias Colorretais/etnologia , Neoplasias Colorretais/epidemiologia , Estados Unidos/epidemiologia , Fatores de Risco , Simulação por Computador , Aspirina/uso terapêutico , Incidência , Sistema de Registros , Disparidades nos Níveis de Saúde , População Branca/estatística & dados numéricos , Feminino , Negro ou Afro-Americano/estatística & dados numéricos , Fonte de Informação
2.
Stat Med ; 42(11): 1641-1668, 2023 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-37183765

RESUMO

Design-based analysis, which accounts for the design features of the study, is commonly used to conduct data analysis in studies with complex survey sampling, such as the Hispanic Community Health Study/Study of Latinos (HCHS/SOL). In this type of longitudinal study, attrition has often been a problem. Although there have been various statistical approaches proposed to handle attrition, such as inverse probability weighting (IPW), non-response cell weighting (NRCW), multiple imputation (MI), and full information maximum likelihood (FIML) approach, there has not been a systematic assessment of these methods to compare their performance in design-based analyses. In this article, we perform extensive simulation studies and compare the performance of different missing data methods in linear and generalized linear population models, and under different missing data mechanism. We find that the design-based analysis is able to produce valid estimation and statistical inference when the missing data are handled appropriately using IPW, NRCW, MI, or FIML approach under missing-completely-at-random or missing-at-random missing mechanism and when the missingness model is correctly specified or over-specified. We also illustrate the use of these methods using data from HCHS/SOL.


Assuntos
Modelos Estatísticos , Humanos , Estudos Longitudinais , Seguimentos , Simulação por Computador , Probabilidade , Modelos Lineares
3.
Stat Med ; 42(11): 1822-1867, 2023 05 20.
Artigo em Inglês | MEDLINE | ID: mdl-36866590

RESUMO

There are established methods for estimating disease prevalence with associated confidence intervals for complex surveys with perfect assays, or simple random sample surveys with imperfect assays. We develop and study methods for the complicated case of complex surveys with imperfect assays. The new methods use the melding method to combine gamma intervals for directly standardized rates and established adjustments for imperfect assays by estimating sensitivity and specificity. One of the new methods appears to have at least nominal coverage in all simulated scenarios. We compare our new methods to established methods in special cases (complex surveys with perfect assays or simple surveys with imperfect assays). In some simulations, our methods appear to guarantee coverage, while competing methods have much lower than nominal coverage, especially when overall prevalence is very low. In other settings, our methods are shown to have higher than nominal coverage. We apply our method to a seroprevalence survey of SARS-CoV-2 in undiagnosed adults in the United States between May and July 2020.


Assuntos
COVID-19 , SARS-CoV-2 , Adulto , Humanos , COVID-19/epidemiologia , Prevalência , Estudos Soroepidemiológicos , Intervalos de Confiança
4.
Biom J ; 65(2): e2200035, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36136044

RESUMO

Web surveys have replaced Face-to-Face and computer assisted telephone interviewing (CATI) as the main mode of data collection in most countries. This trend was reinforced as a consequence of COVID-19 pandemic-related restrictions. However, this mode still faces significant limitations in obtaining probability-based samples of the general population. For this reason, most web surveys rely on nonprobability survey designs. Whereas probability-based designs continue to be the gold standard in survey sampling, nonprobability web surveys may still prove useful in some situations. For instance, when small subpopulations are the group under study and probability sampling is unlikely to meet sample size requirements, complementing a small probability sample with a larger nonprobability one may improve the efficiency of the estimates. Nonprobability samples may also be designed as a mean for compensating for known biases in probability-based web survey samples by purposely targeting respondent profiles that tend to be underrepresented in these surveys. This is the case in the Survey on the impact of the COVID-19 pandemic in Spain (ESPACOV) that motivates this paper. In this paper, we propose a methodology for combining probability and nonprobability web-based survey samples with the help of machine-learning techniques. We then assess the efficiency of the resulting estimates by comparing them with other strategies that have been used before. Our simulation study and the application of the proposed estimation method to the second wave of the ESPACOV Survey allow us to conclude that this is the best option for reducing the biases observed in our data.


Assuntos
COVID-19 , Humanos , COVID-19/epidemiologia , Espanha/epidemiologia , Pandemias , Inquéritos e Questionários , Probabilidade , Aprendizado de Máquina
5.
Biometrics ; 78(1): 227-237, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-33247943

RESUMO

Imputation is a popular technique for handling item nonresponse. Parametric imputation is based on a parametric model for imputation and is not robust against the failure of the imputation model. Nonparametric imputation is fully robust but is not applicable when the dimension of covariates is large due to the curse of dimensionality. Semiparametric imputation is another robust imputation based on a flexible model where the number of model parameters can increase with the sample size. In this paper, we propose a new semiparametric imputation based on a more flexible model assumption than the Gaussian mixture model. In the proposed mixture model, we assume a conditional Gaussian model for the study variable given the auxiliary variables, but the marginal distribution of the auxiliary variables is not necessarily Gaussian. The proposed mixture model is more flexible and achieves a better approximation than the Gaussian mixture models. The proposed method is applicable to high-dimensional covariate problem by including a penalty function in the conditional log-likelihood function. The proposed method is applied to the 2017 Korean Household Income and Expenditure Survey conducted by Statistics Korea.


Assuntos
Modelos Estatísticos , Simulação por Computador , Interpretação Estatística de Dados , Funções Verossimilhança , Tamanho da Amostra
6.
Demography ; 59(3): 995-1022, 2022 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-35466383

RESUMO

We test the effectiveness of a link-tracing sampling approach-network sampling with memory (NSM)-to recruit samples of rare immigrant populations with an application among Chinese immigrants in the Raleigh-Durham area of North Carolina. NSM uses the population network revealed by data from the survey to improve the efficiency of link-tracing sampling and has been shown to substantially reduce design effects in simulated sampling. Our goals are to (1) show that it is possible to recruit a probability sample of a locally rare immigrant group using NSM and achieve high response rates; (2) demonstrate the feasibility of the collection and benefits of new forms of network data that transcend kinship networks in existing surveys and can address unresolved questions about the role of social networks in migration decisions, the maintenance of transnationalism, and the process of social incorporation; and (3) test the accuracy of the NSM approach for recruiting immigrant samples by comparison with the American Community Survey. Our results indicate feasibility, high performance, cost-effectiveness, and accuracy of the NSM approach to sample immigrants for studies of local immigrant communities. This approach can also be extended to recruit multisite samples of immigrants at origin and destination.


Assuntos
Emigração e Imigração , Migrantes , Demografia , Humanos , Dinâmica Populacional , Rede Social
7.
Stat Med ; 40(24): 5237-5250, 2021 10 30.
Artigo em Inglês | MEDLINE | ID: mdl-34219260

RESUMO

Many epidemiologic studies forgo probability sampling and turn to nonprobability volunteer-based samples because of cost, response burden, and invasiveness of biological samples. However, finite population (FP) inference is difficult to make from the nonprobability sample due to the lack of population representativeness. Aiming for making inferences at the population level using nonprobability samples, various inverse propensity score weighting methods have been studied with the propensity defined by the participation rate of population units in the nonprobability sample. In this article, we propose an adjusted logistic propensity weighting (ALP) method to estimate the participation rates for nonprobability sample units. The proposed ALP method is easy to implement by ready-to-use software while producing approximately unbiased estimators for population quantities regardless of the nonprobability sample rate. The efficiency of the ALP estimator can be further improved by scaling the survey sample weights in propensity estimation. Taylor linearization variance estimators are proposed for ALP estimators of FP means that account for all sources of variability. The proposed ALP methods are evaluated numerically via simulation studies and empirically using the naïve unweighted National Health and Nutrition Examination Survey III sample, while taking the 1997 National Health Interview Survey as the reference, to estimate the 15-year mortality rates.


Assuntos
Projetos de Pesquisa , Voluntários , Simulação por Computador , Humanos , Inquéritos Nutricionais , Pontuação de Propensão
8.
BMC Public Health ; 21(1): 1414, 2021 07 17.
Artigo em Inglês | MEDLINE | ID: mdl-34273940

RESUMO

BACKGROUND: Sampling a small number of participants from an entire country is not straightforward. In this case, researchers reluctantly sample from a single setting or few settings, which limits the generalizability of findings. Therefore, there is a need to design efficient sampling method for small sample size surveys that can produce generalizable results at the country level. METHODS: Data comprised of twenty proxy variables to measure health services demands, structures, and outcomes of 413 districts of Iran. We used two data mining methods (hierarchical clustering method (HCM) and model-based clustering method (MCM)) to create homogenous groups of districts, i.e., strata based on these variables. We compared the internal and stability validity of the methods by statistical indices. An expert group checked the face validity of the methods, particularly regarding the total number of strata and the combination of districts in each stratum. The efficiency of selected method, which is measured by the inverse of variance, was compared with a simple random sampling (SRS) through simulation. The sampling design was tested in a national study in Iran, which aimed to evaluate the quality and costs of medical care for eight selected diseases by only recruiting 300 participants per disease at the country level. RESULTS: MCM and HCM divided the districts into eight and two clusters, respectively. The measures of internal and stability validity showed that clusters created by MCM were more separated, compact, and stable, thus forming our optimum strata. The probability of death from stroke, chronic obstructive pulmonary disease, and in-hospital mortality rate were the most important indicators that distinguished the eight strata. Based on the simulation results, MCM increased the efficiency of the sampling design up to 1.7 times compared to SRS. CONCLUSIONS: The use of data mining improved the efficiency of sampling up to 1.7 times greater than SRS and markedly reduced the number of strata to eight in the entire country. The proposed sampling design also identified key variables that could be used to classify districts in Iran for sampling from these target populations in the future studies.


Assuntos
Atenção à Saúde , Análise por Conglomerados , Humanos , Irã (Geográfico) , Reprodutibilidade dos Testes , Tamanho da Amostra
9.
Entropy (Basel) ; 23(3)2021 Mar 08.
Artigo em Inglês | MEDLINE | ID: mdl-33800337

RESUMO

Statistical methods to produce inferences based on samples from finite populations have been available for at least 70 years. Topics such as Survey Sampling and Sampling Theory have become part of the mainstream of the statistical methodology. A wide variety of sampling schemes as well as estimators are now part of the statistical folklore. On the other hand, while the Bayesian approach is now a well-established paradigm with implications in almost every field of the statistical arena, there does not seem to exist a conventional procedure-able to deal with both continuous and discrete variables-that can be used as a kind of default for Bayesian survey sampling, even in the simple random sampling case. In this paper, the Bayesian analysis of samples from finite populations is discussed, its relationship with the notion of superpopulation is reviewed, and a nonparametric approach is proposed. Our proposal can produce inferences for population quantiles and similar quantities of interest in the same way as for population means and totals. Moreover, it can provide results relatively quickly, which may prove crucial in certain contexts such as the analysis of quick counts in electoral settings.

10.
Biometrics ; 76(1): 98-108, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-31444807

RESUMO

Identifiability of statistical models is a fundamental regularity condition that is required for valid statistical inference. Investigation of model identifiability is mathematically challenging for complex models such as latent class models. Jones et al. used Goodman's technique to investigate the identifiability of latent class models with applications to diagnostic tests in the absence of a gold standard test. The tool they used was based on examining the singularity of the Jacobian or the Fisher information matrix, in order to obtain insights into local identifiability (ie, there exists a neighborhood of a parameter such that no other parameter in the neighborhood leads to the same probability distribution as the parameter). In this paper, we investigate a stronger condition: global identifiability (ie, no two parameters in the parameter space give rise to the same probability distribution), by introducing a powerful mathematical tool from computational algebra: the Gröbner basis. With several existing well-known examples, we argue that the Gröbner basis method is easy to implement and powerful to study global identifiability of latent class models, and is an attractive alternative to the information matrix analysis by Rothenberg and the Jacobian analysis by Goodman and Jones et al.


Assuntos
Biometria/métodos , Testes Diagnósticos de Rotina/estatística & dados numéricos , Análise de Classes Latentes , Modelos Estatísticos , Algoritmos , Viés , Simulação por Computador , Testes Diagnósticos de Rotina/normas , Humanos , Reprodutibilidade dos Testes
11.
Stat Sin ; 30(3): 1135-1154, 2020 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-32581492

RESUMO

Data from a large number of covariates with known population totals are frequently observed in survey studies. These auxiliary variables contain valuable information that can be incorporated into estimation of the population total of a survey variable to improve the estimation precision. We consider the generalized regression estimator formulated under the model-assisted framework in which a regression model is utilized to make use of the available covariates while the estimator still has basic design-based properties. The generalized regression estimator has been shown to improve the efficiency of the design-based Horvitz-Thompson estimator when the number of covariates is fixed. In this study, we investigate the performance of the generalized regression estimator when the number of covariates p is allowed to diverge as the sample size n increases. We examine two approaches where the model parameter is estimated using the weighted least squares method when p < n and the LASSO method when the model parameter is sparse. We show that under an assisted model and certain conditions on the joint distribution of the covariates as well as the divergence rates of n and p, the generalized regression estimator is asymptotically more efficient than the Horvitz-Thompson estimator, and is robust against model misspecification. We also study the consistency of variance estimation for the generalized regression estimator. Our theoretical results are corroborated by simulation studies and an example.

12.
Stat Med ; 38(29): 5528-5546, 2019 12 20.
Artigo em Inglês | MEDLINE | ID: mdl-31657494

RESUMO

This paper demonstrates the flexibility of a general approach for the analysis of discrete time competing risks data that can accommodate complex data structures, different time scales for different causes, and nonstandard sampling schemes. The data may involve a single data source where all individuals contribute to analyses of both cause-specific hazard functions, overlapping datasets where some individuals contribute to the analysis of the cause-specific hazard function of only one cause while other individuals contribute to analyses of both cause-specific hazard functions, or separate data sources where each individual contributes to the analysis of the cause-specific hazard function of only a single cause. The approach is modularized into estimation and prediction. For the estimation step, the parameters and the variance-covariance matrix can be estimated using widely available software. The prediction step utilizes a generic program with plug-in estimates from the estimation step. The approach is illustrated with three prognostic models for stage IV male oral cancer using different data structures. The first model uses only men with stage IV oral cancer from population-based registry data. The second model strategically extends the cohort to improve the efficiency of the estimates. The third model improves the accuracy for those with a lower risk of other causes of death, by bringing in an independent data source collected under a complex sampling design with additional other-cause covariates. These analyses represent novel extensions of existing methodology, broadly applicable for the development of prognostic models capturing both the cancer and noncancer aspects of a patient's health.


Assuntos
Sistema de Registros/estatística & dados numéricos , Medição de Risco/estatística & dados numéricos , Idoso , Idoso de 80 Anos ou mais , Bioestatística , Análise de Dados , Humanos , Incidência , Armazenamento e Recuperação da Informação/estatística & dados numéricos , Masculino , Modelos Estatísticos , Neoplasias Bucais/etiologia , Neoplasias Bucais/mortalidade , Neoplasias Bucais/patologia , Análise Multivariada , Prognóstico , Modelos de Riscos Proporcionais , Análise de Regressão , Análise de Sobrevida
13.
Ecol Appl ; 28(6): 1616-1625, 2018 09.
Artigo em Inglês | MEDLINE | ID: mdl-29802750

RESUMO

Statistical models supporting inferences about species occurrence patterns in relation to environmental gradients are fundamental to ecology and conservation biology. A common implicit assumption is that the sampling design is ignorable and does not need to be formally accounted for in analyses. The analyst assumes data are representative of the desired population and statistical modeling proceeds. However, if data sets from probability and non-probability surveys are combined or unequal selection probabilities are used, the design may be non-ignorable. We outline the use of pseudo-maximum likelihood estimation for site-occupancy models to account for such non-ignorable survey designs. This estimation method accounts for the survey design by properly weighting the pseudo-likelihood equation. In our empirical example, legacy and newer randomly selected locations were surveyed for bats to bridge a historic statewide effort with an ongoing nationwide program. We provide a worked example using bat acoustic detection/non-detection data and show how analysts can diagnose whether their design is ignorable. Using simulations we assessed whether our approach is viable for modeling data sets composed of sites contributed outside of a probability design. Pseudo-maximum likelihood estimates differed from the usual maximum likelihood occupancy estimates for some bat species. Using simulations we show the maximum likelihood estimator of species-environment relationships with non-ignorable sampling designs was biased, whereas the pseudo-likelihood estimator was design unbiased. However, in our simulation study the designs composed of a large proportion of legacy or non-probability sites resulted in estimation issues for standard errors. These issues were likely a result of highly variable weights confounded by small sample sizes (5% or 10% sampling intensity and four revisits). Aggregating data sets from multiple sources logically supports larger sample sizes and potentially increases spatial extents for statistical inferences. Our results suggest that ignoring the mechanism for how locations were selected for data collection (e.g., the sampling design) could result in erroneous model-based conclusions. Therefore, in order to ensure robust and defensible recommendations for evidence-based conservation decision-making, the survey design information in addition to the data themselves must be available for analysts. Details for constructing the weights used in estimation and code for implementation are provided.


Assuntos
Ecologia/métodos , Modelos Estatísticos , Animais , Quirópteros
15.
Lifetime Data Anal ; 23(1): 113-135, 2017 01.
Artigo em Inglês | MEDLINE | ID: mdl-27647436

RESUMO

Missing response problem is ubiquitous in survey sampling, medical, social science and epidemiology studies. It is well known that non-ignorable missing is the most difficult missing data problem where the missing of a response depends on its own value. In statistical literature, unlike the ignorable missing data problem, not many papers on non-ignorable missing data are available except for the full parametric model based approach. In this paper we study a semiparametric model for non-ignorable missing data in which the missing probability is known up to some parameters, but the underlying distributions are not specified. By employing Owen (1988)'s empirical likelihood method we can obtain the constrained maximum empirical likelihood estimators of the parameters in the missing probability and the mean response which are shown to be asymptotically normal. Moreover the likelihood ratio statistic can be used to test whether the missing of the responses is non-ignorable or completely at random. The theoretical results are confirmed by a simulation study. As an illustration, the analysis of a real AIDS trial data shows that the missing of CD4 counts around two years are non-ignorable and the sample mean based on observed data only is biased.


Assuntos
Funções Verossimilhança , Modelos Estatísticos , Projetos de Pesquisa , Síndrome da Imunodeficiência Adquirida/terapia , Biometria , Contagem de Linfócito CD4 , Ensaios Clínicos como Assunto , Interpretação Estatística de Dados , Humanos
16.
Stat Med ; 35(18): 3213-28, 2016 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-26910486

RESUMO

Countermatching designs can provide more efficient estimates than simple matching or case-cohort designs in certain situations such as when good surrogate variables for an exposure of interest are available. We extend pseudolikelihood estimation for the Cox model under countermatching designs to models where time-varying covariates are considered. We also implement pseudolikelihood with calibrated weights to improve efficiency in nested case-control designs in the presence of time-varying variables. A simulation study is carried out, which considers four different scenarios including a binary time-dependent variable, a continuous time-dependent variable, and the case including interactions in each. Simulation results show that pseudolikelihood with calibrated weights under countermatching offers large gains in efficiency if compared to case-cohort. Pseudolikelihood with calibrated weights yielded more efficient estimators than pseudolikelihood estimators. Additionally, estimators were more efficient under countermatching than under case-cohort for the situations considered. The methods are illustrated using the Colorado Plateau uranium miners cohort. Furthermore, we present a general method to generate survival times with time-varying covariates. Copyright © 2016 John Wiley & Sons, Ltd.


Assuntos
Estudos de Coortes , Estudos de Casos e Controles , Mineração , Modelos Estatísticos , Exposição Ocupacional , Urânio
17.
BMC Med Res Methodol ; 16(1): 155, 2016 11 15.
Artigo em Inglês | MEDLINE | ID: mdl-27842500

RESUMO

BACKGROUND: The Behavioral Risk Factor Surveillance System (BRFSS) is a network of health-related telephone surveys--conducted by all 50 states, the District of Columbia, and participating US territories-that receive technical assistance from CDC. Data users often aggregate BRFSS state samples for national estimates without accounting for state-level sampling, a practice that could introduce bias because the weighted distributions of the state samples do not always adhere to national demographic distributions. METHODS: This article examines six methods of reweighting, which are then compared with key health indicator estimates from the National Health Interview Survey (NHIS) based on 2013 data. RESULTS: Compared to the usual stacking approach, all of the six new methods reduce the variance of weights and design effect at the national level, and some also reduce the estimated bias. This article also provides a comparison of the methods based on the variances induced by unequal weighting as well as the bias reduction induced by raking at the national level, and recommends a preferred method. CONCLUSIONS: The new method leads to weighted distributions that more accurately reproduce national demographic characteristics. While the empirical results for key estimates were limited to a few health indicators, they also suggest reduction in potential bias and mean squared error. To the extent that survey outcomes are associated with these demographic characteristics, matching the national distributions will reduce bias in estimates of these outcomes at the national level.


Assuntos
Sistema de Vigilância de Fator de Risco Comportamental , Mineração de Dados/métodos , Comportamentos Relacionados com a Saúde , Indicadores Básicos de Saúde , Adulto , Mineração de Dados/estatística & dados numéricos , Feminino , Humanos , Disseminação de Informação/métodos , Masculino , Reprodutibilidade dos Testes , Telefone
18.
Soc Psychiatry Psychiatr Epidemiol ; 51(11): 1547-1557, 2016 11.
Artigo em Inglês | MEDLINE | ID: mdl-27803977

RESUMO

China Mental Health Survey (CMHS), which was carried out from July 2013 to March 2015, was the first national representative community survey of mental disorders and mental health services in China using computer-assisted personal interview (CAPI). Face-to-face interviews were finished in the homes of respondents who were selected from a nationally representative multi-stage disproportionate stratified sampling procedure. Sample selection was integrated with the National Chronic Disease and Risk Factor Surveillance Survey administered by the National Centre for Chronic and Non-communicable Disease Control and Prevention in 2013, which made it possible to obtain both physical and mental health information of Chinese community population. One-stage design of data collection was used in the CMHS to obtain the information of mental disorders, including mood disorders, anxiety disorders, and substance use disorders, while two-stage design was applied for schizophrenia and other psychotic disorders, and dementia. A total of 28,140 respondents finished the survey with 72.9% of the overall response rate. This paper describes the survey mode, fieldwork organization, procedures, and the sample design and weighting of the CMHS. Detailed information is presented on the establishment of a new payment scheme for interviewers, results of the quality control in both stages, and evaluations to the weighting.


Assuntos
Inquéritos Epidemiológicos , Transtornos Mentais/epidemiologia , Serviços de Saúde Mental , Saúde Mental , Adulto , China , Feminino , Humanos , Pessoa de Meia-Idade , Projetos de Pesquisa
19.
Biometrics ; 71(1): 258-266, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25585794

RESUMO

The study of hard-to-reach populations presents significant challenges. Typically, a sampling frame is not available, and population members are difficult to identify or recruit from broader sampling frames. This is especially true of populations at high risk for HIV/AIDS. Respondent-driven sampling (RDS) is often used in such settings with the primary goal of estimating the prevalence of infection. In such populations, the number of people at risk for infection and the number of people infected are of fundamental importance. This article presents a case-study of the estimation of the size of the hard-to-reach population based on data collected through RDS. We study two populations of female sex workers and men-who-have-sex-with-men in El Salvador. The approach is Bayesian and we consider different forms of prior information, including using the UNAIDS population size guidelines for this region. We show that the method is able to quantify the amount of information on population size available in RDS samples. As separate validation, we compare our results to those estimated by extrapolating from a capture-recapture study of El Salvadorian cities. The results of our case-study are largely comparable to those of the capture-recapture study when they differ from the UNAIDS guidelines. Our method is widely applicable to data from RDS studies and we provide a software package to facilitate this.


Assuntos
Interpretação Estatística de Dados , Infecções por HIV/epidemiologia , Homossexualidade Masculina/estatística & dados numéricos , Modelos Estatísticos , Medição de Risco/métodos , População Urbana/estatística & dados numéricos , Simulação por Computador , El Salvador/epidemiologia , Métodos Epidemiológicos , Humanos , Masculino , Prevalência , Reprodutibilidade dos Testes , Tamanho da Amostra , Sensibilidade e Especificidade
20.
Stat Med ; 34(8): 1293-303, 2015 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-25546290

RESUMO

The receiver operating characteristic (ROC) curve can be utilized to evaluate the performance of diagnostic tests. The area under the ROC curve (AUC) is a widely used summary index for comparing multiple ROC curves. Both parametric and nonparametric methods have been developed to estimate and compare the AUCs. However, these methods are usually only applicable to data collected from simple random samples and not surveys and epidemiologic studies that use complex sample designs such as stratified and/or multistage cluster sampling with sample weighting. Such complex samples can inflate variances from intra-cluster correlation and alter the expectations of test statistics because of the use of sample weights that account for differential sampling rates. In this paper, we modify the nonparametric method to incorporate sampling weights to estimate the AUC and employ leaving-one-out jackknife methods along with the balanced repeated replication method to account for the effects of the complex sampling in the variance estimation of our proposed estimators of the AUC. The finite sample properties of our methods are evaluated using simulations, and our methods are illustrated by comparing the estimated AUC for predicting overweight/obesity using different measures of body weight and adiposity among sampled children and adults in the US Hispanic Health and Nutrition Examination Survey.


Assuntos
Adiposidade , Área Sob a Curva , Índice de Massa Corporal , Obesidade/diagnóstico , Adolescente , Adulto , Idoso , Análise de Variância , Viés , Criança , Pré-Escolar , Simulação por Computador , Feminino , Hispânico ou Latino/estatística & dados numéricos , Humanos , Masculino , Pessoa de Meia-Idade , Modelos Estatísticos , Método de Monte Carlo , Inquéritos Nutricionais , Valor Preditivo dos Testes , Curva ROC , Estudos de Amostragem , Estatísticas não Paramétricas , Adulto Jovem
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA