RESUMO
HIV estimation using data from the demographic and health surveys (DHS) is limited by the presence of non-response and test refusals. Conventional adjustments such as imputation require the data to be missing at random. Methods that use instrumental variables allow the possibility that prevalence is different between the respondents and non-respondents, but their performance depends critically on the validity of the instrument. Using Manski's partial identification approach, we form instrumental variable bounds for HIV prevalence from a pool of candidate instruments. Our method does not require all candidate instruments to be valid. We use a simulation study to evaluate and compare our method against its competitors. We illustrate the proposed method using DHS data from Zambia, Malawi and Kenya. Our simulations show that imputation leads to seriously biased results even under mild violations of non-random missingness. Using worst case identification bounds that do not make assumptions about the non-response mechanism is robust but not informative. By taking the union of instrumental variable bounds balances informativeness of the bounds and robustness to inclusion of some invalid instruments. Non-response and refusals are ubiquitous in population based HIV data such as those collected under the DHS. Partial identification bounds provide a robust solution to HIV prevalence estimation without strong assumptions. Union bounds are significantly more informative than the worst case bounds without sacrificing credibility.
Assuntos
Simulação por Computador , Infecções por HIV , Inquéritos Epidemiológicos , Humanos , Infecções por HIV/epidemiologia , Quênia/epidemiologia , Prevalência , Malaui/epidemiologia , Modelos Estatísticos , Zâmbia/epidemiologia , Masculino , Feminino , Viés , Interpretação Estatística de DadosRESUMO
We use data from the National Longitudinal Study of Adolescent to Adult Health to investigate whether the quality of tertiary education -measured by college selectivity-causally affects obesity prevalence in the medium run (by age 24-34) and in the longer run (about 10 years later). We use partial identification methods, which allow us, while relying on weak assumptions, to overcome the potential endogeneity of college selectivity as well as the potential violation of the stable unit treatment value assumption due to students interacting with each other, and to obtain informative identification regions for the average treatment effect of college selectivity on obesity. We find that attending a more selective college causally reduces obesity, both in the medium and in the longer run. We provide evidence that the mechanisms through which the impact of college selectivity on obesity operates include an increase in income, a reduction in physical inactivity and in the consumption of fast food and sweetened drinks.
Assuntos
Obesidade , Humanos , Obesidade/prevenção & controle , Obesidade/epidemiologia , Masculino , Estudos Longitudinais , Feminino , Adulto , Universidades , Adolescente , Adulto Jovem , Estudantes/estatística & dados numéricos , Prevalência , Estados Unidos , EscolaridadeRESUMO
Numerical simulations of the global climate system provide inputs to integrated assessment modeling for estimating the impacts of greenhouse gas mitigation and other policies to address global climate change. While essential tools for this purpose, computational climate models are subject to considerable uncertainty, including intermodel "structural" uncertainty. Structural uncertainty analysis has emphasized simple or weighted averaging of the outputs of multimodel ensembles, sometimes with subjective Bayesian assignment of probabilities across models. However, choosing appropriate weights is problematic. To use climate simulations in integrated assessment, we propose, instead, framing climate model uncertainty as a problem of partial identification, or "deep" uncertainty. This terminology refers to situations in which the underlying mechanisms, dynamics, or laws governing a system are not completely known and cannot be credibly modeled definitively even in the absence of data limitations in a statistical sense. We propose the min-max regret (MMR) decision criterion to account for deep climate uncertainty in integrated assessment without weighting climate model forecasts. We develop a theoretical framework for cost-benefit analysis of climate policy based on MMR, and apply it computationally with a simple integrated assessment model. We suggest avenues for further research.
RESUMO
Expanded access to health care often leads to new diagnoses for previously undetected conditions. New diagnoses make it difficult to identify the causal effect of expanding health insurance on individuals with particular diagnoses: the newly diagnosed in the treatment group are likely to differ in unobserved ways from the control group. This paper provides two methods for dealing with this problem depending on the data available to the researcher and diagnosis-specific knowledge. If there is no panel dimension to the data, then the causal effect for the subgroup of interest can be bounded from either above or below depending on the condition in question. If panel data are available, then the newly diagnosed can be identified, and their treated outcomes subtracted from the overall effect of interest. I apply these methods to find that the difference-in-discontinuities estimator underestimates the effect of Medicare prescription drug coverage on the uptake of insulin by first-time users by 20%.
Assuntos
Seguro Saúde , Medicare , Idoso , Humanos , Estados Unidos , Atenção à Saúde , Cobertura do SeguroRESUMO
As a fundamental component of health care, disease screening is of highly importance. Oftentimes, two screening tests for a specific disease are compared in order to determine an optimal screening policy, for example, the digital rectal examination (DRE) and serum prostate specific antigen (PSA) level for screening prostate cancer. Ideally, if a gold standard test is given to each individual being screened to establish their true disease status, the difference in accuracy measures between two tests can be evaluated. In practice, however, it is common that only individuals who test positive on at least one screening test are to receive gold standard tests, which are often invasive and cannot be applied to those with negative results on both tests due to ethical reasons. Under such circumstances, estimates of the differences in accuracy measures between two tests cannot be determined, thus the inference problem within this framework is challenging. In this article, using sensitivity and specificity as measures of test accuracy, we show that their difference between two tests is interval-identified, as bounded by estimable sharp bounds. Here, we develop the asymptotic normality for the estimators of the bounds and construct confidence intervals for the difference by utilizing the method for solving inference problem for partially identified parameters. The performance of constructed confidence intervals for the difference and their sharp bounds are evaluated via simulation studies. We also apply the proposed method to the prostate cancer example to compare the accuracy of DRE and PSA.
Assuntos
Antígeno Prostático Específico , Neoplasias da Próstata , Exame Retal Digital , Detecção Precoce de Câncer , Humanos , Masculino , Programas de Rastreamento/métodos , Neoplasias da Próstata/diagnóstico , Sensibilidade e EspecificidadeRESUMO
The Special Supplemental Nutrition Program for Women, Infants, and Children (WIC) and the National School Lunch Program (NSLP) are designed to increase food security and reduce hunger for children from low-income households. Since the cutoff age for WIC is five, and school enrollment is required for receiving free or reduced-price NSLP, some children from low-income households cannot receive both WIC and free or reduced-price NSLP. Using data from the Current Population Survey, the partial identification method developed in this paper addresses the problems of self-selection into WIC and systematic underreporting of program participation. Due to this loophole in food assistance programs for children, aging out of WIC is found to increase child food insecurity by at least 1.1 percentage points. This result indicates that the prevalence of child food insecurity would decline by 15% if WIC extended its cutoff age until children enroll in kindergarten.
Assuntos
Assistência Alimentar , Envelhecimento , Criança , Feminino , Insegurança Alimentar , Abastecimento de Alimentos , Humanos , Lactente , Almoço , PobrezaRESUMO
Understanding the relationship between disability and employment is critical and has long been the subject of study. However, estimating this relationship is difficult, particularly with survey data, since both disability and employment status are known to be misreported. Here, we use a partial identification approach to bound the joint distribution of disability and employment status in the presence of misclassification. Allowing for a modest amount of misclassification leads to bounds on the labor market status of the disabled that are not overly informative given the relative size of the disabled population. Thus, absent further assumptions, even a modest amount of misclassification creates much uncertainty about the employment gap between the non-disabled and disabled. However, additional assumptions considered are shown to have some identifying power. For example, under our most stringent assumptions, we find that the employment gap is at least 15.2% before the Great Recession and 22.0% afterward.
Assuntos
Pessoas com Deficiência , Emprego , Humanos , Ocupações , Inquéritos e QuestionáriosRESUMO
We propose a partial identification method for estimating disease prevalence from serology studies. Our data are results from antibody tests in some population sample, where the test parameters, such as the true/false positive rates, are unknown. Our method scans the entire parameter space, and rejects parameter values using the joint data density as the test statistic. The proposed method is conservative for marginal inference, in general, but its key advantage over more standard approaches is that it is valid in finite samples even when the underlying model is not point identified. Moreover, our method requires only independence of serology test results, and does not rely on asymptotic arguments, normality assumptions, or other approximations. We use recent Covid-19 serology studies in the US, and show that the parameter confidence set is generally wide, and cannot support definite conclusions. Specifically, recent serology studies from California suggest a prevalence anywhere in the range 0%-2% (at the time of study), and are therefore inconclusive. However, this range could be narrowed down to 0.7%-1.5% if the actual false positive rate of the antibody test was indeed near its empirical estimate ( â¼ 0.5%). In another study from New York state, Covid-19 prevalence is confidently estimated in the range 13%-17% in mid-April of 2020, which also suggests significant geographic variation in Covid-19 exposure across the US. Combining all datasets yields a 5%-8% prevalence range. Our results overall suggest that serology testing on a massive scale can give crucial information for future policy design, even when such tests are imperfect and their parameters unknown.
RESUMO
As a consequence of missing data on tests for infection and imperfect accuracy of tests, reported rates of cumulative population infection by the SARS CoV-2 virus are lower than actual rates of infection. Hence, reported rates of severe illness conditional on infection are higher than actual rates. Understanding the time path of the COVID-19 pandemic has been hampered by the absence of bounds on infection rates that are credible and informative. This paper explains the logical problem of bounding these rates and reports illustrative findings, using data from Illinois, New York, and Italy. We combine the data with assumptions on the infection rate in the untested population and on the accuracy of the tests that appear credible in the current context. We find that the infection rate might be substantially higher than reported. We also find that, assuming accurate reporting of deaths, the infection fatality rates in Illinois, New York, and Italy are substantially lower than reported.
RESUMO
For ordinal outcomes, the average treatment effect is often ill-defined and hard to interpret. Echoing Agresti and Kateri, we argue that the relative treatment effect can be a useful measure, especially for ordinal outcomes, which is defined as γ=pr{Yi(1)>Yi(0)}-pr{Yi(1)Assuntos
Modelos Estatísticos
, Avaliação de Resultados em Cuidados de Saúde/estatística & dados numéricos
, Biometria
, Cárdia
, Causalidade
, Simulação por Computador
, Feminino
, Humanos
, Masculino
, Estudos Observacionais como Assunto/estatística & dados numéricos
, Ensaios Clínicos Controlados Aleatórios como Assunto/estatística & dados numéricos
, Estupro/prevenção & controle
, Neoplasias Gástricas/mortalidade
, Neoplasias Gástricas/terapia
, Resultado do Tratamento
RESUMO
In Mexico little is known about high-altitude glacial psychrotolerant or psychrophilic fungal species, with most glacial fungi isolated from polar environments or Alpine glaciers. It has been documented that some of these species may play an important role in bioremediation of contaminated environments with heavy metals. In the present study, 75 fungi were isolated from glaciers in Citlaltépetl (5675 masl) and Iztaccíhuatl (5286 masl) volcanoes. Combining morphological characteristics and molecular methods, based on ITS rDNA, 38 fungi were partially identified to genus level, 35 belonging to Ascomycota and three to Mucoromycota. The most abundant genera were Cladosporium, followed by Alternaria and Sordariomycetes order. All isolated fungi were psychrotolerant, pigmented and resistant to different concentrations of Cr(III) and Pb(II), while none tolerated Hg(II). Fungi most tolerant to Cr(III) and Pb(II) belong to the genera Stemphylium, Cladosporium and Penicillium and to a lesser extent Aureobasidium and Sordariomycetes. To our knowledge, this is the first report on cultivable mycobiota richness and their Cr and Pb tolerance. The results open new research possibilities about fungal diversity and heavy metals myco-remediation. Extremophilic fungal communities should be further investigated before global warming causes permanent changes and we miss the opportunity to describe these sites in Mexico.
Assuntos
Camada de Gelo , Altitude , Biodegradação Ambiental , Fungos , México , MicobiomaRESUMO
Accurate estimates of the cumulative incidence of SARS-CoV-2 infection remain elusive. Among the reasons for this are that tests for the virus are not randomly administered, and that the most commonly used tests can yield a substantial fraction of false negatives. In this article, we propose a simple and easy-to-use Bayesian model to estimate the infection rate, which is only partially identified. The model is based on the mapping from the fraction of positive test results to the cumulative infection rate, which depends on two unknown quantities: the probability of a false negative test result and a measure of testing bias towards the infected population. Accumulating evidence about SARS-CoV-2 can be incorporated into the model, which will lead to more precise inference about the infection rate.
RESUMO
In the absence of strong assumptions (e.g., exchangeability), only bounds for causal effects can be identified. Here we describe bounds for the risk difference for an effect of a binary exposure on a binary outcome in 4 common study settings: observational studies and randomized studies, each with and without simple random selection from the target population. Through these scenarios, we introduce randomizations for selection and treatment, and the widths of the bounds are narrowed from 2 (the width of the range of the risk difference) to 0 (point identification). We then assess the strength of the assumptions of exchangeability for internal and external validity by comparing their contributions to the widths of the bounds in the setting of an observational study without random selection from the target population. We find that when less than two-thirds of the target population is selected into the study, the assumption of exchangeability for external validity of the risk difference is stronger than that for internal validity. The relative strength of these assumptions should be considered when designing, analyzing, and interpreting observational studies and will aid in determining the best methods for estimating the causal effects of interest.
Assuntos
Causalidade , Métodos Epidemiológicos , Modelos Estatísticos , Estudos Observacionais como Assunto/estatística & dados numéricos , Ensaios Clínicos Controlados Aleatórios como Assunto/estatística & dados numéricos , Humanos , Projetos de PesquisaRESUMO
When assessing association between a binary trait and some covariates, the binary response may be subject to unidirectional misclassification. Unidirectional misclassification can occur when revealing a particular level of the trait is associated with a type of cost, such as a social desirability or financial cost. The feasibility of addressing misclassification is commonly obscured by model identification issues. The current paper attempts to study the efficacy of inference when the binary response variable is subject to unidirectional misclassification. From a theoretical perspective, we demonstrate that the key model parameters possess identifiability, except for the case with a single binary covariate. From a practical standpoint, the logistic model with quantitative covariates can be weakly identified, in the sense that the Fisher information matrix may be near singular. This can make learning some parameters difficult under certain parameter settings, even with quite large samples. In other cases, the stronger identification enables the model to provide more effective adjustment for unidirectional misclassification. An extension to the Poisson approximation of the binomial model reveals the identifiability of the Poisson and zero-inflated Poisson models. For fully identified models, the proposed method adjusts for misclassification based on learning from data. For binary models where there is difficulty in identification, the method is useful for sensitivity analyses on the potential impact from unidirectional misclassification.
Assuntos
Teorema de Bayes , Viés , Análise de Regressão , Simulação por Computador , Humanos , Modelos Estatísticos , Distribuição de PoissonRESUMO
This paper introduces a model-based approach for measuring heterogeneity in sex preferences using birth history records. The approach identifies the combinations of preferences over the sex and number of children that best explain observed childbearing. Empirical estimates indicate that a majority of parents in Africa, Asia, and the Americas consider the sex of children when making childbearing decisions. Many parents prefer sons and many prefer daughters. Comparisons with reported preferences suggest that survey respondents tend to underreport the degree to which they prefer sons or daughters. Estimates indicate that, although sex preferences are widespread, they have little effect on aggregate fertility levels.
RESUMO
We evaluate the impact of dental insurance on the use of dental services using a potential outcomes identification framework designed to handle uncertainty created by unknown counterfactuals-that is, the endogenous selection problem-and uncertainty about the reliability of self-reported insurance status. Using data from the health and retirement study, we estimate that utilization rates of adults older than 50 years would increase from 75% to around 80% under universal dental coverage.
Assuntos
Assistência Odontológica/economia , Assistência Odontológica/estatística & dados numéricos , Cobertura do Seguro/estatística & dados numéricos , Seguro Odontológico/estatística & dados numéricos , Idoso , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Modelos Econométricos , Reprodutibilidade dos TestesRESUMO
The polychoric correlation is a popular measure of association for ordinal data. It estimates a latent correlation, i.e., the correlation of a latent vector. This vector is assumed to be bivariate normal, an assumption that cannot always be justified. When bivariate normality does not hold, the polychoric correlation will not necessarily approximate the true latent correlation, even when the observed variables have many categories. We calculate the sets of possible values of the latent correlation when latent bivariate normality is not necessarily true, but at least the latent marginals are known. The resulting sets are called partial identification sets, and are shown to shrink to the true latent correlation as the number of categories increase. Moreover, we investigate partial identification under the additional assumption that the latent copula is symmetric, and calculate the partial identification set when one variable is ordinal and another is continuous. We show that little can be said about latent correlations, unless we have impractically many categories or we know a great deal about the distribution of the latent vector. An open-source R package is available for applying our results.
Assuntos
PsicometriaRESUMO
Many partial identification problems can be characterized by the optimal value of a function over a set where both the function and set need to be estimated by empirical data. Despite some progress for convex problems, statistical inference in this general setting remains to be developed. To address this, we derive an asymptotically valid confidence interval for the optimal value through an appropriate relaxation of the estimated set. We then apply this general result to the problem of selection bias in population-based cohort studies. We show that existing sensitivity analyses, which are often conservative and difficult to implement, can be formulated in our framework and made significantly more informative via auxiliary information on the population. We conduct a simulation study to evaluate the finite sample performance of our inference procedure, and conclude with a substantive motivating example on the causal effect of education on income in the highly selected UK Biobank cohort. We demonstrate that our method can produce informative bounds using plausible population-level auxiliary constraints. We implement this method in the [Formula: see text] package [Formula: see text].
RESUMO
Capture-recapture (CRC) surveys are used to estimate the size of a population whose members cannot be enumerated directly. CRC surveys have been used to estimate the number of Coronavirus Disease 2019 (COVID-19) infections, people who use drugs, sex workers, conflict casualties, and trafficking victims. When k-capture samples are obtained, counts of unit captures in subsets of samples are represented naturally by a 2k contingency table in which one element-the number of individuals appearing in none of the samples-remains unobserved. In the absence of additional assumptions, the population size is not identifiable (i.e., point identified). Stringent assumptions about the dependence between samples are often used to achieve point identification. However, real-world CRC surveys often use convenience samples in which the assumed dependence cannot be guaranteed, and population size estimates under these assumptions may lack empirical credibility. In this work, we apply the theory of partial identification to show that weak assumptions or qualitative knowledge about the nature of dependence between samples can be used to characterize a nontrivial confidence set for the true population size. We construct confidence sets under bounds on pairwise capture probabilities using two methods: test inversion bootstrap confidence intervals and profile likelihood confidence intervals. Simulation results demonstrate well-calibrated confidence sets for each method. In an extensive real-world study, we apply the new methodology to the problem of using heterogeneous survey data to estimate the number of people who inject drugs in Brussels, Belgium.