Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 22
Filter
1.
Sci Adv ; 10(22): eadj0266, 2024 May 31.
Article in English | MEDLINE | ID: mdl-38820165

ABSTRACT

Selection bias poses a substantial challenge to valid statistical inference in nonprobability samples. This study compared estimates of the first-dose COVID-19 vaccination rates among Indian adults in 2021 from a large nonprobability sample, the COVID-19 Trends and Impact Survey (CTIS), and a small probability survey, the Center for Voting Options and Trends in Election Research (CVoter), against national benchmark data from the COVID Vaccine Intelligence Network. Notably, CTIS exhibits a larger estimation error on average (0.37) compared to CVoter (0.14). Additionally, we explored the accuracy (regarding mean squared error) of CTIS in estimating successive differences (over time) and subgroup differences (for females versus males) in mean vaccine uptakes. Compared to the overall vaccination rates, targeting these alternative estimands comparing differences or relative differences in two means increased the effective sample size. These results suggest that the Big Data Paradox can manifest in countries beyond the United States and may not apply equally to every estimand of interest.


Subject(s)
Big Data , COVID-19 Vaccines , COVID-19 , SARS-CoV-2 , Vaccination , Humans , COVID-19/epidemiology , COVID-19/prevention & control , COVID-19 Vaccines/administration & dosage , Female , Vaccination/statistics & numerical data , Male , SARS-CoV-2/immunology , Adult , Surveys and Questionnaires , India/epidemiology , Middle Aged
2.
Biometrics ; 80(1)2024 Jan 29.
Article in English | MEDLINE | ID: mdl-38364808

ABSTRACT

We aim to estimate parameters in a generalized linear model (GLM) for a binary outcome when, in addition to the raw data from the internal study, more than 1 external study provides summary information in the form of parameter estimates from fitting GLMs with varying subsets of the internal study covariates. We propose an adaptive penalization method that exploits the external summary information and gains efficiency for estimation, and that is both robust and computationally efficient. The robust property comes from exploiting the relationship between parameters of a GLM and parameters of a GLM with omitted covariates and from downweighting external summary information that is less compatible with the internal data through a penalization. The computational burden associated with searching for the optimal tuning parameter for the penalization is reduced by using adaptive weights and by using an information criterion when searching for the optimal tuning parameter. Simulation studies show that the proposed estimator is robust against various types of population distribution heterogeneity and also gains efficiency compared to direct maximum likelihood estimation. The method is applied to improve a logistic regression model that predicts high-grade prostate cancer making use of parameter estimates from 2 external models.


Subject(s)
Models, Statistical , Male , Humans , Linear Models , Regression Analysis , Likelihood Functions , Logistic Models , Computer Simulation
3.
Psychiatry Res ; 330: 115601, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37976662

ABSTRACT

OBJECTIVES: To compare mortality rates in bipolar disorder with common causes of mortality. METHODS: Observational data from the Prechter Longitudinal Study of Bipolar Disorder (PLS-BD) of 1128 participants including 281 controls was analyzed using logistical regression to quantify mortality rates in comparison with common comorbidities and causes of death. Outcome and treatment measures, including ASRM, GAD-7, PHQ-9 and medication use were used to stratify those with bipolar disorder (BD) that are alive or deceased. A larger cohort of 10,735 existing BD patients with 7,826 controls (no psychiatric diagnosis) from the University of Michigan Health (U-M Health) clinics was used as replication, observational secondary data analysis. RESULTS: The mortality rates are significantly different between those with BD and controls in both PLS-BD and U-M Health. Those with BD and are deceased have a higher percentage of elevated depression measures but show no difference in mania or anxiety measures nor medication use patterns. In both cohorts, a diagnosis of BD increases the odds of mortality greater than history of smoking or being older than ≥ 60-years of age. CONCLUSION: BD was found to increase odds of mortality significantly and beyond that of a history of smoking. This finding was replicated in an independent sample.


Subject(s)
Bipolar Disorder , Humans , Middle Aged , Bipolar Disorder/mortality , Comorbidity , Longitudinal Studies , Observation , Smoking/epidemiology , Risk Factors
5.
Can J Stat ; 51(2): 355-374, 2023 Jun.
Article in English | MEDLINE | ID: mdl-37346757

ABSTRACT

Consider the setting where (i) individual-level data are collected to build a regression model for the association between an event of interest and certain covariates, and (ii) some risk calculators predicting the risk of the event using less detailed covariates are available, possibly as algorithmic black boxes with little information available about how they were built. We propose a general empirical-likelihood-based framework to integrate the rich auxiliary information contained in the calculators into fitting the regression model, to make the estimation of regression parameters more efficient. Two methods are developed, one using working models to extract the calculator information and one making a direct use of calculator predictions without working models. Theoretical and numerical investigations show that the calculator information can substantially reduce the variance of regression parameter estimation. As an application, we study the dependence of the risk of high grade prostate cancer on both conventional risk factors and newly identified molecular biomarkers by integrating information from the Prostate Biopsy Collaborative Group (PBCG) risk calculator, which was built based on conventional risk factors alone.


Insérer votre résumé ici. We will supply a French abstract for those authors who can't prepare it themselves.

6.
Sci Rep ; 13(1): 7318, 2023 05 05.
Article in English | MEDLINE | ID: mdl-37147440

ABSTRACT

As portable chest X-rays are an efficient means of triaging emergent cases, their use has raised the question as to whether imaging carries additional prognostic utility for survival among patients with COVID-19. This study assessed the importance of known risk factors on in-hospital mortality and investigated the predictive utility of radiomic texture features using various machine learning approaches. We detected incremental improvements in survival prognostication utilizing texture features derived from emergent chest X-rays, particularly among older patients or those with a higher comorbidity burden. Important features included age, oxygen saturation, blood pressure, and certain comorbid conditions, as well as image features related to the intensity and variability of pixel distribution. Thus, widely available chest X-rays, in conjunction with clinical information, may be predictive of survival outcomes of patients with COVID-19, especially older, sicker patients, and can aid in disease management by providing additional information.


Subject(s)
COVID-19 , Humans , COVID-19/diagnostic imaging , Prognosis , Hospital Mortality , Machine Learning , Hospitals , Retrospective Studies
7.
Biometrika ; 110(1): 119-134, 2023 Mar.
Article in English | MEDLINE | ID: mdl-36798840

ABSTRACT

We consider the situation of estimating the parameters in a generalized linear prediction model, from an internal dataset, where the outcome variable [Formula: see text] is binary and there are two sets of covariates, [Formula: see text] and [Formula: see text]. We have information from an external study that provides parameter estimates for a generalized linear model of [Formula: see text] on [Formula: see text]. We propose a method that makes limited assumptions about the similarity of the distributions in the two study populations. The method involves orthogonalizing the [Formula: see text] variables and then borrowing information about the ratio of the coefficients from the external model. The method is justified based on a new result relating the parameters in a generalized linear model to the parameters in a generalized linear model with omitted covariates. The method is applicable if the regression coefficients in the [Formula: see text] given [Formula: see text] model are similar in the two populations, up to an unknown scalar constant. This type of transportability between populations is something that can be checked from the available data. The asymptotic variance of the proposed method is derived. The method is evaluated in a simulation study and shown to gain efficiency compared to simple analysis of the internal dataset, and is robust compared to an alternative method of incorporating external information.

8.
Stat Med ; 42(7): 970-992, 2023 03 30.
Article in English | MEDLINE | ID: mdl-36627826

ABSTRACT

There is growing interest in developing causal inference methods for multi-valued treatments with a focus on pairwise average treatment effects. Here we focus on a clinically important, yet less-studied estimand: causal drug-drug interactions (DDIs), which quantifies the degree to which the causal effect of drug A is altered by the presence versus the absence of drug B. Confounding adjustment when studying the effects of DDIs can be accomplished via inverse probability of treatment weighting (IPTW), a standard approach originally developed for binary treatments and later generalized to multi-valued treatments. However, this approach generally results in biased results when the propensity score model is misspecified. Motivated by the need for more robust techniques, we propose two empirical likelihood-based weighting approaches that allow for specifying a set of propensity score models, with the second method balancing user-specified covariates directly, by incorporating additional, nonparametric constraints. The resulting estimators from both methods are consistent when the postulated set of propensity score models contains a correct one; this property has been termed multiple robustness. In this paper, we derive two multiply-robust estimators of the causal DDI, and develop inference procedures. We then evaluate their finite sample performance through simulation. The results demonstrate that the proposed estimators outperform the standard IPTW method in terms of both robustness and efficiency. Finally, we apply the proposed methods to evaluate the impact of renin-angiotensin system inhibitors (RAS-I) on the comparative nephrotoxicity of nonsteroidal anti-inflammatory drugs (NSAID) and opioids, using data derived from electronic medical records from a large multi-hospital health system.


Subject(s)
Models, Statistical , Humans , Likelihood Functions , Data Interpretation, Statistical , Computer Simulation , Drug Interactions
10.
Med Care ; 60(3): 240-247, 2022 03 01.
Article in English | MEDLINE | ID: mdl-34974490

ABSTRACT

BACKGROUND: Renal dialysis is a lifesaving but demanding therapy, requiring 3 weekly treatments of multiple-hour durations. Though travel times and quality of care vary across facilities, the extent to which patients are willing and able to engage in weighing tradeoffs is not known. Since 2015, Medicare has summarized and reported quality data for dialysis facilities using a star rating system. We estimate choice models to assess the relative roles of travel distance and quality of care in explaining patient choice of facility. RESEARCH DESIGN: Using national data on 2 million patient-years from 7198 dialysis facilities and 4-star rating releases, we estimated travel distance to patients' closest facilities, incremental travel distance to the next closest facility with a higher star rating, and the difference in ratings between these 2 facilities. We fit mixed effects logistic regression models predicting whether patients dialyzed at their closest facilities. RESULTS: Median travel distance was 4 times that in rural (10.9 miles) versus urban areas (2.6 miles). Higher differences in rating [odds ratios (OR): 0.56; 95% confidence interval (CI): 0.50-0.62] and greater area deprivation (OR: 0.50; 95% CI: 0.48-0.53) were associated with lower odds of attending one's closest facility. Stratified models were also fit based on urbanicity. For rural patients, excess travel was associated with higher odds of attending the closer facility (per 10 miles; OR: 1.05; 95% CI: 1.04-1.06). Star rating differences were associated with lower odds of receiving care from the closest facility among urban (OR: 0.57; 95% CI: 0.51-0.63) and rural patients (OR: 0.18; 95% CI: 0.08-0.44). CONCLUSIONS: Most dialysis patients have higher rated facilities located not much further than their closest facility, suggesting many patients could evaluate tradeoffs between distance and quality of care in where they receive dialysis. Our results show that such tradeoffs likely occur. Therefore, quality ratings such as the Dialysis Facility Compare (DFC) Star Rating may provide actionable information to patients and caregivers. However, we were not able to assess whether these associations reflect a causal effect of the Star Ratings on patient choice, as the Star Ratings served only as a marker of quality of care.


Subject(s)
Health Services Accessibility/trends , Patient Acceptance of Health Care/psychology , Quality of Health Care , Renal Dialysis/psychology , Travel/psychology , Choice Behavior , Ethnicity/psychology , Ethnicity/statistics & numerical data , Geography , Humans , Medicare , Odds Ratio , Racial Groups/psychology , Racial Groups/statistics & numerical data , Renal Dialysis/standards , Rural Population/statistics & numerical data , United States , Urban Population/statistics & numerical data
11.
Stat Med ; 41(3): 567-579, 2022 02 10.
Article in English | MEDLINE | ID: mdl-34796519

ABSTRACT

In many clinical and observational studies, auxiliary data from the same subjects, such as repeated measurements or surrogate variables, will be collected in addition to the data of main interest. Not directly related to the main study, these auxiliary data in practice are rarely incorporated into the main analysis, though they may carry extra information that can help improve the estimation in the main analysis. Under the setting where part of or all subjects have auxiliary data available, we propose an effective weighting approach to borrow the auxiliary information by building a working model for the auxiliary data, where improvement of estimation precision over the main analysis is guaranteed regardless of the specification of the working model. An information index is also constructed to assess how well the selected working model works to improve the main analysis. Both theoretical and numerical studies show the excellent and robust performance of the proposed method in comparison to estimation without using the auxiliary data. Finally, we utilize the Atherosclerosis Risk in Communities study for illustration.


Subject(s)
Research Design , Computer Simulation , Humans
12.
J Comput Graph Stat ; 31(4): 1063-1075, 2022.
Article in English | MEDLINE | ID: mdl-36644406

ABSTRACT

Penalized regression methods are used in many biomedical applications for variable selection and simultaneous coefficient estimation. However, missing data complicates the implementation of these methods, particularly when missingness is handled using multiple imputation. Applying a variable selection algorithm on each imputed dataset will likely lead to different sets of selected predictors. This paper considers a general class of penalized objective functions which, by construction, force selection of the same variables across imputed datasets. By pooling objective functions across imputations, optimization is then performed jointly over all imputed datasets rather than separately for each dataset. We consider two objective function formulations that exist in the literature, which we will refer to as "stacked" and "grouped" objective functions. Building on existing work, we (a) derive and implement efficient cyclic coordinate descent and majorization-minimization optimization algorithms for continuous and binary outcome data, (b) incorporate adaptive shrinkage penalties, (c) compare these methods through simulation, and (d) develop an R package miselect. Simulations demonstrate that the "stacked" approaches are more computationally efficient and have better estimation and selection properties. We apply these methods to data from the University of Michigan ALS Patients Biorepository aiming to identify the association between environmental pollutants and ALS risk. Supplementary materials are available online.

13.
JAMA Netw Open ; 4(11): e2135379, 2021 11 01.
Article in English | MEDLINE | ID: mdl-34787655

ABSTRACT

Importance: There is a need for studies to evaluate the risk factors for COVID-19 and mortality among the entire Medicare long-term dialysis population using Medicare claims data. Objective: To identify risk factors associated with COVID-19 and mortality in Medicare patients undergoing long-term dialysis. Design, Setting, and Participants: This retrospective, claims-based cohort study compared mortality trends of patients receiving long-term dialysis in 2020 with previous years (2013-2019) and fit Cox regression models to identify risk factors for contracting COVID-19 and postdiagnosis mortality. The cohort included the national population of Medicare patients receiving long-term dialysis in 2020, derived from clinical and administrative databases. COVID-19 was identified through Medicare claims sources. Data were analyzed on May 17, 2021. Main Outcomes and Measures: The 2 main outcomes were COVID-19 and all-cause mortality. Associations of claims-based risk factors with COVID-19 and mortality were investigated prediagnosis and postdiagnosis. Results: Among a total of 498 169 Medicare patients undergoing dialysis (median [IQR] age, 66 [56-74] years; 215 935 [43.1%] women and 283 227 [56.9%] men), 60 090 (12.1%) had COVID-19, among whom 15 612 patients (26.0%) died. COVID-19 rates were significantly higher among Black (21 787 of 165 830 patients [13.1%]) and Hispanic (13 530 of 86 871 patients [15.6%]) patients compared with non-Black patients (38 303 of 332 339 [11.5%]), as well as patients with short (ie, 1-89 days; 7738 of 55 184 patients [14.0%]) and extended (ie, ≥90 days; 10 737 of 30 196 patients [35.6%]) nursing home stays in the prior year. Adjusting for all other risk factors, residing in a nursing home 1 to 89 days in the prior year was associated with a higher hazard for COVID-19 (hazard ratio [HR] vs 0 days, 1.60; 95% CI 1.56-1.65) and for postdiagnosis mortality (HR, 1.31; 95% CI, 1.25-1.37), as was residing in a nursing home for an extended stay (COVID-19: HR, 4.48; 95% CI, 4.37-4.59; mortality: HR, 1.12; 95% CI, 1.07-1.16). Black race (HR vs non-Black: HR, 1.25; 95% CI, 1.23-1.28) and Hispanic ethnicity (HR vs non-Hispanic: HR, 1.68; 95% CI, 1.64-1.72) were associated with significantly higher hazards of COVID-19. Although home dialysis was associated with lower COVID-19 rates (HR, 0.77; 95% CI, 0.75-0.80), it was associated with higher mortality (HR, 1.18; 95% CI, 1.11-1.25). Conclusions and Relevance: These results shed light on COVID-19 risk factors and outcomes among Medicare patients receiving long-term chronic dialysis and could inform policy decisions to mitigate the significant extra burden of COVID-19 and death in this population.


Subject(s)
COVID-19/etiology , Kidney Diseases/mortality , Medicare , Renal Dialysis , Aged , COVID-19/epidemiology , COVID-19/mortality , Ethnicity , Female , Humans , Kidney Diseases/epidemiology , Kidney Diseases/therapy , Male , Middle Aged , Nursing Homes , Proportional Hazards Models , Retrospective Studies , Risk Factors , SARS-CoV-2 , United States/epidemiology
14.
PLoS One ; 16(10): e0258278, 2021.
Article in English | MEDLINE | ID: mdl-34614008

ABSTRACT

BACKGROUND: Understanding risk factors for short- and long-term COVID-19 outcomes have implications for current guidelines and practice. We study whether early identified risk factors for COVID-19 persist one year later and through varying disease progression trajectories. METHODS: This was a retrospective study of 6,731 COVID-19 patients presenting to Michigan Medicine between March 10, 2020 and March 10, 2021. We describe disease progression trajectories from diagnosis to potential hospital admission, discharge, readmission, or death. Outcomes pertained to all patients: rate of medical encounters, hospitalization-free survival, and overall survival, and hospitalized patients: discharge versus in-hospital death and readmission. Risk factors included patient age, sex, race, body mass index, and 29 comorbidity conditions. RESULTS: Younger, non-Black patients utilized healthcare resources at higher rates, while older, male, and Black patients had higher rates of hospitalization and mortality. Diabetes with complications, coagulopathy, fluid and electrolyte disorders, and blood loss anemia were risk factors for these outcomes. Diabetes with complications, coagulopathy, fluid and electrolyte disorders, and blood loss were associated with lower discharge and higher inpatient mortality rates. CONCLUSIONS: This study found differences in healthcare utilization and adverse COVID-19 outcomes, as well as differing risk factors for short- and long-term outcomes throughout disease progression. These findings may inform providers in emergency departments or critical care settings of treatment priorities, empower healthcare stakeholders with effective disease management strategies, and aid health policy makers in optimizing allocations of medical resources.


Subject(s)
COVID-19/epidemiology , Hospitalization , Patient Acceptance of Health Care/statistics & numerical data , Adolescent , COVID-19/diagnosis , Female , Hospital Mortality , Humans , Male , Middle Aged , Prognosis , Retrospective Studies , Risk Factors
15.
Int J Bipolar Disord ; 9(1): 28, 2021 Sep 01.
Article in English | MEDLINE | ID: mdl-34468894

ABSTRACT

BACKGROUND: There is increasing evidence that bipolar disorder is influenced by circadian timing, including the timing of sleep and waking activities. Previous studies in bipolar disorder have shown that people with later timed daily activities, also known as late chronotypes, are at higher risk for subsequent mood episodes over the following 12-18 months. However, these studies were limited to euthymic patients and smaller sample sizes. The aim of the current study was to further examine baseline chronotype as a potentially important predictor of mood-related outcomes in a larger sample of individuals with bipolar disorder and over the longest follow up period to date, of 5 years. Participants included 318 adults diagnosed with bipolar I and II (19-86 years) who were enrolled in the Prechter Longitudinal Study of Bipolar Disorder. RESULTS: Participants with a late chronotype were found to be more likely to have mild to more severe depressive symptoms (PHQ-9 ≥ 5) as captured with PHQ-9 assessments every 2 months over the 5 year follow up period. This higher risk for depressive symptoms remained even after adjusting for age, sex and mood at baseline. Additionally, late chronotypes reported fewer hypomania/mania episodes during the 5 year follow up, as derived from clinical interviews every two years. CONCLUSIONS: These results highlight the potential clinical usefulness of a single self-report question, in identifying patients at risk for a more depressive mood course. The results also suggest that circadian phase advancing treatments, that can shift circadian timing earlier, should be explored as a means to reduce depressive symptoms in late chronotypes with bipolar disorder.

17.
J Affect Disord ; 282: 1226-1233, 2021 03 01.
Article in English | MEDLINE | ID: mdl-33601700

ABSTRACT

OBJECTIVES: To investigate the impact of the SAR-Cov-2 pandemic and lockdown on individuals with bipolar disorder in comparison to healthy controls. METHODS: A longitudinal study of 560 participants including 147 healthy controls was conducted between April 30 and May 30, 2020 during a state-wide lockdown. Bi-weekly measures included the Coronavirus Impact Scale, the Pittsburg Sleep Quality Index, the Patient Health Questionnaire, 9-item, and the Generalized Anxiety Disorder scale, 7-item. Generalized estimating equations method was used to examine the longitudinal change of the measures within the lockdown and the change from pre-pandemic period to pandemic period. RESULTS: All participants reported an impact of lockdown. Individuals with bipolar disorder reported greater impact from the stay-at-home orders with disruptions in routines, income/employment, social support and pandemic related stress. While these measures improved over time, healthy controls recovered quicker and with greater magnitude than persons with bipolar disorder. Comparing mood symptom severity measures in mid-March through May 2020 to the same time window in 2015-2019 (pre- verses post-pandemic), there were no significant differences among individuals with bipolar disorder, whereas healthy controls showed a significant, albeit transient, increase in mood symptoms. CONCLUSION: Everyone was impacted by the SARs-CoV pandemic; however, those with bipolar disorder experienced more life impacting changes from the stay-at-home orders vs healthy controls. These disruptions improved over time but much more slowly than healthy controls. Pre- vs post-pandemic comparisons show a modest but significant increase in mood severity in the healthy controls which was not observed in those with bipolar disorder.


Subject(s)
Bipolar Disorder , COVID-19 , Social Isolation , Bipolar Disorder/epidemiology , Communicable Disease Control , Humans , Longitudinal Studies , Pandemics , SARS-CoV-2
18.
J Affect Disord ; 283: 1-10, 2021 03 15.
Article in English | MEDLINE | ID: mdl-33503551

ABSTRACT

BACKGROUND: Individuals with bipolar disorder (BD) show different personality profiles compared to non-psychiatric populations, but little is known about the temporal stability of personality traits over time, and if changes in mood state drive changes in personality. METHODS: Participants were 533 BD and 185 healthy controls (HC) who completed the NEO-Personality Inventory-Revised (NEO-PI-R) and clinician-administered measures of mood at baseline. One-hundred-eighty BD and 79 HC completed the measures at 5-year follow-up and 60 BD and 16 HC completed the measures at 10-year follow-up. The above measures and demographic information, but not other clinical status indicators the BD illness, were used in analyses. RESULTS: The BD group has higher Neuroticism (N)/N facets and lower Extraversion (E)/E facets and Consciousness (C)/C facets compared to HC. Significant mean-level changes existed within groups but were small in magnitude, and groups showed similar moderate-to-high rank-order stability. Change in (N)/N facets shows an association with change in depression, but changes in all other NEO-PI-R scores are not associated with changes in mood. Personality traits are clinically stable in part of our bipolar sample using clinically relevant interpretation of changes in T scores; however, some BD subjects did show more reliable changes in personality traits than the healthy controls. LIMITATIONS: Reliance on self-report measurement and not all our participants completed the 5- and 10-year follow-up personality assessment who were eligible to do so. CONCLUSIONS: Mean-level and rank-order personality scores show only modest changes, so most personality changes over time are not systematic. Observed changes in personality traits are not explained by changes in mood with the exception of Neuroticism, suggesting other factors influence changes in personality.


Subject(s)
Bipolar Disorder , Affect , Extraversion, Psychological , Humans , Personality , Personality Inventory
19.
Stat Med ; 40(5): 1224-1242, 2021 02 28.
Article in English | MEDLINE | ID: mdl-33410157

ABSTRACT

The inverse probability weighted Cox model is frequently used to estimate the marginal hazard ratio. Its validity requires a crucial condition that the propensity score model be correctly specified. To provide protection against misspecification of the propensity score model, we propose a weighted estimation method rooted in the empirical likelihood theory. The proposed estimator is multiply robust in that it is guaranteed to be consistent when a set of postulated propensity score models contains a correctly specified model. Our simulation studies demonstrate satisfactory finite sample performance of the proposed method in terms of consistency and efficiency. We apply the proposed method to compare the risk of postoperative hospitalization between sleeve gastrectomy and Roux-en-Y gastric bypass using data from a large medical claims and billing database. We further extend the development to multisite studies to enable each site to postulate multiple site-specific propensity score models.


Subject(s)
Models, Statistical , Research Design , Computer Simulation , Humans , Propensity Score , Proportional Hazards Models
20.
Biometrics ; 76(1): 270-280, 2020 03.
Article in English | MEDLINE | ID: mdl-31393001

ABSTRACT

For regression with covariates missing not at random where the missingness depends on the missing covariate values, complete-case (CC) analysis leads to consistent estimation when the missingness is independent of the response given all covariates, but it may not have the desired level of efficiency. We propose a general empirical likelihood framework to improve estimation efficiency over the CC analysis. We expand on methods in Bartlett et al. (2014, Biostatistics 15, 719-730) and Xie and Zhang (2017, Int J Biostat 13, 1-20) that improve efficiency by modeling the missingness probability conditional on the response and fully observed covariates by allowing the possibility of modeling other data distribution-related quantities. We also give guidelines on what quantities to model and demonstrate that our proposal has the potential to yield smaller biases than existing methods when the missingness probability model is incorrect. Simulation studies are presented, as well as an application to data collected from the US National Health and Nutrition Examination Survey.


Subject(s)
Biometry/methods , Regression Analysis , Analysis of Variance , Bias , Computer Simulation , Data Interpretation, Statistical , Humans , Likelihood Functions , Models, Statistical , Nutrition Surveys/statistics & numerical data , Probability , United States
SELECTION OF CITATIONS
SEARCH DETAIL
...