Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 14 de 14
Filter
1.
Brief Bioinform ; 23(2)2022 03 10.
Article in English | MEDLINE | ID: mdl-35021184

ABSTRACT

With the increasing volume of human sequencing data available, analysis incorporating external controls becomes a popular and cost-effective approach to boost statistical power in disease association studies. To prevent spurious association due to population stratification, it is important to match the ancestry backgrounds of cases and controls. However, rare variant association tests based on a standard logistic regression model are conservative when all ancestry-matched strata have the same case-control ratio and might become anti-conservative when case-control ratio varies across strata. Under the conditional logistic regression (CLR) model, we propose a weighted burden test (CLR-Burden), a variance component test (CLR-SKAT) and a hybrid test (CLR-MiST). We show that the CLR model coupled with ancestry matching is a general approach to control for population stratification, regardless of the spatial distribution of disease risks. Through extensive simulation studies, we demonstrate that the CLR-based tests robustly control type 1 errors under different matching schemes and are more powerful than the standard Burden, SKAT and MiST tests. Furthermore, because CLR-based tests allow for different case-control ratios across strata, a full-matching scheme can be employed to efficiently utilize all available cases and controls to accelerate the discovery of disease associated genes.


Subject(s)
Models, Genetic , Case-Control Studies , Computer Simulation , Humans , Logistic Models
2.
Bioinformatics ; 38(2): 303-310, 2022 01 03.
Article in English | MEDLINE | ID: mdl-34499127

ABSTRACT

MOTIVATION: Mendelian randomization (MR) is a valuable tool to examine the causal relationships between health risk factors and outcomes from observational studies. Along with the proliferation of genome-wide association studies, a variety of two-sample MR methods for summary data have been developed to account for horizontal pleiotropy (HP), primarily based on the assumption that the effects of variants on exposure (γ) and HP (α) are independent. In practice, this assumption is too strict and can be easily violated because of the correlated HP. RESULTS: To account for this correlated HP, we propose a Bayesian approach, MR-Corr2, that uses the orthogonal projection to reparameterize the bivariate normal distribution for γ and α, and a spike-slab prior to mitigate the impact of correlated HP. We have also developed an efficient algorithm with paralleled Gibbs sampling. To demonstrate the advantages of MR-Corr2 over existing methods, we conducted comprehensive simulation studies to compare for both type-I error control and point estimates in various scenarios. By applying MR-Corr2 to study the relationships between exposure-outcome pairs in complex traits, we did not identify the contradictory causal relationship between HDL-c and CAD. Moreover, the results provide a new perspective of the causal network among complex traits. AVAILABILITY AND IMPLEMENTATION: The developed R package and code to reproduce all the results are available at https://github.com/QingCheng0218/MR.Corr2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genome-Wide Association Study , Mendelian Randomization Analysis , Mendelian Randomization Analysis/methods , Bayes Theorem , Risk Factors , Computer Simulation
3.
Biometrics ; 79(3): 2208-2219, 2023 09.
Article in English | MEDLINE | ID: mdl-35950778

ABSTRACT

Standard Mendelian randomization (MR) analysis can produce biased results if the genetic variant defining an instrumental variable (IV) is confounded and/or has a horizontal pleiotropic effect on the outcome of interest not mediated by the treatment variable. We provide novel identification conditions for the causal effect of a treatment in the presence of unmeasured confounding by leveraging a possibly invalid IV for which both the IV independence and exclusion restriction assumptions may be violated. The proposed Mendelian randomization mixed-scale treatment effect robust identification (MR MiSTERI) approach relies on (i) an assumption that the treatment effect does not vary with the possibly invalid IV on the additive scale; (ii) that the confounding bias does not vary with the possibly invalid IV on the odds ratio scale; and (iii) that the residual variance for the outcome is heteroskedastic with respect to the possibly invalid IV. Although assumptions (i) and (ii) have, respectively, appeared in the IV literature, assumption (iii) has not; we formally establish that their conjunction can identify a causal effect even with an invalid IV. MR MiSTERI is shown to be particularly advantageous in the presence of pervasive heterogeneity of pleiotropic effects on the additive scale. We propose a simple and consistent three-stage estimator that can be used as a preliminary estimator to a carefully constructed efficient one-step-update estimator. In order to incorporate multiple, possibly correlated, and weak invalid IVs, a common challenge in MR studies, we develop a MAny Weak Invalid Instruments (MR MaWII MiSTERI) approach for strengthened identification and improved estimation accuracy. Both simulation studies and UK Biobank data analysis results demonstrate the robustness of the proposed methods.


Subject(s)
Mendelian Randomization Analysis , Mendelian Randomization Analysis/methods , Causality , Computer Simulation , Bias
4.
Stat Med ; 42(4): 422-432, 2023 02 20.
Article in English | MEDLINE | ID: mdl-36502820

ABSTRACT

It is often of interest in the health and social sciences to investigate the joint mediation effects of multiple post-exposure mediating variables. Identification of such joint mediation effects generally require no unmeasured confounding of the outcome with respect to the whole set of mediators. As the number of mediators under consideration grows, this key assumption is likely to be violated as it is often infeasible to intervene on any of the mediators. In this article, we develop a simple two-step method of moments estimation procedure to assess mediation with multiple mediators simultaneously in the presence of potential unmeasured mediator-outcome confounding. Our identification result leverages heterogeneity of the population exposure effect on the mediators, which is plausible under a variety of empirical settings. The proposed estimators are illustrated through both simulations and an application to evaluate the mediating effects of post-traumatic stress disorder symptoms in the association between self-efficacy and fatigue among health care workers during the COVID-19 outbreak.


Subject(s)
COVID-19 , Mediation Analysis , Humans , COVID-19/epidemiology , Disease Outbreaks , Fatigue
5.
PLoS Genet ; 13(9): e1007021, 2017 Sep.
Article in English | MEDLINE | ID: mdl-28961250

ABSTRACT

Knowledge of biological relatedness between samples is important for many genetic studies. In large-scale human genetic association studies, the estimated kinship is used to remove cryptic relatedness, control for family structure, and estimate trait heritability. However, estimation of kinship is challenging for sparse sequencing data, such as those from off-target regions in target sequencing studies, where genotypes are largely uncertain or missing. Existing methods often assume accurate genotypes at a large number of markers across the genome. We show that these methods, without accounting for the genotype uncertainty in sparse sequencing data, can yield a strong downward bias in kinship estimation. We develop a computationally efficient method called SEEKIN to estimate kinship for both homogeneous samples and heterogeneous samples with population structure and admixture. Our method models genotype uncertainty and leverages linkage disequilibrium through imputation. We test SEEKIN on a whole exome sequencing dataset (WES) of Singapore Chinese and Malays, which involves substantial population structure and admixture. We show that SEEKIN can accurately estimate kinship coefficient and classify genetic relatedness using off-target sequencing data down sampled to ~0.15X depth. In application to the full WES dataset without down sampling, SEEKIN also outperforms existing methods by properly analyzing shallow off-target data (~0.75X). Using both simulated and real phenotypes, we further illustrate how our method improves estimation of trait heritability for WES studies.


Subject(s)
Databases, Genetic , Genetics, Population/methods , Genome, Human , Sequence Analysis, DNA , Asian People/genetics , Computational Biology , Exome , Genetic Association Studies , Genotype , Genotyping Techniques , Humans , Linkage Disequilibrium , Models, Genetic , Software
6.
Stat Sin ; 30(3): 1517-1541, 2020 Jul.
Article in English | MEDLINE | ID: mdl-33209012

ABSTRACT

In observational studies, treatments are typically not randomized and therefore estimated treatment effects may be subject to confounding bias. The instrumental variable (IV) design plays the role of a quasi-experimental handle since the IV is associated with the treatment and only affects the outcome through the treatment. In this paper, we present a novel framework for identification and inference using an IV for the marginal average treatment effect amongst the treated (ETT) in the presence of unmeasured confounding. For inference, we propose three different semiparametric approaches: (i) inverse probability weighting (IPW), (ii) outcome regression (OR), and (iii) doubly robust (DR) estimation, which is consistent if either (i) or (ii) is consistent, but not necessarily both. A closed-form locally semiparametric efficient estimator is obtained in the simple case of binary IV and outcome and the efficiency bound is derived for the more general case.

7.
Am J Epidemiol ; 187(3): 576-584, 2018 03 01.
Article in English | MEDLINE | ID: mdl-29165547

ABSTRACT

Epidemiologic studies are frequently susceptible to missing information. Omitting observations with missing variables remains a common strategy in epidemiologic studies, yet this simple approach can often severely bias parameter estimates of interest if the values are not missing completely at random. Even when missingness is completely random, complete-case analysis can reduce the efficiency of estimated parameters, because large amounts of available data are simply tossed out with the incomplete observations. Alternative methods for mitigating the influence of missing information, such as multiple imputation, are becoming an increasing popular strategy in order to retain all available information, reduce potential bias, and improve efficiency in parameter estimation. In this paper, we describe the theoretical underpinnings of multiple imputation, and we illustrate application of this method as part of a collaborative challenge to assess the performance of various techniques for dealing with missing data (Am J Epidemiol. 2018;187(3):568-575). We detail the steps necessary to perform multiple imputation on a subset of data from the Collaborative Perinatal Project (1959-1974), where the goal is to estimate the odds of spontaneous abortion associated with smoking during pregnancy.


Subject(s)
Data Accuracy , Data Interpretation, Statistical , Epidemiologic Research Design , Epidemiologic Studies , Bias , Female , Humans , Pregnancy
8.
Am J Epidemiol ; 187(3): 585-591, 2018 03 01.
Article in English | MEDLINE | ID: mdl-29165557

ABSTRACT

Missing data is a common occurrence in epidemiologic research. In this paper, 3 data sets with induced missing values from the Collaborative Perinatal Project, a multisite US study conducted from 1959 to 1974, are provided as examples of prototypical epidemiologic studies with missing data. Our goal was to estimate the association of maternal smoking behavior with spontaneous abortion while adjusting for numerous confounders. At the same time, we did not necessarily wish to evaluate the joint distribution among potentially unobserved covariates, which is seldom the subject of substantive scientific interest. The inverse probability weighting (IPW) approach preserves the semiparametric structure of the underlying model of substantive interest and clearly separates the model of substantive interest from the model used to account for the missing data. However, IPW often will not result in valid inference if the missing-data pattern is nonmonotone, even if the data are missing at random. We describe a recently proposed approach to modeling nonmonotone missing-data mechanisms under missingness at random to use in constructing the weights in IPW complete-case estimation, and we illustrate the approach using 3 data sets described in a companion article (Am J Epidemiol. 2018;187(3):568-575).


Subject(s)
Data Accuracy , Data Interpretation, Statistical , Probability , Statistics as Topic/methods , Female , Humans , Pregnancy
9.
Am J Epidemiol ; 187(3): 568-575, 2018 03 01.
Article in English | MEDLINE | ID: mdl-29165572

ABSTRACT

Principled methods with which to appropriately analyze missing data have long existed; however, broad implementation of these methods remains challenging. In this and 2 companion papers (Am J Epidemiol. 2018;187(3):576-584 and Am J Epidemiol. 2018;187(3):585-591), we discuss issues pertaining to missing data in the epidemiologic literature. We provide details regarding missing-data mechanisms and nomenclature and encourage the conduct of principled analyses through a detailed comparison of multiple imputation and inverse probability weighting. Data from the Collaborative Perinatal Project, a multisite US study conducted from 1959 to 1974, are used to create a masked data-analytical challenge with missing data induced by known mechanisms. We illustrate the deleterious effects of missing data with naive methods and show how principled methods can sometimes mitigate such effects. For example, when data were missing at random, naive methods showed a spurious protective effect of smoking on the risk of spontaneous abortion (odds ratio (OR) = 0.43, 95% confidence interval (CI): 0.19, 0.93), while implementation of principled methods multiple imputation (OR = 1.30, 95% CI: 0.95, 1.77) or augmented inverse probability weighting (OR = 1.40, 95% CI: 1.00, 1.97) provided estimates closer to the "true" full-data effect (OR = 1.31, 95% CI: 1.05, 1.64). We call for greater acknowledgement of and attention to missing data and for the broad use of principled missing-data methods in epidemiologic research.


Subject(s)
Data Accuracy , Data Interpretation, Statistical , Epidemiologic Research Design , Epidemiologic Studies , Female , Humans , Pregnancy
10.
Stat Sin ; 28(4): 2069-2088, 2018 Oct.
Article in English | MEDLINE | ID: mdl-33994754

ABSTRACT

Nonmonotone missing data arise routinely in empirical studies of social and health sciences, and when ignored, can induce selection bias and loss of efficiency. In practice, it is common to account for nonresponse under a missing-at-random assumption which although convenient, is rarely appropriate when nonresponse is nonmonotone. Likelihood and Bayesian missing data methodologies often require specification of a parametric model for the full data law, thus a priori ruling out any prospect for semiparametric inference. In this paper, we propose an all-purpose approach which delivers semiparametric inferences when missing data are nonmonotone and not at random. The approach is based on a discrete choice model (DCM) as a means to generate a large class of nonmonotone nonresponse mechanisms that are nonignorable. Sufficient conditions for nonparametric identification are given, and a general framework for fully parametric and semiparametric inference under an arbitrary DCM is proposed. Special consideration is given to the case of logit discrete choice nonresponse model (LDCM) for which we describe generalizations of inverse-probability weighting, pattern-mixture estimation, doubly robust estimation and multiply robust estimation.

11.
Stat Sin ; 28(4): 1965-1983, 2018 Oct.
Article in English | MEDLINE | ID: mdl-33335381

ABSTRACT

Missing data occur frequently in empirical studies in health and social sciences, often compromising our ability to make accurate inferences. An outcome is said to be missing not at random (MNAR) if, conditional on the observed variables, the missing data mechanism still depends on the unobserved outcome. In such settings, identification is generally not possible without imposing additional assumptions. Identification is sometimes possible, however, if an instrumental variable (IV) is observed for all subjects which satisfies the exclusion restriction that the IV affects the missingness process without directly influencing the outcome. In this paper, we provide necessary and sufficient conditions for nonparametric identification of the full data distribution under MNAR with the aid of an IV. In addition, we give sufficient identification conditions that are more straightforward to verify in practice. For inference, we focus on estimation of a population outcome mean, for which we develop a suite of semiparametric estimators that extend methods previously developed for data missing at random. Specifically, we propose inverse probability weighted estimation, outcome regression-based estimation and doubly robust estimation of the mean of an outcome subject to MNAR. For illustration, the methods are used to account for selection bias induced by HIV testing refusal in the evaluation of HIV seroprevalence in Mochudi, Botswana, using interviewer characteristics such as gender, age and years of experience as IVs.

12.
Am J Epidemiol ; 186(9): 1097-1103, 2017 Nov 01.
Article in English | MEDLINE | ID: mdl-28595286

ABSTRACT

When a risk factor affects certain categories of a multinomial outcome but not others, outcome heterogeneity is said to be present. A standard epidemiologic approach for modeling risk factors of a categorical outcome typically entails fitting a polytomous logistic regression via maximum likelihood estimation. In this paper, we show that standard polytomous regression is ill equipped to detect outcome heterogeneity and will generally understate the degree to which such heterogeneity may be present. Specifically, nonsaturated polytomous regression will often a priori rule out the possibility of outcome heterogeneity from its parameter space. As a remedy, we propose to model each category of the outcome as a separate binary regression. For full efficiency, we propose to estimate the collection of regression parameters jointly using a constrained Bayesian approach that ensures that one remains within the multinomial model. The approach is straightforward to implement in standard software for Bayesian estimation.


Subject(s)
Bias , Data Interpretation, Statistical , Regression Analysis , Bayes Theorem , Computer Simulation , Coronary Disease/mortality , Effect Modifier, Epidemiologic , Humans , Likelihood Functions , Longitudinal Studies , Meta-Analysis as Topic , Models, Statistical , Neoplasms/mortality , Risk Factors , Stroke/mortality
13.
Am J Epidemiol ; 187(5): 1130-1131, 2018 05 01.
Article in English | MEDLINE | ID: mdl-29528372
14.
J Am Stat Assoc ; 113(521): 369-379, 2018.
Article in English | MEDLINE | ID: mdl-30034062

ABSTRACT

The development of coherent missing data models to account for nonmonotone missing at random (MAR) data by inverse probability weighting (IPW) remains to date largely unresolved. As a consequence, IPW has essentially been restricted for use only in monotone missing data settings. We propose a class of models for nonmonotone missing data mechanisms that spans the MAR model, while allowing the underlying full data law to remain unrestricted. For parametric specifications within the proposed class, we introduce an unconstrained maximum likelihood estimator for estimating the missing data probabilities which can be easily implemented using existing software. To circumvent potential convergence issues with this procedure, we also introduce a Bayesian constrained approach to estimate the missing data process which is guaranteed to yield inferences that respect all model restrictions. The efficiency of the standard IPW estimator is improved by incorporating information from incomplete cases through an augmented estimating equation which is optimal within a large class of estimating equations. We investigate the finite-sample properties of the proposed estimators in a simulation study and illustrate the new methodology in an application evaluating key correlates of preterm delivery for infants born to HIV infected mothers in Botswana, Africa.

SELECTION OF CITATIONS
SEARCH DETAIL