ABSTRACT
Polygenic risk scores (PRSs) are rapidly emerging as a way to measure disease risk by aggregating multiple genetic variants. Understanding the interplay of the PRS with environmental factors is critical for interpreting and applying PRSs in a wide variety of settings. We develop an efficient method for simultaneously modeling gene-environment correlations and interactions using the PRS in case-control studies. We use a logistic-normal regression modeling framework to specify the disease risk and PRS distribution in the underlying population and propose joint inference across the 2 models using the retrospective likelihood of the case-control data. Extensive simulation studies demonstrate the flexibility of the method in trading-off bias and efficiency for the estimation of various model parameters compared with standard logistic regression or a case-only analysis for gene-environment interactions, or a control-only analysis, for gene-environment correlations. Finally, using simulated case-control data sets within the UK Biobank study, we demonstrate the power of our method for its ability to recover results from the full prospective cohort for the detection of an interaction between long-term oral contraceptive use and the PRS on the risk of breast cancer. This method is computationally efficient and implemented in a user-friendly R package.
Subject(s)
Gene-Environment Interaction , Multifactorial Inheritance , Humans , Case-Control Studies , Multifactorial Inheritance/genetics , Breast Neoplasms/genetics , Female , Logistic Models , Genetic Predisposition to Disease , Computer Simulation , Risk Factors , Models, Genetic , Genetic Risk ScoreABSTRACT
The objective of this study was to examine the impact of methodological changes to the 2018 World Cancer Research Fund/American Institute for Cancer Research (WCRF/AICR) Score on associations with risk for all-cause mortality, cancer mortality, and cancer risk jointly among older adults in the National Institutes of Health (NIH)-AARP Diet and Health Study. Weights were incorporated for each score component; a continuous point scale was developed in place of the score's fully discrete cut points; and cut-point values were changed for physical activity and red meat based on evidence-based recommendations. Exploratory aims also examined the impact of separating components with more than one subcomponent and whether all components were necessary to retain within this population utilizing a penalized scoring approach. Findings suggested weighting the original 2018 WCRF/AICR Score improved its predictive performance in association with all-cause mortality and provided more precise estimates in relation to cancer risk and mortality outcomes. The importance of healthy weight, physical activity, and plant-based foods in relation to cancer and overall mortality risk were highlighted in this population of older adults. Further studies are needed to better understand the consistency and generalizability of these findings across other populations.
Subject(s)
Exercise , Neoplasms , Humans , Neoplasms/mortality , Neoplasms/epidemiology , Male , United States/epidemiology , Female , Aged , Middle Aged , Risk Factors , Diet/statistics & numerical data , Risk Assessment/methods , Cause of DeathABSTRACT
Precise dietary assessment is critical for accurate exposure classification in nutritional research, typically aimed at understanding how diet relates to health. Dietary supplement (DS) use is widespread and represents a considerable source of nutrients. However, few studies have compared the best methods to measure DSs. Our literature review on the relative validity and reproducibility of DS instruments in the United States [e.g., product inventories, questionnaires, and 24-h dietary recalls (24HR)] identified five studies that examined validity (n = 5) and/or reproducibility (n = 4). No gold standard reference method exists for validating DS use; thus, each study's investigators chose the reference instrument used to measure validity. Self-administered questionnaires agreed well with 24HR and inventory methods when comparing the prevalence of commonly used DSs. The inventory method captured nutrient amounts more accurately than the other methods. Reproducibility (over 3 months to 2.4 years) of prevalence of use estimates on the questionnaires was acceptable for common DSs. Given the limited body of research on measurement error in DS assessment, only tentative conclusions on these DS instruments can be drawn at present. Further research is critical to advancing knowledge in DS assessment for research and monitoring purposes.
Subject(s)
Diet , Dietary Supplements , Humans , United States , Reproducibility of Results , Surveys and Questionnaires , NutrientsABSTRACT
OBJECTIVE: To characterize the interplay between multiple medical conditions across sites and account for the heterogeneity in patient population characteristics across sites within a distributed research network, we develop a one-shot algorithm that can efficiently utilize summary-level data from various institutions. By applying our proposed algorithm to a large pediatric cohort across four national Children's hospitals, we replicated a recently published prospective cohort, the RISK study, and quantified the impact of the risk factors associated with the penetrating or stricturing behaviors of pediatric Crohn's disease (PCD). METHODS: In this study, we introduce the ODACoRH algorithm, a one-shot distributed algorithm designed for the competing risks model with heterogeneity. Our approach considers the variability in baseline hazard functions of multiple endpoints of interest across different sites. To accomplish this, we build a surrogate likelihood function by combining patient-level data from the local site with aggregated data from other external sites. We validated our method through extensive simulation studies and replication of the RISK study to investigate the impact of risk factors on the PCD for adolescents and children from four children's hospitals within the PEDSnet, A National Pediatric Learning Health System. To evaluate our ODACoRH algorithm, we compared results from the ODACoRH algorithms with those from meta-analysis as well as those derived from the pooled data. RESULTS: The ODACoRH algorithm had the smallest relative bias to the gold standard method (-0.2%), outperforming the meta-analysis method (-11.4%). In the PCD association study, the estimated subdistribution hazard ratios obtained through the ODACoRH algorithms are identical on par with the results derived from pooled data, which demonstrates the high reliability of our federated learning algorithms. From a clinical standpoint, the identified risk factors for PCD align well with the RISK study published in the Lancet in 2017 and other published studies, supporting the validity of our findings. CONCLUSION: With the ODACoRH algorithm, we demonstrate the capability of effectively integrating data from multiple sites in a decentralized data setting while accounting for between-site heterogeneity. Importantly, our study reveals several crucial clinical risk factors for PCD that merit further investigations.
Subject(s)
Algorithms , Humans , Child , Adolescent , Reproducibility of Results , Computer Simulation , Proportional Hazards Models , Likelihood FunctionsABSTRACT
We consider measurement error models for two variables observed repeatedly and subject to measurement error. One variable is continuous, while the other variable is a mixture of continuous and zero measurements. This second variable has two sources of zeros. The first source is episodic zeros, wherein some of the measurements for an individual may be zero and others positive. The second source is hard zeros, i.e., some individuals will always report zero. An example is the consumption of alcohol from alcoholic beverages: some individuals consume alcoholic beverages episodically, while others never consume alcoholic beverages. However, with a small number of repeat measurements from individuals, it is not possible to determine those who are episodic zeros and those who are hard zeros. We develop a new measurement error model for this problem, and use Bayesian methods to fit it. Simulations and data analyses are used to illustrate our methods. Extensions to parametric models and survival analysis are discussed briefly.
Subject(s)
Bayes Theorem , Models, Statistical , Humans , Computer Simulation , Survival Analysis , Alcohol Drinking , Data Interpretation, StatisticalABSTRACT
Quantile regression is a semiparametric method for modeling associations between variables. It is most helpful when the covariates have complex relationships with the location, scale, and shape of the outcome distribution. Despite the method's robustness to distributional assumptions and outliers in the outcome, regression quantiles may be biased in the presence of measurement error in the covariates. The impact of function-valued covariates contaminated with heteroscedastic error has not yet been examined previously; although, studies have investigated the case of scalar-valued covariates. We present a two-stage strategy to consistently fit linear quantile regression models with a function-valued covariate that may be measured with error. In the first stage, an instrumental variable is used to estimate the covariance matrix associated with the measurement error. In the second stage, simulation extrapolation (SIMEX) is used to correct for measurement error in the function-valued covariate. Point-wise standard errors are estimated by means of nonparametric bootstrap. We present simulation studies to assess the robustness of the measurement error corrected for functional quantile regression. Our methods are applied to National Health and Examination Survey data to assess the relationship between physical activity and body mass index among adults in the United States.
Subject(s)
Regression Analysis , Computer Simulation , Humans , Linear ModelsABSTRACT
BACKGROUND: Dietary supplement (DS) use is widespread in the United States and contributes large amounts of micronutrients to users. Most studies have relied on data from 1 assessment method to characterize the prevalence of DS use. Combining multiple methods enhances the ability to capture nutrient exposures from DSs and examine trends over time. OBJECTIVES: The objective of this study was to characterize DS use and examine trends in any DS as well as micronutrient-containing (MN) DS use in a nationally representative sample of the US population (≥1 y) from the 2007-2018 NHANES using a combined approach. METHODS: NHANES obtains an in-home inventory with a frequency-based dietary supplement and prescription medicine questionnaire (DSMQ), and two 24-h dietary recalls (24HRs). Trends in the prevalence of use and selected types of products used were estimated for the population and by sex, age, race/Hispanic origin, family income [poverty-to-income ratio (PIR)], and household food security (food-secure vs. food-insecure) using the DSMQ or ≥ 1 24HR. Linear trends were tested using orthogonal polynomials (significance set at P < 0.05). RESULTS: DS use increased from 50% in 2007 to 56% in 2018 (P = 0.001); use of MN products increased from 46% to 49% (P = 0.03), and single-nutrient DS (e.g., magnesium, vitamins B-12 and D) use also increased (all P < 0.001). In contrast, multivitamin-mineral use decreased (70% to 56%; P < 0.001). In adults (≥19 y), any (54% to 61%) and MN (49% to 54%) DS use increased, especially in men, non-Hispanic blacks and Hispanics, and low-income adults (PIR ≤130%). In children (1-18 y), any DS use remained stable (â¼38%), as did MN use, except for food-insecure children, whose use increased from 24% to 31% over the decade (P = 0.03). CONCLUSIONS: The prevalence of any and MN DS use increased over time in the United States. This may be partially attributed to increased use of single-nutrient products. Population subgroups differed in their DS use.
Subject(s)
Micronutrients , Trace Elements , Male , Humans , Adult , Child , United States , Nutrition Surveys , Dietary Supplements , Diet , VitaminsABSTRACT
A priori dietary indices provide a standardized, reproducible way to evaluate adherence to dietary recommendations across different populations. Existing nutrient-based indices were developed to reflect food/beverage intake; however, given the high prevalence of dietary supplement (DS) use and its potentially large contribution to nutrient intakes for those that use them, exposure classification without accounting for DS is incomplete. The purpose of this article is to review existing nutrient-based indices and describe the development of the Total Nutrient Index (TNI), an index developed to capture usual intakes from all sources of under-consumed micronutrients among the U.S. population. The TNI assesses U.S. adults' total nutrient intakes relative to recommended nutrient standards for eight under-consumed micronutrients identified by the Dietary Guidelines for Americans: calcium, magnesium, potassium, choline, and vitamins A, C, D, E. The TNI is scored from 0 to 100 (truncated at 100). The mean TNI score of U.S. adults (≥19 y; n = 9,954) based on dietary data from NHANES 2011-2014, was 75.4; the mean score for the index ignoring DS contributions was only 69.0 (t-test; p < 0.001). The TNI extends existing measures of diet quality by including nutrient intakes from all sources and was developed for research, monitoring, and policy purposes.Supplemental data for this article is available online at https://doi.org/10.1080/10408398.2021.1967872.
Subject(s)
Diet , Dietary Exposure , Adult , Humans , United States , Nutrition Surveys , Nutritional Requirements , Dietary Supplements , Vitamins , Micronutrients , Energy IntakeABSTRACT
We consider analyses of case-control studies assembled from electronic health records (EHRs) where the pool of cases is contaminated by patients who are ineligible for the study. These ineligible patients, referred to as "false cases," should be excluded from the analyses if known. However, the true outcome status of a patient in the case pool is unknown except in a subset whose size may be arbitrarily small compared to the entire pool. To effectively remove the influence of the false cases on estimating odds ratio parameters defined by a working association model of the logistic form, we propose a general strategy to adaptively impute the unknown case status without requiring a correct phenotyping model to help discern the true and false case statuses. Our method estimates the target parameters as the solution to a set of unbiased estimating equations constructed using all available data. It outperforms existing methods by achieving robustness to mismodeling the relationship between the outcome status and covariates of interest, as well as improved estimation efficiency. We further show that our estimator is root-n-consistent and asymptotically normal. Through extensive simulation studies and analysis of real EHR data, we demonstrate that our method has desirable robustness to possible misspecification of both the association and phenotyping models, along with statistical efficiency superior to the competitors.
Subject(s)
Electronic Health Records , Models, Statistical , Humans , Computer Simulation , Case-Control StudiesABSTRACT
Neurological dysfunction following viral infection varies among individuals, largely due to differences in their genetic backgrounds. Gait patterns, which can be evaluated using measures of coordination, balance, posture, muscle function, step-to-step variability, and other factors, are also influenced by genetic background. Accordingly, to some extent gait can be characteristic of an individual, even prior to changes in neurological function. Because neuromuscular aspects of gait are under a certain degree of genetic control, the hypothesis tested was that gait parameters could be predictive of neuromuscular dysfunction following viral infection. The Collaborative Cross (CC) mouse resource was utilized to model genetically diverse populations and the DigiGait treadmill system used to provide quantitative and objective measurements of 131 gait parameters in 142 mice from 23 CC and SJL/J strains. DigiGait measurements were taken prior to infection with the neurotropic virus Theiler's Murine Encephalomyelitis Virus (TMEV). Neurological phenotypes were recorded over 90 days post-infection (d.p.i.), and the cumulative frequency of the observation of these phenotypes was statistically associated with discrete baseline DigiGait measurements. These associations represented spatial and postural aspects of gait influenced by the 90 d.p.i. phenotype score. Furthermore, associations were found between these gait parameters with sex and outcomes considered to show resistance, resilience, or susceptibility to severe neurological symptoms after long-term infection. For example, higher pre-infection measurement values for the Paw Drag parameter corresponded with greater disease severity at 90 d.p.i. Quantitative trait loci significantly associated with these DigiGait parameters revealed potential relationships between 28 differentially expressed genes (DEGs) and different aspects of gait influenced by viral infection. Thus, these potential candidate genes and genetic variations may be predictive of long-term neurological dysfunction. Overall, these findings demonstrate the predictive/prognostic value of quantitative and objective pre-infection DigiGait measurements for viral-induced neuromuscular dysfunction.
Subject(s)
Theilovirus , Virus Diseases , Mice , Animals , Virus Diseases/genetics , Mice, Inbred Strains , Quantitative Trait Loci , GaitABSTRACT
Huntington disease is an autosomal dominant, neurodegenerative disease without clearly identified biomarkers for when motor-onset occurs. Current standards to determine motor-onset rely on a clinician's subjective judgment that a patient's extrapyramidal signs are unequivocally associated with Huntington disease. This subjectivity can lead to error which could be overcome using an objective, data-driven metric that determines motor-onset. Recent studies of motor-sign decline-the longitudinal degeneration of motor-ability in patients-have revealed that motor-onset is closely related to an inflection point in its longitudinal trajectory. We propose a nonlinear location-shift marker model that captures this motor-sign decline and assesses how its inflection point is linked to other markers of Huntington disease progression. We propose two estimating procedures to estimate this model and its inflection point: one is a parametric method using nonlinear mixed effects model and the other one is a multi-stage nonparametric approach, which we developed. In an empirical study, the parametric approach was sensitive to correct specification of the mean structure of the longitudinal data. In contrast, our multi-stage nonparametric procedure consistently produced unbiased estimates regardless of the true mean structure. Applying our multi-stage nonparametric estimator to Neurobiological Predictors of Huntington Disease, a large observational study of Huntington disease, leads to earlier prediction of motor-onset compared to the clinician's subjective judgment.
Subject(s)
Huntington Disease , Neurodegenerative Diseases , Biomarkers , Disease Progression , Humans , Huntington Disease/diagnosis , Huntington Disease/genetics , Nonlinear DynamicsABSTRACT
BACKGROUND: Most dietary indices reflect foods and beverages and do not include exposures from dietary supplements (DS) that provide substantial amounts of micronutrients. A nutrient-based approach that captures total intake inclusive of DS can strengthen exposure assessment. OBJECTIVES: We examined the construct and criterion validity of the Total Nutrient Index (TNI) among US adults (≥19 years; nonpregnant or lactating). METHODS: The TNI includes 8 underconsumed micronutrients identified by the Dietary Guidelines for Americans: calcium; magnesium; potassium; choline; and vitamins A, C, D, and E. The TNI is expressed as a percentage of the RDA or Adequate Intake to compute micronutrient component scores; the mean of the component scores yields the TNI score, ranging from 0-100. Data from exemplary menus and the 2003-2006 (≥19 years; n = 8861) and 2011-2014 NHANES (≥19 years; n = 9954) were employed. Exemplary menus were used to determine whether the TNI yielded high scores from dietary sources (women, 31-50 years; men ≥ 70 years). TNI scores were correlated with Healthy Eating Index (HEI) 2015 overall and component scores for dairy, fruits, and vegetables; TNI component scores for vitamins A, C, D, and E were correlated with respective biomarker data. TNI scores were compared between groups with known differences in nutrient intake based on the literature. RESULTS: The TNI yielded high scores on exemplary menus (84.8-93.3/100) and was moderately correlated (r = 0.48) with the HEI-2015. Mean TNI scores were significantly different for DS users (83.5) compared with nonusers (67.1); nonsmokers (76.8) compared with smokers (70.3); and those living with food security (76.6) compared with food insecurity (69.1). Correlations of TNI vitamin component scores with available biomarkers ranged from 0.12 (α-tocopherol) to 0.36 (serum 25-hydroxyvitamin D), and were significantly higher than correlations obtained from the diet alone. CONCLUSIONS: The evaluation of validity supports that the TNI is a useful construct to assess total micronutrient exposures of underconsumed micronutrients among US adults.
Subject(s)
Micronutrients , Trace Elements , Adult , Diet , Dietary Supplements , Female , Humans , Lactation , Male , Nutrients , Nutrition Surveys , United States , Vitamin A , VitaminsABSTRACT
Data with a huge size present great challenges in modeling, inferences, and computation. In handling big data, much attention has been directed to settings with "large p small n", and relatively less work has been done to address problems with p and n being both large, though data with such a feature have now become more accessible than before, where p represents the number of variables and n stands for the sample size. The big volume of data does not automatically ensure good quality of inferences because a large number of unimportant variables may be collected in the process of gathering informative variables. To carry out valid statistical analysis, it is imperative to screen out noisy variables that have no predictive value for explaining the outcome variable. In this paper, we develop a screening method for handling large-sized survival data, where the sample size n is large and the dimension p of covariates is of non-polynomial order of the sample size n, or the so-called NP-dimension. We rigorously establish theoretical results for the proposed method and conduct numerical studies to assess its performance. Our research offers multiple extensions of existing work and enlarges the scope of high-dimensional data analysis. The proposed method capitalizes on the connections among useful regression settings and offers a computationally efficient screening procedure. Our method can be applied to different situations with large-scale data including genomic data.
Subject(s)
Genome , Genomics , Proportional Hazards Models , Sample SizeABSTRACT
The identification of valid surrogate markers of disease or disease progression has the potential to decrease the length and costs of future studies. Most available methods that assess the value of a surrogate marker ignore the fact that surrogates are often measured with error. Failing to adjust for measurement error can erroneously identify a useful surrogate marker as not useful or vice versa. We investigate and propose robust methods to correct for the effect of measurement error when evaluating a surrogate marker using multiple estimators developed for parametric and nonparametric estimates of the proportion of treatment effect explained by the surrogate marker. In addition, we quantify the attenuation bias induced by measurement error and develop inference procedures to allow for variance and confidence interval estimation. Through a simulation study, we show that our proposed estimators correct for measurement error in the surrogate marker and that our inference procedures perform well in finite samples. We illustrate these methods by examining a potential surrogate marker that is measured with error, hemoglobin A1c, using data from the Diabetes Prevention Program clinical trial.
Subject(s)
Models, Statistical , Research Design , Bias , Biomarkers , Computer SimulationABSTRACT
We develop a generalized partially additive model to build a single semiparametric risk scoring system for physical activity across multiple populations. A score comprised of distinct and objective physical activity measures is a new concept that offers challenges due to the nonlinear relationship between physical behaviors and various health outcomes. We overcome these challenges by modeling each score component as a smooth term, an extension of generalized partially linear single-index models. We use penalized splines and propose two inferential methods, one using profile likelihood and a nonparametric bootstrap, the other using a full Bayesian model, to solve additional computational problems. Both methods exhibit similar and accurate performance in simulations. These models are applied to the National Health and Nutrition Examination Survey and quantify nonlinear and interpretable shapes of score components for all-cause mortality.
Subject(s)
Exercise , Models, Statistical , Bayes Theorem , Humans , Linear Models , Nutrition Surveys , Risk FactorsABSTRACT
When predicting crop yield using both functional and multivariate predictors, the prediction performances benefit from the inclusion of the interactions between the two sets of predictors. We assume the interaction depends on a nonparametric, single-index structure of the multivariate predictor and reduce each functional predictor's dimension using functional principal component analysis (FPCA). Allowing the number of FPCA scores to diverge to infinity, we consider a sequence of semiparametric working models with a diverging number of predictors, which are FPCA scores with estimation errors. We show that the parametric component of the model is root-n consistent and asymptotically normal, the overall prediction error is dominated by the estimation of the nonparametric interaction function, and justify a CV-based procedure to select the tuning parameters.
ABSTRACT
Measuring usual dietary intake in freely living humans is difficult to accomplish. As a part of our recent study, a food frequency questionnaire was completed by healthy adult men and women at days 0 and 90 of the study. Data from the food questionnaire were analyzed with a nutrient analysis program ( www.Harvardsffq.date ). Healthy men and women consumed protein as 19-20% and 17-19% of their total energy intakes, respectively, with animal protein representing about 75 and 70% of their total protein intakes, respectively. The intake of each nutritionally essential amino acid (EAA) by the persons exceeded that recommended for healthy adults with a minimal physical activity. In all individuals, the dietary intake of leucine was the highest, followed by lysine, valine, and isoleucine in descending order, and the ingestion of amino acids that are synthesizable de novo in animal cells (AASAs) was about 20% greater than that of total EAAs. The intake of each AASA met those recommended for healthy adults with a minimal physical activity. Intakes of some AASAs (alanine, arginine, aspartate, glutamate, and glycine) from a typical diet providing 90-110 g food protein/day does not meet the requirements of adults with an intensive physical activity. Within the male or female group, there were not significant differences in the dietary intakes of all amino acids between days 0 and 90 of the study, and this was also true for nearly all other essential nutrients. Our findings will help to improve amino acid nutrition and health in both the general population and exercising individuals.
Subject(s)
Amino Acids , Diet , Adult , Eating , Energy Intake , Female , Humans , Male , NutrientsABSTRACT
In biomedical studies, testing for homogeneity between two groups, where one group is modeled by mixture models, is often of great interest. This paper considers the semiparametric exponential family mixture model proposed by Hong et al. (2017) and studies the score test for homogeneity under this model. The score test is nonregular in the sense that nuisance parameters disappear under the null hypothesis. To address this difficulty, we propose a modification of the score test, so that the resulting test enjoys the Wilks phenomenon. In finite samples, we show that with fixed nuisance parameters the score test is locally most powerful. In large samples, we establish the asymptotic power functions under two types of local alternative hypotheses. Our simulation studies illustrate that the proposed score test is powerful and computationally fast. We apply the proposed score test to an UK ovarian cancer DNA methylation data for identification of differentially methylated CpG sites.
Subject(s)
Models, Statistical , Computer SimulationABSTRACT
We continue our review of issues related to measurement error and misclassification in epidemiology. We further describe methods of adjusting for biased estimation caused by measurement error in continuous covariates, covering likelihood methods, Bayesian methods, moment reconstruction, moment-adjusted imputation, and multiple imputation. We then describe which methods can also be used with misclassification of categorical covariates. Methods of adjusting estimation of distributions of continuous variables for measurement error are then reviewed. Illustrative examples are provided throughout these sections. We provide lists of available software for implementing these methods and also provide the code for implementing our examples in the Supporting Information. Next, we present several advanced topics, including data subject to both classical and Berkson error, modeling continuous exposures with measurement error, and categorical exposures with misclassification in the same model, variable selection when some of the variables are measured with error, adjusting analyses or design for error in an outcome variable, and categorizing continuous variables measured with error. Finally, we provide some advice for the often met situations where variables are known to be measured with substantial error, but there is only an external reference standard or partial (or no) information about the type or magnitude of the error.
Subject(s)
Bayes Theorem , Bias , HumansABSTRACT
Measurement error and misclassification of variables frequently occur in epidemiology and involve variables important to public health. Their presence can impact strongly on results of statistical analyses involving such variables. However, investigators commonly fail to pay attention to biases resulting from such mismeasurement. We provide, in two parts, an overview of the types of error that occur, their impacts on analytic results, and statistical methods to mitigate the biases that they cause. In this first part, we review different types of measurement error and misclassification, emphasizing the classical, linear, and Berkson models, and on the concepts of nondifferential and differential error. We describe the impacts of these types of error in covariates and in outcome variables on various analyses, including estimation and testing in regression models and estimating distributions. We outline types of ancillary studies required to provide information about such errors and discuss the implications of covariate measurement error for study design. Methods for ascertaining sample size requirements are outlined, both for ancillary studies designed to provide information about measurement error and for main studies where the exposure of interest is measured with error. We describe two of the simpler methods, regression calibration and simulation extrapolation (SIMEX), that adjust for bias in regression coefficients caused by measurement error in continuous covariates, and illustrate their use through examples drawn from the Observing Protein and Energy (OPEN) dietary validation study. Finally, we review software available for implementing these methods. The second part of the article deals with more advanced topics.