RESUMO
Modern epidemiological studies face opportunities and challenges posed by an ever-expanding capacity to measure a wide range of environmental exposures, along with sophisticated biomarkers of exposure and response at the individual level. The challenge of deciding what to measure is further complicated for longitudinal studies, where logistical and cost constraints preclude the collection of all possible measurements on all participants at every follow-up time. This is true for the National Children's Study (NCS), a large-scale longitudinal study that will enroll women both prior to conception and during pregnancy and collect information on their environment, their pregnancies, and their children's development through early adulthood-with a goal of assessing key exposure/outcome relationships among a cohort of approximately 100 000 children. The success of the NCS will significantly depend on the accurate, yet cost-effective, characterization of environmental exposures thought to be related to the health outcomes of interest. The purpose of this paper is to explore the use of cost saving, yet valid and adequately powered statistical approaches for gathering exposure information within epidemiological cohort studies. The proposed approach involves the collection of detailed exposure assessment information on a specially selected subset of the study population, and collection of less-costly, and presumably less-detailed and less-burdensome, surrogate measures across the entire cohort. We show that large-scale efficiency in costs and burden may be achieved without making substantive sacrifices on the ability to draw reliable inferences concerning the relationship between exposure and health outcome. Several detailed scenarios are provided that document how the targeted sub-sampling design strategy can benefit large cohort studies like the NCS, as well as other more focused environmental epidemiologic studies.