RESUMEN
Understanding community responses to climate is critical for anticipating the future impacts of global change. However, despite increased research efforts in this field, models that explicitly include important biological mechanisms are lacking. Quantifying the potential impacts of climate change on species is complicated by the fact that the effects of climate variation may manifest at several points in the biological process. To this end, we extend a dynamic mechanistic model that combines population dynamics, such as species interactions, with species redistribution by allowing climate to affect both processes. We examine their relative contributions in an application to the changing biomass of a community of eight species in the Gulf of Maine using over 30 years of fisheries data from the Northeast Fishery Science Center. Our model suggests that the mechanisms driving biomass trends vary across space, time, and species. Phase space plots demonstrate that failing to account for the dynamic nature of the environmental and biologic system can yield theoretical estimates of population abundances that are not observed in empirical data. The stock assessments used by fisheries managers to set fishing targets and allocate quotas often ignore environmental effects. At the same time, research examining the effects of climate change on fish has largely focused on redistribution. Frameworks that combine multiple biological reactions to climate change are particularly necessary for marine researchers. This work is just one approach to modeling the complexity of natural systems and highlights the need to incorporate multiple and possibly interacting biological processes in future models.
Asunto(s)
Ecosistema , Crecimiento Demográfico , Animales , Biomasa , Dinámica Poblacional , Predicción , Explotaciones Pesqueras , Cambio Climático , PecesRESUMEN
Species distribution models usually attempt to explain presence-absence or abundance of a species at a site in terms of the environmental features (so-called abiotic features) present at the site. Historically, such models have considered species individually. However, it is well-established that species interact to influence presence-absence and abundance (envisioned as biotic factors). As a result, there has been substantial recent interest in joint species distribution models with various types of response, e.g., presence-absence, continuous and ordinal data. Such models incorporate dependence between species response as a surrogate for interaction. The challenge we address here is how to accommodate such modeling in the context of a large number of species (e.g., order 102) across sites numbering on the order of 102 or 103 when, in practice, only a few species are found at any observed site. Again, there is some recent literature to address this; we adopt a dimension reduction approach. The novel wrinkle we add here is spatial dependence. That is, we have a collection of sites over a relatively small spatial region so it is anticipated that species distribution at a given site would be similar to that at a nearby site. Specifically, we handle dimension reduction through Dirichlet processes, enabling clustering of species, joined with spatial dependence across sites through Gaussian processes. We use both simulated data and a plant communities dataset for the Cape Floristic Region (CFR) of South Africa to demonstrate our approach. The latter consists of presence-absence measurements for 639 tree species at 662 locations. Through both data examples we are able to demonstrate improved predictive performance using the foregoing specification.
RESUMEN
Tree species are predicted to track future climate by shifting their geographic distributions, but climate-mediated migrations are not apparent in a recent continental-scale analysis. To better understand the mechanisms of a possible migration lag, we analyzed relative recruitment patterns by comparing juvenile and adult tree abundances in climate space. One would expect relative recruitment to be higher in cold and dry climates as a result of tree migration with juveniles located further poleward than adults. Alternatively, relative recruitment could be higher in warm and wet climates as a result of higher tree population turnover with increased temperature and precipitation. Using the USDA Forest Service's Forest Inventory and Analysis data at regional scales, we jointly modeled juvenile and adult abundance distributions for 65 tree species in climate space of the eastern United States. We directly compared the optimal climate conditions for juveniles and adults, identified the climates where each species has high relative recruitment, and synthesized relative recruitment patterns across species. Results suggest that for 77% and 83% of the tree species, juveniles have higher optimal temperature and optimal precipitation, respectively, than adults. Across species, the relative recruitment pattern is dominated by relatively more abundant juveniles than adults in warm and wet climates. These different abundance-climate responses through life history are consistent with faster population turnover and inconsistent with the geographic trend of large-scale tree migration. Taken together, this juvenile-adult analysis suggests that tree species might respond to climate change by having faster turnover as dynamics accelerate with longer growing seasons and higher temperatures, before there is evidence of poleward migration at biogeographic scales.
Asunto(s)
Biodiversidad , Cambio Climático , Árboles/crecimiento & desarrollo , Modelos Teóricos , Estados UnidosRESUMEN
The perceived threat of climate change is often evaluated from species distribution models that are fitted to many species independently and then added together. This approach ignores the fact that species are jointly distributed and limit one another. Species respond to the same underlying climatic variables, and the abundance of any one species can be constrained by competition; a large increase in one is inevitably linked to declines of others. Omitting this basic relationship explains why responses modeled independently do not agree with the species richness or basal areas of actual forests. We introduce a joint species distribution modeling approach (JSDM), which is unique in three ways, and apply it to forests of eastern North America. First, it accommodates the joint distribution of species. Second, this joint distribution includes both abundance and presence-absence data. We solve the common issue of large numbers of zeros in abundance data by accommodating zeros in both stem counts and basal area data, i.e., a new approach to zero inflation. Finally, inverse prediction can be applied to the joint distribution of predictions to integrate the role of climate risks across all species and identify geographic areas where communities will change most (in terms of changes in abundance) with climate change. Application to forests in the eastern United States shows that climate can have greatest impact in the Northeast, due to temperature, and in the Upper Midwest, due to temperature and precipitation. Thus, these are the regions experiencing the fastest warming and are also identified as most responsive at this scale.
Asunto(s)
Cambio Climático , Bosques , Modelos Biológicos , Temperatura , Estados UnidosRESUMEN
The projected normal distribution is an under-utilized model for explaining directional data. In particular, the general version provides flexibility, e.g., asymmetry and possible bimodality along with convenient regression specification. Here, we clarify the properties of this general class. We also develop fully Bayesian hierarchical models for analyzing circular data using this class. We show how they can be fit using MCMC methods with suitable latent variables. We show how posterior inference for distributional features such as the angular mean direction and concentration can be implemented as well as how prediction within the regression setting can be handled. With regard to model comparison, we argue for an out-of-sample approach using both a predictive likelihood scoring loss criterion and a cumulative rank probability score criterion.
RESUMEN
We provide methods that can be used to obtain more accurate environmental exposure assessment. In particular, we propose two modeling approaches to combine monitoring data at point level with numerical model output at grid cell level, yielding improved prediction of ambient exposure at point level. Extending our earlier downscaler model (Berrocal, V. J., Gelfand, A. E., and Holland, D. M. (2010b). A spatio-temporal downscaler for outputs from numerical models. Journal of Agricultural, Biological and Environmental Statistics 15, 176-197), these new models are intended to address two potential concerns with the model output. One recognizes that there may be useful information in the outputs for grid cells that are neighbors of the one in which the location lies. The second acknowledges potential spatial misalignment between a station and its putatively associated grid cell. The first model is a Gaussian Markov random field smoothed downscaler that relates monitoring station data and computer model output via the introduction of a latent Gaussian Markov random field linked to both sources of data. The second model is a smoothed downscaler with spatially varying random weights defined through a latent Gaussian process and an exponential kernel function, that yields, at each site, a new variable on which the monitoring station data is regressed with a spatial linear model. We applied both methods to daily ozone concentration data for the Eastern US during the summer months of June, July and August 2001, obtaining, respectively, a 5% and a 15% predictive gain in overall predictive mean square error over our earlier downscaler model (Berrocal et al., 2010b). Perhaps more importantly, the predictive gain is greater at hold-out sites that are far from monitoring sites.
Asunto(s)
Contaminación del Aire/estadística & datos numéricos , Simulación por Computador , Modelos Estadísticos , Contaminantes Atmosféricos/análisis , Contaminación del Aire/análisis , Biometría , Interpretación Estadística de Datos , Exposición a Riesgos Ambientales/estadística & datos numéricos , Humanos , Cadenas de Markov , Distribución Normal , Ozono/análisis , Factores de Tiempo , Estados UnidosRESUMEN
Many applications involve count data from a process that yields an excess number of zeros. Zero-inflated count models, in particular, zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB) models, along with Poisson hurdle models, are commonly used to address this problem. However, these models struggle to explain extreme incidence of zeros (say more than 80%), especially to find important covariates. In fact, the ZIP may struggle even when the proportion is not extreme. To redress this problem we propose the class of k-ZIG models. These models allow more flexible modeling of both the zero-inflation and the nonzero counts, allowing interplay between these two components. We develop the properties of this new class of models, including reparameterization to a natural link function. The models are straightforwardly fitted within a Bayesian framework. The methodology is illustrated with simulated data examples as well as a forest seedling dataset obtained from the USDA Forest Service's Forest Inventory and Analysis program.
Asunto(s)
Modelos Estadísticos , Distribución de Poisson , Teorema de Bayes , Biometría , Bases de Datos Factuales/estadística & datos numéricos , Agricultura Forestal/estadística & datos numéricos , ÁrbolesRESUMEN
The choice of the sampling locations in a spatial network is often guided by practical demands. In particular, many locations are preferentially chosen to capture high values of a response, for example, air pollution levels in environmental monitoring. Then, model estimation and prediction of the exposure surface become biased due to the selective sampling. Since prediction is often the main utility of the modeling, we suggest that the effect of preferential sampling lies more importantly in the resulting predictive surface than in parameter estimation. Our contribution is to offer a direct simulation-based approach to assessing the effects of preferential sampling. We compare two predictive surfaces over the study region, one originating from the notion of an 'operating' intensity driving the selection of monitoring sites, the other under complete spatial randomness. We can consider a range of response models. They may reflect the operating intensity, introduce alternative informative covariates, or just propose a flexible spatial model. Then, we can generate data under the given model. Upon fitting the model and interpolating (kriging), we will obtain two predictive surfaces to compare. It is important to note that we need suitable metrics to compare the surfaces and that the predictive surfaces are random, so we need to make expected comparisons.
RESUMEN
In environmental health studies air pollution measurements from the closest monitor are commonly used as a proxy for personal exposure. This technique assumes that air pollution concentrations are spatially homogeneous in the neighborhoods associated with the monitors and consequently introduces measurement error into a resultant model. To model the relationship between maternal exposure to air pollution and birth weight, we build a hierarchical model that accounts for the associated measurement error. We allow four possible scenarios, with increasing flexibility, for capturing this uncertainty. In the two simplest cases, we specify models with a constant variance term and a variance component that allows uncertainty in the exposure measurements to increase as the distance between maternal residence and the location of the closest monitor increases. In the remaining two models, we introduce spatial dependence using random effects. The models are illustrated using Bayesian hierarchical modeling techniques that relate pregnancy outcomes from the North Carolina Detailed Birth Records to air pollution data from the U.S. Environmental Protection Agency.
Asunto(s)
Teorema de Bayes , Peso al Nacer , Monitoreo del Ambiente/métodos , Exposición Materna/efectos adversos , Modelos Estadísticos , Material Particulado/envenenamiento , Monitoreo del Ambiente/normas , Femenino , Humanos , Recién Nacido , North Carolina , EmbarazoRESUMEN
To examine environmental and geologic determinants of arsenic in groundwater, detailed geologic data were integrated with well water arsenic concentration data and well construction data for 471 private wells in Orange County, NC, via a geographic information system. For the statistical analysis, the geologic units were simplified into four generalized categories based on rock type and interpreted mode of deposition/emplacement. The geologic transitions from rocks of a primary pyroclastic origin to rocks of volcaniclastic sedimentary origin were designated as polylines. The data were fitted to a left-censored regression model to identify key determinants of arsenic levels in groundwater. A Bayesian spatial random effects model was then developed to capture any spatial patterns in groundwater arsenic residuals into model estimation. Statistical model results indicate (1) wells close to a transition zone or fault are more likely to contain detectible arsenic; (2) welded tuffs and hydrothermal quartz bodies are associated with relatively higher groundwater arsenic concentrations and even higher for those proximal to a pluton; and (3) wells of greater depth are more likely to contain elevated arsenic. This modeling effort informs policy intervention by creating three-dimensional maps of predicted arsenic levels in groundwater for any location and depth in the area.
Asunto(s)
Arsénico/análisis , Agua Dulce/análisis , Modelos Químicos , Teorema de Bayes , Sistemas de Información Geográfica , Análisis Multivariante , North CarolinaRESUMEN
Large point referenced datasets occur frequently in the environmental and natural sciences. Use of Bayesian hierarchical spatial models for analyzing these datasets is undermined by onerous computational burdens associated with parameter estimation. Low-rank spatial process models attempt to resolve this problem by projecting spatial effects to a lower-dimensional subspace. This subspace is determined by a judicious choice of "knots" or locations that are fixed a priori. One such representation yields a class of predictive process models (e.g., Banerjee et al., 2008) for spatial and spatial-temporal data. Our contribution here expands upon predictive process models with fixed knots to models that accommodate stochastic modeling of the knots. We view the knots as emerging from a point pattern and investigate how such adaptive specifications can yield more flexible hierarchical frameworks that lead to automated knot selection and substantial computational benefits.
RESUMEN
In relating pollution to birth outcomes, maternal exposure has usually been described using monitoring data. Such characterization provides a misrepresentation of exposure as it (i) does not take into account the spatial misalignment between an individual's residence and monitoring sites, and (ii) it ignores the fact that individuals spend most of their time indoors and typically in more than one location. In this paper, we break with previous studies by using a stochastic simulator to describe personal exposure (to particulate matter) and then relate simulated exposures at the individual level to the health outcome (birthweight) rather than aggregating to a selected spatial unit.We propose a hierarchical model that, at the first stage, specifies a linear relationship between birthweight and personal exposure, adjusting for individual risk factors and introduces random spatial effects for the census tract of maternal residence. At the second stage, our hierarchical model specifies the distribution of each individual's personal exposure using the empirical distribution yielded by the stochastic simulator as well as a model for the spatial random effects.We have applied our framework to analyze birthweight data from 14 counties in North Carolina in years 2001 and 2002. We investigate whether there are certain aspects and time windows of exposure that are more detrimental to birthweight by building different exposure metrics which we incorporate, one by one, in our hierarchical model. To assess the difference in relating ambient exposure to birthweight versus personal exposure to birthweight, we compare estimates of the effect of air pollution obtained from hierarchical models that linearly relate ambient exposure and birthweight versus those obtained from our modeling framework.Our analysis does not show a significant effect of PM(2.5) on birthweight for reasons which we discuss. However, our modeling framework serves as a template for analyzing the relationship between personal exposure and longer term health endpoints.
RESUMEN
We develop a local, spatial measure of educational isolation (EI) and characterize the relationship between EI and our previously developed measure of racial isolation (RI). EI measures the extent to which non-college educated individuals are exposed primarily to other non-college educated individuals. To characterize how the RI-EI relationship varies across space, we propose a novel measure of local correlation. Using birth records from the State of Michigan (2005-2012), we estimate associations between RI, EI, and birth outcomes. EI was lower in urban communities and higher in rural communities, while RI was highest in urban areas and parts of the southeastern United States (US). We observed greater heterogeneity in EI in low RI tracts, especially in non-urban tracts; residents of high RI tracts are likely to be both educationally and racially isolated. Associations were also observed between RI, EI, and gestational length (weeks) and preterm birth (PTB). For example, moving from the lowest to the highest quintile of RI was associated with a 1.11 (1.07, 1.15) and 1.16 (1.10, 1.22) increase in odds of PTB among NHB and NHW women, respectively. Moving from the lowest to the highest quintile of EI was associated with a 1.07 (1.02, 1.12) and 1.03 (1.00, 1.05) increase in odds of PTB among NHB and NHW women, respectively. This work provides three tools (RI, EI, and the local correlation measure) to researchers and policymakers interested in how residential isolation shapes disparate outcomes.
Asunto(s)
Nacimiento Prematuro , Escolaridad , Femenino , Humanos , Recién Nacido , Michigan , Embarazo , Nacimiento Prematuro/epidemiología , Grupos Raciales , Sudeste de Estados UnidosRESUMEN
Birthweight and gestational age are closely related and represent important indicators of a healthy pregnancy. Customary modeling for birthweight is conditional on gestational age. However, joint modeling directly addresses the relationship between gestational age and birthweight, and provides increased flexibility and interpretation as well as a strategy to avoid using gestational age as an intermediate variable. Previous proposals have utilized finite mixtures of bivariate regression models to incorporate well-established risk factors into analysis (e.g. sex and birth order of the baby, maternal age, race, and tobacco use) while examining the non-Gaussian shape of the joint birthweight and gestational age distribution. We build on this approach by demonstrating the inferential (prognostic) benefits of joint modeling (e.g. investigation of 'age inappropriate' outcomes like small for gestational age) and hence re-emphasize the importance of capturing the non-Gaussian distributional shapes. We additionally extend current models through a latent specification which admits interval-censored gestational age. We work within a Bayesian framework which enables inference beyond customary parameter estimation and prediction as well as exact uncertainty assessment. The model is applied to a portion of the 2003-2006 North Carolina Detailed Birth Record data (n=336129) available through the Children's Environmental Health Initiative and is fitted using the Bayesian methodology and Markov chain Monte Carlo approaches.
Asunto(s)
Peso al Nacer , Edad Gestacional , Modelos Estadísticos , Adolescente , Adulto , Algoritmos , Teorema de Bayes , Certificado de Nacimiento , Orden de Nacimiento , Educación/estadística & datos numéricos , Etnicidad/estadística & datos numéricos , Femenino , Humanos , Recién Nacido de Bajo Peso , Recién Nacido , Recién Nacido Pequeño para la Edad Gestacional , Funciones de Verosimilitud , Masculino , Estado Civil/estadística & datos numéricos , Cadenas de Markov , Edad Materna , Método de Montecarlo , North Carolina , Embarazo , Nacimiento Prematuro/epidemiología , Grupos Raciales/estadística & datos numéricos , Análisis de Regresión , Fumar/efectos adversos , Fumar/epidemiología , Distribuciones Estadísticas , Adulto JovenRESUMEN
Advances in Geographical Information Systems (GIS) and Global Positioning Systems (GPS) enable accurate geocoding of locations where scientific data are collected. This has encouraged collection of large spatial datasets in many fields and has generated considerable interest in statistical modeling for location-referenced spatial data. The setting where the number of locations yielding observations is too large to fit the desired hierarchical spatial random effects models using Markov chain Monte Carlo methods is considered. This problem is exacerbated in spatial-temporal and multivariate settings where many observations occur at each location. The recently proposed predictive process, motivated by kriging ideas, aims to maintain the richness of desired hierarchical spatial modeling specifications in the presence of large datasets. A shortcoming of the original formulation of the predictive process is that it induces a positive bias in the non-spatial error term of the models. A modified predictive process is proposed to address this problem. The predictive process approach is knot-based leading to questions regarding knot design. An algorithm is designed to achieve approximately optimal spatial placement of knots. Detailed illustrations of the modified predictive process using multivariate spatial regression with both a simulated and a real dataset are offered.
RESUMEN
Disease incidence or mortality data are typically available as rates or counts for specified regions, collected over time. We propose Bayesian nonparametric spatial modeling approaches to analyze such data. We develop a hierarchical specification using spatial random effects modeled with a Dirichlet process prior. The Dirichlet process is centered around a multivariate normal distribution. This latter distribution arises from a log-Gaussian process model that provides a latent incidence rate surface, followed by block averaging to the areal units determined by the regions in the study. With regard to the resulting posterior predictive inference, the modeling approach is shown to be equivalent to an approach based on block averaging of a spatial Dirichlet process to obtain a prior probability model for the finite dimensional distribution of the spatial random effects. We introduce a dynamic formulation for the spatial random effects to extend the model to spatio-temporal settings. Posterior inference is implemented through Gibbs sampling. We illustrate the methodology with simulated data as well as with a data set on lung cancer incidences for all 88 counties in the state of Ohio over an observation period of 21 years.
Asunto(s)
Teorema de Bayes , Métodos Epidemiológicos , Modelos Biológicos , Modelos Estadísticos , Simulación por Computador , Humanos , Incidencia , Neoplasias Pulmonares/epidemiología , Masculino , Persona de Mediana Edad , Ohio/epidemiologíaRESUMEN
It is often of interest to model the incidence and duration of threshold exceedance events for an environmental variable over a set of monitoring locations. Such data arrive over continuous time and can be considered as observations of a two-state process yielding, sequentially, a length of time in the below threshold state followed by a length of time in the above threshold state, then returning to the below threshold state, etc. We have a two-state continuous time Markov process, often referred to as an alternating renewal process. The process is observed over a truncated time window and, within this window, time in each state is modeled using a distinct cumulative intensity specification. Initially, we model each intensity over the window using a parametric regression specification. We extend the regression specification adding temporal random effects to enrich the model, using a realization of a log Gaussian process over time. With only one type of renewal, this specification is referred to as a Gaussian process modulated renewal process. Here, we introduce Gaussian process modulation to the intensity for each state. Model fitting is done within a Bayesian framework. We clarify that fitting with a customary log Gaussian process specification over a lengthy time window is computationally infeasible. The nearest neighbor Gaussian process (NNGP), which supplies sparse covariance structure, is adopted to enable tractable computation. We also propose methods for both generating data under our models and for conducting model comparison. The model is applied to hourly ozone data for four monitoring sites in different locations across the United States for the ozone season of 2014. For each site, we obtain estimated profiles of up-crossing and down-crossing intensity functions through time. In addition, we obtain inference regarding the number of exceedances, the distribution of the duration of exceedance events, and the proportion of time in the above and below threshold state for any time interval.
RESUMEN
The most prevalent spatial data setting is, arguably, that of so-called geostatistical data, data that arise as random variables observed at fixed spatial locations. Collection of such data in space and in time has grown enormously in the past two decades. With it has grown a substantial array of methods to analyze such data. Here, we attempt a review of a fully model-based perspective for such data analysis, the approach of hierarchical modeling fitted within a Bayesian framework. The benefit, as with hierarchical Bayesian modeling in general, is full and exact inference, with proper assessment of uncertainty. Geostatistical modeling includes univariate and multivariate data collection at sites, continuous and categorical data at sites, static and dynamic data at sites, and datasets over very large numbers of sites and long periods of time. Within the hierarchical modeling framework, we offer a review of the current state of the art in these settings.
RESUMEN
Models of the geographic distributions of species have wide application in ecology. But the nonspatial, single-level, regression models that ecologists have often employed do not deal with problems of irregular sampling intensity or spatial dependence, and do not adequately quantify uncertainty. We show here how to build statistical models that can handle these features of spatial prediction and provide richer, more powerful inference about species niche relations, distributions, and the effects of human disturbance. We begin with a familiar generalized linear model and build in additional features, including spatial random effects and hierarchical levels. Since these models are fully specified statistical models, we show that it is possible to add complexity without sacrificing interpretability. This step-by-step approach, together with attached code that implements a simple, spatially explicit, regression model, is structured to facilitate self-teaching. All models are developed in a Bayesian framework. We assess the performance of the models by using them to predict the distributions of two plant species (Proteaceae) from South Africa's Cape Floristic Region. We demonstrate that making distribution models spatially explicit can be essential for accurately characterizing the environmental response of species, predicting their probability of occurrence, and assessing uncertainty in the model results. Adding hierarchical levels to the models has further advantages in allowing human transformation of the landscape to be taken into account, as well as additional features of the sampling process.
Asunto(s)
Biología Computacional/métodos , Ecología/métodos , Ecosistema , Monitoreo del Ambiente , Modelos Estadísticos , Teorema de Bayes , Simulación por Computador , Interpretación Estadística de Datos , Ecología/estadística & datos numéricos , Investigación Empírica , Genética de Población , Humanos , Dinámica Poblacional , Probabilidad , Proteaceae/genética , Proteaceae/fisiología , Análisis de Regresión , SudáfricaRESUMEN
Gaussian Process (GP) models provide a very flexible nonparametric approach to modeling location-and-time indexed datasets. However, the storage and computational requirements for GP models are infeasible for large spatial datasets. Nearest Neighbor Gaussian Processes (Datta A, Banerjee S, Finley AO, Gelfand AE. Hierarchical nearest-neighbor gaussian process models for large geostatistical datasets. J Am Stat Assoc 2016., JASA) provide a scalable alternative by using local information from few nearest neighbors. Scalability is achieved by using the neighbor sets in a conditional specification of the model. We show how this is equivalent to sparse modeling of Cholesky factors of large covariance matrices. We also discuss a general approach to construct scalable Gaussian Processes using sparse local kriging. We present a multivariate data analysis which demonstrates how the nearest neighbor approach yields inference indistinguishable from the full rank GP despite being several times faster. Finally, we also propose a variant of the NNGP model for automating the selection of the neighbor set size.