Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 82
Filtrar
2.
Demography ; 60(3): 915-937, 2023 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-37212712

RESUMO

Population projections provide predictions of future population sizes for an area. Historically, most population projections have been produced using deterministic or scenario-based approaches and have not assessed uncertainty about future population change. Starting in 2015, however, the United Nations (UN) has produced probabilistic population projections for all countries using a Bayesian approach. There is also considerable interest in subnational probabilistic population projections, but the UN's national approach cannot be used directly for this purpose, because within-country correlations in fertility and mortality are generally larger than between-country ones, migration is not constrained in the same way, and there is a need to account for college and other special populations, particularly at the county level. We propose a Bayesian method for producing subnational population projections, including migration and accounting for college populations, by building on but modifying the UN approach. We illustrate our approach by applying it to the counties of Washington State and comparing the results with extant deterministic projections produced by Washington State demographers. Out-of-sample experiments show that our method gives accurate and well-calibrated forecasts and forecast intervals. In most cases, our intervals were narrower than the growth-based intervals issued by the state, particularly for shorter time horizons.


Assuntos
Fertilidade , Previsões Demográficas , Humanos , Teorema de Bayes , Previsões , Dinâmica Populacional , Mortalidade
3.
Int J Forecast ; 39(1): 73-97, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36568848

RESUMO

Population forecasts are used by governments and the private sector for planning, with horizons up to about three generations (around 2100) for different purposes. The traditional methods are deterministic using scenarios, but probabilistic forecasts are desired to get an idea of accuracy, assess changes, and make decisions involving risks. In a significant breakthrough, since 2015, the United Nations has issued probabilistic population forecasts for all countries using a Bayesian methodology that we review here. Assessment of the social cost of carbon relies on long-term forecasts of carbon emissions, which in turn depend on even longer-range population and economic forecasts, to 2300. We extend the UN method to very-long range population forecasts by combining the statistical approach with expert review and elicitation. While the world population is projected to grow for the rest of this century, it will likely stabilize in the 22nd century and decline in the 23rd century.

4.
Nature ; 610(7933): 687-692, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-36049503

RESUMO

The social cost of carbon dioxide (SC-CO2) measures the monetized value of the damages to society caused by an incremental metric tonne of CO2 emissions and is a key metric informing climate policy. Used by governments and other decision-makers in benefit-cost analysis for over a decade, SC-CO2 estimates draw on climate science, economics, demography and other disciplines. However, a 2017 report by the US National Academies of Sciences, Engineering, and Medicine1 (NASEM) highlighted that current SC-CO2 estimates no longer reflect the latest research. The report provided a series of recommendations for improving the scientific basis, transparency and uncertainty characterization of SC-CO2 estimates. Here we show that improved probabilistic socioeconomic projections, climate models, damage functions, and discounting methods that collectively reflect theoretically consistent valuation of risk, substantially increase estimates of the SC-CO2. Our preferred mean SC-CO2 estimate is $185 per tonne of CO2 ($44-$413 per tCO2: 5%-95% range, 2020 US dollars) at a near-term risk-free discount rate of 2%, a value 3.6 times higher than the US government's current value of $51 per tCO2. Our estimates incorporate updated scientific understanding throughout all components of SC-CO2 estimation in the new open-source Greenhouse Gas Impact Value Estimator (GIVE) model, in a manner fully responsive to the near-term NASEM recommendations. Our higher SC-CO2 values, compared with estimates currently used in policy evaluation, substantially increase the estimated benefits of greenhouse gas mitigation and thereby increase the expected net benefits of more stringent climate policies.


Assuntos
Dióxido de Carbono , Modelos Climáticos , Fatores Socioeconômicos , Dióxido de Carbono/análise , Dióxido de Carbono/economia , Clima , Gases de Efeito Estufa/análise , Gases de Efeito Estufa/economia , Incerteza , Desvalorização pelo Atraso , Risco , Formulação de Políticas , Política Ambiental
5.
Proc Natl Acad Sci U S A ; 119(35): e2203822119, 2022 08 30.
Artigo em Inglês | MEDLINE | ID: mdl-35994637

RESUMO

We propose a method for forecasting global human migration flows. A Bayesian hierarchical model is used to make probabilistic projections of the 39,800 bilateral migration flows among the 200 most populous countries. We generate out-of-sample forecasts for all bilateral flows for the 2015 to 2020 period, using models fitted to bilateral migration flows for five 5-y periods from 1990 to 1995 through 2010 to 2015. We find that the model produces well-calibrated out-of-sample forecasts of bilateral flows, as well as total country-level inflows, outflows, and net flows. The mean absolute error decreased by 61% using our method, compared to a leading model of international migration. Out-of-sample analysis indicated that simple methods for forecasting migration flows offered accurate projections of bilateral migration flows in the near term. Our method matched or improved on the out-of-sample performance using these simple deterministic alternatives, while also accurately assessing uncertainty. We integrate the migration flow forecasting model into a fully probabilistic population projection model to generate bilateral migration flow forecasts by age and sex for all flows from 2020 to 2025 through 2040 to 2045.


Assuntos
Emigração e Imigração , Teorema de Bayes , Emigração e Imigração/tendências , Previsões , Migração Humana/tendências , Humanos , Internacionalidade , Modelos Estatísticos
6.
Proc Natl Acad Sci U S A ; 119(16): e2120737119, 2022 04 19.
Artigo em Inglês | MEDLINE | ID: mdl-35412893

RESUMO

Probability models are used for many statistical tasks, notably parameter estimation, interval estimation, inference about model parameters, point prediction, and interval prediction. Thus, choosing a statistical model and accounting for uncertainty about this choice are important parts of the scientific process. Here we focus on one such choice, that of variables to include in a linear regression model. Many methods have been proposed, including Bayesian and penalized likelihood methods, and it is unclear which one to use. We compared 21 of the most popular methods by carrying out an extensive set of simulation studies based closely on real datasets that span a range of situations encountered in practical data analysis. Three adaptive Bayesian model averaging (BMA) methods performed best across all statistical tasks. These used adaptive versions of Zellner's g-prior for the parameters, where the prior variance parameter g is a function of sample size or is estimated from the data. We found that for BMA methods implemented with Markov chain Monte Carlo, 10,000 iterations were enough. Computationally, we found two of the three best methods (BMA with g=√n and empirical Bayes-local) to be competitive with the least absolute shrinkage and selection operator (LASSO), which is often preferred as a variable selection technique because of its computational efficiency. BMA performed better than Bayesian model selection (in which just one model is selected).

7.
Proc Natl Acad Sci U S A ; 118(31)2021 08 03.
Artigo em Inglês | MEDLINE | ID: mdl-34312227

RESUMO

There are multiple sources of data giving information about the number of SARS-CoV-2 infections in the population, but all have major drawbacks, including biases and delayed reporting. For example, the number of confirmed cases largely underestimates the number of infections, and deaths lag infections substantially, while test positivity rates tend to greatly overestimate prevalence. Representative random prevalence surveys, the only putatively unbiased source, are sparse in time and space, and the results can come with big delays. Reliable estimates of population prevalence are necessary for understanding the spread of the virus and the effectiveness of mitigation strategies. We develop a simple Bayesian framework to estimate viral prevalence by combining several of the main available data sources. It is based on a discrete-time Susceptible-Infected-Removed (SIR) model with time-varying reproductive parameter. Our model includes likelihood components that incorporate data on deaths due to the virus, confirmed cases, and the number of tests administered on each day. We anchor our inference with data from random-sample testing surveys in Indiana and Ohio. We use the results from these two states to calibrate the model on positive test counts and proceed to estimate the infection fatality rate and the number of new infections on each day in each state in the United States. We estimate the extent to which reported COVID cases have underestimated true infection counts, which was large, especially in the first months of the pandemic. We explore the implications of our results for progress toward herd immunity.


Assuntos
COVID-19/epidemiologia , Inquéritos Epidemiológicos/métodos , Número Básico de Reprodução , Teorema de Bayes , COVID-19/diagnóstico , COVID-19/prevenção & controle , COVID-19/transmissão , Inquéritos Epidemiológicos/estatística & dados numéricos , Humanos , Imunidade Coletiva , Incidência , Modelos Estatísticos , Mortalidade , Prevalência , SARS-CoV-2/isolamento & purificação , Estados Unidos/epidemiologia
8.
Artigo em Inglês | MEDLINE | ID: mdl-33899003

RESUMO

The 2015 Paris Agreement aims to keep global warming by 2100 to below 2°C, with 1.5°C as a target. To that end, countries agreed to reduce their emissions by nationally determined contributions (NDCs). Using a fully statistically based probabilistic framework, we find that the probabilities of meeting their nationally determined contributions for the largest emitters are low, e.g. 2% for the USA and 16% for China. On current trends, the probability of staying below 2°C of warming is only 5%, but if all countries meet their nationally determined contributions and continue to reduce emissions at the same rate after 2030, it rises to 26%. If the USA alone does not meet its nationally determined contribution, it declines to 18%. To have an even chance of staying below 2°C, the average rate of decline in emissions would need to increase from the 1% per year needed to meet the nationally determined contributions, to 1.8% per year.

9.
Ann Appl Stat ; 15(1): 437-459, 2021 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-33868540

RESUMO

Smoking is one of the main risk factors that has affected human mortality and life expectancy over the past century. Smoking accounts for a large part of the nonlinearities in the growth of life expectancy and of the geographic and sex differences in mortality. As Bongaarts (2006) and Janssen (2018) suggested, accounting for smoking could improve the quality of mortality forecasts due to the predictable nature of the smoking epidemic. We propose a new Bayesian hierarchical model to forecast life expectancy at birth for both sexes and for 69 countries with good data on smoking-related mortality. The main idea is to convert the forecast of the non-smoking life expectancy at birth (i.e., life expectancy at birth removing the smoking effect) into life expectancy forecast through the use of the age-specific smoking attributable fraction (ASSAF). We introduce a new age-cohort model for the ASSAF and a Bayesian hierarchical model for non-smoking life expectancy at birth. The forecast performance of the proposed method is evaluated by out-of-sample validation compared with four other commonly used methods for life expectancy forecasting. Improvements in forecast accuracy and model calibration based on the new method are observed.

10.
Popul Dev Rev ; 46(3): 409-441, 2020 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-33132461

RESUMO

Education and family planning can both be influenced by policy and are thought to accelerate fertility decline. However, questions remain about the nature of these effects. Does the effect of education operate through increasing educational attainment of women or educational enrollment of children? At which educational level is the effect strongest? Does the effect of family planning operate through increasing contraceptive prevalence or reducing unmet need? Is education or family planning more important? We assessed the quantitative impact of education and family planning in high-fertility settings using a regression framework inspired by Granger causality. We found that women's attainment of lower secondary education is key to accelerating fertility decline and found an accelerating effect of contraceptive prevalence for modern methods. We found the impact of contraceptive prevalence to be substantially larger than that of education. These accelerating effects hold in sub-Saharan Africa, but with smaller effect sizes there than elsewhere.

11.
Ann Appl Stat ; 14(1): 381-408, 2020 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-32405333

RESUMO

Smoking is one of the leading preventable threats to human health and a major risk factor for lung cancer, upper aero-digestive cancer, and chronic obstructive pulmonary disease. Estimating and forecasting the smoking attributable fraction (SAF) of mortality can yield insights into smoking epidemics and also provide a basis for more accurate mortality and life expectancy projection. Peto et al. (1992) proposed a method to estimate the SAF using the lung cancer mortality rate as an indicator of exposure to smoking in the population of interest. Here we use the same method to estimate the all-age SAF (ASAF) for both genders for over 60 countries. We document a strong and cross-nationally consistent pattern of the evolution of the SAF over time. We use this as the basis for a new Bayesian hierarchical model to project future male and female ASAF from over 60 countries simultaneously. This gives forecasts as well as predictive distributions that can be used to find uncertainty intervals for any quantity of interest. We assess the model using out-of-sample predictive validation, and find that it provides good forecasts and well calibrated forecast intervals, comparing favorably with other methods.

12.
Ann Appl Stat ; 14(2): 685-705, 2020 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-33824692

RESUMO

Since the 1940s, population projections have in most cases been produced using the deterministic cohort component method. However, in 2015, for the first time, in a major advance, the United Nations issued official probabilistic population projections for all countries based on Bayesian hierarchical models for total fertility and life expectancy. The estimates of these models and the resulting projections are conditional on the UN's official estimates of past values. However, these past values are themselves uncertain, particularly for the majority of the world's countries that do not have longstanding high-quality vital registration systems, when they rely on surveys and censuses with their own biases and measurement errors. This paper extends the UN model for projecting future total fertility rates to take account of uncertainty about past values. This is done by adding an additional level to the hierarchical model to represent the multiple data sources, in each case estimating their bias and measurement error variance. We assess the method by out-of-sample predictive validation. While the prediction intervals produced by the extant method (which does not account for this source of uncertainty) have somewhat less than nominal coverage, we find that our proposed method achieves closer to nominal coverage. The prediction intervals become wider for countries for which the estimates of past total fertility rates rely heavily on surveys rather than on vital registration data, especially in high fertility countries.

13.
J Comput Biol ; 26(10): 1113-1129, 2019 10.
Artigo em Inglês | MEDLINE | ID: mdl-31009236

RESUMO

The inference of gene networks from large-scale human genomic data is challenging due to the difficulty in identifying correct regulators for each gene in a high-dimensional search space. We present a Bayesian approach integrating external data sources with knockdown data from human cell lines to infer gene regulatory networks. In particular, we assemble multiple data sources, including gene expression data, genome-wide binding data, gene ontology, and known pathways, and use a supervised learning framework to compute prior probabilities of regulatory relationships. We show that our integrated method improves the accuracy of inferred gene networks as well as extends some previous Bayesian frameworks both in theory and applications. We apply our method to two different human cell lines, namely skin melanoma cell line A375 and lung cancer cell line A549, to illustrate the capabilities of our method. Our results show that the improvement in performance could vary from cell line to cell line and that we might need to choose different external data sources serving as prior knowledge if we hope to obtain better accuracy for different cell lines.


Assuntos
Redes Reguladoras de Genes , Genômica/métodos , Células A549 , Teorema de Bayes , Linhagem Celular Tumoral , Regulação Neoplásica da Expressão Gênica , Ontologia Genética , Humanos , Neoplasias Pulmonares/genética , Melanoma/genética , Neoplasias Cutâneas/genética , Aprendizado de Máquina Supervisionado , Transcriptoma
14.
Proc Natl Acad Sci U S A ; 116(1): 116-122, 2019 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-30584106

RESUMO

We propose a method for estimating migration flows between all pairs of countries that allows for decomposition of migration into emigration, return, and transit components. Current state-of-the-art estimates of bilateral migration flows rely on the assumption that the number of global migrants is as small as possible. We relax this assumption, producing complete estimates of all between-country migration flows with genuine estimates of total global migration. We find that the total number of individuals migrating internationally has oscillated between 1.13 and 1.29% of the global population per 5-year period since 1990. Return migration and transit migration are big parts of total migration; roughly one of four migration events is a return to an individual's country of birth. In the most recent time period, we estimate particularly large return migration flows from the United States to Central and South America and from the Persian Gulf to south Asia.


Assuntos
Emigração e Imigração/estatística & dados numéricos , Humanos , México , Migrantes/estatística & dados numéricos , Estados Unidos
15.
Stat Modelling ; 19(4): 444-465, 2019 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-33824624

RESUMO

Gene regulatory network reconstruction is an essential task of genomics in order to further our understanding of how genes interact dynamically with each other. The most readily available data, however, are from steady state observations. These data are not as informative about the relational dynamics between genes as knockout or over-expression experiments, which attempt to control the expression of individual genes. We develop a new framework for network inference using samples from the equilibrium distribution of a vector autoregressive (VAR) time-series model which can be applied to steady state gene expression data. We explore the theoretical aspects of our method and apply the method to synthetic gene expression data generated using GeneNetWeaver.

16.
J Comput Graph Stat ; 28(4): 790-805, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-32410811

RESUMO

We propose adaptive incremental mixture Markov chain Monte Carlo (AIMM), a novel approach to sample from challenging probability distributions defined on a general state-space. While adaptive MCMC methods usually update a parametric proposal kernel with a global rule, AIMM locally adapts a semiparametric kernel. AIMM is based on an independent Metropolis-Hastings proposal distribution which takes the form of a finite mixture of Gaussian distributions. Central to this approach is the idea that the proposal distribution adapts to the target by locally adding a mixture component when the discrepancy between the proposal mixture and the target is deemed to be too large. As a result, the number of components in the mixture proposal is not fixed in advance. Theoretically, we prove that there exists a stochastic process that can be made arbitrarily close to AIMM and that converges to the correct target distribution. We also illustrate that it performs well in practice in a variety of challenging situations, including high-dimensional and multimodal target distributions. Finally, the methodology is successfully applied to two real data examples, including the Bayesian inference of a semiparametric regression model for the Boston Housing dataset. Supplementary materials for this article are available online.

17.
Stat Comput ; 28(4): 869-890, 2018 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-30449953

RESUMO

Bayesian Additive Regression Trees (BART) is a statistical sum of trees model. It can be considered a Bayesian version of machine learning tree ensemble methods where the individual trees are the base learners. However for datasets where the number of variables p is large the algorithm can become inefficient and computationally expensive. Another method which is popular for high dimensional data is random forests, a machine learning algorithm which grows trees using a greedy search for the best split points. However its default implementation does not produce probabilistic estimates or predictions. We propose an alternative fitting algorithm for BART called BART-BMA, which uses Bayesian Model Averaging and a greedy search algorithm to obtain a posterior distribution more efficiently than BART for datasets with large p. BART-BMA incorporates elements of both BART and random forests to offer a model-based algorithm which can deal with high-dimensional data. We have found that BART-BMA can be run in a reasonable time on a standard laptop for the "small n large p" scenario which is common in many areas of bioinformatics. We showcase this method using simulated data and data from two real proteomic experiments, one to distinguish between patients with cardiovascular disease and controls and another to classify aggressive from non-aggressive prostate cancer. We compare our results to their main competitors. Open source code written in R and Rcpp to run BART-BMA can be found at: https://github.com/BelindaHernandez/BART-BMA.git.

18.
J Stat Softw ; 842018 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-30450020

RESUMO

Finite mixture modeling provides a framework for cluster analysis based on parsimonious Gaussian mixture models. Variable or feature selection is of particular importance in situations where only a subset of the available variables provide clustering information. This enables the selection of a more parsimonious model, yielding more efficient estimates, a clearer interpretation and, often, improved clustering partitions. This paper describes the R package clustvarsel which performs subset selection for model-based clustering. An improved version of the Raftery and Dean (2006) methodology is implemented in the new release of the package to find the (locally) optimal subset of variables with group/cluster information in a dataset. Search over the solution space is performed using either a step-wise greedy search or a headlong algorithm. Adjustments for speeding up these algorithms are discussed, as well as a parallel implementation of the stepwise search. Usage of the package is presented through the discussion of several data examples.

19.
Ann Appl Stat ; 12(2): 940-970, 2018 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-32308778

RESUMO

The United Nations is the major organization producing and regularly updating probabilistic population projections for all countries. International migration is a critical component of such projections, and between-country correlations are important for forecasts of regional aggregates. However, in the data we consider there are 200 countries and only 12 data points, each one corresponding to a five-year time period. Thus a 200 × 200 correlation matrix must be estimated on the basis of 12 data points. Using Pearson correlations produces many spurious correlations. We propose a maximum a posteriori estimator for the correlation matrix with an interpretable informative prior distribution. The prior serves to regularize the correlation matrix, shrinking a priori untrustworthy elements towards zero. Our estimated correlation structure improves projections of net migration for regional aggregates, producing narrower projections of migration for Africa as a whole and wider projections for Europe. A simulation study confirms that our estimator outperforms both the Pearson correlation matrix and a simple shrinkage estimator when estimating a sparse correlation matrix.

20.
Demogr Res ; 38: 1843-1884, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-31942164

RESUMO

BACKGROUND: We consider the problem of probabilistic projection of the total fertility rate (TFR) for subnational regions. OBJECTIVE: We seek a method that is consistent with the UN's recently adopted Bayesian method for probabilistic TFR projections for all countries and works well for all countries. METHODS: We assess various possible methods using subnational TFR data for 47 countries. RESULTS: We find that the method that performs best in terms of out-of-sample predictive performance and also in terms of reproducing the within-country correlation in TFR is a method that scales each national trajectory from the national predictive posterior distribution by a region-specific scale factor that is allowed to vary slowly over time. CONCLUSIONS: Probabilistic projections of TFR for subnational units are best produced by scaling the national projection by a slowly time-varying region-specific scale factor. This supports the hypothesis of Watkins (1990, 1991) that within-country TFR converges over time in response to country-specific factors, and thus extends the Watkins hypothesis to the last 50 years and to a much wider range of countries around the world. CONTRIBUTION: We have developed a new method for probabilistic projection of subnational TFR that works well and outperforms other methods. This also sheds light on the extent to which within-country TFR converges over time.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...