Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 30
Filtrar
Mais filtros

Tipo de documento
Intervalo de ano de publicação
1.
Nature ; 595(7866): 181-188, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-34194044

RESUMO

Computational social science is more than just large repositories of digital data and the computational methods needed to construct and analyse them. It also represents a convergence of different fields with different ways of thinking about and doing science. The goal of this Perspective is to provide some clarity around how these approaches differ from one another and to propose how they might be productively integrated. Towards this end we make two contributions. The first is a schema for thinking about research activities along two dimensions-the extent to which work is explanatory, focusing on identifying and estimating causal effects, and the degree of consideration given to testing predictions of outcomes-and how these two priorities can complement, rather than compete with, one another. Our second contribution is to advocate that computational social scientists devote more attention to combining prediction and explanation, which we call integrative modelling, and to outline some practical suggestions for realizing this goal.


Assuntos
Simulação por Computador , Ciência de Dados/métodos , Previsões/métodos , Modelos Teóricos , Ciências Sociais/métodos , Objetivos , Humanos
2.
Proc Natl Acad Sci U S A ; 121(24): e2322973121, 2024 Jun 11.
Artigo em Inglês | MEDLINE | ID: mdl-38833466

RESUMO

Why are some life outcomes difficult to predict? We investigated this question through in-depth qualitative interviews with 40 families sampled from a multidecade longitudinal study. Our sampling and interviewing process was informed by the earlier efforts of hundreds of researchers to predict life outcomes for participants in this study. The qualitative evidence we uncovered in these interviews combined with a mathematical decomposition of prediction error led us to create a conceptual framework. Our specific evidence and our more general framework suggest that unpredictability should be expected in many life outcome prediction tasks, even in the presence of complex algorithms and large datasets. Our work provides a foundation for future empirical and theoretical work on unpredictability in human lives.


Assuntos
Algoritmos , Humanos , Estudos Longitudinais , Feminino , Masculino , Incerteza , Adulto
3.
Proc Natl Acad Sci U S A ; 117(15): 8398-8403, 2020 04 14.
Artigo em Inglês | MEDLINE | ID: mdl-32229555

RESUMO

How predictable are life trajectories? We investigated this question with a scientific mass collaboration using the common task method; 160 teams built predictive models for six life outcomes using data from the Fragile Families and Child Wellbeing Study, a high-quality birth cohort study. Despite using a rich dataset and applying machine-learning methods optimized for prediction, the best predictions were not very accurate and were only slightly better than those from a simple benchmark model. Within each outcome, prediction error was strongly associated with the family being predicted and weakly associated with the technique used to generate the prediction. Overall, these results suggest practical limits to the predictability of life outcomes in some settings and illustrate the value of mass collaborations in the social sciences.


Assuntos
Ciências Sociais/normas , Adolescente , Criança , Pré-Escolar , Estudos de Coortes , Família , Feminino , Humanos , Lactente , Vida , Aprendizado de Máquina , Masculino , Valor Preditivo dos Testes , Ciências Sociais/métodos , Ciências Sociais/estatística & dados numéricos
4.
Demography ; 54(4): 1503-1528, 2017 08.
Artigo em Inglês | MEDLINE | ID: mdl-28741073

RESUMO

Adult death rates are a critical indicator of population health and well-being. Wealthy countries have high-quality vital registration systems, but poor countries lack this infrastructure and must rely on estimates that are often problematic. In this article, we introduce the network survival method, a new approach for estimating adult death rates. We derive the precise conditions under which it produces consistent and unbiased estimates. Further, we develop an analytical framework for sensitivity analysis. To assess the performance of the network survival method in a realistic setting, we conducted a nationally representative survey experiment in Rwanda (n = 4,669). Network survival estimates were similar to estimates from other methods, even though the network survival estimates were made with substantially smaller samples and are based entirely on data from Rwanda, with no need for model life tables or pooling of data from other countries. Our analytic results demonstrate that the network survival method has attractive properties, and our empirical results show that this method can be used in countries where reliable estimates of adult death rates are sorely needed.


Assuntos
Inquéritos Epidemiológicos/métodos , Modelos Estatísticos , Mortalidade/tendências , Apoio Social , Adolescente , Adulto , Feminino , Inquéritos Epidemiológicos/normas , Humanos , Entrevistas como Assunto , Pessoa de Meia-Idade , Reprodutibilidade dos Testes , Ruanda/epidemiologia , Fatores Socioeconômicos , Adulto Jovem
5.
Am J Epidemiol ; 183(8): 747-57, 2016 04 15.
Artigo em Inglês | MEDLINE | ID: mdl-27015875

RESUMO

The network scale-up method is a promising technique that uses sampled social network data to estimate the sizes of epidemiologically important hidden populations, such as sex workers and people who inject illicit drugs. Although previous scale-up research has focused exclusively on networks of acquaintances, we show that the type of personal network about which survey respondents are asked to report is a potentially crucial parameter that researchers are free to vary. This generalization leads to a method that is more flexible and potentially more accurate. In 2011, we conducted a large, nationally representative survey experiment in Rwanda that randomized respondents to report about one of 2 different personal networks. Our results showed that asking respondents for less information can, somewhat surprisingly, produce more accurate size estimates. We also estimated the sizes of 4 key populations at risk for human immunodeficiency virus infection in Rwanda. Our estimates were higher than earlier estimates from Rwanda but lower than international benchmarks. Finally, in this article we develop a new sensitivity analysis framework and use it to assess the possible biases in our estimates. Our design can be customized and extended for other settings, enabling researchers to continue to improve the network scale-up method.


Assuntos
Usuários de Drogas/estatística & dados numéricos , Infecções por HIV/epidemiologia , Homossexualidade Masculina/estatística & dados numéricos , Profissionais do Sexo/estatística & dados numéricos , Meio Social , Rede Social , Abuso de Substâncias por Via Intravenosa/epidemiologia , Métodos Epidemiológicos , Feminino , Infecções por HIV/etiologia , Humanos , Masculino , Medição de Risco/métodos , Ruanda/epidemiologia , Abuso de Substâncias por Via Intravenosa/complicações , Inquéritos e Questionários
6.
Sci Adv ; 10(18): eadk3452, 2024 May 03.
Artigo em Inglês | MEDLINE | ID: mdl-38691601

RESUMO

Machine learning (ML) methods are proliferating in scientific research. However, the adoption of these methods has been accompanied by failures of validity, reproducibility, and generalizability. These failures can hinder scientific progress, lead to false consensus around invalid claims, and undermine the credibility of ML-based science. ML methods are often applied and fail in similar ways across disciplines. Motivated by this observation, our goal is to provide clear recommendations for conducting and reporting ML-based science. Drawing from an extensive review of past literature, we present the REFORMS checklist (recommendations for machine-learning-based science). It consists of 32 questions and a paired set of guidelines. REFORMS was developed on the basis of a consensus of 19 researchers across computer science, data science, mathematics, social sciences, and biomedical sciences. REFORMS can serve as a resource for researchers when designing and implementing a study, for referees when reviewing papers, and for journals when enforcing standards for transparency and reproducibility.


Assuntos
Consenso , Aprendizado de Máquina , Humanos , Reprodutibilidade dos Testes , Ciência
7.
Proc Natl Acad Sci U S A ; 107(15): 6743-7, 2010 Apr 13.
Artigo em Inglês | MEDLINE | ID: mdl-20351258

RESUMO

Respondent-driven sampling (RDS) is a network-based technique for estimating traits in hard-to-reach populations, for example, the prevalence of HIV among drug injectors. In recent years RDS has been used in more than 120 studies in more than 20 countries and by leading public health organizations, including the Centers for Disease Control and Prevention in the United States. Despite the widespread use and growing popularity of RDS, there has been little empirical validation of the methodology. Here we investigate the performance of RDS by simulating sampling from 85 known, network populations. Across a variety of traits we find that RDS is substantially less accurate than generally acknowledged and that reported RDS confidence intervals are misleadingly narrow. Moreover, because we model a best-case scenario in which the theoretical RDS sampling assumptions hold exactly, it is unlikely that RDS performs any better in practice than in our simulations. Notably, the poor performance of RDS is driven not by the bias but by the high variance of estimates, a possibility that had been largely overlooked in the RDS literature. Given the consistency of our results across networks and our generous sampling conditions, we conclude that RDS as currently practiced may not be suitable for key aspects of public health surveillance where it is now extensively applied.


Assuntos
Vigilância da População/métodos , Saúde Pública/métodos , Projetos de Pesquisa , Algoritmos , Controle de Doenças Transmissíveis , Coleta de Dados/métodos , Interpretação Estatística de Dados , Infecções por HIV/complicações , Infecções por HIV/epidemiologia , Humanos , Modelos Estatísticos , Reprodutibilidade dos Testes , Tamanho da Amostra , Abuso de Substâncias por Via Intravenosa/complicações , Abuso de Substâncias por Via Intravenosa/epidemiologia
8.
Am J Epidemiol ; 174(10): 1190-6, 2011 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-22003188

RESUMO

One of the many challenges hindering the global response to the human immunodeficiency virus (HIV)/acquired immunodeficiency syndrome (AIDS) epidemic is the difficulty of collecting reliable information about the populations most at risk for the disease. Thus, the authors empirically assessed a promising new method for estimating the sizes of most at-risk populations: the network scale-up method. Using 4 different data sources, 2 of which were from other researchers, the authors produced 5 estimates of the number of heavy drug users in Curitiba, Brazil. The authors found that the network scale-up and generalized network scale-up estimators produced estimates 5-10 times higher than estimates made using standard methods (the multiplier method and the direct estimation method using data from 2004 and 2010). Given that equally plausible methods produced such a wide range of results, the authors recommend that additional studies be undertaken to compare estimates based on the scale-up method with those made using other methods. If scale-up-based methods routinely produce higher estimates, this would suggest that scale-up-based methods are inappropriate for populations most at risk of HIV/AIDS or that standard methods may tend to underestimate the sizes of these populations.


Assuntos
Projetos de Pesquisa Epidemiológica , Infecções por HIV/epidemiologia , Abuso de Substâncias por Via Intravenosa/epidemiologia , Síndrome da Imunodeficiência Adquirida/epidemiologia , Síndrome da Imunodeficiência Adquirida/etiologia , Brasil/epidemiologia , Infecções por HIV/etiologia , Humanos , Prevalência , Medição de Risco , Abuso de Substâncias por Via Intravenosa/complicações
9.
Soc Networks ; 33(1): 70-78, 2011 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-21318126

RESUMO

Estimating the sizes of hard-to-count populations is a challenging and important problem that occurs frequently in social science, public health, and public policy. This problem is particularly pressing in HIV/AIDS research because estimates of the sizes of the most at-risk populations-illicit drug users, men who have sex with men, and sex workers-are needed for designing, evaluating, and funding programs to curb the spread of the disease. A promising new approach in this area is the network scale-up method, which uses information about the personal networks of respondents to make population size estimates. However, if the target population has low social visibility, as is likely to be the case in HIV/AIDS research, scale-up estimates will be too low. In this paper we develop a game-like activity that we call the game of contacts in order to estimate the social visibility of groups, and report results from a study of heavy drug users in Curitiba, Brazil (n = 294). The game produced estimates of social visibility that were consistent with qualitative expectations but of surprising magnitude. Further, a number of checks suggest that the data are high-quality. While motivated by the specific problem of population size estimation, our method could be used by researchers more broadly and adds to long-standing efforts to combine the richness of social network analysis with the power and scale of sample surveys.

10.
Elife ; 102021 11 09.
Artigo em Inglês | MEDLINE | ID: mdl-34751133

RESUMO

Any large dataset can be analyzed in a number of ways, and it is possible that the use of different analysis strategies will lead to different results and conclusions. One way to assess whether the results obtained depend on the analysis strategy chosen is to employ multiple analysts and leave each of them free to follow their own approach. Here, we present consensus-based guidance for conducting and reporting such multi-analyst studies, and we discuss how broader adoption of the multi-analyst approach has the potential to strengthen the robustness of results and conclusions obtained from analyses of datasets in basic and applied research.


Assuntos
Consenso , Análise de Dados , Conjuntos de Dados como Assunto , Pesquisa
11.
Sex Transm Infect ; 86 Suppl 2: ii11-5, 2010 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-21106509

RESUMO

Estimating sizes of hidden or hard-to-reach populations is an important problem in public health. For example, estimates of the sizes of populations at highest risk for HIV and AIDS are needed for designing, evaluating and allocating funding for treatment and prevention programmes. A promising approach to size estimation, relatively new to public health, is the network scale-up method (NSUM), involving two steps: estimating the personal network size of the members of a random sample of a total population and, with this information, estimating the number of members of a hidden subpopulation of the total population. We describe the method, including two approaches to estimating personal network sizes (summation and known population). We discuss the strengths and weaknesses of each approach and provide examples of international applications of the NSUM in public health. We conclude with recommendations for future research and evaluation.


Assuntos
Coleta de Dados/métodos , Saúde Pública/estatística & dados numéricos , Humanos , Medição de Risco , Tamanho da Amostra
12.
Stat Med ; 28(17): 2202-29, 2009 Jul 30.
Artigo em Inglês | MEDLINE | ID: mdl-19572381

RESUMO

Respondent-driven sampling (RDS) is a recently introduced, and now widely used, technique for estimating disease prevalence in hidden populations. RDS data are collected through a snowball mechanism, in which current sample members recruit future sample members. In this paper we present RDS as Markov chain Monte Carlo importance sampling, and we examine the effects of community structure and the recruitment procedure on the variance of RDS estimates. Past work has assumed that the variance of RDS estimates is primarily affected by segregation between healthy and infected individuals. We examine an illustrative model to show that this is not necessarily the case, and that bottlenecks anywhere in the networks can substantially affect estimates. We also show that variance is inflated by a common design feature in which the sample members are encouraged to recruit multiple future sample members. The paper concludes with suggestions for implementing and evaluating RDS studies.


Assuntos
Cadeias de Markov , Método de Monte Carlo , Estudos de Amostragem , Algoritmos , Biometria , Métodos Epidemiológicos , Feminino , Infecções por HIV/complicações , Infecções por HIV/epidemiologia , Humanos , Masculino , Modelos Estatísticos , Cidade de Nova Iorque/epidemiologia , Saúde Pública/estatística & dados numéricos , Apoio Social , Transtornos Relacionados ao Uso de Substâncias/complicações
13.
Socius ; 52019.
Artigo em Inglês | MEDLINE | ID: mdl-37309413

RESUMO

Reproducibility is fundamental to science, and an important component of reproducibility is computational reproducibility: the ability of a researcher to recreate the results of a published study using the original author's raw data and code. Although most people agree that computational reproducibility is important, it is still difficult to achieve in practice. In this article, the authors describe their approach to enabling computational reproducibility for the 12 articles in this special issue of Socius about the Fragile Families Challenge. The approach draws on two tools commonly used by professional software engineers but not widely used by academic researchers: software containers (e.g., Docker) and cloud computing (e.g., Amazon Web Services). These tools made it possible to standardize the computing environment around each submission, which will ease computational reproducibility both today and in the future. Drawing on their successes and struggles, the authors conclude with recommendations to researchers and journals.

14.
Socius ; 52019.
Artigo em Inglês | MEDLINE | ID: mdl-37347012

RESUMO

Stewards of social data face a fundamental tension. On one hand, they want to make their data accessible to as many researchers as possible to facilitate new discoveries. At the same time, they want to restrict access to their data as much as possible to protect the people represented in the data. In this article, we provide a case study addressing this common tension in an uncommon setting: the Fragile Families Challenge, a scientific mass collaboration designed to yield insights that could improve the lives of disadvantaged children in the United States. We describe our process of threat modeling, threat mitigation, and third-party guidance. We also describe the ethical principles that formed the basis of our process. We are open about our process and the trade-offs we made in the hope that others can improve on what we have done.

15.
Socius ; 52019.
Artigo em Inglês | MEDLINE | ID: mdl-37309412

RESUMO

The Fragile Families Challenge is a scientific mass collaboration designed to measure and understand the predictability of life trajectories. Participants in the Challenge created predictive models of six life outcomes using data from the Fragile Families and Child Wellbeing Study, a high-quality birth cohort study. This Special Collection includes 12 articles describing participants' approaches to predicting these six outcomes as well as 3 articles describing methodological and procedural insights from running the Challenge. This introduction will help readers interpret the individual articles and help researchers interested in running future projects similar to the Fragile Families Challenge.

16.
Socius ; 52019.
Artigo em Inglês | MEDLINE | ID: mdl-37214352

RESUMO

Researchers rely on metadata systems to prepare data for analysis. As the complexity of data sets increases and the breadth of data analysis practices grow, existing metadata systems can limit the efficiency and quality of data preparation. This article describes the redesign of a metadata system supporting the Fragile Families and Child Wellbeing Study on the basis of the experiences of participants in the Fragile Families Challenge. The authors demonstrate how treating metadata as data (i.e., releasing comprehensive information about variables in a format amenable to both automated and manual processing) can make the task of data preparation less arduous and less error prone for all types of data analysis. The authors hope that their work will facilitate new applications of machine-learning methods to longitudinal surveys and inspire research on data preparation in the social sciences. The authors have open-sourced the tools they created so that others can use and improve them.

17.
Soc Psychol Q ; 74(4): 338, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-24078078

RESUMO

Individuals influence each others' decisions about cultural products such as songs, books, and movies; but to what extent can the perception of success become a "self-fulfilling prophecy"? We have explored this question experimentally by artificially inverting the true popularity of songs in an online "music market," in which 12,207 participants listened to and downloaded songs by unknown bands. We found that most songs experienced self-fulfilling prophecies, in which perceived-but initially false-popularity became real over time. We also found, however, that the inversion was not self-fulfilling for the market as a whole, in part because the very best songs recovered their popularity in the long run. Moreover, the distortion of market information reduced the correlation between appeal and popularity, and led to fewer overall downloads. These results, although partial and speculative, suggest a new approach to the study of cultural markets, and indicate the potential of web-based experiments to explore the social psychological origin of other macro-sociological phenomena.

18.
Nat Hum Behav ; 7(4): 478-479, 2023 04.
Artigo em Inglês | MEDLINE | ID: mdl-36759587
19.
Epidemiology ; 23(1): 148-50, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22157310
20.
Sociol Methodol ; 46(1): 153-186, 2016 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-29375167

RESUMO

The network scale-up method enables researchers to estimate the size of hidden populations, such as drug injectors and sex workers, using sampled social network data. The basic scale-up estimator offers advantages over other size estimation techniques, but it depends on problematic modeling assumptions. We propose a new generalized scale-up estimator that can be used in settings with non-random social mixing and imperfect awareness about membership in the hidden population. Further, the new estimator can be used when data are collected via complex sample designs and from incomplete sampling frames. However, the generalized scale-up estimator also requires data from two samples: one from the frame population and one from the hidden population. In some situations these data from the hidden population can be collected by adding a small number of questions to already planned studies. For other situations, we develop interpretable adjustment factors that can be applied to the basic scale-up estimator. We conclude with practical recommendations for the design and analysis of future studies.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA