|

Measuring the predictability of life outcomes with a scientific mass collaboration.

Salganik, Matthew J; Lundberg, Ian; Kindel, Alexander T; Ahearn, Caitlin E; Al-Ghoneim, Khaled; Almaatouq, Abdullah; Altschul, Drew M; Brand, Jennie E; Carnegie, Nicole Bohme; Compton, Ryan James; Datta, Debanjan; Davidson, Thomas; Filippova, Anna; Gilroy, Connor; Goode, Brian J; Jahani, Eaman; Kashyap, Ridhi; Kirchner, Antje; McKay, Stephen; Morgan, Allison C; Pentland, Alex; Polimis, Kivan; Raes, Louis; Rigobon, Daniel E; Roberts, Claudia V; Stanescu, Diana M; Suhara, Yoshihiko; Usmani, Adaner; Wang, Erik H; Adem, Muna; Alhajri, Abdulla; AlShebli, Bedoor; Amin, Redwane; Amos, Ryan B; Argyle, Lisa P; Baer-Bositis, Livia; Büchi, Moritz; Chung, Bo-Ryehn; Eggert, William; Faletto, Gregory; Fan, Zhilin; Freese, Jeremy; Gadgil, Tejomay; Gagné, Josh; Gao, Yue; Halpern-Manners, Andrew; Hashim, Sonia P; Hausen, Sonia; He, Guanhua; Higuera, Kimberly.

Proc Natl Acad Sci U S A ; 117(15): 8398-8403, 2020 04 14.

Article En | MEDLINE | ID: mdl-32229555

How predictable are life trajectories? We investigated this question with a scientific mass collaboration using the common task method; 160 teams built predictive models for six life outcomes using data from the Fragile Families and Child Wellbeing Study, a high-quality birth cohort study. Despite using a rich dataset and applying machine-learning methods optimized for prediction, the best predictions were not very accurate and were only slightly better than those from a simple benchmark model. Within each outcome, prediction error was strongly associated with the family being predicted and weakly associated with the technique used to generate the prediction. Overall, these results suggest practical limits to the predictability of life outcomes in some settings and illustrate the value of mass collaborations in the social sciences.

Social Sciences/standards , Adolescent , Child , Child, Preschool , Cohort Studies , Family , Female , Humans , Infant , Life , Machine Learning , Male , Predictive Value of Tests , Social Sciences/methods , Social Sciences/statistics & numerical data

Humans in the Loop: Incorporating Expert and Crowd-Sourced Knowledge for Predictions Using Survey Data.

Filippova, Anna; Gilroy, Connor; Kashyap, Ridhi; Kirchner, Antje; Morgan, Allison C; Polimis, Kivan; Usmani, Adaner; Wang, Tong.

Socius ; 52019.

Article En | MEDLINE | ID: mdl-33981842

Survey data sets are often wider than they are long. This high ratio of variables to observations raises concerns about overfitting during prediction, making informed variable selection important. Recent applications in computer science have sought to incorporate human knowledge into machine-learning methods to address these problems. The authors implement such a "human-in-the-loop" approach in the Fragile Families Challenge. The authors use surveys to elicit knowledge from experts and laypeople about the importance of different variables to different outcomes. This strategy offers the option to subset the data before prediction or to incorporate human knowledge as scores in prediction models, or both together. The authors find that human intervention is not obviously helpful. Human-informed subsetting reduces predictive performance, and considered alone, approaches incorporating scores perform marginally worse than approaches that do not. However, incorporating human knowledge may still improve predictive performance, and future research should consider new ways of doing so.