RESUMO
In response to growing recognition of the social impacts of new artificial intelligence (AI)-based technologies, major AI and machine learning (ML) conferences and journals now encourage or require papers to include ethics impact statements and undergo ethics reviews. This move has sparked heated debate concerning the role of ethics in AI research, at times devolving into name calling and threats of "cancellation." We diagnose this conflict as one between "atomist" and "holist" ideologies. Among other things, atomists believe facts are and should be kept separate from values, while holists believe facts and values are and should be inextricable from one another. With the goal of reducing disciplinary polarization, we draw on numerous philosophical and historical sources to describe each ideology's core beliefs and assumptions. Finally, we call on atomists and holists within the ever-expanding data science community to exhibit greater empathy during ethical disagreements and propose four targeted strategies to ensure AI research benefits society.
RESUMO
Rapid growth in the availability of behavioral big data (BBD) has outpaced the speed of updates to ethical research codes and regulation of data privacy and human subjects' data collection, storage, and use. The introduction of the European Union's (EU's) General Data Protection Regulation (GDPR) in May 2018 will have far-reaching effects on data scientists and researchers who use BBD, not only in the EU, but around the world. Consequently, many companies are struggling to comply with the Regulation. At the same time, academics interested in research collaborations with companies are finding it more difficult to obtain data. In light of the importance of BBD in both industry and academia, data scientists and behavioral researchers would benefit from a deeper understanding of the GDPR's key concepts, definitions, and principles, especially as they apply to the data science workflow. We identify key GDPR concepts and principles and describe how they can impact the work of data scientists and researchers in this new data privacy regulation era.
Assuntos
Pesquisa Comportamental/tendências , Segurança Computacional , Ciência de Dados/tendências , União Europeia , Humanos , PrivacidadeRESUMO
Behavioral big data (BBD) refers to very large and rich multidimensional data sets on human and social behaviors, actions, and interactions, which have become available to companies, governments, and researchers. A growing number of researchers in social science and management fields acquire and analyze BBD for the purpose of extracting knowledge and scientific discoveries. However, the relationships between the researcher, data, subjects, and research questions differ in the BBD context compared to traditional behavioral data. Behavioral researchers using BBD face not only methodological and technical challenges but also ethical and moral dilemmas. In this article, we discuss several dilemmas, challenges, and trade-offs related to acquiring and analyzing BBD for causal behavioral research.
Assuntos
Comportamento , Interpretação Estatística de Dados , Pesquisa , HumanosRESUMO
For robust detection performance, traditional control chart monitoring for biosurveillance is based on input data free of trends, day-of-week effects, and other systematic behaviour. Time series forecasting methods may be used to remove this behaviour by subtracting forecasts from observations to form residuals for algorithmic input. We describe three forecast methods and compare their predictive accuracy on each of 16 authentic syndromic data streams. The methods are (1) a non-adaptive regression model using a long historical baseline, (2) an adaptive regression model with a shorter, sliding baseline, and (3) the Holt-Winters method for generalized exponential smoothing. Criteria for comparing the forecasts were the root-mean-square error, the median absolute per cent error (MedAPE), and the median absolute deviation. The median-based criteria showed best overall performance for the Holt-Winters method. The MedAPE measures over the 16 test series averaged 16.5, 11.6, and 9.7 for the non-adaptive regression, adaptive regression, and Holt-Winters methods, respectively. The non-adaptive regression forecasts were degraded by changes in the data behaviour in the fixed baseline period used to compute model coefficients. The mean-based criterion was less conclusive because of the effects of poor forecasts on a small number of calendar holidays. The Holt-Winters method was also most effective at removing serial autocorrelation, with most 1-day-lag autocorrelation coefficients below 0.15. The forecast methods were compared without tuning them to the behaviour of individual series. We achieved improved predictions with such tuning of the Holt-Winters method, but practical use of such improvements for routine surveillance will require reliable data classification methods.
Assuntos
Previsões/métodos , Modelos Estatísticos , Vigilância da População , Viés , Comportamentos Relacionados com a Saúde , Humanos , Análise de Regressão , Sensibilidade e Especificidade , Fatores de TempoRESUMO
The traditional focus for detecting outbreaks of an epidemic or bio-terrorist attack has been on the collection and analysis of medical and public health data. Although such data are the most direct indicators of symptoms, they tend to be collected, delivered, and analysed days, weeks, and even months after the outbreak. By the time this information reaches decision makers it is often too late to treat the infected population or to react in some other way. In this paper, we explore different sources of data, traditional and non-traditional, that can be used for detecting a bio-terrorist attack in a timely manner. We set our discussion in the context of state-of-the-art syndromic surveillance systems and we focus on statistical issues and challenges associated with non-traditional data sources and the timely integration of multiple data sources for detection purposes.
Assuntos
Bioterrorismo , Estatística como Assunto/métodos , Coleta de Dados , Interpretação Estatística de Dados , Surtos de Doenças , Humanos , Vigilância da População/métodosRESUMO
The recent series of anthrax attacks has reinforced the importance of biosurveillance systems for the timely detection of epidemics. This paper describes a statistical framework for monitoring grocery data to detect a large-scale but localized bioterrorism attack. Our system illustrates the potential of data sources that may be more timely than traditional medical and public health data. The system includes several layers, each customized to grocery data and tuned to finding footprints of an epidemic. We also propose an evaluation methodology that is suitable in the absence of data on large-scale bioterrorist attacks and disease outbreaks.