ABSTRACT
Objective: The European Health Data Space (EHDS) shapes the digital transformation of healthcare in Europe. The EHDS regulation will also accelerate the use of health data for research, innovation, policy-making, and regulatory activities for secondary use of data (known as EHDS2). The Integration of heterogeneous Data and Evidence towards Regulatory and HTA Acceptance (IDERHA) project builds one of the first pan-European health data spaces in alignment with the EHDS2 requirements, addressing lung cancer as a pilot. Methods: In this study, we conducted a comprehensive review of the EHDS regulation, technical requirements for EHDS2, and related projects. We also explored the results of the Joint Action Towards the European Health Data Space (TEHDAS) to identify the framework of IDERHA's alignment with EHDS2. We also conducted an internal webinar and an external workshop with EHDS experts to share expertise on the EHDS requirements and challenges. Results: We identified the lessons learned from the existing projects and the minimum-set of requirements for aligning IDERHA infrastructure with EHDS2, including user journey, concepts, terminologies, and standards. The IDERHA framework (i.e., platform architecture, standardization approaches, documentation, etc.) is being developed accordingly. Discussion: The IDERHA's alignment plan with EHDS2 necessitates the implementation of three categories of standardization for: data discoverability: Data Catalog Vocabulary (DCAT-AP), enabling semantics interoperability: Observational Medical Outcomes Partnership (OMOP), and health data exchange (DICOM and FHIR). The main challenge is that some standards are still being refined, e.g., the extension of the DCAT-AP (HealthDCAT-AP). Additionally, extensions to the Observational Health Data Sciences and Informatics (OHDSI) OMOP Common Data Model (CDM) to represent the patient-generated health data are still needed. Finally, proper mapping between standards (FHIR/OMOP) is a prerequisite for proper data exchange. Conclusions: The IDERHA's plan and our collaboration with other EHDS initiatives/projects are critical in advancing the implementation of EHDS2.
ABSTRACT
Geographic Question Answering (GeoQA) systems can automatically answer questions phrased in natural language. Potentially this may enable data analysts to make use of geographic information without requiring any GIS skills. However, going beyond the retrieval of existing geographic facts on particular places remains a challenge. Current systems usually cannot handle geo-analytical questions that require GIS analysis procedures to arrive at answers. To enable geo-analytical QA, GeoQA systems need to interpret questions in terms of a transformation that can be implemented in a GIS workflow. To this end, we propose a novel approach to question parsing that interprets questions in terms of core concepts of spatial information and their functional roles in context-free grammar. The core concepts help model spatial information in questions independently from implementation formats, and their functional roles indicate how concepts are transformed and used in a workflow. Using our parser, geo-analytical questions can be converted into expressions of concept transformations corresponding to abstract GIS workflows. We developed our approach on a corpus of 309 GIS-related questions and tested it on an independent source of 134 test questions including workflows. The evaluation results show high precision and recall on a gold standard of concept transformations.
ABSTRACT
Spatial network analysis is a collection of methods for measuring accessibility potentials as well as for analyzing flows over transport networks. Though it has been part of the practice of geographic information systems for a long time, designing network analytical workflows still requires a considerable amount of expertise. In principle, artificial intelligence methods for workflow synthesis could be used to automate this task. This would improve the (re)usability of analytic resources. However, though underlying graph algorithms are well understood, we still lack a conceptual model that captures the required methodological know-how. The reason is that in practice this know-how goes beyond graph theory to a significant extent. In this article we suggest interpreting spatial networks in terms of quantified relations between spatial objects, where both the objects themselves and their relations can be quantified in an extensive or an intensive manner. Using this model, it becomes possible to effectively organize data sources and network functions towards common analytical goals for answering questions. We tested our model on 12 analytical tasks, and evaluated automatically synthesized workflows with network experts. Results show that standard data models are insufficient for answering questions, and that our model adds information crucial for understanding spatial network functionality.
ABSTRACT
With ever more people living in cities worldwide, it becomes increasingly important to understand and improve the impact of the urban habitat on livability, health behaviors, and health outcomes. However, implementing interventions that tackle the exposome in complex urban systems can be costly and have long-term, sometimes unforeseen, impacts. Hence, it is crucial to assess the health impact, cost-effectiveness, and social distributional impacts of possible urban exposome interventions (UEIs) before implementing them. Spatial agent-based modeling (ABM) can capture complex behavior-environment interactions, exposure dynamics, and social outcomes in a spatial context. This article discusses model architectures and methodological challenges for successfully modeling UEIs using spatial ABM. We review the potential and limitations of the method; model components required to capture active and passive exposure and intervention effects; human-environment interactions and their integration into the macro-level health impact assessment and social costs benefit analysis; and strategies for model calibration. Major challenges for a successful application of ABM to UEI assessment are (1) the design of realistic behavioral models that can capture different types of exposure and that respond to urban interventions, (2) the mismatch between the possible granularity of exposure estimates and the evidence for corresponding exposure-response functions, (3) the scalability issues that emerge when aiming to estimate long-term effects such as health and social impacts based on high-resolution models of human-environment interactions, (4) as well as the data- and computational complexity of calibrating the resulting agent-based model. Although challenges exist, strategies are proposed to improve the implementation of ABM in exposome research.
ABSTRACT
Loose programming enables analysts to program with concepts instead of procedural code. Data transformations are left underspecified, leaving out procedural details and exploiting knowledge about the applicability of functions to data types. To synthesize workflows of high quality for a geo-analytical task, the semantic type system needs to reflect knowledge of geographic information systems (GIS) at a level that is deep enough to capture geo-analytical concepts and intentions, yet shallow enough to generalize over GIS implementations. Recently, core concepts of spatial information and related geo-analytical concepts were proposed as a way to add the required abstraction level to current geodata models. The core concept data types (CCD) ontology is a semantic type system that can be used to constrain GIS functions for workflow synthesis. However, to date, it is unknown what gain in precision and workflow quality can be expected. In this article we synthesize workflows by annotating GIS tools with these types, specifying a range of common analytical tasks taken from an urban livability scenario. We measure the quality of automatically synthesized workflows against a benchmark generated from common data types. Results show that CCD concepts significantly improve the precision of workflow synthesis.
ABSTRACT
"Data Science" has taken many disciplines by storm. And for a good reason: New forms and unseen quantities of data enter nearly every scientific field, substantially changing the ways how scientists do science, and potentially allowing them to answer old questions or to pose them in novel ways. The recent success of Data Science is also reflected in corresponding study programs and curricula and the emergence of specialized branches, such as Geographic Data Science (GDS). Some researchers, therefore, claim that Data Science and GDS should be treated as autonomous scientific disciplines, while others fear that it sells nothing but old wine in new bottles. In an attempt to sober the discussion, we investigate GDS and Data Science from the perspective of meta-science. We provide arguments why today's GDS and Data Science should be seen as an interdisciplinary community of practice of data-driven scientists, rather than a scientific discipline. We also discuss what is missing for GDS and Data Science to become genuine scientific disciplines.
ABSTRACT
Running is a popular form of physical activity. Personal, social, and environmental determinants influence the engagement of the individual. To get insight in the relation between running behavior and external situations for different types of users, we carried out an extensive data mining study on large-scale datasets. We combined 4 years of historical running data (collected by a mobile exercise application from over 10K participants) with weather, topographical and demographical datasets. We introduce weighted frequent item mining for the analysis of the data. In this way, we capture temporal and environmental situations that frequently associate with different running performances. The results show that specific temporal and environmental situations (hour in a day, day in a week, temperature, distance to residential areas, and population density) influence the running performance of users more than other situational features. Hierarchical agglomerative clustering on the running data is used to split runners in two clusters (with sustained and less sustained running behavior). We compared the two groups of runners and found that runners with less sustained behavior are more sensitive to the environmental situations (especially several weather and location related features, such as temperature, weather type, distance to the nearest park) than regular runners. Further analysis focused on the situational features for the less sustained runners. Results show that specific feature values correspond to a better or worse running distance. Not only the influence of individual features was examined but also the interplay between features. Our findings provide important empirical evidence that the role of external situations in the running behavior of individuals can be derived from analysis of the combined historical datasets. This opens up a large potential to take those situations specifically into consideration when supporting individuals which show less sustained behavior.
Subject(s)
Mobile Applications , Humans , Machine LearningABSTRACT
BACKGROUND: Our understanding of how food choices are affected by exposure to the food environment is limited, and there are important gaps in the literature. Recently developed smartphone-based technologies, including global positioning systems and ecological momentary assessment, enable these gaps to be filled. OBJECTIVE: We present the FoodTrack study design and methods, as well as participants' compliance with the study protocol and their experiences with the app. We propose future analyses of the data to examine individual food environmental exposure taking into account the accessible food environment and individual time constraints; to assess people's food choices in relation to food environmental exposure; and to examine the moderating role of individual and contextual determinants of food purchases and consumption. METHODS: We conducted a 7-day observational study among adults (25-45 years of age) living in urban areas in the Netherlands. Participants completed a baseline questionnaire, used an app (incorporating global positioning system tracking and ecological momentary assessment) for 7 days, and then completed a closing survey. The app automatically collected global positioning system tracking data, and participants uploaded information on all food purchases over the 7-day period into the app. Participants also answered questions on contextual or individual purchase-related determinants directly after each purchase. During the final 3 days of the study, the participants also uploaded data on fruit, vegetable, and snack consumption and answered similar ecological momentary assessment questions after each intake. RESULTS: In total, 140 participants completed the study. More than half of the participants said they liked the app (81/140, 57.9%) and found it easy to use (75/140, 53.6%). Of the 140 participants, 126 (90.0%) said that they had collected data on all or almost all purchases and intakes during the 7-day period. Most found the additional ecological momentary assessment questions "easy to answer" (113/140, 80.7%) with "no effort" (99/140, 70.7%). Of 106 participants who explored their trips in the app, 20 (18.8%) had trouble with their smartphone's global positioning system tracking function. Therefore, we will not be able to include all participants in some of the proposed analyses, as we lack these data. We are analyzing data from the first study aim and we expect to publish the results in the spring of 2020. CONCLUSIONS: Participants perceived the FoodTrack app as a user-friendly tool. The app is particularly useful for observational studies that aim to gain insight into daily food environment exposure and food choices. Further analyses of the FoodTrack study data will provide novel insights into individual food environmental exposure, evidence on the individual food environment-diet interaction, and insights into the underlying individual and contextual mechanisms of food purchases and consumption. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): DERR1-10.2196/15283.