Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Entropy (Basel) ; 23(4)2021 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-33921188

RESUMO

In the last decades, the development of interconnectivity, pervasive systems, citizen sensors, and Big Data technologies allowed us to gather many data from different sources worldwide. This phenomenon has raised privacy concerns around the globe, compelling states to enforce data protection laws. In parallel, privacy-enhancing techniques have emerged to meet regulation requirements allowing companies and researchers to exploit individual data in a privacy-aware way. Thus, data curators need to find the most suitable algorithms to meet a required trade-off between utility and privacy. This crucial task could take a lot of time since there is a lack of benchmarks on privacy techniques. To fill this gap, we compare classical approaches of privacy techniques like Statistical Disclosure Control and Differential Privacy techniques to more recent techniques such as Generative Adversarial Networks and Machine Learning Copies using an entire commercial database in the current effort. The obtained results allow us to show the evolution of privacy techniques and depict new uses of the privacy-aware Machine Learning techniques.

2.
Remote Sens (Basel) ; 15(11): 2775, 2023 May 26.
Artigo em Inglês | MEDLINE | ID: mdl-37324796

RESUMO

Disease control programs are needed to identify the breeding sites of mosquitoes, which transmit malaria and other diseases, in order to target interventions and identify environmental risk factors. The increasing availability of very-high-resolution drone data provides new opportunities to find and characterize these vector breeding sites. Within this study, drone images from two malaria-endemic regions in Burkina Faso and Côte d'Ivoire were assembled and labeled using open-source tools. We developed and applied a workflow using region-of-interest-based and deep learning methods to identify land cover types associated with vector breeding sites from very-high-resolution natural color imagery. Analysis methods were assessed using cross-validation and achieved maximum Dice coefficients of 0.68 and 0.75 for vegetated and non-vegetated water bodies, respectively. This classifier consistently identified the presence of other land cover types associated with the breeding sites, obtaining Dice coefficients of 0.88 for tillage and crops, 0.87 for buildings and 0.71 for roads. This study establishes a framework for developing deep learning approaches to identify vector breeding sites and highlights the need to evaluate how results will be used by control programs.

3.
PLoS One ; 16(1): e0244409, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33507933

RESUMO

El Niño is an extreme weather event featuring unusual warming of surface waters in the eastern equatorial Pacific Ocean. This phenomenon is characterized by heavy rains and floods that negatively affect the economic activities of the impacted areas. Understanding how this phenomenon influences consumption behavior at different granularity levels is essential for recommending strategies to normalize the situation. With this aim, we performed a multi-scale analysis of data associated with bank transactions involving credit and debit cards. Our findings can be summarized into two main results: Coarse-grained analysis reveals the presence of the El Niño phenomenon and the recovery time in a given territory, while fine-grained analysis demonstrates a change in individuals' purchasing patterns and in merchant relevance as a consequence of the climatic event. The results also indicate that society successfully withstood the natural disaster owing to the economic structure built over time. In this study, we present a new method that may be useful for better characterizing future extreme events.


Assuntos
Comportamento do Consumidor , El Niño Oscilação Sul , Análise por Conglomerados , Humanos , Peru , Fatores de Tempo
4.
Artif Intell Med ; 117: 102096, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-34127235

RESUMO

BACKGROUND: Internet provides different tools for communicating with patients, such as social media (e.g., Twitter) and email platforms. These platforms provided new data sources to shed lights on patient experiences with health care and improve our understanding of patient-provider communication. Several existing topic modeling and document clustering methods have been adapted to analyze these new free-text data automatically. However, both tweets and emails are often composed of short texts; and existing topic modeling and clustering approaches have suboptimal performance on these short texts. Moreover, research over health-related short texts using these methods has become difficult to reproduce and benchmark, partially due to the absence of a detailed comparison of state-of-the-art topic modeling and clustering methods on these short texts. METHODS: We trained eight state-of- the-art topic modeling and clustering algorithms on short texts from two health-related datasets (tweets and emails): Latent Semantic Indexing (LSI), Latent Dirichlet Allocation (LDA), LDA with Gibbs Sampling (GibbsLDA), Online LDA, Biterm Model (BTM), Online Twitter LDA, and Gibbs Sampling for Dirichlet Multinomial Mixture (GSDMM), as well as the k-means clustering algorithm with two different feature representations: TF-IDF and Doc2Vec. We used cluster validity indices to evaluate the performance of topic modeling and clustering: two internal indices (i.e. assessing the goodness of a clustering structure without external information) and five external indices (i.e. comparing the results of a cluster analysis to an externally known provided class labels). RESULTS: In overall, for number of clusters (k) from 2 to 50, Online Twitter LDA and GSDMM achieved the best performance in terms of internal indices, while LSI and k-means with TF-IDF had the highest external indices. Also, of all tweets (N = 286, 971; HPV represents 94.6% of tweets and lynch syndrome represents 5.4%), for k = 2, most of the methods could respect this initial clustering distribution. However, we found model performance varies with the source of data and hyper-parameters such as the number of topics and the number of iterations used to train the models. We also conducted an error analysis using the Hamming loss metric, for which the poorest value was obtained by GSDMM on both datasets. CONCLUSIONS: Researchers hoping to group or classify health related short-text data can expect to select the most suitable topic modeling and clustering methods for their specific research questions. Therefore, we presented a comparison of the most common used topic modeling and clustering algorithms over two health-related, short-text datasets using both internal and external clustering validation indices. Internal indices suggested Online Twitter LDA and GSDMM as the best, while external indices suggested LSI and k-means with TF-IDF as the best. In summary, our work suggested researchers can improve their analysis of model performance by using a variety of metrics, since there is not a single best metric.


Assuntos
Correio Eletrônico , Mídias Sociais , Análise por Conglomerados , Comunicação , Humanos , Aprendizado de Máquina
5.
J Environ Public Health ; 2021: 3220244, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34759971

RESUMO

Land-use practices such as agriculture can impact mosquito vector breeding ecology, resulting in changes in disease transmission. The typical breeding habitats of Africa's second most important malaria vector Anopheles funestus are large, semipermanent water bodies, which make them potential candidates for targeted larval source management. This is a technical workflow for the integration of drone surveys and mosquito larval sampling, designed for a case study aiming to characterise An. funestus breeding sites near two villages in an agricultural setting in Côte d'Ivoire. Using satellite remote sensing data, we developed an environmentally and spatially representative sampling frame and conducted paired mosquito larvae and drone mapping surveys from June to August 2021. To categorise the drone imagery, we also developed a land cover classification scheme with classes relative to An. funestus breeding ecology. We sampled 189 potential breeding habitats, of which 119 (63%) were positive for the Anopheles genus and nine (4.8%) were positive for An. funestus. We mapped 30.42 km2 of the region of interest including all water bodies which were sampled for larvae. These data can be used to inform targeted vector control efforts, although its generalisability over a large region is limited by the fine-scale nature of this study area. This paper develops protocols for integrating drone surveys and statistically rigorous entomological sampling, which can be adjusted to collect data on vector breeding habitats in other ecological contexts. Further research using data collected in this study can enable the development of deep-learning algorithms for identifying An. funestus breeding habitats across rural agricultural landscapes in Côte d'Ivoire and the analysis of risk factors for these sites.


Assuntos
Anopheles , Malária , Agricultura , Animais , Côte d'Ivoire , Ecossistema , Larva , Mosquitos Vetores , Estações do Ano , Fluxo de Trabalho
6.
Artigo em Inglês | MEDLINE | ID: mdl-35462884

RESUMO

The COVID-19 crisis has produced worldwide changes from people's lifestyles to travel restrictions imposed by world's nations aiming to keep the virus out. Several countries have created digital information applications to help control and manage the COVID-19 crisis, such as the creation of contact tracing apps. The Peruvian government in collaboration with several institutions developed PerúEnTusManos, an epidemiological tracing application. The application uses georeferencing to study users' movements and creates individual mobility patterns from the Peruvian citizens as well as detects crowds. In this article, we present a process to detect possible infected individuals based on probabilities assigned to people that had contact with someone who tested positive for COVID-19, using data collected from PerúEnTusManos. The preliminary evaluation shows promising results when detecting probabilities of possible infected individuals as well as the most infected districts in Peru. The ultimate goal of the application in Peru is to provide reliable information to health authorities to make informed decisions about the assignations of the available clinical tests and the economic re-activation.

7.
Artigo em Inglês | MEDLINE | ID: mdl-35463811

RESUMO

Twitter became the most popular form of social interactions in the healthcare domain. Thus, various teams have evaluated Twitter as an additional source where patients share information about their healthcare with the potential goal to improve their outcomes. Several existing topic modeling and document clustering applications have been adapted to assess tweets showing that the performances of the applications are negatively affected due to the nature and characteristics of tweets. Moreover, Twitter health research has become difficult to measure because of the absence of comparisons between the existing applications. In this paper, we perform an evaluation based on internal indexes of different topic modeling and document clustering applications over two Twitter health-related datasets. Our results show that Online Twitter LDA and Gibbs LDA get a better performance for extracting topics and grouping tweets. We want to provide health practitioners this comparison to select the most suitable application for their tasks.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA