Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 20
Filter
1.
PLoS One ; 18(12): e0295598, 2023.
Article in English | MEDLINE | ID: mdl-38064477

ABSTRACT

Tabular data is commonly used in business and literature and can be analyzed using tree-based Machine Learning (ML) algorithms to extract meaningful information. Deep Learning (DL) excels in data such as image, sound, and text, but it is less frequently utilized with tabular data. However, it is possible to use tools to convert tabular data into images for use with Convolutional Neural Networks (CNNs) which are powerful DL models for image classification. The goal of this work is to compare the performance of converters for tabular data into images, select the best one, optimize a CNN using random search, and compare it with an optimized ML algorithm, the XGBoost. Results show that even a basic CNN, with only 1 convolutional layer, can reach comparable metrics to the XGBoost, which was trained on the original tabular data and optimized with grid search and feature selection. However, further optimization of the CNN with random search did not significantly improve its performance.


Subject(s)
Arboviruses , Neural Networks, Computer , Algorithms , Machine Learning
3.
PLoS One ; 18(6): e0276150, 2023.
Article in English | MEDLINE | ID: mdl-37267293

ABSTRACT

BACKGROUND: Communicable diseases represent a huge economic burden for healthcare systems and for society. Sexually transmitted infections (STIs) are a concerning issue, especially in developing and underdeveloped countries, in which environmental factors and other determinants of health play a role in contributing to its fast spread. In light of this situation, machine learning techniques have been explored to assess the incidence of syphilis and contribute to the epidemiological surveillance in this scenario. OBJECTIVE: The main goal of this work is to evaluate the performance of different machine learning models on predicting undesirable outcomes of congenital syphilis in order to assist resources allocation and optimize the healthcare actions, especially in a constrained health environment. METHOD: We use clinical and sociodemographic data from pregnant women that were assisted by a social program in Pernambuco, Brazil, named Mãe Coruja Pernambucana Program (PMCP). Based on a rigorous methodology, we propose six experiments using three feature selection techniques to select the most relevant attributes, pre-process and clean the data, apply hyperparameter optimization to tune the machine learning models, and train and test models to have a fair evaluation and discussion. RESULTS: The AdaBoost-BODS-Expert model, an Adaptive Boosting (AdaBoost) model that used attributes selected by health experts, presented the best results in terms of evaluation metrics and acceptance by health experts from PMCP. By using this model, the results are more reliable and allows adoption on a daily usage to classify possible outcomes of congenital syphilis using clinical and sociodemographic data.


Subject(s)
Sexually Transmitted Diseases , Syphilis, Congenital , Syphilis , Female , Humans , Pregnancy , Syphilis, Congenital/epidemiology , Sexually Transmitted Diseases/epidemiology , Syphilis/epidemiology , Developing Countries , Incidence
5.
Sci Data ; 9(1): 771, 2022 12 15.
Article in English | MEDLINE | ID: mdl-36522386

ABSTRACT

After COVID-19, tuberculosis (TB) is the leading cause of death by an infectious disease in the world. This work presents a data set based on data collected from the Brazilian Information System for Notifiable Diseases (SINAN) for the period from January 2001 to April 2020 relating to patients diagnosed with tuberculosis in Brazil. The data from SINAN was pre-processed to generate a new data set with two distinct treatment outcome classes: CURED and DIED. The data set comprises 37 categorical attributes (including socio-demographic, clinical, and laboratory data) as well as the target class. There are 927,909 records of patients classified as CURED and 36,190 classified as DIED, totaling 964,099 records.


Subject(s)
Tuberculosis , Humans , Brazil/epidemiology , Information Systems , Prognosis , Tuberculosis/epidemiology , Tuberculosis/drug therapy
6.
BMC Med Inform Decis Mak ; 22(1): 334, 2022 12 19.
Article in English | MEDLINE | ID: mdl-36536413

ABSTRACT

BACKGROUND: Care during pregnancy, childbirth and puerperium are fundamental to avoid pathologies for the mother and her baby. However, health issues can occur during this period, causing misfortunes, such as the death of the fetus or neonate. Predictive models of fetal and infant deaths are important technological tools that can help to reduce mortality indexes. The main goal of this work is to present a systematic review of literature focused on computational models to predict mortality, covering stillbirth, perinatal, neonatal, and infant deaths, highlighting their methodology and the description of the proposed computational models. METHODS: We conducted a systematic review of literature, limiting the search to the last 10 years of publications considering the five main scientific databases as source. RESULTS: From 671 works, 18 of them were selected as primary studies for further analysis. We found that most of works are focused on prediction of neonatal deaths, using machine learning models (more specifically Random Forest). The top five most common features used to train models are birth weight, gestational age, sex of the child, Apgar score and mother's age. Having predictive models for preventing mortality during and post-pregnancy not only improve the mother's quality of life, as well as it can be a powerful and low-cost tool to decrease mortality ratios. CONCLUSION: Based on the results of this SRL, we can state that scientific efforts have been done in this area, but there are many open research opportunities to be developed by the community.


Subject(s)
Artificial Intelligence , Quality of Life , Humans , Infant , Pregnancy , Infant, Newborn , Child , Female , Stillbirth , Postpartum Period , Infant Death
7.
Rev Soc Bras Med Trop ; 55: e0420, 2022.
Article in English | MEDLINE | ID: mdl-35946631

ABSTRACT

BACKGROUND: Malaria is curable. Nonetheless, over 229 million cases of malaria were recorded in 2019, along with 409,000 deaths. Although over 42 million Brazilians are at risk of contracting malaria, 99% percent of all malaria cases in Brazil are located in or around the Amazon rainforest. Despite declining cases and deaths, malaria remains a major public health issue in Brazil. Accurate spatiotemporal prediction of malaria propagation may enable improved resource allocation to support efforts to eradicate the disease. METHODS: In response to calls for novel research on malaria elimination strategies that suit local conditions, in this study, we propose machine learning (ML) and deep learning (DL) models to predict the probability of malaria cases in the state of Amazonas. Using a dataset of approximately 6 million records (January 2003 to December 2018), we applied k-means clustering to group cities based on their similarity of malaria incidence. We evaluated random forest, long-short term memory (LSTM) and dated recurrent unit (GRU) models and compared their performance. RESULTS: The LSTM architecture achieved better performance in clusters with less variability in the number of cases, whereas the GRU presents better results in clusters with high variability. Although Diebold-Mariano testing suggested that both the LSTM and GRU performed comparably, GRU can be trained significantly faster, which could prove advantageous in practice. CONCLUSIONS: All models showed satisfactory accuracy and strong performance in predicting new cases of malaria, and each could serve as a supplemental tool to support regional policies and strategies.


Subject(s)
Deep Learning , Malaria , Brazil/epidemiology , Cities , Humans , Incidence , Malaria/epidemiology
9.
Sci Data ; 9(1): 198, 2022 05 10.
Article in English | MEDLINE | ID: mdl-35538103

ABSTRACT

One of the main categories of Neglected Tropical Diseases (NTDs) are arboviruses, of which Dengue and Chikungunya are the most common. Arboviruses mainly affect tropical countries. Brazil has the largest absolute number of cases in Latin America. This work presents a unified data set with clinical, sociodemographic, and laboratorial data on confirmed patients of Dengue and Chikungunya, as well as patients ruled out of infection from these diseases. The data is based on case notification data submitted to the Brazilian Information System for Notifiable Diseases, from Portuguese Sistema de Informação de Agravo de Notificação (SINAN), from 2013 to 2020. The original data set comprised 13,421,230 records and 118 attributes. Following a pre-processing process, a final data set of 7,632,542 records and 56 attributes was generated. The data presented in this work will assist researchers in investigating antecedents of arbovirus emergence and transmission more generally, and Dengue and Chikungunya in particular. Furthermore, it can be used to train and test machine learning models for differential diagnosis and multi-class classification.


Subject(s)
Arboviruses , Chikungunya Fever , Dengue , Zika Virus Infection , Brazil/epidemiology , Chikungunya Fever/epidemiology , Dengue/epidemiology , Humans , Neglected Diseases
10.
BMC Pregnancy Childbirth ; 22(1): 379, 2022 May 01.
Article in English | MEDLINE | ID: mdl-35501764

ABSTRACT

BACKGROUND: The Brazilian healthcare system is a large and complex system, specially considering its mixed public and private funding. The incidence of syphilis has increased in the last four years, in spite of the presence of an effective and available treatment. Furthermore, syphilis takes part in a group of disorders of compulsory notification to the public health surveillance. The epidemiological implications are especially important during pregnancy since it can lead to complications, related to prematurity stillbirth and miscarriage, in addition to congenital syphilis, characterized by multisystem involved in the newborn. METHODS: The Action Research methodology was applied to address the complexity of the syphilis surveillance scenario in Pernambuco, Brazil. Iterative learning cycles were used, resulting in six cycles, followed by a formal validation of an operational version of the syphilis Trigram visualisation at the end of the process. The original data source was analyzed and prepared to be used without any new data or change in the ordinary procedure of the current system. RESULTS: The main result of this work is the production of a Syphilis Trigram: a domain-specific infographic for presenting gestational data and birth data. The second contribution of this work is the Average Trigram, an organized pie chart which synthesizes the Syphilis Trigram relationship in an aggregated way. The visualization of both graphics is presented in an Infographic User Interface, a tool that gathers an infographic broad visualization aspect to data visualization. These interfaces also gather selections and filters tools to assist and refine the presented information. The user can experience a specific case-by-case view, in addition to an aggregated perspective according to the cities monitored by the system. CONCLUSIONS: The proposed domain-specific visualization amplifies the understanding of each syphilis case and the overall characteristics of cases of a chosen city. This new information produced by the Trigram can help clarify the reinfection/relapse cases, optimize resource allocation and enhance the syphilis healthcare policies without the need of new data. Thus, this enables the health surveillance professionals to see the broad tendency, understand the key patterns through visualization, and take action in a feasible time.


Subject(s)
Pregnancy Complications, Infectious , Syphilis, Congenital , Syphilis , Brazil/epidemiology , Child , Child Health , Female , Humans , Infant, Newborn , Pregnancy , Pregnancy Complications, Infectious/epidemiology , Syphilis/diagnosis , Syphilis/epidemiology , Syphilis, Congenital/epidemiology , Syphilis, Congenital/prevention & control
11.
PLoS Negl Trop Dis ; 16(1): e0010061, 2022 01.
Article in English | MEDLINE | ID: mdl-35025860

ABSTRACT

BACKGROUND: Neglected tropical diseases (NTDs) primarily affect the poorest populations, often living in remote, rural areas, urban slums or conflict zones. Arboviruses are a significant NTD category spread by mosquitoes. Dengue, Chikungunya, and Zika are three arboviruses that affect a large proportion of the population in Latin and South America. The clinical diagnosis of these arboviral diseases is a difficult task due to the concurrent circulation of several arboviruses which present similar symptoms, inaccurate serologic tests resulting from cross-reaction and co-infection with other arboviruses. OBJECTIVE: The goal of this paper is to present evidence on the state of the art of studies investigating the automatic classification of arboviral diseases to support clinical diagnosis based on Machine Learning (ML) and Deep Learning (DL) models. METHOD: We carried out a Systematic Literature Review (SLR) in which Google Scholar was searched to identify key papers on the topic. From an initial 963 records (956 from string-based search and seven from a single backward snowballing procedure), only 15 relevant papers were identified. RESULTS: Results show that current research is focused on the binary classification of Dengue, primarily using tree-based ML algorithms. Only one paper was identified using DL. Five papers presented solutions for multi-class problems, covering Dengue (and its variants) and Chikungunya. No papers were identified that investigated models to differentiate between Dengue, Chikungunya, and Zika. CONCLUSIONS: The use of an efficient clinical decision support system for arboviral diseases can improve the quality of the entire clinical process, thus increasing the accuracy of the diagnosis and the associated treatment. It should help physicians in their decision-making process and, consequently, improve the use of resources and the patient's quality of life.


Subject(s)
Arbovirus Infections/diagnosis , Chikungunya Fever/diagnosis , Decision Support Systems, Clinical , Dengue/diagnosis , Zika Virus Infection/diagnosis , Aedes/virology , Animals , Arbovirus Infections/drug therapy , Arbovirus Infections/virology , Chikungunya Fever/drug therapy , Chikungunya virus , Deep Learning , Dengue/drug therapy , Dengue Virus , Humans , Mosquito Vectors/virology , Neglected Diseases/virology , South America , Zika Virus , Zika Virus Infection/drug therapy
12.
Scientometrics ; 127(3): 1609-1642, 2022.
Article in English | MEDLINE | ID: mdl-35068619

ABSTRACT

The mapping and analysis of scientific knowledge makes it possible to identify the dynamics and/or growth of a particular field of research or to support strategic decisions related to different research entities, based on bibliometric and/or scientometric indicators. However, with the exponential growth of scientific production, a systematic and data-oriented approach to the analysis of this large set of productions becomes increasingly essential. Thus, in this work, a data-oriented methodology was proposed, combining Data Analysis, Machine Learning and Complex Network Analysis techniques, and Data Version Control (DVC) tool, for the extraction of implicit knowledge in scientific production bases. In addition, the approach was validated through a case study in a COVID-19 manuscripts dataset, which had 199,895 articles published on arXiv, bioRxiv, medRxiv, PubMed and Scopus databases. The results suggest the feasibility of the proposed methodology, indicating the most active countries and the most explored themes in each period of the pandemic. Therefore, this study has the potential to instrument and expand strategic decisions by the scientific community, aiming at extracting knowledge that supports the fight against the COVID-19 pandemic.

13.
Rev. Soc. Bras. Med. Trop ; 55: e0420, 2022. tab, graf
Article in English | LILACS-Express | LILACS | ID: biblio-1387531

ABSTRACT

ABSTRACT Background: Malaria is curable. Nonetheless, over 229 million cases of malaria were recorded in 2019, along with 409,000 deaths. Although over 42 million Brazilians are at risk of contracting malaria, 99% percent of all malaria cases in Brazil are located in or around the Amazon rainforest. Despite declining cases and deaths, malaria remains a major public health issue in Brazil. Accurate spatiotemporal prediction of malaria propagation may enable improved resource allocation to support efforts to eradicate the disease. Methods: In response to calls for novel research on malaria elimination strategies that suit local conditions, in this study, we propose machine learning (ML) and deep learning (DL) models to predict the probability of malaria cases in the state of Amazonas. Using a dataset of approximately 6 million records (January 2003 to December 2018), we applied k-means clustering to group cities based on their similarity of malaria incidence. We evaluated random forest, long-short term memory (LSTM) and dated recurrent unit (GRU) models and compared their performance. Results: The LSTM architecture achieved better performance in clusters with less variability in the number of cases, whereas the GRU presents better results in clusters with high variability. Although Diebold-Mariano testing suggested that both the LSTM and GRU performed comparably, GRU can be trained significantly faster, which could prove advantageous in practice. Conclusions: All models showed satisfactory accuracy and strong performance in predicting new cases of malaria, and each could serve as a supplemental tool to support regional policies and strategies.

14.
Malar J ; 20(1): 431, 2021 Oct 30.
Article in English | MEDLINE | ID: mdl-34717641

ABSTRACT

BACKGROUND: Although considerable success in reducing the incidence of malaria has been achieved in Brazil in recent years, an increase in the proportion of cases caused by the harder-to-eliminate Plasmodium vivax parasite can be noted. Recurrences in P. vivax malaria cases are due to new mosquito-bite infections, drug resistance or especially from relapses arising from hypnozoites. As such, new innovative surveillance strategies are needed. The aim of this study was to develop an infographic visualization tool to improve individual-level malaria surveillance focused on malaria elimination in the Brazilian Amazon. METHODS: Action Research methodology was employed to deal with the complex malaria surveillance problem in the Amazon region. Iterative cycles were used, totalling four cycles with a formal validation of an operational version of the Malaria Trigram tool at the end of the process. Further probabilistic data linkage was carried out so that information on the same patients could be linked, allowing for follow-up analysis since the official system was not planned in such way that includes this purpose. RESULTS: An infographic user interface was developed for the Malaria Trigram that incorporates all the visual and descriptive power of the Trigram concept. It is a multidimensional and interactive historical representation of malaria cases per patient over time and provides visual input to decision-makers on recurrences of malaria. CONCLUSIONS: The Malaria Trigram is aimed to help public health professionals and policy makers to recognise and analyse different types of patterns in malaria events, including recurrences and reinfections, based on the current Brazilian health surveillance system, the SIVEP-Malária system, with no additional primary data collection or change in the current process. By using the Malaria Trigram, it is possible to plan and coordinate interventions for malaria elimination that are integrated with other parallel actions in the Brazilian Amazon region, such as vector control management, effective drug and vaccine deployment strategies.


Subject(s)
Data Visualization , Disease Eradication/statistics & numerical data , Epidemiological Monitoring , Malaria, Vivax/prevention & control , Population Surveillance/methods , Brazil , Humans , Plasmodium vivax , Recurrence
15.
Data Brief ; 33: 106554, 2020 Dec.
Article in English | MEDLINE | ID: mdl-33344736

ABSTRACT

Mercosur (a.k.a. Mercosul) is a trade bloc comprising five South American countries. In 2018, a unified Mercosur license plate model was rolled out. Access to large volumes of ground truth Mercosur license plates with sufficient presentation variety is a significant challenge for training supervised models for license plate detection (LPD) in automatic license plate recognition (ALPR) systems. To address this problem, a Mercosur license plate generator was developed to generate artificial license plate images meeting the new standard with sufficient variety for ALPR training purposes. This includes images with variation due to occlusions and environmental conditions. An embedded system was developed for detecting legacy license plates in images of real scenarios and overwriting these with artificially generated Mercosur license plates. This data set comprises 3,829 images of vehicles with synthetic license plates that meet the new Mercosur standard in real scenarios, and equivalent number of text files containing label information for the images, all organized in a CSV file with compiled image file paths and associated labels.

16.
Article in English | MEDLINE | ID: mdl-33218105

ABSTRACT

Over 2.8 million people die each year from being overweight or obese, a largely preventable disease. Social media has fundamentally changed the way we communicate, collaborate, consume, and create content. The ease with which content can be shared has resulted in a rapid increase in the number of individuals or organisations that seek to influence opinion and the volume of content that they generate. The nutrition and diet domain is not immune to this phenomenon. Unfortunately, from a public health perspective, many of these 'influencers' may be poorly qualified in order to provide nutritional or dietary guidance, and advice given may be without accepted scientific evidence and contrary to public health policy. In this preliminary study, we analyse the 'healthy diet' discourse on Twitter. While using a multi-component analytical approach, we analyse more than 1.2 million English language tweets over a 16-month period in order to identify and characterise the influential actors and discover topics of interest in the discourse. Our analysis suggests that the discourse is dominated by non-health professionals. There is widespread use of bots that pollute the discourse and seek to create a false equivalence on the efficacy of a particular nutritional strategy or diet. Topic modelling suggests a significant focus on diet, nutrition, exercise, weight, disease, and quality of life. Public health policy makers and professional nutritionists need to consider what interventions can be taken in order to counteract the influence of non-professional and bad actors on social media.


Subject(s)
Diet, Healthy , Social Media , Diet, Healthy/statistics & numerical data , Exercise , Humans , Quality of Life , Social Media/statistics & numerical data
17.
Data Brief ; 32: 106178, 2020 Oct.
Article in English | MEDLINE | ID: mdl-32837978

ABSTRACT

COVID-2019 has been recognized as a global threat, and several studies are being conducted in order to contribute to the fight and prevention of this pandemic. This work presents a scholarly production dataset focused on COVID-19, providing an overview of scientific research activities, making it possible to identify countries, scientists and research groups most active in this task force to combat the coronavirus disease. The dataset is composed of 40,212 records of articles' metadata collected from Scopus, PubMed, arXiv and bioRxiv databases from January 2019 to July 2020. Those data were extracted by using the techniques of Python Web Scraping and preprocessed with Pandas Data Wrangling. In addition, the pipeline to preprocess and generate the dataset are versioned with the Data Version Control tool (DVC) and are thus easily reproducible and auditable.

19.
Data Brief ; 26: 104223, 2019 Oct.
Article in English | MEDLINE | ID: mdl-31508461

ABSTRACT

The data set is composed of 2285 definitions posted on the Urban Dictionary platform from 1999 to May 2016. The data was classified as misogynistic and non-misogynistic by three independent researchers with domain knowledge. The data set is available in public repository in a table containing two columns: the text-based definition from Urban Dictionary and its respective classification (1 for misogynistic and 0 for non-misogynistic).

20.
Sensors (Basel) ; 19(7)2019 Apr 06.
Article in English | MEDLINE | ID: mdl-30959877

ABSTRACT

Human falls are a global public health issue resulting in over 37.3 million severe injuries and 646,000 deaths yearly. Falls result in direct financial cost to health systems and indirectly to society productivity. Unsurprisingly, human fall detection and prevention are a major focus of health research. In this article, we consider deep learning for fall detection in an IoT and fog computing environment. We propose a Convolutional Neural Network composed of three convolutional layers, two maxpool, and three fully-connected layers as our deep learning model. We evaluate its performance using three open data sets and against extant research. Our approach for resolving dimensionality and modelling simplicity issues is outlined. Accuracy, precision, sensitivity, specificity, and the Matthews Correlation Coefficient are used to evaluate performance. The best results are achieved when using data augmentation during the training process. The paper concludes with a discussion of challenges and future directions for research in this domain.


Subject(s)
Accidental Falls , Neural Networks, Computer , Algorithms , Biosensing Techniques/methods , Deep Learning , Humans , Machine Learning
SELECTION OF CITATIONS
SEARCH DETAIL
...