RESUMO
Human context recognition (HCR) using sensor data is a crucial task in Context-Aware (CA) applications in domains such as healthcare and security. Supervised machine learning HCR models are trained using smartphone HCR datasets that are scripted or gathered in-the-wild. Scripted datasets are most accurate because of their consistent visit patterns. Supervised machine learning HCR models perform well on scripted datasets but poorly on realistic data. In-the-wild datasets are more realistic, but cause HCR models to perform worse due to data imbalance, missing or incorrect labels, and a wide variety of phone placements and device types. Lab-to-field approaches learn a robust data representation from a scripted, high-fidelity dataset, which is then used for enhancing performance on a noisy, in-the-wild dataset with similar labels. This research introduces Triplet-based Domain Adaptation for Context REcognition (Triple-DARE), a lab-to-field neural network method that combines three unique loss functions to enhance intra-class compactness and inter-class separation within the embedding space of multi-labeled datasets: (1) domain alignment loss in order to learn domain-invariant embeddings; (2) classification loss to preserve task-discriminative features; and (3) joint fusion triplet loss. Rigorous evaluations showed that Triple-DARE achieved 6.3% and 4.5% higher F1-score and classification, respectively, than state-of-the-art HCR baselines and outperformed non-adaptive HCR models by 44.6% and 10.7%, respectively.
Assuntos
Redes Neurais de Computação , Aprendizado de Máquina Supervisionado , Humanos , Aclimatação , Registros , SmartphoneRESUMO
Diseases caused by the consumption of food are a significant but avoidable public health issue, and identifying the source of contamination is a key step in an outbreak investigation to prevent foodborne illnesses. Historical foodborne outbreaks provide rich data on critical attributes such as outbreak factors, food vehicles, and etiologies, and an improved understanding of the relationships between these attributes could provide insights for developing effective food safety interventions. The purpose of this study was to identify hidden patterns underlying the relations between the critical attributes involved in historical foodborne outbreaks through data mining approaches. A statistical analysis was used to identify the associations between outbreak factors and food sources, and the factors that were strongly significant were selected as predictive factors for food vehicles. A multinomial prediction model was built based on factors selected for predicting "simple" foods (beef, dairy, and vegetables) as sources of outbreaks. In addition, the relations between the food vehicles and common etiologies were investigated through text mining approaches (support vector machines, logistic regression, random forest, and naïve Bayes). A support vector machine model was identified as the optimal model to predict etiologies from the occurrence of food vehicles. Association rules also indicated the specific food vehicles that have strong relations to the etiologies. Meanwhile, a food ingredient network describing the relationships between foods and ingredients was constructed and used with Monte Carlo simulation to predict possible ingredients from foods that cause an outbreak. The simulated results were confirmed with foods and ingredients that are already known to cause historical foodborne outbreaks. The method could provide insights into the prediction of the possible ingredient sources of contamination when given the name of a food. The results could provide insights into the early identification of food sources of contamination and assist in future outbreak investigations. The data-driven approach will provide a new perspective and strategies for discovering hidden knowledge from massive data.
RESUMO
Given that depression is one of the most prevalent mental illnesses, developing effective and unobtrusive diagnosis tools is of great importance. Recent work that screens for depression with text messages leverage models relying on lexical category features. Given the colloquial nature of text messages, the performance of these models may be limited by formal lexicons. We thus propose a strategy to automatically construct alternative lexicons that contain more relevant and colloquial terms. Specifically, we generate 36 lexicons from fiction, forum, and news corpuses. These lexicons are then used to extract lexical category features from the text messages. We utilize machine learning models to compare the depression screening capabilities of these lexical category features. Out of our 36 constructed lexicons, 14 achieved statistically significantly higher average F1 scores over the pre-existing formal lexicon and basic bag-of-words approach. In comparison to the pre-existing lexicon, our best performing lexicon increased the average F1 scores by 10%. We thus confirm our hypothesis that less formal lexicons can improve the performance of classification models that screen for depression with text messages. By providing our automatically constructed lexicons, we aid future machine learning research that leverages less formal text.
Assuntos
Depressão , Transtornos Mentais , Envio de Mensagens de Texto , Humanos , Depressão/diagnóstico , Aprendizado de Máquina , Transtornos Mentais/diagnósticoRESUMO
Foodborne diseases and outbreaks are significant threats to public health, resulting in millions of illnesses and deaths worldwide each year. Traditional foodborne disease surveillance systems rely on data from healthcare facilities, laboratories, and government agencies to monitor and control outbreaks. Recently, there is a growing recognition of the potential value of incorporating social media data into surveillance systems. This paper explores the use of social media data as an alternative surveillance tool for foodborne diseases by collecting large-scale Twitter data, building food safety data storage models, and developing a novel frontend foodborne illness surveillance system. Descriptive and predictive analyses of the collected data were conducted in comparison with ground truth data reported by the U.S. Centers for Disease Control and Prevention (CDC). The results indicate that the most implicated food categories and the distributions from both Twitter and the CDC were similar. The system developed with Twitter data could complement traditional foodborne disease surveillance systems by providing near-real-time information on foodborne illnesses, implicated foods, symptoms, locations, and other information critical for detecting a potential foodborne outbreak.
RESUMO
Depression is among the most prevalent mental health disorders with increasing prevalence worldwide. While early detection is critical for the prognosis of depression treatment, detecting depression is challenging. Previous deep learning research has thus begun to detect depression with the transcripts of clinical interview questions. Since approaches using Bidirectional Encoder Representations from Transformers (BERT) have demonstrated particular promise, we hypothesize that ensembles of BERT variants will improve depression detection. Thus, in this research, we compare the depression classification abilities of three BERT variants and four ensembles of BERT variants on the transcripts of responses to 12 clinical interview questions. Specifically, we implement the ensembles with different ensemble strategies, number of model components, and architectural layer combinations. Our results demonstrate that ensembles increase mean F1 scores and robustness across clinical interview data. Clinical relevance- This research highlights the potential of ensembles to detect depression with text which is important to guide future development of healthcare application ecosystems.
Assuntos
Depressão , Transtornos Mentais , Depressão/diagnóstico , Ecossistema , Fontes de Energia Elétrica , Instalações de Saúde , HumanosRESUMO
Foodborne outbreaks are a serious but preventable threat to public health that often lead to illness, loss of life, significant economic loss, and the erosion of consumer confidence. Understanding how consumers respond when interacting with foods, as well as extracting information from posts on social media may provide new means of reducing the risks and curtailing the outbreaks. In recent years, Twitter has been employed as a new tool for identifying unreported foodborne illnesses. However, there is a huge gap between the identification of sporadic illnesses and the early detection of a potential outbreak. In this work, the dual-task BERTweet model was developed to identify unreported foodborne illnesses and extract foodborne-illness-related entities from Twitter. Unlike previous methods, our model leveraged the mutually beneficial relationships between the two tasks. The results showed that the F1-score of relevance prediction was 0.87, and the F1-score of entity extraction was 0.61. Key elements such as time, location, and food detected from sentences indicating foodborne illnesses were used to analyze potential foodborne outbreaks in massive historical tweets. A case study on tweets indicating foodborne illnesses showed that the discovered trend is consistent with the true outbreaks that occurred during the same period.
Assuntos
Busca de Comunicante/métodos , Surtos de Doenças/prevenção & controle , Doenças Transmitidas por Alimentos/epidemiologia , Crowdsourcing/métodos , Doenças Transmitidas por Alimentos/etiologia , Humanos , Aprendizado de Máquina , Modelos Teóricos , Vigilância da População/métodos , Saúde Pública/métodos , Saúde Pública/tendências , Mídias Sociais/tendênciasRESUMO
Smartphone health sensing tools, which analyze passively gathered human behavior data, can provide clinicians with a longitudinal view of their patients' ailments in natural settings. In this Visualization Viewpoints article, we postulate that interactive visual analytics (IVA) can assist data scientists during the development of such tools by facilitating the discovery and correction of wrong or missing user-provided ground-truth health annotations. IVA can also assist clinicians in making sense of their patients' behaviors by providing additional contextual and semantic information. We review the current state-of-the-art, outline unique challenges, and illustrate our viewpoints using our work as well as those of other researchers. Finally, we articulate open challenges in this exciting and emerging field of research.
Assuntos
Semântica , Smartphone , HumanosRESUMO
BACKGROUND: Precision prevention is increasingly important in HIV prevention research to move beyond universal interventions to those tailored for high-risk individuals. The current study was designed to develop machine learning algorithms for predicting adolescent HIV risk behaviours. METHODS: Comprehensive longitudinal data on adolescent risk behaviours, perceptions, peer and family influence, and neighbourhood risk factors were collected from 2564 grade-10 students at baseline followed for 24 months over 2008-2012. Machine learning techniques [support vector machine (SVM) and random forests] were applied to innovatively leverage longitudinal data for robust HIV risk behaviour prediction. In this study, we focused on two adolescent risk behaviours: had ever had sex and had multiple sex partners. Twenty percent of the data were withheld for model testing. RESULTS: The SVM model with cost-sensitive learning achieved the highest sensitivity, at 79.1%, specificity of 75.4% with AUC of 0.86 in predicting multiple sex partners on the training data (10-fold cross-validation), and sensitivity of 79.7%, specificity of 76.5% with AUC of 0.86 on the testing data. The random forest model obtained the best performance in predicting had ever had sex, yielding the sensitivity of 78.5%, specificity of 73.1% with AUC of 0.84 on the training data and sensitivity of 82.7%, specificity of 75.3% with AUC of 0.87 on the testing data. CONCLUSION: Machine learning methods can be used to build effective prediction model(s) to identify adolescents who are likely to engage in HIV risk behaviours. This study builds a foundation for targeted intervention strategies and informs precision prevention efforts in school-setting.
Assuntos
Comportamento do Adolescente , Infecções por HIV , Adolescente , Algoritmos , Infecções por HIV/diagnóstico , Infecções por HIV/prevenção & controle , Humanos , Aprendizado de Máquina , Fatores de RiscoRESUMO
Objective: Early identification of individuals who are at risk for suicide is crucial in supporting suicide prevention. Machine learning is emerging as a promising approach to support this objective. Machine learning is broadly defined as a set of mathematical models and computational algorithms designed to automatically learn complex patterns between predictors and outcomes from example data, without being explicitly programmed to do so. The model's performance continuously improves over time by learning from newly available data. Method: This concept paper explores how machine learning approaches applied to healthcare data obtained from electronic health records, including billing and claims data, can advance our ability to accurately predict future suicidal behavior. Results: We provide a general overview of machine learning concepts, summarize exemplar studies, describe continued challenges, and propose innovative research directions. Conclusion: Machine learning has potential for improving estimation of suicide risk, yet important challenges and opportunities remain. Further research can focus on incorporating evolving methods for addressing data imbalances, understanding factors that affect generalizability across samples and healthcare systems, expanding the richness of the data, leveraging newer machine learning approaches, and developing automatic learning systems.
RESUMO
Depression is the leading cause of disability, often undiagnosed, and one of the most treatable mood disorders. As such, unobtrusively diagnosing depression is important. Many studies are starting to utilize machine learning for depression sensing from social media and Smartphone data to replace the survey instruments currently employed to screen for depression. In this study, we compare the ability of a privately versus a publicly available modality to screen for depression. Specifically, we leverage between two weeks and a year of text messages and tweets to predict scores from the Patient Health Questionnaire-9, a prevalent depression screening instrument. This is the first study to leverage the retrospectively-harvested crowd-sourced texts and tweets within the combined Moodable and EMU datasets. Our approach involves comprehensive feature engineering, feature selection, and machine learning. Our 245 features encompass word category frequencies, part of speech tag frequencies, sentiment, and volume. The best model is Logistic Regression built on the top ten features from two weeks of text data. This model achieves an average F1 score of 0.806, AUC of 0.832, and recall of 0.925. We discuss the implications of the selected features, temporal quantity of data, and modality.
Assuntos
Mídias Sociais , Envio de Mensagens de Texto , Depressão/diagnóstico , Humanos , Aprendizado de Máquina , Estudos RetrospectivosRESUMO
Depression is both debilitating and prevalent. While treatable, it is often undiagnosed. Passive depression screening is crucial, but leveraging data from Smartphones and social media has privacy concerns. Inspired by the known relationship between depression and slower information processing speed, we hypothesize the latency of texting replies will contain useful information in screening for depression. Specifically, we extract nine reply latency related features from crowd-sourced text message conversation meta-data. By considering text metadata instead of content, we mitigate the privacy concerns. To predict binary screening survey scores, we explore a variety of machine learning methods built on principal components of the latency features. Our findings demonstrate that an XGBoost model built with one principal component achieves an F1 score of 0.67, AUC of 0.72, and Accuracy of 0.69. Thus, we confirm that reply latency of texting has promise as a modality for depression screening.
Assuntos
Smartphone , Mídias Sociais , Envio de Mensagens de Texto , Depressão/diagnóstico , Inquéritos e QuestionáriosRESUMO
Antibiotic resistant bacterial infections are a growing global health crisis. Antibiograms, aggregate antimicrobial resistance reports, are critical for tracking antibiotic susceptibility and prescribing antibiotics. This research leverages fifteen years of the expansive Massachusetts statewide antibiogram dataset curated by the Massachusetts Department of Public Health. Given the lengthy annual antibiogram creation process, data are not timely. Our prior research involved forecasting the current antimicrobial susceptibility given historic antibiograms. The objective for this research is to expand upon this prior work by identifying which antibiotic-bacteria combinations have resistance trends that are not well forecasted. For that, our proposed Previous Year Anomalous Trend Identification (PYATI) strategy employs a cluster driven outlier detection solution to identify the trends to remove before forecasting. Employing PYATI to remove antibiotic-bacteria combinations with anomalous trends statistically significantly reduces the forecasting error for the remaining combinations. As antibiotic resistance is furthered by prescribing ineffective antibiotics, PYATI can be leveraged to improve antibiotic prescribing.
Assuntos
Antibacterianos , Bactérias , Antibacterianos/uso terapêutico , Resistência Microbiana a Medicamentos , Massachusetts , Testes de Sensibilidade MicrobianaRESUMO
INTRODUCTION: Adverse drug event (ADE) detection is a vital step towards effective pharmacovigilance and prevention of future incidents caused by potentially harmful ADEs. The electronic health records (EHRs) of patients in hospitals contain valuable information regarding ADEs and hence are an important source for detecting ADE signals. However, EHR texts tend to be noisy. Yet applying off-the-shelf tools for EHR text preprocessing jeopardizes the subsequent ADE detection performance, which depends on a well tokenized text input. OBJECTIVE: In this paper, we report our experience with the NLP Challenges for Detecting Medication and Adverse Drug Events from Electronic Health Records (MADE1.0), which aims to promote deep innovations on this subject. In particular, we have developed rule-based sentence and word tokenization techniques to deal with the noise in the EHR text. METHODS: We propose a detection methodology by adapting a three-layered, deep learning architecture of (1) recurrent neural network [bi-directional long short-term memory (Bi-LSTM)] for character-level word representation to encode the morphological features of the medical terminology, (2) Bi-LSTM for capturing the contextual information of each word within a sentence, and (3) conditional random fields for the final label prediction by also considering the surrounding words. We experiment with different word embedding methods commonly used in word-level classification tasks and demonstrate the impact of an integrated usage of both domain-specific and general-purpose pre-trained word embedding for detecting ADEs from EHRs. RESULTS: Our system was ranked first for the named entity recognition task in the MADE1.0 challenge, with a micro-averaged F1-score of 0.8290 (official score). CONCLUSION: Our results indicate that the integration of two widely used sequence labeling techniques that complement each other along with dual-level embedding (character level and word level) to represent words in the input layer results in a deep learning architecture that achieves excellent information extraction accuracy for EHR notes.
Assuntos
Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/epidemiologia , Registros Eletrônicos de Saúde/tendências , Aprendizado de Máquina/tendências , Redes Neurais de Computação , Aprendizado Profundo/normas , Aprendizado Profundo/tendências , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/diagnóstico , Registros Eletrônicos de Saúde/normas , Humanos , Aprendizado de Máquina/normasRESUMO
Few existing visualization systems can handle large data sets with hundreds of dimensions, since high-dimensional data sets cause clutter on the display and large response time in interactive exploration. In this paper, we present a significantly improved multidimensional visualization approach named Value and Relation (VaR) display that allows users to effectively and efficiently explore large data sets with several hundred dimensions. In the VaR display, data values and dimension relationships are explicitly visualized in the same display by using dimension glyphs to explicitly represent values in dimensions and glyph layout to explicitly convey dimension relationships. In particular, pixel-oriented techniques and density-based scatterplots are used to create dimension glyphs to convey values. Multidimensional scaling, Jigsaw map hierarchy visualization techniques, and an animation metaphor named Rainfall are used to convey relationships among dimensions. A rich set of interaction tools has been provided to allow users to interactively detect patterns of interest in the VaR display. A prototype of the VaR display has been fully implemented. The case studies presented in this paper show how the prototype supports interactive exploration of data sets of several hundred dimensions. A user study evaluating the prototype is also reported in this paper.
RESUMO
Data abstraction techniques are widely used in multiresolution visualization systems to reduce visual clutter and facilitate analysis from overview to detail. However, analysts are usually unaware of how well the abstracted data represent the original dataset, which can impact the reliability of results gleaned from the abstractions. In this paper, we define two data abstraction quality measures for computing the degree to which the abstraction conveys the original dataset: the Histogram Difference Measure and the Nearest Neighbor Measure. They have been integrated within XmdvTool, a public-domain multiresolution visualization system for multivariate data analysis that supports sampling as well as clustering to simplify data. Several interactive operations are provided, including adjusting the data abstraction level, changing selected regions, and setting the acceptable data abstraction quality level. Conducting these operations, analysts can select an optimal data abstraction level. Also, analysts can compare different abstraction methods using the measures to see how well relative data density and outliers are maintained, and then select an abstraction method that meets the requirement of their analytic tasks.
RESUMO
Background. The use of electronic hand hygiene reminder systems has been proposed as an approach to improve hand hygiene compliance among healthcare workers, although information on efficacy is limited. We prospectively assessed whether hand hygiene activities among healthcare workers could be increased using an electronic hand hygiene monitoring and reminder system. Methods. A prospective controlled clinical trial was conducted in 2 medical intensive care units (ICUs) at an academic medical center with comparable patient populations, healthcare staff, and physical layout. Hand hygiene activity was monitored concurrently in both ICUs, and the reminder system was installed in the test ICU. The reminder system was tested during 3 administered phases including: room entry/exit chimes, display of real-time hand hygiene activity, and a combination of the 2. Results. In the test ICU, the mean number of hand hygiene events increased from 1538 per day at baseline to 1911 per day (24% increase) with the use of a combination of room entry/exit chimes, real-time displays of hand hygiene activity, and manager reports (P < .001); in addition, the ratio of hand hygiene to room entry/exit events also increased from 26.1% to 36.6% (40% increase, P < .001). The performance returned to baseline (1473 hand hygiene events per day) during the follow-up phase. There was no significant change in hand hygiene activity in the control ICU during the course of the trial. Conclusions. In an ICU setting, an electronic hand hygiene reminder system that provided real-time feedback on overall unit-wide hand hygiene performance significantly increased hand hygiene activity.