|

1.

Differing Content and Language Based on Poster-Patient Relationships on the Chinese Social Media Platform Weibo: Text Classification, Sentiment Analysis, and Topic Modeling of Posts on Breast Cancer.

Zhang, Zhouqing; Liew, Kongmeng; Kuijer, Roeline; She, Wan Jou; Yada, Shuntaro; Wakamiya, Shoko; Aramaki, Eiji.

JMIR Cancer ; 10: e51332, 2024 May 09.

Article En | MEDLINE | ID: mdl-38723250

BACKGROUND: Breast cancer affects the lives of not only those diagnosed but also the people around them. Many of those affected share their experiences on social media. However, these narratives may differ according to who the poster is and what their relationship with the patient is; a patient posting about their experiences may post different content from someone whose friends or family has breast cancer. Weibo is 1 of the most popular social media platforms in China, and breast cancer-related posts are frequently found there. OBJECTIVE: With the goal of understanding the different experiences of those affected by breast cancer in China, we aimed to explore how content and language used in relevant posts differ according to who the poster is and what their relationship with the patient is and whether there are differences in emotional expression and topic content if the patient is the poster themselves or a friend, family member, relative, or acquaintance. METHODS: We used Weibo as a resource to examine how posts differ according to the different poster-patient relationships. We collected a total of 10,322 relevant Weibo posts. Using a 2-step analysis method, we fine-tuned 2 Chinese Robustly Optimized Bidirectional Encoder Representations from Transformers (BERT) Pretraining Approach models on this data set with annotated poster-patient relationships. These models were lined in sequence, first a binary classifier (no_patient or patient) and then a multiclass classifier (post_user, family_members, friends_relatives, acquaintances, heard_relation), to classify poster-patient relationships. Next, we used the Linguistic Inquiry and Word Count lexicon to conduct sentiment analysis from 5 emotion categories (positive and negative emotions, anger, sadness, and anxiety), followed by topic modeling (BERTopic). RESULTS: Our binary model (F1-score=0.92) and multiclass model (F1-score=0.83) were largely able to classify poster-patient relationships accurately. Subsequent sentiment analysis showed significant differences in emotion categories across all poster-patient relationships. Notably, negative emotions and anger were higher for the "no_patient" class, but sadness and anxiety were higher for the "family_members" class. Focusing on the top 30 topics, we also noted that topics on fears and anger toward cancer were higher in the "no_patient" class, but topics on cancer treatment were higher in the "family_members" class. CONCLUSIONS: Chinese users post different types of content, depending on the poster- poster-patient relationships. If the patient is family, posts are sadder and more anxious but also contain more content on treatments. However, if no patient is detected, posts show higher levels of anger. We think that these may stem from rants from posters, which may help with emotion regulation and gathering social support.

2.

Adverse Event Signal Detection Using Patients' Concerns in Pharmaceutical Care Records: Evaluation of Deep Learning Models.

Nishioka, Satoshi; Watabe, Satoshi; Yanagisawa, Yuki; Sayama, Kyoko; Kizaki, Hayato; Imai, Shungo; Someya, Mitsuhiro; Taniguchi, Ryoo; Yada, Shuntaro; Aramaki, Eiji; Hori, Satoko.

J Med Internet Res ; 26: e55794, 2024 Apr 16.

Article En | MEDLINE | ID: mdl-38625718

BACKGROUND: Early detection of adverse events and their management are crucial to improving anticancer treatment outcomes, and listening to patients' subjective opinions (patients' voices) can make a major contribution to improving safety management. Recent progress in deep learning technologies has enabled various new approaches for the evaluation of safety-related events based on patient-generated text data, but few studies have focused on the improvement of real-time safety monitoring for individual patients. In addition, no study has yet been performed to validate deep learning models for screening patients' narratives for clinically important adverse event signals that require medical intervention. In our previous work, novel deep learning models have been developed to detect adverse event signals for hand-foot syndrome or adverse events limiting patients' daily lives from the authored narratives of patients with cancer, aiming ultimately to use them as safety monitoring support tools for individual patients. OBJECTIVE: This study was designed to evaluate whether our deep learning models can screen clinically important adverse event signals that require intervention by health care professionals. The applicability of our deep learning models to data on patients' concerns at pharmacies was also assessed. METHODS: Pharmaceutical care records at community pharmacies were used for the evaluation of our deep learning models. The records followed the SOAP format, consisting of subjective (S), objective (O), assessment (A), and plan (P) columns. Because of the unique combination of patients' concerns in the S column and the professional records of the pharmacists, this was considered a suitable data for the present purpose. Our deep learning models were applied to the S records of patients with cancer, and the extracted adverse event signals were assessed in relation to medical actions and prescribed drugs. RESULTS: From 30,784 S records of 2479 patients with at least 1 prescription of anticancer drugs, our deep learning models extracted true adverse event signals with more than 80% accuracy for both hand-foot syndrome (n=152, 91%) and adverse events limiting patients' daily lives (n=157, 80.1%). The deep learning models were also able to screen adverse event signals that require medical intervention by health care providers. The extracted adverse event signals could reflect the side effects of anticancer drugs used by the patients based on analysis of prescribed anticancer drugs. "Pain or numbness" (n=57, 36.3%), "fever" (n=46, 29.3%), and "nausea" (n=40, 25.5%) were common symptoms out of the true adverse event signals identified by the model for adverse events limiting patients' daily lives. CONCLUSIONS: Our deep learning models were able to screen clinically important adverse event signals that require intervention for symptoms. It was also confirmed that these deep learning models could be applied to patients' subjective information recorded in pharmaceutical care records accumulated during pharmacists' daily work.

Antineoplastic Agents , Deep Learning , Hand-Foot Syndrome , Neoplasms , Humans , Prescriptions , Antineoplastic Agents/adverse effects , Neoplasms/drug therapy

3.

Detection of Adverse Event Signals with Severity Grade Classification from Cancer Patient Narrative.

Nishioka, Satoshi; Asano, Masaki; Yada, Shuntaro; Aramaki, Eiji; Yajima, Hiroshi; Kizaki, Hayato; Hori, Satoko.

Stud Health Technol Inform ; 310: 554-558, 2024 Jan 25.

Article En | MEDLINE | ID: mdl-38269870

Adverse event (AE) management is crucial to improve anti-cancer treatment outcomes, but it is reported that some AE signals can be missed in clinical visits. Thus, monitoring AE signals seamlessly, including events outside hospitals, would be helpful for early intervention. Here we investigated how to detect AE signals from texts written by cancer patients themselves by developing deep-learning (DL) models to classify posts mentioning AEs according to severity grade, in order to focus on those that might need immediate treatment interventions. Using patient blogs written in Japanese by cancer patients as a data source, we built DL models based on three approaches, BERT, ELECTRA, and T5. Among these models, T5 showed the best F1 scores for both Grade ≥ 1 and ≥ 2 article classification tasks (0.85 and 0.53, respectively). This model might benefit patients by enabling earlier AE signal detection, thereby improving quality of life.

Neoplasms , Quality of Life , Humans , Blogging , Hospitals , Narration

4.

Adverse event signal extraction from cancer patients' narratives focusing on impact on their daily-life activities.

Nishioka, Satoshi; Asano, Masaki; Yada, Shuntaro; Aramaki, Eiji; Yajima, Hiroshi; Yanagisawa, Yuki; Sayama, Kyoko; Kizaki, Hayato; Hori, Satoko.

Sci Rep ; 13(1): 15516, 2023 09 19.

Article En | MEDLINE | ID: mdl-37726371

Adverse event (AE) management is important to improve anti-cancer treatment outcomes, but it is known that some AE signals can be missed during clinical visits. In particular, AEs that affect patients' activities of daily living (ADL) need careful monitoring as they may require immediate medical intervention. This study aimed to build deep-learning (DL) models for extracting signals of AEs limiting ADL from patients' narratives. The data source was blog posts written in Japanese by breast cancer patients. After pre-processing and annotation for AE signals, three DL models (BERT, ELECTRA, and T5) were trained and tested in three different approaches for AE signal identification. The performances of the trained models were evaluated in terms of precision, recall, and F1 scores. From 2,272 blog posts, 191 and 702 articles were identified as describing AEs limiting ADL or not limiting ADL, respectively. Among tested DL modes and approaches, T5 showed the best F1 scores to identify articles with AE limiting ADL or all AE: 0.557 and 0.811, respectively. The most frequent AE signals were "pain or numbness", "fatigue" and "nausea". Our results suggest that this AE monitoring scheme focusing on patients' ADL has potential to reinforce current AE management provided by medical staff.

Breast Neoplasms , Bryozoa , Humans , Animals , Female , Activities of Daily Living , Hypesthesia , Medical Staff

5.

Diagnosing psychiatric disorders from history of present illness using a large-scale linguistic model.

Otsuka, Norio; Kawanishi, Yuu; Doi, Fumimaro; Takeda, Tsutomu; Okumura, Kazuki; Yamauchi, Takahira; Yada, Shuntaro; Wakamiya, Shoko; Aramaki, Eiji; Makinodan, Manabu.

Psychiatry Clin Neurosci ; 77(11): 597-604, 2023 Nov.

Article En | MEDLINE | ID: mdl-37526294

AIM: Recent advances in natural language processing models are expected to provide diagnostic assistance in psychiatry from the history of present illness (HPI). However, existing studies have been limited, with the target diseases including only major diseases, small sample sizes, or no comparison with diagnoses made by psychiatrists to ensure accuracy. Therefore, we formulated an accurate diagnostic model that covers all psychiatric disorders. METHODS: HPIs and diagnoses were extracted from discharge summaries of 2,642 cases at the Nara Medical University Hospital, Japan, from 21 May 2007, to 31 May 31 2021. The diagnoses were classified into 11 classes according to the code from ICD-10 Chapter V. Using UTH-BERT pre-trained on the electronic medical records of the University of Tokyo Hospital, Japan, we predicted the main diagnoses at discharge based on HPIs and compared the concordance rate with the results of psychiatrists. The psychiatrists were divided into two groups: semi-Designated with 3-4 years of experience and Residents with only 2 months of experience. RESULTS: The model's match rate was 74.3%, compared to 71.5% for the semi-Designated psychiatrists and 69.4% for the Residents. If the cases were limited to those correctly answered by the semi-Designated group, the model and the Residents performed at 84.9% and 83.3%, respectively. CONCLUSION: We demonstrated that the model matched the diagnosis predicted from the HPI with a high probability to the principal diagnosis at discharge. Hence, the model can provide diagnostic suggestions in actual clinical practice.

Mental Disorders , Psychiatry , Humans , Mental Disorders/diagnosis , Mental Disorders/epidemiology , Patient Discharge , Hospitals , International Classification of Diseases , Psychiatry/methods

6.

Transferability Based on Drug Structure Similarity in the Automatic Classification of Noncompliant Drug Use on Social Media: Natural Language Processing Approach.

Nishiyama, Tomohiro; Yada, Shuntaro; Wakamiya, Shoko; Hori, Satoko; Aramaki, Eiji.

J Med Internet Res ; 25: e44870, 2023 05 03.

Article En | MEDLINE | ID: mdl-37133915

BACKGROUND: Medication noncompliance is a critical issue because of the increased number of drugs sold on the web. Web-based drug distribution is difficult to control, causing problems such as drug noncompliance and abuse. The existing medication compliance surveys lack completeness because it is impossible to cover patients who do not go to the hospital or provide accurate information to their doctors, so a social media-based approach is being explored to collect information about drug use. Social media data, which includes information on drug usage by users, can be used to detect drug abuse and medication compliance in patients. OBJECTIVE: This study aimed to assess how the structural similarity of drugs affects the efficiency of machine learning models for text classification of drug noncompliance. METHODS: This study analyzed 22,022 tweets about 20 different drugs. The tweets were labeled as either noncompliant use or mention, noncompliant sales, general use, or general mention. The study compares 2 methods for training machine learning models for text classification: single-sub-corpus transfer learning, in which a model is trained on tweets about a single drug and then tested on tweets about other drugs, and multi-sub-corpus incremental learning, in which models are trained on tweets about drugs in order of their structural similarity. The performance of a machine learning model trained on a single subcorpus (a data set of tweets about a specific category of drugs) was compared to the performance of a model trained on multiple subcorpora (data sets of tweets about multiple categories of drugs). RESULTS: The results showed that the performance of the model trained on a single subcorpus varied depending on the specific drug used for training. The Tanimoto similarity (a measure of the structural similarity between compounds) was weakly correlated with the classification results. The model trained by transfer learning a corpus of drugs with close structural similarity performed better than the model trained by randomly adding a subcorpus when the number of subcorpora was small. CONCLUSIONS: The results suggest that structural similarity improves the classification performance of messages about unknown drugs if the drugs in the training corpus are few. On the other hand, this indicates that there is little need to consider the influence of the Tanimoto structural similarity if a sufficient variety of drugs are ensured.

Social Media , Substance-Related Disorders , Humans , Natural Language Processing , Machine Learning , Commerce

7.

Extracting Multiple Worries From Breast Cancer Patient Blogs Using Multilabel Classification With the Natural Language Processing Model Bidirectional Encoder Representations From Transformers: Infodemiology Study of Blogs.

Watanabe, Tomomi; Yada, Shuntaro; Aramaki, Eiji; Yajima, Hiroshi; Kizaki, Hayato; Hori, Satoko.

JMIR Cancer ; 8(2): e37840, 2022 Jun 03.

Article En | MEDLINE | ID: mdl-35657664

BACKGROUND: Patients with breast cancer have a variety of worries and need multifaceted information support. Their accumulated posts on social media contain rich descriptions of their daily worries concerning issues such as treatment, family, and finances. It is important to identify these issues to help patients with breast cancer to resolve their worries and obtain reliable information. OBJECTIVE: This study aimed to extract and classify multiple worries from text generated by patients with breast cancer using Bidirectional Encoder Representations From Transformers (BERT), a context-aware natural language processing model. METHODS: A total of 2272 blog posts by patients with breast cancer in Japan were collected. Five worry labels, "treatment," "physical," "psychological," "work/financial," and "family/friends," were defined and assigned to each post. Multiple labels were allowed. To assess the label criteria, 50 blog posts were randomly selected and annotated by two researchers with medical knowledge. After the interannotator agreement had been assessed by means of Cohen kappa, one researcher annotated all the blogs. A multilabel classifier that simultaneously predicts five worries in a text was developed using BERT. This classifier was fine-tuned by using the posts as input and adding a classification layer to the pretrained BERT. The performance was evaluated for precision using the average of 5-fold cross-validation results. RESULTS: Among the blog posts, 477 included "treatment," 1138 included "physical," 673 included "psychological," 312 included "work/financial," and 283 included "family/friends." The interannotator agreement values were 0.67 for "treatment," 0.76 for "physical," 0.56 for "psychological," 0.73 for "work/financial," and 0.73 for "family/friends," indicating a high degree of agreement. Among all blog posts, 544 contained no label, 892 contained one label, and 836 contained multiple labels. It was found that the worries varied from user to user, and the worries posted by the same user changed over time. The model performed well, though prediction performance differed for each label. The values of precision were 0.59 for "treatment," 0.82 for "physical," 0.64 for "psychological," 0.67 for "work/financial," and 0.58 for "family/friends." The higher the interannotator agreement and the greater the number of posts, the higher the precision tended to be. CONCLUSIONS: This study showed that the BERT model can extract multiple worries from text generated from patients with breast cancer. This is the first application of a multilabel classifier using the BERT model to extract multiple worries from patient-generated text. The results will be helpful to identify breast cancer patients' worries and give them timely social support.

8.

Natural Language Processing: from Bedside to Everywhere.

Aramaki, Eiji; Wakamiya, Shoko; Yada, Shuntaro; Nakamura, Yuta.

Yearb Med Inform ; 31(1): 243-253, 2022 Aug.

Article En | MEDLINE | ID: mdl-35654422

OBJECTIVES: Owing to the rapid progress of natural language processing (NLP), the role of NLP in the medical field has radically gained considerable attention from both NLP and medical informatics. Although numerous medical NLP papers are published annually, there is still a gap between basic NLP research and practical product development. This gap raises questions, such as what has medical NLP achieved in each medical field, and what is the burden for the practical use of NLP? This paper aims to clarify the above questions. METHODS: We explore the literature on potential NLP products/services applied to various medical/clinical/healthcare areas. RESULTS: This paper introduces clinical applications (bedside applications), in which we introduce the use of NLP for each clinical department, internal medicine, pre-surgery, post-surgery, oncology, radiology, pathology, psychiatry, rehabilitation, obstetrics, and gynecology. Also, we clarify technical problems to be addressed for encouraging bedside applications based on NLP. CONCLUSIONS: These results contribute to discussions regarding potentially feasible NLP applications and highlight research gaps for future studies.

Medical Informatics , Natural Language Processing

9.

Automatic data extraction to support meta-analysis statistical analysis: a case study on breast cancer.

Mutinda, Faith Wavinya; Liew, Kongmeng; Yada, Shuntaro; Wakamiya, Shoko; Aramaki, Eiji.

BMC Med Inform Decis Mak ; 22(1): 158, 2022 06 18.

Article En | MEDLINE | ID: mdl-35717167

BACKGROUND: Meta-analyses aggregate results of different clinical studies to assess the effectiveness of a treatment. Despite their importance, meta-analyses are time-consuming and labor-intensive as they involve reading hundreds of research articles and extracting data. The number of research articles is increasing rapidly and most meta-analyses are outdated shortly after publication as new evidence has not been included. Automatic extraction of data from research articles can expedite the meta-analysis process and allow for automatic updates when new results become available. In this study, we propose a system for automatically extracting data from research abstracts and performing statistical analysis. MATERIALS AND METHODS: Our corpus consists of 1011 PubMed abstracts of breast cancer randomized controlled trials annotated with the core elements of clinical trials: Participants, Intervention, Control, and Outcomes (PICO). We proposed a BERT-based named entity recognition (NER) model to identify PICO information from research abstracts. After extracting the PICO information, we parse numeric outcomes to identify the number of patients having certain outcomes for statistical analysis. RESULTS: The NER model extracted PICO elements with relatively high accuracy, achieving F1-scores greater than 0.80 in most entities. We assessed the performance of the proposed system by reproducing the results of an existing meta-analysis. The data extraction step achieved high accuracy, however the statistical analysis step achieved low performance because abstracts sometimes lack all the required information. CONCLUSION: We proposed a system for automatically extracting data from research abstracts and performing statistical analysis. We evaluated the performance of the system by reproducing an existing meta-analysis and the system achieved a relatively good performance, though more substantiation is required.

Breast Neoplasms , Breast Neoplasms/therapy , Female , Humans , Natural Language Processing , PubMed

10.

AUTOMETA: Automatic Meta-Analysis System Employing Natural Language Processing.

Mutinda, Faith W; Yada, Shuntaro; Wakamiya, Shoko; Aramaki, Eiji.

Stud Health Technol Inform ; 290: 612-616, 2022 Jun 06.

Article En | MEDLINE | ID: mdl-35673089

Meta-analyses examine the results of different clinical studies to determine whether a treatment is effective or not. Meta-analyses provide the gold standard for medical evidence. Despite their importance, meta-analyses are time-consuming and this poses a challenge where timeliness is important. Research articles are also increasing rapidly and most meta-analyses become outdated after publication since they have not incorporated new evidence. Therefore, there is increasing interest to automate meta-analysis so as to speed up the process and allow for automatic update when new results are available. In this preliminary study we present AUTOMETA, our proposed system for automating meta-analysis which employs existing natural language processing methods for identifying Participants, Intervention, Control, and Outcome (PICO) elements. We show that our system can perform advanced meta-analyses by parsing numeric outcomes to identify the number of patients having certain outcomes. We also present a new dataset which improves previous datasets by incorporating additional tags to identify detailed information.

Natural Language Processing , Systems Analysis , Humans

11.

Identification of hand-foot syndrome from cancer patients' blog posts: BERT-based deep-learning approach to detect potential adverse drug reaction symptoms.

Nishioka, Satoshi; Watanabe, Tomomi; Asano, Masaki; Yamamoto, Tatsunori; Kawakami, Kazuyoshi; Yada, Shuntaro; Aramaki, Eiji; Yajima, Hiroshi; Kizaki, Hayato; Hori, Satoko.

PLoS One ; 17(5): e0267901, 2022.

Article En | MEDLINE | ID: mdl-35507636

Early detection and management of adverse drug reactions (ADRs) is crucial for improving patients' quality of life. Hand-foot syndrome (HFS) is one of the most problematic ADRs for cancer patients. Recently, an increasing number of patients post their daily experiences to internet community, for example in blogs, where potential ADR signals not captured through routine clinic visits can be described. Therefore, this study aimed to identify patients with potential ADRs, focusing on HFS, from internet blogs by using natural language processing (NLP) deep-learning methods. From 10,646 blog posts, written in Japanese by cancer patients, 149 HFS-positive sentences were extracted after pre-processing, annotation and scrutiny by a certified oncology pharmacist. The HFS-positive sentences described not only HFS typical expressions like "pain" or "spoon nail", but also patient-derived unique expressions like onomatopoeic ones. The dataset was divided at a 4 to 1 ratio and used to train and evaluate three NLP deep-learning models: long short-term memory (LSTM), bidirectional LSTM and bidirectional encoder representations from transformers (BERT). The BERT model gave the best performance with precision 0.63, recall 0.82 and f1 score 0.71 in the HFS user identification task. Our results demonstrate that this NLP deep-learning model can successfully identify patients with potential HFS from blog posts, where patients' real wordings on symptoms or impacts on their daily lives are described. Thus, it should be feasible to utilize patient-generated text data to improve ADR management for individual patients.

Deep Learning , Drug-Related Side Effects and Adverse Reactions , Hand-Foot Syndrome , Neoplasms , Hand-Foot Syndrome/diagnosis , Hand-Foot Syndrome/etiology , Humans , Natural Language Processing , Quality of Life

12.

Medical Needs Extraction for Breast Cancer Patients from Question and Answer Services: Natural Language Processing-Based Approach.

Kamba, Masaru; Manabe, Masae; Wakamiya, Shoko; Yada, Shuntaro; Aramaki, Eiji; Odani, Satomi; Miyashiro, Isao.

JMIR Cancer ; 7(4): e32005, 2021 Oct 28.

Article En | MEDLINE | ID: mdl-34709187

BACKGROUND: A large number of patient narratives are available on various web services. As for web question and answer services, patient questions often relate to medical needs, and we expect these questions to provide clues for a better understanding of patients' medical needs. OBJECTIVE: This study aimed to extract patients' needs and classify them into thematic categories. Clarifying patient needs is the first step in solving social issues that patients with cancer encounter. METHODS: For this study, we used patient question texts containing the key phrase "breast cancer," available at the Yahoo! Japan question and answer service, Yahoo! Chiebukuro, which contains over 60,000 questions on cancer. First, we converted the question text into a vector representation. Next, the relevance between patient needs and existing cancer needs categories was calculated based on cosine similarity. RESULTS: The proportion of correct classifications in our proposed method was approximately 70%. Considering the results of classifying questions, we found the variation and the number of needs. CONCLUSIONS: We created 3 corpora to classify the problems of patients with cancer. The proposed method was able to classify the problems considering the question text. Moreover, as an application example, the question text that included the side effect signaling of drugs and the unmet needs of cancer patients could be extracted. Revealing these needs is important to fulfill the medical needs of patients with cancer.

13.

Estimation of Psychological Distress in Japanese Youth Through Narrative Writing: Text-Based Stylometric and Sentiment Analyses.

Manabe, Masae; Liew, Kongmeng; Yada, Shuntaro; Wakamiya, Shoko; Aramaki, Eiji.

JMIR Form Res ; 5(8): e29500, 2021 Aug 12.

Article En | MEDLINE | ID: mdl-34387556

BACKGROUND: Internalizing mental illnesses associated with psychological distress are often underdetected. Text-based detection using natural language processing (NLP) methods is increasingly being used to complement conventional detection efforts. However, these approaches often rely on self-disclosure through autobiographical narratives that may not always be possible, especially in the context of the collectivistic Japanese culture. OBJECTIVE: We propose the use of narrative writing as an alternative resource for mental illness detection in youth. Accordingly, in this study, we investigated the textual characteristics of narratives written by youth with psychological distress; our research focuses on the detection of psychopathological tendencies in written imaginative narratives. METHODS: Using NLP tools such as stylometric measures and lexicon-based sentiment analysis, we examined short narratives from 52 Japanese youth (mean age 19.8 years, SD 3.1) obtained through crowdsourcing. Participants wrote a short narrative introduction to an imagined story before completing a questionnaire to quantify their tendencies toward psychological distress. Based on this score, participants were categorized into higher distress and lower distress groups. The written narratives were then analyzed using NLP tools and examined for between-group differences. Although outside the scope of this study, we also carried out a supplementary analysis of narratives written by adults using the same procedure. RESULTS: Youth demonstrating higher tendencies toward psychological distress used significantly more positive (happiness-related) words, revealing differences in valence of the narrative content. No other significant differences were observed between the high and low distress groups. CONCLUSIONS: Youth with tendencies toward mental illness were found to write more positive stories that contained more happiness-related terms. These results may potentially have widespread implications on psychological distress screening on online platforms, particularly in cultures such as Japan that are not accustomed to self-disclosure. Although the mechanisms that we propose in explaining our results are speculative, we believe that this interpretation paves the way for future research in online surveillance and detection efforts.

14.

Semantic Textual Similarity in Japanese Clinical Domain Texts Using BERT.

Mutinda, Faith Wavinya; Yada, Shuntaro; Wakamiya, Shoko; Aramaki, Eiji.

Methods Inf Med ; 60(S 01): e56-e64, 2021 06.

Article En | MEDLINE | ID: mdl-34237783

BACKGROUND: Semantic textual similarity (STS) captures the degree of semantic similarity between texts. It plays an important role in many natural language processing applications such as text summarization, question answering, machine translation, information retrieval, dialog systems, plagiarism detection, and query ranking. STS has been widely studied in the general English domain. However, there exists few resources for STS tasks in the clinical domain and in languages other than English, such as Japanese. OBJECTIVE: The objective of this study is to capture semantic similarity between Japanese clinical texts (Japanese clinical STS) by creating a Japanese dataset that is publicly available. MATERIALS: We created two datasets for Japanese clinical STS: (1) Japanese case reports (CR dataset) and (2) Japanese electronic medical records (EMR dataset). The CR dataset was created from publicly available case reports extracted from the CiNii database. The EMR dataset was created from Japanese electronic medical records. METHODS: We used an approach based on bidirectional encoder representations from transformers (BERT) to capture the semantic similarity between the clinical domain texts. BERT is a popular approach for transfer learning and has been proven to be effective in achieving high accuracy for small datasets. We implemented two Japanese pretrained BERT models: a general Japanese BERT and a clinical Japanese BERT. The general Japanese BERT is pretrained on Japanese Wikipedia texts while the clinical Japanese BERT is pretrained on Japanese clinical texts. RESULTS: The BERT models performed well in capturing semantic similarity in our datasets. The general Japanese BERT outperformed the clinical Japanese BERT and achieved a high correlation with human score (0.904 in the CR dataset and 0.875 in the EMR dataset). It was unexpected that the general Japanese BERT outperformed the clinical Japanese BERT on clinical domain dataset. This could be due to the fact that the general Japanese BERT is pretrained on a wide range of texts compared with the clinical Japanese BERT.

Natural Language Processing , Semantics , Electronic Health Records , Humans , Information Storage and Retrieval , Japan

15.

Measuring Public Concern About COVID-19 in Japanese Internet Users Through Search Queries: Infodemiological Study.

Gao, Zhiwei; Fujita, Sumio; Shimizu, Nobuyuki; Liew, Kongmeng; Murayama, Taichi; Yada, Shuntaro; Wakamiya, Shoko; Aramaki, Eiji.

JMIR Public Health Surveill ; 7(7): e29865, 2021 07 20.

Article En | MEDLINE | ID: mdl-34174781

BACKGROUND: COVID-19 has disrupted lives and livelihoods and caused widespread panic worldwide. Emerging reports suggest that people living in rural areas in some countries are more susceptible to COVID-19. However, there is a lack of quantitative evidence that can shed light on whether residents of rural areas are more concerned about COVID-19 than residents of urban areas. OBJECTIVE: This infodemiology study investigated attitudes toward COVID-19 in different Japanese prefectures by aggregating and analyzing Yahoo! JAPAN search queries. METHODS: We measured COVID-19 concerns in each Japanese prefecture by aggregating search counts of COVID-19-related queries of Yahoo! JAPAN users and data related to COVID-19 cases. We then defined two indices-the localized concern index (LCI) and localized concern index by patient percentage (LCIPP)-to quantitatively represent the degree of concern. To investigate the impact of emergency declarations on people's concerns, we divided our study period into three phases according to the timing of the state of emergency in Japan: before, during, and after. In addition, we evaluated the relationship between the LCI and LCIPP in different prefectures by correlating them with prefecture-level indicators of urbanization. RESULTS: Our results demonstrated that the concerns about COVID-19 in the prefectures changed in accordance with the declaration of the state of emergency. The correlation analyses also indicated that the differentiated types of public concern measured by the LCI and LCIPP reflect the prefectures' level of urbanization to a certain extent (ie, the LCI appears to be more suitable for quantifying COVID-19 concern in urban areas, while the LCIPP seems to be more appropriate for rural areas). CONCLUSIONS: We quantitatively defined Japanese Yahoo users' concerns about COVID-19 by using the search counts of COVID-19-related search queries. Our results also showed that the LCI and LCIPP have external validity.

Anxiety/epidemiology , Attitude to Health , COVID-19/psychology , Internet/statistics & numerical data , Search Engine/statistics & numerical data , Adult , Aged , COVID-19/epidemiology , Female , Humans , Japan/epidemiology , Male , Middle Aged , Rural Population/statistics & numerical data , Urban Population/statistics & numerical data

16.

Identification of Adverse Drug Event-Related Japanese Articles: Natural Language Processing Analysis.

Ujiie, Shogo; Yada, Shuntaro; Wakamiya, Shoko; Aramaki, Eiji.

JMIR Med Inform ; 8(11): e22661, 2020 Nov 27.

Article En | MEDLINE | ID: mdl-33245290

BACKGROUND: Medical articles covering adverse drug events (ADEs) are systematically reported by pharmaceutical companies for drug safety information purposes. Although policies governing reporting to regulatory bodies vary among countries and regions, all medical article reporting may be categorized as precision or recall based. Recall-based reporting, which is implemented in Japan, requires the reporting of any possible ADE. Therefore, recall-based reporting can introduce numerous false negatives or substantial amounts of noise, a problem that is difficult to address using limited manual labor. OBJECTIVE: Our aim was to develop an automated system that could identify ADE-related medical articles, support recall-based reporting, and alleviate manual labor in Japanese pharmaceutical companies. METHODS: Using medical articles as input, our system based on natural language processing applies document-level classification to extract articles containing ADEs (replacing manual labor in the first screening) and sentence-level classification to extract sentences within those articles that imply ADEs (thus supporting experts in the second screening). We used 509 Japanese medical articles annotated by a medical engineer to evaluate the performance of the proposed system. RESULTS: Document-level classification yielded an F1 of 0.903. Sentence-level classification yielded an F1 of 0.413. These were averages of fivefold cross-validations. CONCLUSIONS: A simple automated system may alleviate the manual labor involved in screening drug safety-related medical articles in pharmaceutical companies. After improving the accuracy of the sentence-level classification by considering a wider context, we intend to apply this system toward real-world postmarketing surveillance.

17.

Surveillance of early stage COVID-19 clusters using search query logs and mobile device-based location information.

Hisada, Shohei; Murayama, Taichi; Tsubouchi, Kota; Fujita, Sumio; Yada, Shuntaro; Wakamiya, Shoko; Aramaki, Eiji.

Sci Rep ; 10(1): 18680, 2020 10 29.

Article En | MEDLINE | ID: mdl-33122686

Two clusters of the coronavirus disease 2019 (COVID-19) were confirmed in Hokkaido, Japan, in February 2020. To identify these clusters, this study employed web search query logs of multiple devices and user location information from location-aware mobile devices. We anonymously identified users who used a web search engine (i.e., Yahoo! JAPAN) to search for COVID-19 or its symptoms. We regarded them as web searchers who were suspicious of their own COVID-19 infection (WSSCI). We extracted the location of WSSCI via a mobile operating system application and compared the spatio-temporal distribution of WSSCI with the actual location of the two known clusters. In the early stage of cluster development, we confirmed several WSSCI. Our approach was accurate in this stage and became biased after a public announcement of the cluster development. When other cluster-related resources, such as detailed population statistics, are not available, the proposed metric can capture hints of emerging clusters.

Coronavirus Infections/epidemiology , Epidemiological Monitoring , Infection Control/methods , Pneumonia, Viral/epidemiology , Population Surveillance/methods , Search Engine/statistics & numerical data , Smartphone/statistics & numerical data , COVID-19 , Coronavirus Infections/prevention & control , Facilities and Services Utilization/statistics & numerical data , Humans , Internet/statistics & numerical data , Japan , Pandemics/prevention & control , Pneumonia, Viral/prevention & control