Search | VHL Regional Portal

1.

Predictive Model for Extended-Spectrum ß-Lactamase-Producing Bacterial Infections Using Natural Language Processing Technique and Open Data in Intensive Care Unit Environment: Retrospective Observational Study.

Ito, Genta; Yada, Shuntaro; Wakamiya, Shoko; Aramaki, Eiji.

JMIR Form Res ; 8: e54044, 2024 Jul 10.

Article in English | MEDLINE | ID: mdl-38986131

ABSTRACT

BACKGROUND: Machine learning has advanced medical event prediction, mostly using private data. The public MIMIC-3 (Medical Information Mart for Intensive Care III) data set, which contains detailed data on over 40,000 intensive care unit patients, stands out as it can help develop better models including structured and textual data. OBJECTIVE: This study aimed to build and test a machine learning model using the MIMIC-3 data set to determine the effectiveness of information extracted from electronic medical record text using a named entity recognition, specifically QuickUMLS, for predicting important medical events. Using the prediction of extended-spectrum ß-lactamase (ESBL)-producing bacterial infections as an example, this study shows how open data sources and simple technology can be useful for making clinically meaningful predictions. METHODS: The MIMIC-3 data set, including demographics, vital signs, laboratory results, and textual data, such as discharge summaries, was used. This study specifically targeted patients diagnosed with Klebsiella pneumoniae or Escherichia coli infection. Predictions were based on ESBL-producing bacterial standards and the minimum inhibitory concentration criteria. Both the structured data and extracted patient histories were used as predictors. In total, 2 models, an L1-regularized logistic regression model and a LightGBM model, were evaluated using the receiver operating characteristic area under the curve (ROC-AUC) and the precision-recall curve area under the curve (PR-AUC). RESULTS: Of 46,520 MIMIC-3 patients, 4046 were identified with bacterial cultures, indicating the presence of K pneumoniae or E coli. After excluding patients who lacked discharge summary text, 3614 patients remained. The L1-penalized model, with variables from only the structured data, displayed a ROC-AUC of 0.646 and a PR-AUC of 0.307. The LightGBM model, combining structured and textual data, achieved a ROC-AUC of 0.707 and a PR-AUC of 0.369. Key contributors to the LightGBM model included patient age, duration since hospital admission, and specific medical history such as diabetes. The structured data-based model showed improved performance compared to the reference models. Performance was further improved when textual medical history was included. Compared to other models predicting drug-resistant bacteria, the results of this study ranked in the middle. Some misidentifications, potentially due to the limitations of QuickUMLS, may have affected the accuracy of the model. CONCLUSIONS: This study successfully developed a predictive model for ESBL-producing bacterial infections using the MIMIC-3 data set, yielding results consistent with existing literature. This model stands out for its transparency and reliance on open data and open-named entity recognition technology. The performance of the model was enhanced using textual information. With advancements in natural language processing tools such as BERT and GPT, the extraction of medical data from text holds substantial potential for future model optimization.

2.

Differing Content and Language Based on Poster-Patient Relationships on the Chinese Social Media Platform Weibo: Text Classification, Sentiment Analysis, and Topic Modeling of Posts on Breast Cancer.

Zhang, Zhouqing; Liew, Kongmeng; Kuijer, Roeline; She, Wan Jou; Yada, Shuntaro; Wakamiya, Shoko; Aramaki, Eiji.

JMIR Cancer ; 10: e51332, 2024 May 09.

Article in English | MEDLINE | ID: mdl-38723250

ABSTRACT

BACKGROUND: Breast cancer affects the lives of not only those diagnosed but also the people around them. Many of those affected share their experiences on social media. However, these narratives may differ according to who the poster is and what their relationship with the patient is; a patient posting about their experiences may post different content from someone whose friends or family has breast cancer. Weibo is 1 of the most popular social media platforms in China, and breast cancer-related posts are frequently found there. OBJECTIVE: With the goal of understanding the different experiences of those affected by breast cancer in China, we aimed to explore how content and language used in relevant posts differ according to who the poster is and what their relationship with the patient is and whether there are differences in emotional expression and topic content if the patient is the poster themselves or a friend, family member, relative, or acquaintance. METHODS: We used Weibo as a resource to examine how posts differ according to the different poster-patient relationships. We collected a total of 10,322 relevant Weibo posts. Using a 2-step analysis method, we fine-tuned 2 Chinese Robustly Optimized Bidirectional Encoder Representations from Transformers (BERT) Pretraining Approach models on this data set with annotated poster-patient relationships. These models were lined in sequence, first a binary classifier (no_patient or patient) and then a multiclass classifier (post_user, family_members, friends_relatives, acquaintances, heard_relation), to classify poster-patient relationships. Next, we used the Linguistic Inquiry and Word Count lexicon to conduct sentiment analysis from 5 emotion categories (positive and negative emotions, anger, sadness, and anxiety), followed by topic modeling (BERTopic). RESULTS: Our binary model (F1-score=0.92) and multiclass model (F1-score=0.83) were largely able to classify poster-patient relationships accurately. Subsequent sentiment analysis showed significant differences in emotion categories across all poster-patient relationships. Notably, negative emotions and anger were higher for the "no_patient" class, but sadness and anxiety were higher for the "family_members" class. Focusing on the top 30 topics, we also noted that topics on fears and anger toward cancer were higher in the "no_patient" class, but topics on cancer treatment were higher in the "family_members" class. CONCLUSIONS: Chinese users post different types of content, depending on the poster- poster-patient relationships. If the patient is family, posts are sadder and more anxious but also contain more content on treatments. However, if no patient is detected, posts show higher levels of anger. We think that these may stem from rants from posters, which may help with emotion regulation and gathering social support.

3.

Exploring the Impact of the COVID-19 Pandemic on Twitter in Japan: Qualitative Analysis of Disrupted Plans and Consequences.

Kamba, Masaru; She, Wan Jou; Ferawati, Kiki; Wakamiya, Shoko; Aramaki, Eiji.

JMIR Infodemiology ; 4: e49699, 2024 04 01.

Article in English | MEDLINE | ID: mdl-38557446

ABSTRACT

BACKGROUND: Despite being a pandemic, the impact of the spread of COVID-19 extends beyond public health, influencing areas such as the economy, education, work style, and social relationships. Research studies that document public opinions and estimate the long-term potential impact after the pandemic can be of value to the field. OBJECTIVE: This study aims to uncover and track concerns in Japan throughout the COVID-19 pandemic by analyzing Japanese individuals' self-disclosure of disruptions to their life plans on social media. This approach offers alternative evidence for identifying concerns that may require further attention for individuals living in Japan. METHODS: We extracted 300,778 tweets using the query phrase Corona-no-sei ("due to COVID-19," "because of COVID-19," or "considering COVID-19"), enabling us to identify the activities and life plans disrupted by the pandemic. The correlation between the number of tweets and COVID-19 cases was analyzed, along with an examination of frequently co-occurring words. RESULTS: The top 20 nouns, verbs, and noun plus verb pairs co-occurring with Corona no-sei were extracted. The top 5 keywords were graduation ceremony, cancel, school, work, and event. The top 5 verbs were disappear, go, rest, can go, and end. Our findings indicate that education emerged as the top concern when the Japanese government announced the first state of emergency. We also observed a sudden surge in anxiety about material shortages such as toilet paper. As the pandemic persisted and more states of emergency were declared, we noticed a shift toward long-term concerns, including careers, social relationships, and education. CONCLUSIONS: Our study incorporated machine learning techniques for disease monitoring through the use of tweet data, allowing the identification of underlying concerns (eg, disrupted education and work conditions) throughout the 3 stages of Japanese government emergency announcements. The comparison with COVID-19 case numbers provides valuable insights into the short- and long-term societal impacts, emphasizing the importance of considering citizens' perspectives in policy-making and supporting those affected by the pandemic, particularly in the context of Japanese government decision-making.

Subject(s)

COVID-19 , Social Media , Humans , COVID-19/epidemiology , Pandemics , Japan/epidemiology , SARS-CoV-2

4.

Extracting Spatio-Temporal Trends in Medical Research Prioritization Through Natural Language Processing of Case Report Abstracts.

Yao, Lean Franzl Lim; Liew, Kongmeng; Wakamiya, Shoko; Aramaki, Eiji.

Stud Health Technol Inform ; 310: 634-638, 2024 Jan 25.

Article in English | MEDLINE | ID: mdl-38269886

ABSTRACT

Medical research prioritization is an important aspect of decision-making by researchers and relevant stakeholders. The ever-increasing availability of technology and data has opened doors to new discoveries and new questions. This makes it difficult for researchers and relevant stakeholders to make well-informed decisions about the research areas they want to support and the nations they should look for collaborations. It is, therefore, useful to look at the spatio-temporal trends of medical research prioritization to gain insight into popular and neglected areas of research as well as the allocation of prioritization of each nation. In this study, we develop a system that collects, classifies, and summarizes case report abstracts according to the location, time, and disease category of the report. The additional classifications allow us to visualize and monitor the trends in medical research prioritization by location, time, and disease category.

Subject(s)

Biomedical Research , Natural Language Processing , Humans , Research Personnel , Technology , Case Reports as Topic

5.

Diagnosing psychiatric disorders from history of present illness using a large-scale linguistic model.

Otsuka, Norio; Kawanishi, Yuu; Doi, Fumimaro; Takeda, Tsutomu; Okumura, Kazuki; Yamauchi, Takahira; Yada, Shuntaro; Wakamiya, Shoko; Aramaki, Eiji; Makinodan, Manabu.

Psychiatry Clin Neurosci ; 77(11): 597-604, 2023 Nov.

Article in English | MEDLINE | ID: mdl-37526294

ABSTRACT

AIM: Recent advances in natural language processing models are expected to provide diagnostic assistance in psychiatry from the history of present illness (HPI). However, existing studies have been limited, with the target diseases including only major diseases, small sample sizes, or no comparison with diagnoses made by psychiatrists to ensure accuracy. Therefore, we formulated an accurate diagnostic model that covers all psychiatric disorders. METHODS: HPIs and diagnoses were extracted from discharge summaries of 2,642 cases at the Nara Medical University Hospital, Japan, from 21 May 2007, to 31 May 31 2021. The diagnoses were classified into 11 classes according to the code from ICD-10 Chapter V. Using UTH-BERT pre-trained on the electronic medical records of the University of Tokyo Hospital, Japan, we predicted the main diagnoses at discharge based on HPIs and compared the concordance rate with the results of psychiatrists. The psychiatrists were divided into two groups: semi-Designated with 3-4 years of experience and Residents with only 2 months of experience. RESULTS: The model's match rate was 74.3%, compared to 71.5% for the semi-Designated psychiatrists and 69.4% for the Residents. If the cases were limited to those correctly answered by the semi-Designated group, the model and the Residents performed at 84.9% and 83.3%, respectively. CONCLUSION: We demonstrated that the model matched the diagnosis predicted from the HPI with a high probability to the principal diagnosis at discharge. Hence, the model can provide diagnostic suggestions in actual clinical practice.

Subject(s)

Mental Disorders , Psychiatry , Humans , Mental Disorders/diagnosis , Mental Disorders/epidemiology , Patient Discharge , Hospitals , International Classification of Diseases , Psychiatry/methods

6.

Transferability Based on Drug Structure Similarity in the Automatic Classification of Noncompliant Drug Use on Social Media: Natural Language Processing Approach.

Nishiyama, Tomohiro; Yada, Shuntaro; Wakamiya, Shoko; Hori, Satoko; Aramaki, Eiji.

J Med Internet Res ; 25: e44870, 2023 05 03.

Article in English | MEDLINE | ID: mdl-37133915

ABSTRACT

BACKGROUND: Medication noncompliance is a critical issue because of the increased number of drugs sold on the web. Web-based drug distribution is difficult to control, causing problems such as drug noncompliance and abuse. The existing medication compliance surveys lack completeness because it is impossible to cover patients who do not go to the hospital or provide accurate information to their doctors, so a social media-based approach is being explored to collect information about drug use. Social media data, which includes information on drug usage by users, can be used to detect drug abuse and medication compliance in patients. OBJECTIVE: This study aimed to assess how the structural similarity of drugs affects the efficiency of machine learning models for text classification of drug noncompliance. METHODS: This study analyzed 22,022 tweets about 20 different drugs. The tweets were labeled as either noncompliant use or mention, noncompliant sales, general use, or general mention. The study compares 2 methods for training machine learning models for text classification: single-sub-corpus transfer learning, in which a model is trained on tweets about a single drug and then tested on tweets about other drugs, and multi-sub-corpus incremental learning, in which models are trained on tweets about drugs in order of their structural similarity. The performance of a machine learning model trained on a single subcorpus (a data set of tweets about a specific category of drugs) was compared to the performance of a model trained on multiple subcorpora (data sets of tweets about multiple categories of drugs). RESULTS: The results showed that the performance of the model trained on a single subcorpus varied depending on the specific drug used for training. The Tanimoto similarity (a measure of the structural similarity between compounds) was weakly correlated with the classification results. The model trained by transfer learning a corpus of drugs with close structural similarity performed better than the model trained by randomly adding a subcorpus when the number of subcorpora was small. CONCLUSIONS: The results suggest that structural similarity improves the classification performance of messages about unknown drugs if the drugs in the training corpus are few. On the other hand, this indicates that there is little need to consider the influence of the Tanimoto structural similarity if a sufficient variety of drugs are ensured.

Subject(s)

Social Media , Substance-Related Disorders , Humans , Natural Language Processing , Machine Learning , Commerce

7.

Disruptions in the Cystic Fibrosis Community's Experiences and Concerns During the COVID-19 Pandemic: Topic Modeling and Time Series Analysis of Reddit Comments.

Yao, Lean Franzl; Ferawati, Kiki; Liew, Kongmeng; Wakamiya, Shoko; Aramaki, Eiji.

J Med Internet Res ; 25: e45249, 2023 04 20.

Article in English | MEDLINE | ID: mdl-37079359

ABSTRACT

BACKGROUND: The COVID-19 pandemic disrupted the needs and concerns of the cystic fibrosis community. Patients with cystic fibrosis were particularly vulnerable during the pandemic due to overlapping symptoms in addition to the challenges patients with rare diseases face, such as the need for constant medical aid and limited information regarding their disease or treatments. Even before the pandemic, patients vocalized these concerns on social media platforms like Reddit and formed communities and networks to share insight and information. This data can be used as a quick and efficient source of information about the experiences and concerns of patients with cystic fibrosis in contrast to traditional survey- or clinical-based methods. OBJECTIVE: This study applies topic modeling and time series analysis to identify the disruption caused by the COVID-19 pandemic and its impact on the cystic fibrosis community's experiences and concerns. This study illustrates the utility of social media data in gaining insight into the experiences and concerns of patients with rare diseases. METHODS: We collected comments from the subreddit r/CysticFibrosis to represent the experiences and concerns of the cystic fibrosis community. The comments were preprocessed before being used to train the BERTopic model to assign each comment to a topic. The number of comments and active users for each data set was aggregated monthly per topic and then fitted with an autoregressive integrated moving average (ARIMA) model to study the trends in activity. To verify the disruption in trends during the COVID-19 pandemic, we assigned a dummy variable in the model where a value of "1" was assigned to months in 2020 and "0" otherwise and tested for its statistical significance. RESULTS: A total of 120,738 comments from 5827 users were collected from March 24, 2011, until August 31, 2022. We found 22 topics representing the cystic fibrosis community's experiences and concerns. Our time series analysis showed that for 9 topics, the COVID-19 pandemic was a statistically significant event that disrupted the trends in user activity. Of the 9 topics, only 1 showed significantly increased activity during this period, while the other 8 showed decreased activity. This mixture of increased and decreased activity for these topics indicates a shift in attention or focus on discussion topics during this period. CONCLUSIONS: There was a disruption in the experiences and concerns the cystic fibrosis community faced during the COVID-19 pandemic. By studying social media data, we were able to quickly and efficiently study the impact on the lived experiences and daily struggles of patients with cystic fibrosis. This study shows how social media data can be used as an alternative source of information to gain insight into the needs of patients with rare diseases and how external factors disrupt them.

Subject(s)

COVID-19 , Cystic Fibrosis , Social Media , Humans , COVID-19/epidemiology , Pandemics , Cystic Fibrosis/epidemiology , Rare Diseases , Time Factors

8.

Exploring the use of AI text-to-image generation to downregulate negative emotions in an expressive writing application.

Azuaje, Gamar; Liew, Kongmeng; Buening, Rebecca; She, Wan Jou; Siriaraya, Panote; Wakamiya, Shoko; Aramaki, Eiji.

R Soc Open Sci ; 10(1): 220238, 2023 Jan.

Article in English | MEDLINE | ID: mdl-36636309

ABSTRACT

Conventional writing therapies are versatile, accessible and easy to facilitate online, but often require participants to self-disclose traumatic experiences. To make expressive writing therapies safer for online, unsupervised environments, we explored the use of text-to-image generation as a means to downregulate negative emotions during a fictional writing exercise. We developed a writing tool, StoryWriter, that uses Generative Adversarial Network models to generate artwork from users' narratives in real time. These images were intended to positively distract users from their negative emotions throughout the writing task. In this paper, we report the outcomes of two user studies: Study 1 (N = 388), which experimentally examined the efficacy of this application via negative versus neutral emotion induction and image generation versus no image generation control groups; and Study 2 (N = 54), which qualitatively examined open-ended feedback. Our results are heterogeneous: both studies suggested that StoryWriter somewhat contributed to improved emotion outcomes for participants with pre-existing negative emotions, but users' open-ended responses indicated that these outcomes may be adversely modulated by the generated images, which could undermine the therapeutic benefits of the writing task itself.

9.

Monitoring Mentions of COVID-19 Vaccine Side Effects on Japanese and Indonesian Twitter: Infodemiological Study.

Ferawati, Kiki; Liew, Kongmeng; Aramaki, Eiji; Wakamiya, Shoko.

JMIR Infodemiology ; 2(2): e39504, 2022.

Article in English | MEDLINE | ID: mdl-36277140

ABSTRACT

Background: The year 2021 was marked by vaccinations against COVID-19, which spurred wider discussion among the general population, with some in favor and some against vaccination. Twitter, a popular social media platform, was instrumental in providing information about the COVID-19 vaccine and has been effective in observing public reactions. We focused on tweets from Japan and Indonesia, 2 countries with a large Twitter-using population, where concerns about side effects were consistently stated as a strong reason for vaccine hesitancy. Objective: This study aimed to investigate how Twitter was used to report vaccine-related side effects and to compare the mentions of these side effects from 2 messenger RNA (mRNA) vaccine types developed by Pfizer and Moderna, in Japan and Indonesia. Methods: We obtained tweet data from Twitter using Japanese and Indonesian keywords related to COVID-19 vaccines and their side effects from January 1, 2021, to December 31, 2021. We then removed users with a high frequency of tweets and merged the tweets from multiple users as a single sentence to focus on user-level analysis, resulting in a total of 214,165 users (Japan) and 12,289 users (Indonesia). Then, we filtered the data to select tweets mentioning Pfizer or Moderna only and removed tweets mentioning both. We compared the side effect counts to the public reports released by Pfizer and Moderna. Afterward, logistic regression models were used to compare the side effects for the Pfizer and Moderna vaccines for each country. Results: We observed some differences in the ratio of side effects between the public reports and tweets. Specifically, fever was mentioned much more frequently in tweets than would be expected based on the public reports. We also observed differences in side effects reported between Pfizer and Moderna vaccines from Japan and Indonesia, with more side effects reported for the Pfizer vaccine in Japanese tweets and more side effects with the Moderna vaccine reported in Indonesian tweets. Conclusions: We note the possible consequences of vaccine side effect surveillance on Twitter and information dissemination, in that fever appears to be over-represented. This could be due to fever possibly having a higher severity or measurability, and further implications are discussed.

10.

Measuring concerns about the COVID-19 vaccine among Japanese internet users through search queries.

Uehara, Makoto; Fujita, Sumio; Shimizu, Nobuyuki; Liew, Kongmeng; Wakamiya, Shoko; Aramaki, Eiji.

Sci Rep ; 12(1): 15037, 2022 09 03.

Article in English | MEDLINE | ID: mdl-36057657

ABSTRACT

With the increasing availability of the COVID-19 vaccines, vaccination has been rapidly promoted globally as a countermeasure against the spread of COVID-19. In Japan, vaccination was first introduced in February 2021. However, the amount of concern towards vaccination differs between individuals, and topics of concern include adverse reactions and side effects. This study investigated attitudes toward vaccines or vaccination during the COVID-19 pandemic across different Japanese prefectures, using Yahoo! JAPAN search queries. We first defined a vaccine concern index (VCI) by aggregating the search counts of vaccine-related queries from Yahoo! JAPAN users before examining VCI across all Japanese prefectures, accounting for gender and age. Our results demonstrated that VCI tended to be lower in more populated areas, and VCI was higher in their 20s to 40s than older people, especially in female users. Furthermore, there was a significant positive correlation (Spearman's Rank correlation coefficient [Formula: see text] = 0.60, [Formula: see text]) between VCI and prefectural vaccination rate, suggesting that web searching of adverse vaccine reactions may precede actual vaccination. This could reflect the information-seeking behavior of individuals who are accepting of vaccinations.

Subject(s)

COVID-19 , Vaccines , Aged , COVID-19/prevention & control , COVID-19 Vaccines/adverse effects , Female , Humans , Internet , Japan/epidemiology , Pandemics , Vaccination

11.

Natural Language Processing: from Bedside to Everywhere.

Aramaki, Eiji; Wakamiya, Shoko; Yada, Shuntaro; Nakamura, Yuta.

Yearb Med Inform ; 31(1): 243-253, 2022 Aug.

Article in English | MEDLINE | ID: mdl-35654422

ABSTRACT

OBJECTIVES: Owing to the rapid progress of natural language processing (NLP), the role of NLP in the medical field has radically gained considerable attention from both NLP and medical informatics. Although numerous medical NLP papers are published annually, there is still a gap between basic NLP research and practical product development. This gap raises questions, such as what has medical NLP achieved in each medical field, and what is the burden for the practical use of NLP? This paper aims to clarify the above questions. METHODS: We explore the literature on potential NLP products/services applied to various medical/clinical/healthcare areas. RESULTS: This paper introduces clinical applications (bedside applications), in which we introduce the use of NLP for each clinical department, internal medicine, pre-surgery, post-surgery, oncology, radiology, pathology, psychiatry, rehabilitation, obstetrics, and gynecology. Also, we clarify technical problems to be addressed for encouraging bedside applications based on NLP. CONCLUSIONS: These results contribute to discussions regarding potentially feasible NLP applications and highlight research gaps for future studies.

Subject(s)

Medical Informatics , Natural Language Processing

12.

Clinical Comparable Corpus Describing the Same Subjects with Different Expressions.

Nakamura, Yuta; Hanaoka, Shouhei; Nomura, Yukihiro; Hayashi, Naoto; Abe, Osamu; Yada, Shunrato; Wakamiya, Shoko; Aramaki, Eiji.

Stud Health Technol Inform ; 290: 253-257, 2022 Jun 06.

Article in English | MEDLINE | ID: mdl-35673012

ABSTRACT

Medical artificial intelligence (AI) systems need to learn to recognize synonyms or paraphrases describing the same anatomy, disease, treatment, etc. to better understand real-world clinical documents. Existing linguistic resources focus on variants at the word or sentence level. To handle linguistic variations on a broader scale, we proposed the Medical Text Radiology Report section Japanese version (MedTxt-RR-JA), the first clinical comparable corpus. MedTxt-RR-JA was built by recruiting nine radiologists to diagnose the same 15 lung cancer cases in Radiopaedia, an open-access radiological repository. The 135 radiology reports in MedTxt-RR-JA were shown to contain word-, sentence- and document-level variations maintaining similarity of contents. MedTxt-RR-JA is also the first publicly available Japanese radiology report corpus that would help to overcome poor data availability for Japanese medical AI systems. Moreover, our methodology can be applied widely to building clinical corpora without privacy concerns.

Subject(s)

Artificial Intelligence , Radiology , Humans , Language , Radiography , Radiologists

13.

AUTOMETA: Automatic Meta-Analysis System Employing Natural Language Processing.

Mutinda, Faith W; Yada, Shuntaro; Wakamiya, Shoko; Aramaki, Eiji.

Stud Health Technol Inform ; 290: 612-616, 2022 Jun 06.

Article in English | MEDLINE | ID: mdl-35673089

ABSTRACT

Meta-analyses examine the results of different clinical studies to determine whether a treatment is effective or not. Meta-analyses provide the gold standard for medical evidence. Despite their importance, meta-analyses are time-consuming and this poses a challenge where timeliness is important. Research articles are also increasing rapidly and most meta-analyses become outdated after publication since they have not incorporated new evidence. Therefore, there is increasing interest to automate meta-analysis so as to speed up the process and allow for automatic update when new results are available. In this preliminary study we present AUTOMETA, our proposed system for automating meta-analysis which employs existing natural language processing methods for identifying Participants, Intervention, Control, and Outcome (PICO) elements. We show that our system can perform advanced meta-analyses by parsing numeric outcomes to identify the number of patients having certain outcomes. We also present a new dataset which improves previous datasets by incorporating additional tags to identify detailed information.

Subject(s)

Natural Language Processing , Systems Analysis , Humans

14.

Automatic data extraction to support meta-analysis statistical analysis: a case study on breast cancer.

Mutinda, Faith Wavinya; Liew, Kongmeng; Yada, Shuntaro; Wakamiya, Shoko; Aramaki, Eiji.

BMC Med Inform Decis Mak ; 22(1): 158, 2022 06 18.

Article in English | MEDLINE | ID: mdl-35717167

ABSTRACT

BACKGROUND: Meta-analyses aggregate results of different clinical studies to assess the effectiveness of a treatment. Despite their importance, meta-analyses are time-consuming and labor-intensive as they involve reading hundreds of research articles and extracting data. The number of research articles is increasing rapidly and most meta-analyses are outdated shortly after publication as new evidence has not been included. Automatic extraction of data from research articles can expedite the meta-analysis process and allow for automatic updates when new results become available. In this study, we propose a system for automatically extracting data from research abstracts and performing statistical analysis. MATERIALS AND METHODS: Our corpus consists of 1011 PubMed abstracts of breast cancer randomized controlled trials annotated with the core elements of clinical trials: Participants, Intervention, Control, and Outcomes (PICO). We proposed a BERT-based named entity recognition (NER) model to identify PICO information from research abstracts. After extracting the PICO information, we parse numeric outcomes to identify the number of patients having certain outcomes for statistical analysis. RESULTS: The NER model extracted PICO elements with relatively high accuracy, achieving F1-scores greater than 0.80 in most entities. We assessed the performance of the proposed system by reproducing the results of an existing meta-analysis. The data extraction step achieved high accuracy, however the statistical analysis step achieved low performance because abstracts sometimes lack all the required information. CONCLUSION: We proposed a system for automatically extracting data from research abstracts and performing statistical analysis. We evaluated the performance of the system by reproducing an existing meta-analysis and the system achieved a relatively good performance, though more substantiation is required.

Subject(s)

Breast Neoplasms , Breast Neoplasms/therapy , Female , Humans , Natural Language Processing , PubMed

15.

Exploring Relationships Between Tweet Numbers and Over-the-counter Drug Sales for Allergic Rhinitis: Retrospective Analysis.

Wakamiya, Shoko; Morimoto, Osamu; Omichi, Katsuhiro; Hara, Hideyuki; Kawase, Ichiro; Koshiba, Ryuji; Aramaki, Eiji.

JMIR Form Res ; 6(2): e33941, 2022 Feb 02.

Article in English | MEDLINE | ID: mdl-35107434

ABSTRACT

BACKGROUND: Health-related social media data are increasingly being used in disease surveillance studies. In particular, surveillance of infectious diseases such as influenza has demonstrated high correlations between the number of social media posts mentioning the disease and the number of patients who went to the hospital and were diagnosed with the disease. However, the prevalence of some diseases, such as allergic rhinitis, cannot be estimated based on the number of patients alone. Specifically, individuals with allergic rhinitis typically self-medicate by taking over-the-counter (OTC) medications without going to the hospital. Although allergic rhinitis is not a life-threatening disease, it represents a major social problem because it reduces people's quality of life, making it essential to understand its prevalence and people's motives for self-medication behavior. OBJECTIVE: This study aims to explore the relationship between the number of social media posts mentioning the main symptoms of allergic rhinitis and the sales volume of OTC rhinitis medications in Japan. METHODS: We collected tweets over 4 years (from 2017 to 2020) that included keywords corresponding to the main nasal symptoms of allergic rhinitis: "sneezing," "runny nose," and "stuffy nose." We also obtained the sales volume of OTC drugs, including oral medications and nasal sprays, for the same period. We then calculated the Pearson correlation coefficient between time series data on the number of tweets per week and time series data on the sales volume of OTC drugs per week. RESULTS: The results showed a much higher correlation (r=0.8432) between the time series data on the number of tweets mentioning "stuffy nose" and the time series data on the sales volume of nasal sprays than for the other two symptoms. There was also a high correlation (r=0.9317) between the seasonal components of these time series data. CONCLUSIONS: We investigated the relationships between social media data and behavioral patterns, such as OTC drug sales volume. Exploring these relationships can help us understand the prevalence of allergic rhinitis and the motives for self-care treatment using social media data, which would be useful as a marketing indicator to reduce the number of out-of-stocks in stores, provide (sell) rhinitis medicines to consumers in a stable manner, and reduce the loss of sales opportunities. In the future, in-depth investigations are required to estimate sales volume using social media data, and future research could investigate other diseases and countries.

16.

Medical Needs Extraction for Breast Cancer Patients from Question and Answer Services: Natural Language Processing-Based Approach.

Kamba, Masaru; Manabe, Masae; Wakamiya, Shoko; Yada, Shuntaro; Aramaki, Eiji; Odani, Satomi; Miyashiro, Isao.

JMIR Cancer ; 7(4): e32005, 2021 Oct 28.

Article in English | MEDLINE | ID: mdl-34709187

ABSTRACT

BACKGROUND: A large number of patient narratives are available on various web services. As for web question and answer services, patient questions often relate to medical needs, and we expect these questions to provide clues for a better understanding of patients' medical needs. OBJECTIVE: This study aimed to extract patients' needs and classify them into thematic categories. Clarifying patient needs is the first step in solving social issues that patients with cancer encounter. METHODS: For this study, we used patient question texts containing the key phrase "breast cancer," available at the Yahoo! Japan question and answer service, Yahoo! Chiebukuro, which contains over 60,000 questions on cancer. First, we converted the question text into a vector representation. Next, the relevance between patient needs and existing cancer needs categories was calculated based on cosine similarity. RESULTS: The proportion of correct classifications in our proposed method was approximately 70%. Considering the results of classifying questions, we found the variation and the number of needs. CONCLUSIONS: We created 3 corpora to classify the problems of patients with cancer. The proposed method was able to classify the problems considering the question text. Moreover, as an application example, the question text that included the side effect signaling of drugs and the unmet needs of cancer patients could be extracted. Revealing these needs is important to fulfill the medical needs of patients with cancer.

17.

Estimation of Psychological Distress in Japanese Youth Through Narrative Writing: Text-Based Stylometric and Sentiment Analyses.

Manabe, Masae; Liew, Kongmeng; Yada, Shuntaro; Wakamiya, Shoko; Aramaki, Eiji.

JMIR Form Res ; 5(8): e29500, 2021 Aug 12.

Article in English | MEDLINE | ID: mdl-34387556

ABSTRACT

BACKGROUND: Internalizing mental illnesses associated with psychological distress are often underdetected. Text-based detection using natural language processing (NLP) methods is increasingly being used to complement conventional detection efforts. However, these approaches often rely on self-disclosure through autobiographical narratives that may not always be possible, especially in the context of the collectivistic Japanese culture. OBJECTIVE: We propose the use of narrative writing as an alternative resource for mental illness detection in youth. Accordingly, in this study, we investigated the textual characteristics of narratives written by youth with psychological distress; our research focuses on the detection of psychopathological tendencies in written imaginative narratives. METHODS: Using NLP tools such as stylometric measures and lexicon-based sentiment analysis, we examined short narratives from 52 Japanese youth (mean age 19.8 years, SD 3.1) obtained through crowdsourcing. Participants wrote a short narrative introduction to an imagined story before completing a questionnaire to quantify their tendencies toward psychological distress. Based on this score, participants were categorized into higher distress and lower distress groups. The written narratives were then analyzed using NLP tools and examined for between-group differences. Although outside the scope of this study, we also carried out a supplementary analysis of narratives written by adults using the same procedure. RESULTS: Youth demonstrating higher tendencies toward psychological distress used significantly more positive (happiness-related) words, revealing differences in valence of the narrative content. No other significant differences were observed between the high and low distress groups. CONCLUSIONS: Youth with tendencies toward mental illness were found to write more positive stories that contained more happiness-related terms. These results may potentially have widespread implications on psychological distress screening on online platforms, particularly in cultures such as Japan that are not accustomed to self-disclosure. Although the mechanisms that we propose in explaining our results are speculative, we believe that this interpretation paves the way for future research in online surveillance and detection efforts.

18.

Semantic Textual Similarity in Japanese Clinical Domain Texts Using BERT.

Mutinda, Faith Wavinya; Yada, Shuntaro; Wakamiya, Shoko; Aramaki, Eiji.

Methods Inf Med ; 60(S 01): e56-e64, 2021 06.

Article in English | MEDLINE | ID: mdl-34237783

ABSTRACT

BACKGROUND: Semantic textual similarity (STS) captures the degree of semantic similarity between texts. It plays an important role in many natural language processing applications such as text summarization, question answering, machine translation, information retrieval, dialog systems, plagiarism detection, and query ranking. STS has been widely studied in the general English domain. However, there exists few resources for STS tasks in the clinical domain and in languages other than English, such as Japanese. OBJECTIVE: The objective of this study is to capture semantic similarity between Japanese clinical texts (Japanese clinical STS) by creating a Japanese dataset that is publicly available. MATERIALS: We created two datasets for Japanese clinical STS: (1) Japanese case reports (CR dataset) and (2) Japanese electronic medical records (EMR dataset). The CR dataset was created from publicly available case reports extracted from the CiNii database. The EMR dataset was created from Japanese electronic medical records. METHODS: We used an approach based on bidirectional encoder representations from transformers (BERT) to capture the semantic similarity between the clinical domain texts. BERT is a popular approach for transfer learning and has been proven to be effective in achieving high accuracy for small datasets. We implemented two Japanese pretrained BERT models: a general Japanese BERT and a clinical Japanese BERT. The general Japanese BERT is pretrained on Japanese Wikipedia texts while the clinical Japanese BERT is pretrained on Japanese clinical texts. RESULTS: The BERT models performed well in capturing semantic similarity in our datasets. The general Japanese BERT outperformed the clinical Japanese BERT and achieved a high correlation with human score (0.904 in the CR dataset and 0.875 in the EMR dataset). It was unexpected that the general Japanese BERT outperformed the clinical Japanese BERT on clinical domain dataset. This could be due to the fact that the general Japanese BERT is pretrained on a wide range of texts compared with the clinical Japanese BERT.

Subject(s)

Natural Language Processing , Semantics , Electronic Health Records , Humans , Information Storage and Retrieval , Japan

19.

Measuring Public Concern About COVID-19 in Japanese Internet Users Through Search Queries: Infodemiological Study.

Gao, Zhiwei; Fujita, Sumio; Shimizu, Nobuyuki; Liew, Kongmeng; Murayama, Taichi; Yada, Shuntaro; Wakamiya, Shoko; Aramaki, Eiji.

JMIR Public Health Surveill ; 7(7): e29865, 2021 07 20.

Article in English | MEDLINE | ID: mdl-34174781

ABSTRACT

BACKGROUND: COVID-19 has disrupted lives and livelihoods and caused widespread panic worldwide. Emerging reports suggest that people living in rural areas in some countries are more susceptible to COVID-19. However, there is a lack of quantitative evidence that can shed light on whether residents of rural areas are more concerned about COVID-19 than residents of urban areas. OBJECTIVE: This infodemiology study investigated attitudes toward COVID-19 in different Japanese prefectures by aggregating and analyzing Yahoo! JAPAN search queries. METHODS: We measured COVID-19 concerns in each Japanese prefecture by aggregating search counts of COVID-19-related queries of Yahoo! JAPAN users and data related to COVID-19 cases. We then defined two indices-the localized concern index (LCI) and localized concern index by patient percentage (LCIPP)-to quantitatively represent the degree of concern. To investigate the impact of emergency declarations on people's concerns, we divided our study period into three phases according to the timing of the state of emergency in Japan: before, during, and after. In addition, we evaluated the relationship between the LCI and LCIPP in different prefectures by correlating them with prefecture-level indicators of urbanization. RESULTS: Our results demonstrated that the concerns about COVID-19 in the prefectures changed in accordance with the declaration of the state of emergency. The correlation analyses also indicated that the differentiated types of public concern measured by the LCI and LCIPP reflect the prefectures' level of urbanization to a certain extent (ie, the LCI appears to be more suitable for quantifying COVID-19 concern in urban areas, while the LCIPP seems to be more appropriate for rural areas). CONCLUSIONS: We quantitatively defined Japanese Yahoo users' concerns about COVID-19 by using the search counts of COVID-19-related search queries. Our results also showed that the LCI and LCIPP have external validity.

Subject(s)

Anxiety/epidemiology , Attitude to Health , COVID-19/psychology , Internet/statistics & numerical data , Search Engine/statistics & numerical data , Adult , Aged , COVID-19/epidemiology , Female , Humans , Japan/epidemiology , Male , Middle Aged , Rural Population/statistics & numerical data , Urban Population/statistics & numerical data

20.

Modeling the spread of fake news on Twitter.

Murayama, Taichi; Wakamiya, Shoko; Aramaki, Eiji; Kobayashi, Ryota.

PLoS One ; 16(4): e0250419, 2021.

Article in English | MEDLINE | ID: mdl-33886665

ABSTRACT

Fake news can have a significant negative impact on society because of the growing use of mobile devices and the worldwide increase in Internet access. It is therefore essential to develop a simple mathematical model to understand the online dissemination of fake news. In this study, we propose a point process model of the spread of fake news on Twitter. The proposed model describes the spread of a fake news item as a two-stage process: initially, fake news spreads as a piece of ordinary news; then, when most users start recognizing the falsity of the news item, that itself spreads as another news story. We validate this model using two datasets of fake news items spread on Twitter. We show that the proposed model is superior to the current state-of-the-art methods in accurately predicting the evolution of the spread of a fake news item. Moreover, a text analysis suggests that our model appropriately infers the correction time, i.e., the moment when Twitter users start realizing the falsity of the news item. The proposed model contributes to understanding the dynamics of the spread of fake news on social media. Its ability to extract a compact representation of the spreading pattern could be useful in the detection and mitigation of fake news.

Subject(s)

Deception , Information Dissemination/methods , Models, Theoretical , Social Media , Data Mining , Humans , Smartphone , Time Factors

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL