Pesquisa | Biblioteca Virtual em Saúde

1.

Exploring X: barriers to care for eosinophilic esophagitis.

Thanawala, Shivani U; Klein, Ari; Raval, Krish; Amaro, Jesus Ivan Flores; Beveridge, Claire A; Muir, Amanda B; Falk, Gary W; Gonzalez-Hernandez, Graciela; Lynch, Kristle L.

Dis Esophagus ; 2024 May 14.

Artigo em Inglês | MEDLINE | ID: mdl-38745432

RESUMO

Patients with chronic diseases have increasingly turned to social media to discuss symptoms and share the challenges they face with disease management. The primary aim of this study is to use naturally occurring data from X (formerly known as Twitter) to identify barriers to care faced by individuals affected by eosinophilic esophagitis (EoE). For this qualitative study, the X application programming interface with academic research access was used to search for posts that referenced EoE between 1 January 2019 and 10 August 2022. The posts were identified as being either related to barriers to care for EoE or not. Those related to barriers to care were further categorized by the type of barrier that was expressed. A total of 8636 EoE-related posts were annotated of which 12.1% were related to barriers to care in EoE. The themes that emerged about barriers to care included: dietary challenges, limited treatment options, lack of community support, lack of physician awareness of disease, misinformation, cost of care, lack of patient belief in disease or trust in physician, and limited access to care. Saturation of themes was achieved. This study highlights barriers to care in EoE using readily accessible social media data that is not derived from a curated research setting. Identifying these obstacles is key to improving care for this chronic disease.

2.

Methods and Annotated Data Sets Used to Predict the Gender and Age of Twitter Users: Scoping Review.

O'Connor, Karen; Golder, Su; Weissenbacher, Davy; Klein, Ari Z; Magge, Arjun; Gonzalez-Hernandez, Graciela.

J Med Internet Res ; 26: e47923, 2024 Mar 15.

Artigo em Inglês | MEDLINE | ID: mdl-38488839

RESUMO

BACKGROUND: Patient health data collected from a variety of nontraditional resources, commonly referred to as real-world data, can be a key information source for health and social science research. Social media platforms, such as Twitter (Twitter, Inc), offer vast amounts of real-world data. An important aspect of incorporating social media data in scientific research is identifying the demographic characteristics of the users who posted those data. Age and gender are considered key demographics for assessing the representativeness of the sample and enable researchers to study subgroups and disparities effectively. However, deciphering the age and gender of social media users poses challenges. OBJECTIVE: This scoping review aims to summarize the existing literature on the prediction of the age and gender of Twitter users and provide an overview of the methods used. METHODS: We searched 15 electronic databases and carried out reference checking to identify relevant studies that met our inclusion criteria: studies that predicted the age or gender of Twitter users using computational methods. The screening process was performed independently by 2 researchers to ensure the accuracy and reliability of the included studies. RESULTS: Of the initial 684 studies retrieved, 74 (10.8%) studies met our inclusion criteria. Among these 74 studies, 42 (57%) focused on predicting gender, 8 (11%) focused on predicting age, and 24 (32%) predicted a combination of both age and gender. Gender prediction was predominantly approached as a binary classification task, with the reported performance of the methods ranging from 0.58 to 0.96 F1-score or 0.51 to 0.97 accuracy. Age prediction approaches varied in terms of classification groups, with a higher range of reported performance, ranging from 0.31 to 0.94 F1-score or 0.43 to 0.86 accuracy. The heterogeneous nature of the studies and the reporting of dissimilar performance metrics made it challenging to quantitatively synthesize results and draw definitive conclusions. CONCLUSIONS: Our review found that although automated methods for predicting the age and gender of Twitter users have evolved to incorporate techniques such as deep neural networks, a significant proportion of the attempts rely on traditional machine learning methods, suggesting that there is potential to improve the performance of these tasks by using more advanced methods. Gender prediction has generally achieved a higher reported performance than age prediction. However, the lack of standardized reporting of performance metrics or standard annotated corpora to evaluate the methods used hinders any meaningful comparison of the approaches. Potential biases stemming from the collection and labeling of data used in the studies was identified as a problem, emphasizing the need for careful consideration and mitigation of biases in future studies. This scoping review provides valuable insights into the methods used for predicting the age and gender of Twitter users, along with the challenges and considerations associated with these methods.

Assuntos

Mídias Sociais , Humanos , Adulto Jovem , Adulto , Reprodutibilidade dos Testes , Redes Neurais de Computação , Aprendizado de Máquina

3.

Social Media Posts on Statins: What Can We Learn About Patient Experiences and Perspectives?

Golder, Su; Klein, Ari; O'Connor, Karen; Wang, Yunwen; Gonzalez-Hernandez, Graciela.

J Am Heart Assoc ; 13(7): e033992, 2024 Apr 02.

Artigo em Inglês | MEDLINE | ID: mdl-38533982

Assuntos

COVID-19 , Inibidores de Hidroximetilglutaril-CoA Redutases , Mídias Sociais , Humanos , Inibidores de Hidroximetilglutaril-CoA Redutases/efeitos adversos , SARS-CoV-2 , Avaliação de Resultados da Assistência ao Paciente

4.

Using Longitudinal Twitter Data for Digital Epidemiology of Childhood Health Outcomes: An Annotated Data Set and Deep Neural Network Classifiers.

Klein, Ari Z; Gutiérrez Gómez, José Agustín; Levine, Lisa D; Gonzalez-Hernandez, Graciela.

J Med Internet Res ; 26: e50652, 2024 Mar 25.

Artigo em Inglês | MEDLINE | ID: mdl-38526542

RESUMO

We manually annotated 9734 tweets that were posted by users who reported their pregnancy on Twitter, and used them to train, evaluate, and deploy deep neural network classifiers (F1-score=0.93) to detect tweets that report having a child with attention-deficit/hyperactivity disorder (678 users), autism spectrum disorders (1744 users), delayed speech (902 users), or asthma (1255 users), demonstrating the potential of Twitter as a complementary resource for assessing associations between pregnancy exposures and childhood health outcomes on a large scale.

Assuntos

Asma , Transtorno do Espectro Autista , Mídias Sociais , Criança , Feminino , Gravidez , Humanos , Asma/epidemiologia , Redes Neurais de Computação

5.

Overview of the 8th Social Media Mining for Health Applications (#SMM4H) shared tasks at the AMIA 2023 Annual Symposium.

Klein, Ari Z; Banda, Juan M; Guo, Yuting; Schmidt, Ana Lucia; Xu, Dongfang; Flores Amaro, Ivan; Rodriguez-Esteban, Raul; Sarker, Abeed; Gonzalez-Hernandez, Graciela.

J Am Med Inform Assoc ; 31(4): 991-996, 2024 Apr 03.

Artigo em Inglês | MEDLINE | ID: mdl-38218723

RESUMO

OBJECTIVE: The aim of the Social Media Mining for Health Applications (#SMM4H) shared tasks is to take a community-driven approach to address the natural language processing and machine learning challenges inherent to utilizing social media data for health informatics. In this paper, we present the annotated corpora, a technical summary of participants' systems, and the performance results. METHODS: The eighth iteration of the #SMM4H shared tasks was hosted at the AMIA 2023 Annual Symposium and consisted of 5 tasks that represented various social media platforms (Twitter and Reddit), languages (English and Spanish), methods (binary classification, multi-class classification, extraction, and normalization), and topics (COVID-19, therapies, social anxiety disorder, and adverse drug events). RESULTS: In total, 29 teams registered, representing 17 countries. In general, the top-performing systems used deep neural network architectures based on pre-trained transformer models. In particular, the top-performing systems for the classification tasks were based on single models that were pre-trained on social media corpora. CONCLUSION: To facilitate future work, the datasets-a total of 61 353 posts-will remain available by request, and the CodaLab sites will remain active for a post-evaluation phase.

Assuntos

Mídias Sociais , Humanos , Mineração de Dados/métodos , Redes Neurais de Computação , Processamento de Linguagem Natural , Aprendizado de Máquina

6.

Association Between COVID-19 During Pregnancy and Preterm Birth by Trimester of Infection: A Retrospective Cohort Study Using Longitudinal Social Media Data.

Klein, Ari Z; Kunatharaju, Shriya; Golder, Su; Levine, Lisa D; Figueiredo, Jane C; Gonzalez-Hernandez, Graciela.

medRxiv ; 2023 Nov 21.

Artigo em Inglês | MEDLINE | ID: mdl-38045356

RESUMO

Background: Preterm birth, defined as birth at <37 weeks of gestation, is the leading cause of neonatal death globally and, together with low birthweight, the second leading cause of infant mortality in the United States. There is mounting evidence that COVID-19 infection during pregnancy is associated with an increased risk of preterm birth; however, data remain limited by trimester of infection. The ability to study COVID-19 infection during the earlier stages of pregnancy has been limited by available sources of data. The objective of this study was to use self-reports in large-scale, longitudinal social media data to assess the association between trimester of COVID-19 infection and preterm birth. Methods: In this retrospective cohort study, we used natural language processing and machine learning, followed by manual validation, to identify pregnant Twitter users and to search their longitudinal collection of publicly available tweets for reports of COVID-19 infection during pregnancy and, subsequently, a preterm birth or term birth (i.e., a gestational age ≥37 weeks) outcome. Among the users who reported their pregnancy on Twitter, we also identified a 1:1 age-matched control group, consisting of users with a due date prior to January 1, 2020-that is, without COVID-19 infection during pregnancy. We calculated the odds ratios (ORs) with 95% confidence intervals (CIs) to compare the overall rates of preterm birth for pregnancies with and without COVID-19 infection and by timing of infection: first trimester (weeks 1-13), second trimester (weeks 1427), or third trimester (weeks 28-36). Results: Through August 2022, we identified 298 Twitter users who reported COVID-19 infection during pregnancy, a preterm birth or term birth outcome, and maternal age: 94 (31.5%) with first-trimester infection, 110 (36.9%) second-trimester infection, and 95 (31.9%) third-trimester infection. In total, 26 (8.8%) of these 298 users reported preterm birth: 8 (8.5%) were infected during the first trimester, 7 (6.4%) were infected during the second trimester, and 12 (12.6%) were infected during the third trimester. In the 1:1 age-matched control group, 13 (4.4%) of the 298 users reported preterm birth. Overall, the risk of preterm birth was significantly higher for pregnancies with COVID-19 infection compared to those without (OR 2.1, 95% CI 1.06-4.16). In particular, the risk of preterm birth was significantly higher for pregnancies with COVID-19 infection during the third trimester (OR 3.17, CI 1.39-7.21). Conclusion: The results of our study suggest that COVID-19 infection particularly during the third trimester is associated with an increased risk of preterm birth.

7.

Overview of the 8^th Social Media Mining for Health Applications (#SMM4H) Shared Tasks at the AMIA 2023 Annual Symposium.

Klein, Ari Z; Banda, Juan M; Guo, Yuting; Schmidt, Ana Lucia; Xu, Dongfang; Amaro, Jesus Ivan Flores; Rodriguez-Esteban, Raul; Sarker, Abeed; Gonzalez-Hernandez, Graciela.

medRxiv ; 2023 Nov 08.

Artigo em Inglês | MEDLINE | ID: mdl-37986776

RESUMO

The aim of the Social Media Mining for Health Applications (#SMM4H) shared tasks is to take a community-driven approach to address the natural language processing and machine learning challenges inherent to utilizing social media data for health informatics. The eighth iteration of the #SMM4H shared tasks was hosted at the AMIA 2023 Annual Symposium and consisted of five tasks that represented various social media platforms (Twitter and Reddit), languages (English and Spanish), methods (binary classification, multi-class classification, extraction, and normalization), and topics (COVID-19, therapies, social anxiety disorder, and adverse drug events). In total, 29 teams registered, representing 18 countries. In this paper, we present the annotated corpora, a technical summary of the systems, and the performance results. In general, the top-performing systems used deep neural network architectures based on pre-trained transformer models. In particular, the top-performing systems for the classification tasks were based on single models that were pre-trained on social media corpora. To facilitate future work, the datasets-a total of 61,353 posts-will remain available by request, and the CodaLab sites will remain active for a post-evaluation phase.

8.

Text mining biomedical literature to identify extremely unbalanced data for digital epidemiology and systematic reviews: dataset and methods for a SARS-CoV-2 genomic epidemiology study.

Weissenbacher, Davy; O'Connor, Karen; Klein, Ari; Golder, Su; Flores, Ivan; Elyaderani, Amir; Scotch, Matthew; Gonzalez-Hernandez, Graciela.

medRxiv ; 2023 Aug 04.

Artigo em Inglês | MEDLINE | ID: mdl-37577535

RESUMO

There are many studies that require researchers to extract specific information from the published literature, such as details about sequence records or about a randomized control trial. While manual extraction is cost efficient for small studies, larger studies such as systematic reviews are much more costly and time-consuming. To avoid exhaustive manual searches and extraction, and their related cost and effort, natural language processing (NLP) methods can be tailored for the more subtle extraction and decision tasks that typically only humans have performed. The need for such studies that use the published literature as a data source became even more evident as the COVID-19 pandemic raged through the world and millions of sequenced samples were deposited in public repositories such as GISAID and GenBank, promising large genomic epidemiology studies, but more often than not lacked many important details that prevented large-scale studies. Thus, granular geographic location or the most basic patient-relevant data such as demographic information, or clinical outcomes were not noted in the sequence record. However, some of these data was indeed published, but in the text, tables, or supplementary material of a corresponding published article. We present here methods to identify relevant journal articles that report having produced and made available in GenBank or GISAID, new SARS-CoV-2 sequences, as those that initially produced and made available the sequences are the most likely articles to include the high-level details about the patients from whom the sequences were obtained. Human annotators validated the approach, creating a gold standard set for training and validation of a machine learning classifier. Identifying these articles is a crucial step to enable future automated informatics pipelines that will apply Machine Learning and Natural Language Processing to identify patient characteristics such as co-morbidities, outcomes, age, gender, and race, enriching SARS-CoV-2 sequence databases with actionable information for defining large genomic epidemiology studies. Thus, enriched patient metadata can enable secondary data analysis, at scale, to uncover associations between the viral genome (including variants of concern and their sublineages), transmission risk, and health outcomes. However, for such enrichment to happen, the right papers need to be found and very detailed data needs to be extracted from them. Further, finding the very specific articles needed for inclusion is a task that also facilitates scoping and systematic reviews, greatly reducing the time needed for full-text analysis and extraction.

9.

Automatically Identifying Self-Reports of COVID-19 Diagnosis on Twitter: An Annotated Data Set, Deep Neural Network Classifiers, and a Large-Scale Cohort.

Klein, Ari Z; Kunatharaju, Shriya; O'Connor, Karen; Gonzalez-Hernandez, Graciela.

J Med Internet Res ; 25: e46484, 2023 07 03.

Artigo em Inglês | MEDLINE | ID: mdl-37399062

Assuntos

COVID-19 , Mídias Sociais , Humanos , Autorrelato , Teste para COVID-19 , Redes Neurais de Computação

10.

Pregex: Rule-Based Detection and Extraction of Twitter Data in Pregnancy.

Klein, Ari Z; Kunatharaju, Shriya; O'Connor, Karen; Gonzalez-Hernandez, Graciela.

J Med Internet Res ; 25: e40569, 2023 02 09.

Artigo em Inglês | MEDLINE | ID: mdl-36757756

Assuntos

Mídias Sociais , Feminino , Gravidez , Humanos , Mineração de Dados , Processamento de Linguagem Natural

11.

Automatically Identifying Twitter Users for Interventions to Support Dementia Family Caregivers: Annotated Data Set and Benchmark Classification Models.

Klein, Ari Z; Magge, Arjun; O'Connor, Karen; Gonzalez-Hernandez, Graciela.

JMIR Aging ; 5(3): e39547, 2022 Sep 16.

Artigo em Inglês | MEDLINE | ID: mdl-36112408

RESUMO

BACKGROUND: More than 6 million people in the United States have Alzheimer disease and related dementias, receiving help from more than 11 million family or other informal caregivers. A range of traditional interventions has been developed to support family caregivers; however, most of them have not been implemented in practice and remain largely inaccessible. While recent studies have shown that family caregivers of people with dementia use Twitter to discuss their experiences, methods have not been developed to enable the use of Twitter for interventions. OBJECTIVE: The objective of this study is to develop an annotated data set and benchmark classification models for automatically identifying a cohort of Twitter users who have a family member with dementia. METHODS: Between May 4 and May 20, 2021, we collected 10,733 tweets, posted by 8846 users, that mention a dementia-related keyword, a linguistic marker that potentially indicates a diagnosis, and a select familial relationship. Three annotators annotated 1 random tweet per user to distinguish those that indicate having a family member with dementia from those that do not. Interannotator agreement was 0.82 (Fleiss kappa). We used the annotated tweets to train and evaluate support vector machine and deep neural network classifiers. To assess the scalability of our approach, we then deployed automatic classification on unlabeled tweets that were continuously collected between May 4, 2021, and March 9, 2022. RESULTS: A deep neural network classifier based on a BERT (bidirectional encoder representations from transformers) model pretrained on tweets achieved the highest F1-score of 0.962 (precision=0.946 and recall=0.979) for the class of tweets indicating that the user has a family member with dementia. The classifier detected 128,838 tweets that indicate having a family member with dementia, posted by 74,290 users between May 4, 2021, and March 9, 2022-that is, approximately 7500 users per month. CONCLUSIONS: Our annotated data set can be used to automatically identify Twitter users who have a family member with dementia, enabling the use of Twitter on a large scale to not only explore family caregivers' experiences but also directly target interventions at these users.

12.

Using Twitter Data for Cohort Studies of Drug Safety in Pregnancy: Proof-of-concept With ß-Blockers.

Klein, Ari Z; O'Connor, Karen; Levine, Lisa D; Gonzalez-Hernandez, Graciela.

JMIR Form Res ; 6(6): e36771, 2022 Jun 30.

Artigo em Inglês | MEDLINE | ID: mdl-35771614

RESUMO

BACKGROUND: Despite the fact that medication is taken during more than 90% of pregnancies, the fetal risk for most medications is unknown, and the majority of medications have no data regarding safety in pregnancy. OBJECTIVE: Using ß-blockers as a proof-of-concept, the primary objective of this study was to assess the utility of Twitter data for a cohort study design-in particular, whether we could identify (1) Twitter users who have posted tweets reporting that they took medication during pregnancy and (2) their associated pregnancy outcomes. METHODS: We searched for mentions of ß-blockers in 2.75 billion tweets posted by 415,690 users who announced their pregnancy on Twitter. We manually reviewed the matching tweets to first determine if the user actually took the ß-blocker mentioned in the tweet. Then, to help determine if the ß-blocker was taken during pregnancy, we used the time stamp of the tweet reporting intake and drew upon an automated natural language processing (NLP) tool that estimates the date of the user's prenatal time period. For users who posted tweets indicating that they took or may have taken the ß-blocker during pregnancy, we drew upon additional NLP tools to help identify tweets that report their pregnancy outcomes. Adverse pregnancy outcomes included miscarriage, stillbirth, birth defects, preterm birth (<37 weeks gestation), low birth weight (<5 pounds and 8 ounces at delivery), and neonatal intensive care unit (NICU) admission. Normal pregnancy outcomes included gestational age ≥37 weeks and birth weight ≥5 pounds and 8 ounces. RESULTS: We retrieved 5114 tweets, posted by 2339 users, that mention a ß-blocker, and manually identified 2332 (45.6%) tweets, posted by 1195 (51.1%) of the users, that self-report taking the ß-blocker. We were able to estimate the date of the prenatal time period for 356 pregnancies among 334 (27.9%) of these 1195 users. Among these 356 pregnancies, we identified 257 (72.2%) during which the ß-blocker was or may have been taken. We manually verified an adverse pregnancy outcome-preterm birth, NICU admission, low birth weight, birth defects, or miscarriage-for 38 (14.8%) of these 257 pregnancies. We manually verified a gestational age ≥37 weeks for 198 (90.4%) and a birth weight ≥5 pounds and 8 ounces for 50 (22.8%) of the 219 pregnancies for which we did not identify an adverse pregnancy outcome. CONCLUSIONS: Our ability to detect pregnancy outcomes for Twitter users who posted tweets reporting that they took or may have taken a ß-blocker during pregnancy suggests that Twitter can be a complementary resource for cohort studies of drug safety in pregnancy.

13.

A chronological and geographical analysis of personal reports of COVID-19 on Twitter from the UK.

Golder, Su; Klein, Ari Z; Magge, Arjun; O'Connor, Karen; Cai, Haitao; Weissenbacher, Davy; Gonzalez-Hernandez, Graciela.

Digit Health ; 8: 20552076221097508, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35574580

RESUMO

Objective: Given the uncertainty about the trends and extent of the rapidly evolving COVID-19 outbreak, and the lack of extensive testing in the United Kingdom, our understanding of COVID-19 transmission is limited. We proposed to use Twitter to identify personal reports of COVID-19 to assess whether this data can help inform as a source of data to help us understand and model the transmission and trajectory of COVID-19. Methods: We used natural language processing and machine learning framework. We collected tweets (excluding retweets) from the Twitter Streaming API that indicate that the user or a member of the user's household had been exposed to COVID-19. The tweets were required to be geo-tagged or have profile location metadata in the UK. Results: We identified a high level of agreement between personal reports from Twitter and lab-confirmed cases by geographical region in the UK. Temporal analysis indicated that personal reports from Twitter appear up to 2 weeks before UK government lab-confirmed cases are recorded. Conclusions: Analysis of tweets may indicate trends in COVID-19 in the UK and provide signals of geographical locations where resources may need to be targeted or where regional policies may need to be put in place to further limit the spread of COVID-19. It may also help inform policy makers of the restrictions in lockdown that are most effective or ineffective.

14.

Toward Using Twitter for PrEP-Related Interventions: An Automated Natural Language Processing Pipeline for Identifying Gay or Bisexual Men in the United States.

Klein, Ari Z; Meanley, Steven; O'Connor, Karen; Bauermeister, José A; Gonzalez-Hernandez, Graciela.

JMIR Public Health Surveill ; 8(4): e32405, 2022 04 25.

Artigo em Inglês | MEDLINE | ID: mdl-35468092

RESUMO

BACKGROUND: Pre-exposure prophylaxis (PrEP) is highly effective at preventing the acquisition of HIV. There is a substantial gap, however, between the number of people in the United States who have indications for PrEP and the number of them who are prescribed PrEP. Although Twitter content has been analyzed as a source of PrEP-related data (eg, barriers), methods have not been developed to enable the use of Twitter as a platform for implementing PrEP-related interventions. OBJECTIVE: Men who have sex with men (MSM) are the population most affected by HIV in the United States. Therefore, the objectives of this study were to (1) develop an automated natural language processing (NLP) pipeline for identifying men in the United States who have reported on Twitter that they are gay, bisexual, or MSM and (2) assess the extent to which they demographically represent MSM in the United States with new HIV diagnoses. METHODS: Between September 2020 and January 2021, we used the Twitter Streaming Application Programming Interface (API) to collect more than 3 million tweets containing keywords that men may include in posts reporting that they are gay, bisexual, or MSM. We deployed handwritten, high-precision regular expressions-designed to filter out noise and identify actual self-reports-on the tweets and their user profile metadata. We identified 10,043 unique users geolocated in the United States and drew upon a validated NLP tool to automatically identify their ages. RESULTS: By manually distinguishing true- and false-positive self-reports in the tweets or profiles of 1000 (10%) of the 10,043 users identified by our automated pipeline, we established that our pipeline has a precision of 0.85. Among the 8756 users for which a US state-level geolocation was detected, 5096 (58.2%) were in the 10 states with the highest numbers of new HIV diagnoses. Among the 6240 users for which a county-level geolocation was detected, 4252 (68.1%) were in counties or states considered priority jurisdictions by the Ending the HIV Epidemic initiative. Furthermore, the age distribution of the users reflected that of MSM in the United States with new HIV diagnoses. CONCLUSIONS: Our automated NLP pipeline can be used to identify MSM in the United States who may be at risk of acquiring HIV, laying the groundwork for using Twitter on a large scale to directly target PrEP-related interventions at this population.

Assuntos

Infecções por HIV , Minorias Sexuais e de Gênero , Mídias Sociais , Infecções por HIV/epidemiologia , Infecções por HIV/prevenção & controle , Homossexualidade Masculina , Humanos , Masculino , Processamento de Linguagem Natural , Estados Unidos/epidemiologia

15.

Adolescent Perceptions of Menstruation on Twitter: Opportunities for Advocacy and Education.

Davies, Shelby H; Langer, Miriam D; Klein, Ari; Gonzalez-Hernandez, Graciela; Dowshen, Nadia.

J Adolesc Health ; 71(1): 94-104, 2022 07.

Artigo em Inglês | MEDLINE | ID: mdl-35283044

RESUMO

PURPOSE: While some adolescents celebrate menstruation as a rite of passage, others seek discretion due to stigma. Many youth have used Twitter to combat stigma and raise awareness about other culturally taboo topics, but previous work has not explored youth conversations regarding menstruation. This study aims to assess whether Twitter can provide useful insights into how youth perceive menstruation. METHODS: The team searched 162,316,839 tweets of 71,443 users of the age range 13-25 years in the Health Language Processing Twitter Youth Cohort for tweets that matched menstruation-related keywords: a pad, my pad, my period, her period, your period, tampon, diva cup, menstruate, that time of the month. Twelve codes emerged using a grounded theory approach and were sorted into three themes. RESULTS: Analysis was conducted on 10,000 tweets. Three themes emerged, including menstrual health, menstrual stigma, and menstrual positivity. Tweets related to menstrual health included physical complications, sexual/reproductive health, health education, and LGBTQ health. Tweets that addressed menstrual stigma included inconvenience/limitations, shame/stereotypes, religion/alternate perceptions, access/affordability, and self-depreciation/harm. Tweets related to menstrual positivity included awareness/community, strength/resilience, and environment/sustainability. DISCUSSION: This study provides insights into youth perceptions about menstruation. There was overwhelming emphasis placed on the negative expectations and shame around menstruation. A significant minority of tweets were directly or indirectly related to advocacy or education, which supports the potential use of Twitter as a platform to improve public health messaging, transform health outcomes, and promote equity among youth who menstruate.

Assuntos

Menstruação , Mídias Sociais , Adolescente , Pré-Escolar , Feminino , Humanos , Lactente , Saúde Pública , Saúde Reprodutiva , Estigma Social

16.

ReportAGE: Automatically extracting the exact age of Twitter users based on self-reports in tweets.

Klein, Ari Z; Magge, Arjun; Gonzalez-Hernandez, Graciela.

PLoS One ; 17(1): e0262087, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-35077484

RESUMO

Advancing the utility of social media data for research applications requires methods for automatically detecting demographic information about social media study populations, including users' age. The objective of this study was to develop and evaluate a method that automatically identifies the exact age of users based on self-reports in their tweets. Our end-to-end automatic natural language processing (NLP) pipeline, ReportAGE, includes query patterns to retrieve tweets that potentially mention an age, a classifier to distinguish retrieved tweets that self-report the user's exact age ("age" tweets) and those that do not ("no age" tweets), and rule-based extraction to identify the age. To develop and evaluate ReportAGE, we manually annotated 11,000 tweets that matched the query patterns. Based on 1000 tweets that were annotated by all five annotators, inter-annotator agreement (Fleiss' kappa) was 0.80 for distinguishing "age" and "no age" tweets, and 0.95 for identifying the exact age among the "age" tweets on which the annotators agreed. A deep neural network classifier, based on a RoBERTa-Large pretrained transformer model, achieved the highest F1-score of 0.914 (precision = 0.905, recall = 0.942) for the "age" class. When the age extraction was evaluated using the classifier's predictions, it achieved an F1-score of 0.855 (precision = 0.805, recall = 0.914) for the "age" class. When it was evaluated directly on the held-out test set, it achieved an F1-score of 0.931 (precision = 0.873, recall = 0.998) for the "age" class. We deployed ReportAGE on a collection of more than 1.2 billion tweets, posted by 245,927 users, and predicted ages for 132,637 (54%) of them. Scaling the detection of exact age to this large number of users can advance the utility of social media data for research applications that do not align with the predefined age groupings of extant binary or multi-class classification approaches.

Assuntos

Coleta de Dados/métodos , Adolescente , Adulto , Humanos , Processamento de Linguagem Natural , Redes Neurais de Computação , Autorrelato , Mídias Sociais , Adulto Jovem

17.

Toward Using Twitter Data to Monitor COVID-19 Vaccine Safety in Pregnancy: Proof-of-Concept Study of Cohort Identification.

Klein, Ari Z; O'Connor, Karen; Gonzalez-Hernandez, Graciela.

JMIR Form Res ; 6(1): e33792, 2022 Jan 06.

Artigo em Inglês | MEDLINE | ID: mdl-34870607

RESUMO

BACKGROUND: COVID-19 during pregnancy is associated with an increased risk of maternal death, intensive care unit admission, and preterm birth; however, many people who are pregnant refuse to receive COVID-19 vaccination because of a lack of safety data. OBJECTIVE: The objective of this preliminary study was to assess whether Twitter data could be used to identify a cohort for epidemiologic studies of COVID-19 vaccination in pregnancy. Specifically, we examined whether it is possible to identify users who have reported (1) that they received COVID-19 vaccination during pregnancy or the periconception period, and (2) their pregnancy outcomes. METHODS: We developed regular expressions to search for reports of COVID-19 vaccination in a large collection of tweets posted through the beginning of July 2021 by users who have announced their pregnancy on Twitter. To help determine if users were vaccinated during pregnancy, we drew upon a natural language processing (NLP) tool that estimates the timeframe of the prenatal period. For users who posted tweets with a timestamp indicating they were vaccinated during pregnancy, we drew upon additional NLP tools to help identify tweets that reported their pregnancy outcomes. RESULTS: We manually verified the content of tweets detected automatically, identifying 150 users who reported on Twitter that they received at least one dose of COVID-19 vaccination during pregnancy or the periconception period. We manually verified at least one reported outcome for 45 of the 60 (75%) completed pregnancies. CONCLUSIONS: Given the limited availability of data on COVID-19 vaccine safety in pregnancy, Twitter can be a complementary resource for potentially increasing the acceptance of COVID-19 vaccination in pregnant populations. The results of this preliminary study justify the development of scalable methods to identify a larger cohort for epidemiologic studies.

18.

Active neural networks to detect mentions of changes to medication treatment in social media.

Weissenbacher, Davy; Ge, Suyu; Klein, Ari; O'Connor, Karen; Gross, Robert; Hennessy, Sean; Gonzalez-Hernandez, Graciela.

J Am Med Inform Assoc ; 28(12): 2551-2561, 2021 11 25.

Artigo em Inglês | MEDLINE | ID: mdl-34613417

RESUMO

OBJECTIVE: We address a first step toward using social media data to supplement current efforts in monitoring population-level medication nonadherence: detecting changes to medication treatment. Medication treatment changes, like changes to dosage or to frequency of intake, that are not overseen by physicians are, by that, nonadherence to medication. Despite the consequences, including worsening health conditions or death, 50% of patients are estimated to not take medications as indicated. Current methods to identify nonadherence have major limitations. Direct observation may be intrusive or expensive, and indirect observation through patient surveys relies heavily on patients' memory and candor. Using social media data in these studies may address these limitations. METHODS: We annotated 9830 tweets mentioning medications and trained a convolutional neural network (CNN) to find mentions of medication treatment changes, regardless of whether the change was recommended by a physician. We used active and transfer learning from 12 972 reviews we annotated from WebMD to address the class imbalance of our Twitter corpus. To validate our CNN and explore future directions, we annotated 1956 positive tweets as to whether they reflect nonadherence and categorized the reasons given. RESULTS: Our CNN achieved 0.50 F1-score on this new corpus. The manual analysis of positive tweets revealed that nonadherence is evident in a subset with 9 categories of reasons for nonadherence. CONCLUSION: We showed that social media users publicly discuss medication treatment changes and may explain their reasons including when it constitutes nonadherence. This approach may be useful to supplement current efforts in adherence monitoring.

Assuntos

Mídias Sociais , Humanos , Adesão à Medicação , Redes Neurais de Computação

19.

Toward Using Twitter for Tracking COVID-19: A Natural Language Processing Pipeline and Exploratory Data Set.

Klein, Ari Z; Magge, Arjun; O'Connor, Karen; Flores Amaro, Jesus Ivan; Weissenbacher, Davy; Gonzalez Hernandez, Graciela.

J Med Internet Res ; 23(1): e25314, 2021 01 22.

Artigo em Inglês | MEDLINE | ID: mdl-33449904

RESUMO

BACKGROUND: In the United States, the rapidly evolving COVID-19 outbreak, the shortage of available testing, and the delay of test results present challenges for actively monitoring its spread based on testing alone. OBJECTIVE: The objective of this study was to develop, evaluate, and deploy an automatic natural language processing pipeline to collect user-generated Twitter data as a complementary resource for identifying potential cases of COVID-19 in the United States that are not based on testing and, thus, may not have been reported to the Centers for Disease Control and Prevention. METHODS: Beginning January 23, 2020, we collected English tweets from the Twitter Streaming application programming interface that mention keywords related to COVID-19. We applied handwritten regular expressions to identify tweets indicating that the user potentially has been exposed to COVID-19. We automatically filtered out "reported speech" (eg, quotations, news headlines) from the tweets that matched the regular expressions, and two annotators annotated a random sample of 8976 tweets that are geo-tagged or have profile location metadata, distinguishing tweets that self-report potential cases of COVID-19 from those that do not. We used the annotated tweets to train and evaluate deep neural network classifiers based on bidirectional encoder representations from transformers (BERT). Finally, we deployed the automatic pipeline on more than 85 million unlabeled tweets that were continuously collected between March 1 and August 21, 2020. RESULTS: Interannotator agreement, based on dual annotations for 3644 (41%) of the 8976 tweets, was 0.77 (Cohen κ). A deep neural network classifier, based on a BERT model that was pretrained on tweets related to COVID-19, achieved an F1-score of 0.76 (precision=0.76, recall=0.76) for detecting tweets that self-report potential cases of COVID-19. Upon deploying our automatic pipeline, we identified 13,714 tweets that self-report potential cases of COVID-19 and have US state-level geolocations. CONCLUSIONS: We have made the 13,714 tweets identified in this study, along with each tweet's time stamp and US state-level geolocation, publicly available to download. This data set presents the opportunity for future work to assess the utility of Twitter data as a complementary resource for tracking the spread of COVID-19.

Assuntos

COVID-19/epidemiologia , COVID-19/transmissão , Conjuntos de Dados como Assunto , Processamento de Linguagem Natural , Mídias Sociais/estatística & dados numéricos , COVID-19/diagnóstico , Surtos de Doenças/estatística & dados numéricos , Humanos , Estudos Longitudinais , SARS-CoV-2 , Autorrelato , Fala , Estados Unidos/epidemiologia

20.

An annotated data set for identifying women reporting adverse pregnancy outcomes on Twitter.

Klein, Ari Z; Gonzalez-Hernandez, Graciela.

Data Brief ; 32: 106249, 2020 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-32944604

RESUMO

Despite the prevalence in the United States of miscarriage [1], stillbirth [2], and infant mortality associated with preterm birth and low birthweight [3], their causes remain largely unknown [4], [5], [6]. To advance the use of social media data as a complementary resource for epidemiology of adverse pregnancy outcomes, we present a data set of 6487 tweets that mention miscarriage, stillbirth, preterm birth or premature labor, low birthweight, neonatal intensive care, or fetal/infant loss in general. These tweets are a subset of 22,912 tweets retrieved by applying hand-written regular expressions to a database containing more than 400 million public tweets posted by more than 100,000 women who have announced their pregnancy on Twitter [7]. Two professional annotators labeled the 6487 tweets in a binary fashion, distinguishing those potentially reporting that the user has personally experienced the outcome ("outcome" tweets) from those that merely mention the outcome ("non-outcome" tweets). Inter-annotator agreement was κâ¯=â¯0.90 (Cohen's kappa). The tweets annotated as "outcome" include 1318 women reporting miscarriage, 94 stillbirth, 591 preterm birth or premature labor, 171 low birthweight, 453 neonatal intensive care, and 356 fetal/infant loss in general. These "outcome" tweets can be used to explore patient experiences and perceptions of adverse pregnancy outcomes, and can direct researchers to the users' broader timelines-tweets posted by a user over time-for observational studies. Our past work demonstrates the analysis of timelines for selecting a study population [8] and conducting a case-control study [9] of users reporting that their child has a birth defect. For larger-scale studies, the full annotated corpus can be used to train supervised machine learning algorithms to automatically identify additional users reporting adverse pregnancy outcomes on Twitter. We used the annotated corpus to train feature-engineered and deep learning-based classifiers presented in "A natural language processing pipeline to advance the use of Twitter data for digital epidemiology of adverse pregnancy outcomes" [10].

RESUMO

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA