Your browser doesn't support javascript.
loading
Natural language processing of Reddit data to evaluate dermatology patient experiences and therapeutics.
Okon, Edidiong; Rachakonda, Vishnutheja; Hong, Hyo Jung; Callison-Burch, Chris; Lipoff, Jules B.
Affiliation
  • Okon E; School of Engineering, University of Pennsylvania, Philadelphia, Pennsylvania.
  • Rachakonda V; School of Engineering, University of Pennsylvania, Philadelphia, Pennsylvania.
  • Hong HJ; Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania.
  • Callison-Burch C; School of Engineering, University of Pennsylvania, Philadelphia, Pennsylvania.
  • Lipoff JB; Department of Dermatology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania. Electronic address: jules.lipoff@pennmedicine.upenn.edu.
J Am Acad Dermatol ; 83(3): 803-808, 2020 Sep.
Article in En | MEDLINE | ID: mdl-31306722
ABSTRACT

BACKGROUND:

There is a lack of research studying patient-generated data on Reddit, one of the world's most popular forums with active users interested in dermatology. Techniques within natural language processing, a field of artificial intelligence, can analyze large amounts of text information and extract insights.

OBJECTIVE:

To apply natural language processing to Reddit comments about dermatology topics to assess for feasibility and potential for insights and engagement.

METHODS:

A software pipeline preprocessed Reddit comments from 2005 to 2017 from 7 popular dermatology-related subforums on Reddit, applied latent Dirichlet allocation, and used spectral clustering to establish cohesive themes and the frequency of word representation and grouped terms within these topics.

RESULTS:

We created a corpus of 176,000 comments and identified trends in patient engagement in spaces such as eczema and acne, among others, with a focus on homeopathic treatments and isotretinoin.

LIMITATIONS:

Latent Dirichlet allocation is an unsupervised model, meaning there is no ground truth to which the model output can be compared. However, because these forums are anonymous, there seems little incentive for patients to be dishonest.

CONCLUSIONS:

Reddit data has viability and utility for dermatologic research and engagement with the public, especially for common dermatology topics such as tanning, acne, and psoriasis.
Subject(s)
Key words

Full text: 1 Database: MEDLINE Main subject: Natural Language Processing / Dermatology / Social Media / Patient Outcome Assessment Type of study: Prognostic_studies Limits: Humans Language: En Journal: J Am Acad Dermatol Year: 2020 Type: Article

Full text: 1 Database: MEDLINE Main subject: Natural Language Processing / Dermatology / Social Media / Patient Outcome Assessment Type of study: Prognostic_studies Limits: Humans Language: En Journal: J Am Acad Dermatol Year: 2020 Type: Article