Search | VHL Regional Portal

Applying Sparse Machine Learning Methods to Twitter: Analysis of the 2012 Change in Pap Smear Guidelines. A Sequential Mixed-Methods Study.

Lyles, Courtney Rees; Godbehere, Andrew; Le, Gem; El Ghaoui, Laurent; Sarkar, Urmimala.

JMIR Public Health Surveill ; 2(1): e21, 2016 Jun 10.

Article in English | MEDLINE | ID: mdl-27288093

ABSTRACT

BACKGROUND: It is difficult to synthesize the vast amount of textual data available from social media websites. Capturing real-world discussions via social media could provide insights into individuals' opinions and the decision-making process. OBJECTIVE: We conducted a sequential mixed methods study to determine the utility of sparse machine learning techniques in summarizing Twitter dialogues. We chose a narrowly defined topic for this approach: cervical cancer discussions over a 6-month time period surrounding a change in Pap smear screening guidelines. METHODS: We applied statistical methodologies, known as sparse machine learning algorithms, to summarize Twitter messages about cervical cancer before and after the 2012 change in Pap smear screening guidelines by the US Preventive Services Task Force (USPSTF). All messages containing the search terms "cervical cancer," "Pap smear," and "Pap test" were analyzed during: (1) January 1-March 13, 2012, and (2) March 14-June 30, 2012. Topic modeling was used to discern the most common topics from each time period, and determine the singular value criterion for each topic. The results were then qualitatively coded from top 10 relevant topics to determine the efficiency of clustering method in grouping distinct ideas, and how the discussion differed before vs. after the change in guidelines . RESULTS: This machine learning method was effective in grouping the relevant discussion topics about cervical cancer during the respective time periods (~20% overall irrelevant content in both time periods). Qualitative analysis determined that a significant portion of the top discussion topics in the second time period directly reflected the USPSTF guideline change (eg, "New Screening Guidelines for Cervical Cancer"), and many topics in both time periods were addressing basic screening promotion and education (eg, "It is Cervical Cancer Awareness Month! Click the link to see where you can receive a free or low cost Pap test.") CONCLUSIONS: It was demonstrated that machine learning tools can be useful in cervical cancer prevention and screening discussions on Twitter. This method allowed us to prove that there is publicly available significant information about cervical cancer screening on social media sites. Moreover, we observed a direct impact of the guideline change within the Twitter messages.

Classification of a large microarray data set: algorithm comparison and analysis of drug signatures.

Natsoulis, Georges; El Ghaoui, Laurent; Lanckriet, Gert R G; Tolley, Alexander M; Leroy, Fabrice; Dunlea, Shane; Eynon, Barrett P; Pearson, Cecelia I; Tugendreich, Stuart; Jarnagin, Kurt.

Genome Res ; 15(5): 724-36, 2005 May.

Article in English | MEDLINE | ID: mdl-15867433

ABSTRACT

A large gene expression database has been produced that characterizes the gene expression and physiological effects of hundreds of approved and withdrawn drugs, toxicants, and biochemical standards in various organs of live rats. In order to derive useful biological knowledge from this large database, a variety of supervised classification algorithms were compared using a 597-microarray subset of the data. Our studies show that several types of linear classifiers based on Support Vector Machines (SVMs) and Logistic Regression can be used to derive readily interpretable drug signatures with high classification performance. Both methods can be tuned to produce classifiers of drug treatments in the form of short, weighted gene lists which upon analysis reveal that some of the signature genes have a positive contribution (act as "rewards" for the class-of-interest) while others have a negative contribution (act as "penalties") to the classification decision. The combination of reward and penalty genes enhances performance by keeping the number of false positive treatments low. The results of these algorithms are combined with feature selection techniques that further reduce the length of the drug signatures, an important step towards the development of useful diagnostic biomarkers and low-cost assays. Multiple signatures with no genes in common can be generated for the same classification end-point. Comparison of these gene lists identifies biological processes characteristic of a given class.

Subject(s)

Algorithms , Classification/methods , Gene Expression Regulation , Oligonucleotide Array Sequence Analysis/methods , Oligonucleotide Array Sequence Analysis/standards , Pharmaceutical Preparations/metabolism , RNA, Messenger/isolation & purification , Animals , Bone Marrow/metabolism , Dose-Response Relationship, Drug , Kidney/metabolism , Liver/metabolism , Logistic Models , Male , Myocardium/metabolism , Principal Component Analysis , Rats , Rats, Sprague-Dawley , Reproducibility of Results

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL