Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 32
Filter
1.
Proc Natl Acad Sci U S A ; 121(9): e2313925121, 2024 Feb 27.
Article in English | MEDLINE | ID: mdl-38386710

ABSTRACT

We administer a Turing test to AI chatbots. We examine how chatbots behave in a suite of classic behavioral games that are designed to elicit characteristics such as trust, fairness, risk-aversion, cooperation, etc., as well as how they respond to a traditional Big-5 psychological survey that measures personality traits. ChatGPT-4 exhibits behavioral and personality traits that are statistically indistinguishable from a random human from tens of thousands of human subjects from more than 50 countries. Chatbots also modify their behavior based on previous experience and contexts "as if" they were learning from the interactions and change their behavior in response to different framings of the same strategic situation. Their behaviors are often distinct from average and modal human behaviors, in which case they tend to behave on the more altruistic and cooperative end of the distribution. We estimate that they act as if they are maximizing an average of their own and partner's payoffs.


Subject(s)
Artificial Intelligence , Behavior , Humans , Altruism , Trust
2.
JMIR Form Res ; 7: e45376, 2023 Sep 15.
Article in English | MEDLINE | ID: mdl-37713239

ABSTRACT

BACKGROUND: An effective and scalable information retrieval (IR) system plays a crucial role in enabling clinicians and researchers to harness the valuable information present in electronic health records. In a previous study, we developed a prototype medical IR system, which incorporated a semantically based query recommendation (SBQR) feature. The system was evaluated empirically and demonstrated high perceived performance by end users. To delve deeper into the factors contributing to this perceived performance, we conducted a follow-up study using query log analysis. OBJECTIVE: One of the primary challenges faced in IR is that users often have limited knowledge regarding their specific information needs. Consequently, an IR system, particularly its user interface, needs to be thoughtfully designed to assist users through the iterative process of refining their queries as they encounter relevant documents during their search. To address these challenges, we incorporated "query recommendation" into our Electronic Medical Record Search Engine (EMERSE), drawing inspiration from the success of similar features in modern IR systems for general purposes. METHODS: The query log data analyzed in this study were collected during our previous experimental study, where we developed EMERSE with the SBQR feature. We implemented a logging mechanism to capture user query behaviors and the output of the IR system (retrieved documents). In this analysis, we compared the initial query entered by users with the query formulated with the assistance of the SBQR. By examining the results of this comparison, we could examine whether the use of SBQR helped in constructing improved queries that differed from the original ones. RESULTS: Our findings revealed that the first query entered without SBQR and the final query with SBQR assistance were highly similar (Jaccard similarity coefficient=0.77). This suggests that the perceived positive performance of the system was primarily attributed to the automatic query expansion facilitated by the SBQR rather than users manually manipulating their queries. In addition, through entropy analysis, we observed that search results converged in scenarios of moderate difficulty, and the degree of convergence correlated strongly with the perceived system performance. CONCLUSIONS: The study demonstrated the potential contribution of the SBQR in shaping participants' positive perceptions of system performance, contingent upon the difficulty of the search scenario. Medical IR systems should therefore consider incorporating an SBQR as a user-controlled option or a semiautomated feature. Future work entails redesigning the experiment in a more controlled manner and conducting multisite studies to demonstrate the effectiveness of EMERSE with SBQR for patient cohort identification. By further exploring and validating these findings, we can enhance the usability and functionality of medical IR systems in real-world settings.

3.
AMIA Annu Symp Proc ; 2023: 951-960, 2023.
Article in English | MEDLINE | ID: mdl-38222378

ABSTRACT

Abortion is a controversial topic that has long been debated in the US. With the recent Supreme Court decision to overturn Roe v. Wade, access to safe and legal reproductive care is once again in the national spotlight. A key issue central to this debate is patient privacy, as in the post-HITECH Act era it has become easier for medical records to be electronically accessed and shared. This study analyzed a large Twitter dataset from May to December 2022 to examine the public's reactions to Roe v. Wade's overruling and its implications for privacy. Using a mixed-methods approach consisting of computational and qualitative content analysis, we found a wide range of concerns voiced from the confidentiality of patient-physician information exchange to medical records being shared without patient consent. These findings may inform policy making and healthcare industry practices concerning medical privacy related to reproductive rights and women's health.


Subject(s)
Abortion, Legal , Privacy , Pregnancy , Female , Humans , United States , Supreme Court Decisions , Confidentiality , Machine Learning
4.
Proc Natl Acad Sci U S A ; 119(51): e2206580119, 2022 Dec 20.
Article in English | MEDLINE | ID: mdl-36525536

ABSTRACT

While the gig economy provides flexible jobs for millions of workers globally, a lack of organization identity and coworker bonds contributes to their low engagement and high attrition rates. To test the impact of virtual teams on worker productivity and retention, we conduct a field experiment with 27,790 drivers on a ride-sharing platform. We organize drivers into teams that are randomly assigned to receiving their team ranking, or individual ranking within their team, or individual performance information (control). We find that treated drivers work longer hours and generate significantly higher revenue. Furthermore, drivers in the team-ranking treatment continue to be more engaged 3 mo after the end of the experiment. A machine-learning analysis of 149 team contests in 86 cities suggests that social comparison, driver experience, and within-team similarity are the key predictors of the virtual team efficacy.

5.
BMC Med Inform Decis Mak ; 21(Suppl 9): 377, 2022 04 05.
Article in English | MEDLINE | ID: mdl-35382811

ABSTRACT

BACKGROUND: Natural language processing (NLP) tasks in the health domain often deal with limited amount of labeled data due to high annotation costs and naturally rare observations. To compensate for the lack of training data, health NLP researchers often have to leverage knowledge and resources external to a task at hand. Recently, pretrained large-scale language models such as the Bidirectional Encoder Representations from Transformers (BERT) have been proven to be a powerful way of learning rich linguistic knowledge from massive unlabeled text and transferring that knowledge to downstream tasks. However, previous downstream tasks often used training data at such a large scale that is unlikely to obtain in the health domain. In this work, we aim to study whether BERT can still benefit downstream tasks when training data are relatively small in the context of health NLP. METHOD: We conducted a learning curve analysis to study the behavior of BERT and baseline models as training data size increases. We observed the classification performance of these models on two disease diagnosis data sets, where some diseases are naturally rare and have very limited observations (fewer than 2 out of 10,000). The baselines included commonly used text classification models such as sparse and dense bag-of-words models, long short-term memory networks, and their variants that leveraged external knowledge. To obtain learning curves, we incremented the amount of training examples per disease from small to large, and measured the classification performance in macro-averaged [Formula: see text] score. RESULTS: On the task of classifying all diseases, the learning curves of BERT were consistently above all baselines, significantly outperforming them across the spectrum of training data sizes. But under extreme situations where only one or two training documents per disease were available, BERT was outperformed by linear classifiers with carefully engineered bag-of-words features. CONCLUSION: As long as the amount of training documents is not extremely few, fine-tuning a pretrained BERT model is a highly effective approach to health NLP tasks like disease classification. However, in extreme cases where each class has only one or two training documents and no more will be available, simple linear models using bag-of-words features shall be considered.


Subject(s)
Learning Curve , Natural Language Processing , Humans , Language
6.
PLoS One ; 17(1): e0261262, 2022.
Article in English | MEDLINE | ID: mdl-35081111

ABSTRACT

Emotions at work have long been identified as critical signals of work motivations, status, and attitudes, and as predictors of various work-related outcomes. When more and more employees work remotely, these emotional signals of workers become harder to observe through daily, face-to-face communications. The use of online platforms to communicate and collaborate at work provides an alternative channel to monitor the emotions of workers. This paper studies how emojis, as non-verbal cues in online communications, can be used for such purposes and how the emotional signals in emoji usage can be used to predict future behavior of workers. In particular, we present how the developers on GitHub use emojis in their work-related activities. We show that developers have diverse patterns of emoji usage, which can be related to their working status including activity levels, types of work, types of communications, time management, and other behavioral patterns. Developers who use emojis in their posts are significantly less likely to dropout from the online work platform. Surprisingly, solely using emoji usage as features, standard machine learning models can predict future dropouts of developers at a satisfactory accuracy. Features related to the general use and the emotions of emojis appear to be important factors, while they do not rule out paths through other purposes of emoji use.


Subject(s)
Emotions , Facial Expression , Nonverbal Communication , Communication , Humans , Teleworking
7.
BMC Med Inform Decis Mak ; 19(Suppl 5): 238, 2019 12 05.
Article in English | MEDLINE | ID: mdl-31801534

ABSTRACT

BACKGROUND: Accurately recognizing rare diseases based on symptom description is an important task in patient triage, early risk stratification, and target therapies. However, due to the very nature of rare diseases, the lack of historical data poses a great challenge to machine learning-based approaches. On the other hand, medical knowledge in automatically constructed knowledge graphs (KGs) has the potential to compensate the lack of labeled training examples. This work aims to develop a rare disease classification algorithm that makes effective use of a knowledge graph, even when the graph is imperfect. METHOD: We develop a text classification algorithm that represents a document as a combination of a "bag of words" and a "bag of knowledge terms," where a "knowledge term" is a term shared between the document and the subgraph of KG relevant to the disease classification task. We use two Chinese disease diagnosis corpora to evaluate the algorithm. The first one, HaoDaiFu, contains 51,374 chief complaints categorized into 805 diseases. The second data set, ChinaRe, contains 86,663 patient descriptions categorized into 44 disease categories. RESULTS: On the two evaluation data sets, the proposed algorithm delivers robust performance and outperforms a wide range of baselines, including resampling, deep learning, and feature selection approaches. Both classification-based metric (macro-averaged F1 score) and ranking-based metric (mean reciprocal rank) are used in evaluation. CONCLUSION: Medical knowledge in large-scale knowledge graphs can be effectively leveraged to improve rare diseases classification models, even when the knowledge graph is incomplete.


Subject(s)
Machine Learning , Rare Diseases/classification , Algorithms , Humans , Pattern Recognition, Automated , Triage
8.
Stud Health Technol Inform ; 264: 1408-1412, 2019 Aug 21.
Article in English | MEDLINE | ID: mdl-31438158

ABSTRACT

To improve user experience, many health IT systems provide personalization options allowing end users to tailor the software to their needs and preferences. However, few studies have investigated if healthcare professionals actually make full use of this feature. As an initial step towards understanding end users' software personalization behavior in healthcare, we conducted a pilot study to examine how clinicians, staff, and researchers customized a search engine designed to facilitate information retrieval from electronic health records. The results show that a majority of the end users (82.4%) did not make an effort to modify the system's default settings. Among those who did, they more often changed its 'look-and-feel' than its functionality offerings. We conclude that future research is warranted to study the rationale underlying healthcare professionals' software personalization decisions both to optimize user experience and to avoid building complex and costly personalization options that are unused or underutilized.


Subject(s)
Electronic Health Records , Search Engine , Humans , Information Storage and Retrieval , Pilot Projects , Software
9.
J Am Med Inform Assoc ; 26(11): 1314-1322, 2019 11 01.
Article in English | MEDLINE | ID: mdl-31294792

ABSTRACT

OBJECTIVE: Active Learning (AL) attempts to reduce annotation cost (ie, time) by selecting the most informative examples for annotation. Most approaches tacitly (and unrealistically) assume that the cost for annotating each sample is identical. This study introduces a cost-aware AL method, which simultaneously models both the annotation cost and the informativeness of the samples and evaluates both via simulation and user studies. MATERIALS AND METHODS: We designed a novel, cost-aware AL algorithm (Cost-CAUSE) for annotating clinical named entities; we first utilized lexical and syntactic features to estimate annotation cost, then we incorporated this cost measure into an existing AL algorithm. Using the 2010 i2b2/VA data set, we then conducted a simulation study comparing Cost-CAUSE with noncost-aware AL methods, and a user study comparing Cost-CAUSE with passive learning. RESULTS: Our cost model fit empirical annotation data well, and Cost-CAUSE increased the simulation area under the learning curve (ALC) scores by up to 5.6% and 4.9%, compared with random sampling and alternate AL methods. Moreover, in a user annotation task, Cost-CAUSE outperformed passive learning on the ALC score and reduced annotation time by 20.5%-30.2%. DISCUSSION: Although AL has proven effective in simulations, our user study shows that a real-world environment is far more complex. Other factors have a noticeable effect on the AL method, such as the annotation accuracy of users, the tiredness of users, and even the physical and mental condition of users. CONCLUSION: Cost-CAUSE saves significant annotation cost compared to random sampling.


Subject(s)
Algorithms , Electronic Health Records/economics , Information Storage and Retrieval/economics , Natural Language Processing , Big Data , Computer Simulation , Humans , Models, Economic
10.
BMC Med Inform Decis Mak ; 19(Suppl 3): 75, 2019 04 04.
Article in English | MEDLINE | ID: mdl-30944012

ABSTRACT

BACKGROUND: Numbers and numerical concepts appear frequently in free text clinical notes from electronic health records. Knowledge of the frequent lexical variations of these numerical concepts, and their accurate identification, is important for many information extraction tasks. This paper describes an analysis of the variation in how numbers and numerical concepts are represented in clinical notes. METHODS: We used an inverted index of approximately 100 million notes to obtain the frequency of various permutations of numbers and numerical concepts, including the use of Roman numerals, numbers spelled as English words, and invalid dates, among others. Overall, twelve types of lexical variants were analyzed. RESULTS: We found substantial variation in how these concepts were represented in the notes, including multiple data quality issues. We also demonstrate that not considering these variations could have substantial real-world implications for cohort identification tasks, with one case missing > 80% of potential patients. CONCLUSIONS: Numbering within clinical notes can be variable, and not taking these variations into account could result in missing or inaccurate information for natural language processing and information retrieval tasks.


Subject(s)
Electronic Health Records , Information Storage and Retrieval , Natural Language Processing , Clinical Coding
11.
J Am Med Inform Assoc ; 25(7): 800-808, 2018 07 01.
Article in English | MEDLINE | ID: mdl-29584896

ABSTRACT

Objective: Medical word sense disambiguation (WSD) is challenging and often requires significant training with data labeled by domain experts. This work aims to develop an interactive learning algorithm that makes efficient use of expert's domain knowledge in building high-quality medical WSD models with minimal human effort. Methods: We developed an interactive learning algorithm with expert labeling instances and features. An expert can provide supervision in 3 ways: labeling instances, specifying indicative words of a sense, and highlighting supporting evidence in a labeled instance. The algorithm learns from these labels and iteratively selects the most informative instances to ask for future labels. Our evaluation used 3 WSD corpora: 198 ambiguous terms from Medical Subject Headings (MSH) as MEDLINE indexing terms, 74 ambiguous abbreviations in clinical notes from the University of Minnesota (UMN), and 24 ambiguous abbreviations in clinical notes from Vanderbilt University Hospital (VUH). For each ambiguous term and each learning algorithm, a learning curve that plots the accuracy on the test set against the number of labeled instances was generated. The area under the learning curve was used as the primary evaluation metric. Results: Our interactive learning algorithm significantly outperformed active learning, the previous fastest learning algorithm for medical WSD. Compared to active learning, it achieved 90% accuracy for the MSH corpus with 42% less labeling effort, 35% less labeling effort for the UMN corpus, and 16% less labeling effort for the VUH corpus. Conclusions: High-quality WSD models can be efficiently trained with minimal supervision by inviting experts to label informative instances and provide domain knowledge through labeling/highlighting contextual features.


Subject(s)
Algorithms , Machine Learning , Natural Language Processing , Logistic Models , Medical Records , Medical Subject Headings , Medicine , Vocabulary
12.
BMC Med Inform Decis Mak ; 17(Suppl 2): 82, 2017 Jul 05.
Article in English | MEDLINE | ID: mdl-28699546

ABSTRACT

BACKGROUND: Active learning (AL) has shown the promising potential to minimize the annotation cost while maximizing the performance in building statistical natural language processing (NLP) models. However, very few studies have investigated AL in a real-life setting in medical domain. METHODS: In this study, we developed the first AL-enabled annotation system for clinical named entity recognition (NER) with a novel AL algorithm. Besides the simulation study to evaluate the novel AL algorithm, we further conducted user studies with two nurses using this system to assess the performance of AL in real world annotation processes for building clinical NER models. RESULTS: The simulation results show that the novel AL algorithm outperformed traditional AL algorithm and random sampling. However, the user study tells a different story that AL methods did not always perform better than random sampling for different users. CONCLUSIONS: We found that the increased information content of actively selected sentences is strongly offset by the increased time required to annotate them. Moreover, the annotation time was not considered in the querying algorithms. Our future work includes developing better AL algorithms with the estimation of annotation time and evaluating the system with larger number of users.


Subject(s)
Medical Informatics , Natural Language Processing , Problem-Based Learning , Computer Simulation , Humans
13.
J Biomed Inform ; 67: 1-10, 2017 03.
Article in English | MEDLINE | ID: mdl-28131722

ABSTRACT

OBJECTIVE: The utility of biomedical information retrieval environments can be severely limited when users lack expertise in constructing effective search queries. To address this issue, we developed a computer-based query recommendation algorithm that suggests semantically interchangeable terms based on an initial user-entered query. In this study, we assessed the value of this approach, which has broad applicability in biomedical information retrieval, by demonstrating its application as part of a search engine that facilitates retrieval of information from electronic health records (EHRs). MATERIALS AND METHODS: The query recommendation algorithm utilizes MetaMap to identify medical concepts from search queries and indexed EHR documents. Synonym variants from UMLS are used to expand the concepts along with a synonym set curated from historical EHR search logs. The empirical study involved 33 clinicians and staff who evaluated the system through a set of simulated EHR search tasks. User acceptance was assessed using the widely used technology acceptance model. RESULTS: The search engine's performance was rated consistently higher with the query recommendation feature turned on vs. off. The relevance of computer-recommended search terms was also rated high, and in most cases the participants had not thought of these terms on their own. The questions on perceived usefulness and perceived ease of use received overwhelmingly positive responses. A vast majority of the participants wanted the query recommendation feature to be available to assist in their day-to-day EHR search tasks. DISCUSSION AND CONCLUSION: Challenges persist for users to construct effective search queries when retrieving information from biomedical documents including those from EHRs. This study demonstrates that semantically-based query recommendation is a viable solution to addressing this challenge.


Subject(s)
Algorithms , Electronic Health Records , Natural Language Processing , Search Engine , Humans , Information Storage and Retrieval , Semantics
14.
AMIA Annu Symp Proc ; 2017: 820-829, 2017.
Article in English | MEDLINE | ID: mdl-29854148

ABSTRACT

Social media are important platforms for risk communication during public health crises. Effective dissemination of accurate, relevant, and up-to-date health information is important for the public to raise awareness and develop risk management strategies. This study investigates Zika virus-related information circulated on Twitter, identifying the patterns of dissemination of popular tweets and tweets from public health authorities such as the CDC. We leveraged a large corpus of Twitter data covering the entire year of 2016. We analyzed the data using quantitative and qualitative content analyses, followed by machine learning to scale the manual content analyses to the corpus. The results revealed possible discrepancies between what the general public was most interested in, or concerned about, and what public health authorities provided during the Zika outbreak. We provide implications for public health authorities to improve risk communication through better alignment with the general public's information needs during public health crises.


Subject(s)
Consumer Health Information/methods , Disease Outbreaks , Information Dissemination , Machine Learning , Public Health Practice , Social Media , Zika Virus Infection , Communication , Humans , Risk , Zika Virus , Zika Virus Infection/epidemiology
15.
Proc Natl Acad Sci U S A ; 113(52): 14944-14948, 2016 12 27.
Article in English | MEDLINE | ID: mdl-27974610

ABSTRACT

This paper reports the results of a large-scale field experiment designed to test the hypothesis that group membership can increase participation and prosocial lending for an online crowdlending community, Kiva. The experiment uses variations on a simple email manipulation to encourage Kiva members to join a lending team, testing which types of team recommendation emails are most likely to get members to join teams as well as the subsequent impact on lending. We find that emails do increase the likelihood that a lender joins a team, and that joining a team increases lending in a short window (1 wk) following our intervention. The impact on lending is large relative to median lender lifetime loans. We also find that lenders are more likely to join teams recommended based on location similarity rather than team status. Our results suggest team recommendation can be an effective behavioral mechanism to increase prosocial lending.

16.
AMIA Annu Symp Proc ; 2016: 2062-2071, 2016.
Article in English | MEDLINE | ID: mdl-28269966

ABSTRACT

Resolving word ambiguity in clinical text is critical for many natural language processing applications. Effective word sense disambiguation (WSD) systems rely on training a machine learning based classifier with abundant clinical text that is accurately annotated, the creation of which can be costly and time-consuming. We describe a double-loop interactive machine learning process, named ReQ-ReC (ReQuery-ReClassify), and demonstrate its effectiveness on multiple evaluation corpora. Using ReQ-ReC, a human expert first uses her domain knowledge to include sense-specific contextual words into the ReQuery loops and searches for instances relevant to the senses. Then, in the ReClassify loops, the expert only annotates the most ambiguous instances found by the current WSD model. Even with machine-generated queries only, the framework is comparable with or faster than current active learning methods in building WSD models. The process can be further accelerated when human experts use their domain knowledge to guide the search process.


Subject(s)
Machine Learning , Natural Language Processing , Algorithms , Humans , Models, Theoretical , Problem-Based Learning
17.
JMIR Diabetes ; 1(2): e4, 2016 Nov 07.
Article in English | MEDLINE | ID: mdl-30291053

ABSTRACT

BACKGROUND: Use of social media is becoming ubiquitous, and disease-related communities are forming online, including communities of interest around diabetes. OBJECTIVE: Our objective was to examine diabetes-related participation on Twitter by describing the frequency and timing of diabetes-related tweets, the geography of tweets, and the types of participants over a 2-year sample of 10% of all tweets. METHODS: We identified tweets with diabetes-related search terms and hashtags in a dataset of 29.6 billion tweets for the years 2013 and 2014 and extracted the text, time, location, retweet, and user information. We assessed the frequencies of tweets used across different search terms and hashtags by month and day of week and, for tweets that provided location information, by country. We also performed these analyses for a subset of tweets that used the hashtag #dsma, a social media advocacy community focused on diabetes. Random samples of user profiles in the 2 groups were also drawn and reviewed to understand the types of stakeholders participating online. RESULTS: We found 1,368,575 diabetes-related tweets based on diabetes-related terms and hashtags. There was a seasonality to tweets; a higher proportion occurred during the month of November, which is when World Diabetes Day occurs. The subset of tweets with the #dsma were most frequent on Thursdays (coordinated universal time), which is consistent with the timing of a weekly chat organized by this online community. Approximately 2% of tweets carried geolocation information and were most prominent in the United States (on the east and west coasts), followed by Indonesia and the United Kingdom. For the user profiles randomly selected among overall tweets, we could not identify a relationship to diabetes for the majority of users; for the profiles using the #dsma hashtag, we found that patients with type 1 diabetes and their caregivers represented the largest proportion of individuals. CONCLUSIONS: Twitter is increasingly becoming a space for online conversations about diabetes. Further qualitative and quantitative content analysis is needed to understand the nature and purpose of these conversations.

18.
J Am Med Inform Assoc ; 23(2): 269-75, 2016 Mar.
Article in English | MEDLINE | ID: mdl-26269536

ABSTRACT

OBJECTIVE: ClinicalTrials.gov serves critical functions of disseminating trial information to the public and helping the trials recruit participants. This study assessed the readability of trial descriptions at ClinicalTrials.gov using multiple quantitative measures. MATERIALS AND METHODS: The analysis included all 165,988 trials registered at ClinicalTrials.gov as of April 30, 2014. To obtain benchmarks, the authors also analyzed 2 other medical corpora: (1) all 955 Health Topics articles from MedlinePlus and (2) a random sample of 100,000 clinician notes retrieved from an electronic health records system intended for conveying internal communication among medical professionals. The authors characterized each of the corpora using 4 surface metrics, and then applied 5 different scoring algorithms to assess their readability. The authors hypothesized that clinician notes would be most difficult to read, followed by trial descriptions and MedlinePlus Health Topics articles. RESULTS: Trial descriptions have the longest average sentence length (26.1 words) across all corpora; 65% of their words used are not covered by a basic medical English dictionary. In comparison, average sentence length of MedlinePlus Health Topics articles is 61% shorter, vocabulary size is 95% smaller, and dictionary coverage is 46% higher. All 5 scoring algorithms consistently rated CliniclTrials.gov trial descriptions the most difficult corpus to read, even harder than clinician notes. On average, it requires 18 years of education to properly understand these trial descriptions according to the results generated by the readability assessment algorithms. DISCUSSION AND CONCLUSION: Trial descriptions at CliniclTrials.gov are extremely difficult to read. Significant work is warranted to improve their readability in order to achieve CliniclTrials.gov's goal of facilitating information dissemination and subject recruitment.


Subject(s)
Clinical Trials as Topic , Comprehension , Databases, Factual , Vocabulary , Algorithms , Analysis of Variance , Consumer Health Information , MedlinePlus , Terminology as Topic
19.
J Am Med Inform Assoc ; 23(2): 356-65, 2016 Mar.
Article in English | MEDLINE | ID: mdl-26224335

ABSTRACT

OBJECTIVE: Traditional Chinese medicine (TCM) is a unique and complex medical system that has developed over thousands of years. This article studies the problem of automatically extracting meaningful relations of entities from TCM literature, for the purposes of assisting clinical treatment or poly-pharmacology research and promoting the understanding of TCM in Western countries. METHODS: Instead of separately extracting each relation from a single sentence or document, we propose to collectively and globally extract multiple types of relations (eg, herb-syndrome, herb-disease, formula-syndrome, formula-disease, and syndrome-disease relations) from the entire corpus of TCM literature, from the perspective of network mining. In our analysis, we first constructed heterogeneous entity networks from the TCM literature, in which each edge is a candidate relation, then used a heterogeneous factor graph model (HFGM) to simultaneously infer the existence of all the edges. We also employed a semi-supervised learning algorithm estimate the model's parameters. RESULTS: We performed our method to extract relations from a large dataset consisting of more than 100,000 TCM article abstracts. Our results show that the performance of the HFGM at extracting all types of relations from TCM literature was significantly better than a traditional support vector machine (SVM) classifier (increasing the average precision by 11.09%, the recall by 13.83%, and the F1-measure by 12.47% for different types of relations, compared with a traditional SVM classifier). CONCLUSION: This study exploits the power of collective inference and proposes an HFGM based on heterogeneous entity networks, which significantly improved our ability to extract relations from TCM literature.


Subject(s)
Information Storage and Retrieval/methods , Medicine, Chinese Traditional , Support Vector Machine , Datasets as Topic , Humans
20.
J Biomed Inform ; 58: 11-18, 2015 Dec.
Article in English | MEDLINE | ID: mdl-26385377

ABSTRACT

OBJECTIVES: Named entity recognition (NER), a sequential labeling task, is one of the fundamental tasks for building clinical natural language processing (NLP) systems. Machine learning (ML) based approaches can achieve good performance, but they often require large amounts of annotated samples, which are expensive to build due to the requirement of domain experts in annotation. Active learning (AL), a sample selection approach integrated with supervised ML, aims to minimize the annotation cost while maximizing the performance of ML-based models. In this study, our goal was to develop and evaluate both existing and new AL methods for a clinical NER task to identify concepts of medical problems, treatments, and lab tests from the clinical notes. METHODS: Using the annotated NER corpus from the 2010 i2b2/VA NLP challenge that contained 349 clinical documents with 20,423 unique sentences, we simulated AL experiments using a number of existing and novel algorithms in three different categories including uncertainty-based, diversity-based, and baseline sampling strategies. They were compared with the passive learning that uses random sampling. Learning curves that plot performance of the NER model against the estimated annotation cost (based on number of sentences or words in the training set) were generated to evaluate different active learning and the passive learning methods and the area under the learning curve (ALC) score was computed. RESULTS: Based on the learning curves of F-measure vs. number of sentences, uncertainty sampling algorithms outperformed all other methods in ALC. Most diversity-based methods also performed better than random sampling in ALC. To achieve an F-measure of 0.80, the best method based on uncertainty sampling could save 66% annotations in sentences, as compared to random sampling. For the learning curves of F-measure vs. number of words, uncertainty sampling methods again outperformed all other methods in ALC. To achieve 0.80 in F-measure, in comparison to random sampling, the best uncertainty based method saved 42% annotations in words. But the best diversity based method reduced only 7% annotation effort. CONCLUSION: In the simulated setting, AL methods, particularly uncertainty-sampling based approaches, seemed to significantly save annotation cost for the clinical NER task. The actual benefit of active learning in clinical NER should be further evaluated in a real-time setting.


Subject(s)
Learning , Machine Learning , Humans , Natural Language Processing
SELECTION OF CITATIONS
SEARCH DETAIL
...