Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
Add more filters










Database
Language
Publication year range
1.
Front Artif Intell ; 5: 826207, 2022.
Article in English | MEDLINE | ID: mdl-35514953

ABSTRACT

Stereotypes are encountered every day, in interpersonal communication as well as in entertainment, news stories, and on social media. In this study, we present a computational method to mine large, naturally occurring datasets of text for sentences that express perceptions of a social group of interest, and then map these sentences to the two-dimensional plane of perceived warmth and competence for comparison and interpretation. This framework is grounded in established social psychological theory, and validated against both expert annotation and crowd-sourced stereotype data. Additionally, we present two case studies of how the model might be used to answer questions using data "in-the-wild," by collecting Twitter data about women and older adults. Using the data about women, we are able to observe how sub-categories of women (e.g., Black women and white women) are described similarly and differently from each other, and from the superordinate group of women in general. Using the data about older adults, we show evidence that the terms people use to label a group (e.g., old people vs. senior citizens) are associated with different stereotype content. We propose that this model can be used by other researchers to explore questions of how stereotypes are expressed in various large text corpora.

2.
J Am Med Inform Assoc ; 25(10): 1274-1283, 2018 10 01.
Article in English | MEDLINE | ID: mdl-30272184

ABSTRACT

Objective: We executed the Social Media Mining for Health (SMM4H) 2017 shared tasks to enable the community-driven development and large-scale evaluation of automatic text processing methods for the classification and normalization of health-related text from social media. An additional objective was to publicly release manually annotated data. Materials and Methods: We organized 3 independent subtasks: automatic classification of self-reports of 1) adverse drug reactions (ADRs) and 2) medication consumption, from medication-mentioning tweets, and 3) normalization of ADR expressions. Training data consisted of 15 717 annotated tweets for (1), 10 260 for (2), and 6650 ADR phrases and identifiers for (3); and exhibited typical properties of social-media-based health-related texts. Systems were evaluated using 9961, 7513, and 2500 instances for the 3 subtasks, respectively. We evaluated performances of classes of methods and ensembles of system combinations following the shared tasks. Results: Among 55 system runs, the best system scores for the 3 subtasks were 0.435 (ADR class F1-score) for subtask-1, 0.693 (micro-averaged F1-score over two classes) for subtask-2, and 88.5% (accuracy) for subtask-3. Ensembles of system combinations obtained best scores of 0.476, 0.702, and 88.7%, outperforming individual systems. Discussion: Among individual systems, support vector machines and convolutional neural networks showed high performance. Performance gains achieved by ensembles of system combinations suggest that such strategies may be suitable for operational systems relying on difficult text classification tasks (eg, subtask-1). Conclusions: Data imbalance and lack of context remain challenges for natural language processing of social media text. Annotated data from the shared task have been made available as reference standards for future studies (http://dx.doi.org/10.17632/rxwfb3tysd.1).


Subject(s)
Drug-Related Side Effects and Adverse Reactions/classification , Natural Language Processing , Neural Networks, Computer , Social Media/classification , Support Vector Machine , Data Mining/methods , Humans , Pharmacovigilance
3.
J Biomed Inform ; 46(2): 275-85, 2013 Apr.
Article in English | MEDLINE | ID: mdl-23380683

ABSTRACT

This paper addresses an information-extraction problem that aims to identify semantic relations among medical concepts (problems, tests, and treatments) in clinical text. The objectives of the paper are twofold. First, we extend an earlier one-page description (appearing as a part of [5]) of a top-ranked model in the 2010 I2B2 NLP Challenge to a necessary level of details, with the belief that feature design is the most crucial factor to the success of our system and hence deserves a more detailed discussion. We present a precise quantification of the contributions of a wide variety of knowledge sources. In addition, we show the end-to-end results obtained on the noisy output of a top-ranked concept detector, which could help construct a more complete view of the state of the art in the real-world scenario. As the second major objective, we reformulate our models into a composite-kernel framework and present the best result, according to our knowledge, on the same dataset.


Subject(s)
Data Mining/methods , Electronic Health Records , Natural Language Processing , Semantics , Algorithms , Artificial Intelligence , Databases, Factual , Humans
4.
J Am Med Inform Assoc ; 18(5): 557-62, 2011.
Article in English | MEDLINE | ID: mdl-21565856

ABSTRACT

OBJECTIVE: As clinical text mining continues to mature, its potential as an enabling technology for innovations in patient care and clinical research is becoming a reality. A critical part of that process is rigid benchmark testing of natural language processing methods on realistic clinical narrative. In this paper, the authors describe the design and performance of three state-of-the-art text-mining applications from the National Research Council of Canada on evaluations within the 2010 i2b2 challenge. DESIGN: The three systems perform three key steps in clinical information extraction: (1) extraction of medical problems, tests, and treatments, from discharge summaries and progress notes; (2) classification of assertions made on the medical problems; (3) classification of relations between medical concepts. Machine learning systems performed these tasks using large-dimensional bags of features, as derived from both the text itself and from external sources: UMLS, cTAKES, and Medline. MEASUREMENTS: Performance was measured per subtask, using micro-averaged F-scores, as calculated by comparing system annotations with ground-truth annotations on a test set. RESULTS: The systems ranked high among all submitted systems in the competition, with the following F-scores: concept extraction 0.8523 (ranked first); assertion detection 0.9362 (ranked first); relationship detection 0.7313 (ranked second). CONCLUSION: For all tasks, we found that the introduction of a wide range of features was crucial to success. Importantly, our choice of machine learning algorithms allowed us to be versatile in our feature design, and to introduce a large number of features without overfitting and without encountering computing-resource bottlenecks.


Subject(s)
Benchmarking , Data Mining , Electronic Health Records , Natural Language Processing , Algorithms , Artificial Intelligence , Canada , Data Mining/classification , Electronic Health Records/classification , Humans
5.
BMC Med Inform Decis Mak ; 10: 56, 2010 Sep 28.
Article in English | MEDLINE | ID: mdl-20920176

ABSTRACT

BACKGROUND: Clinical trials are one of the most important sources of evidence for guiding evidence-based practice and the design of new trials. However, most of this information is available only in free text - e.g., in journal publications - which is labour intensive to process for systematic reviews, meta-analyses, and other evidence synthesis studies. This paper presents an automatic information extraction system, called ExaCT, that assists users with locating and extracting key trial characteristics (e.g., eligibility criteria, sample size, drug dosage, primary outcomes) from full-text journal articles reporting on randomized controlled trials (RCTs). METHODS: ExaCT consists of two parts: an information extraction (IE) engine that searches the article for text fragments that best describe the trial characteristics, and a web browser-based user interface that allows human reviewers to assess and modify the suggested selections. The IE engine uses a statistical text classifier to locate those sentences that have the highest probability of describing a trial characteristic. Then, the IE engine's second stage applies simple rules to these sentences to extract text fragments containing the target answer. The same approach is used for all 21 trial characteristics selected for this study. RESULTS: We evaluated ExaCT using 50 previously unseen articles describing RCTs. The text classifier (first stage) was able to recover 88% of relevant sentences among its top five candidates (top5 recall) with the topmost candidate being relevant in 80% of cases (top1 precision). Precision and recall of the extraction rules (second stage) were 93% and 91%, respectively. Together, the two stages of the extraction engine were able to provide (partially) correct solutions in 992 out of 1050 test tasks (94%), with a majority of these (696) representing fully correct and complete answers. CONCLUSIONS: Our experiments confirmed the applicability and efficacy of ExaCT. Furthermore, they demonstrated that combining a statistical method with 'weak' extraction rules can identify a variety of study characteristics. The system is flexible and can be extended to handle other characteristics and document types (e.g., study protocols).


Subject(s)
Information Storage and Retrieval/methods , Periodicals as Topic , Randomized Controlled Trials as Topic , Humans , Information Storage and Retrieval/standards , Reproducibility of Results
6.
AMIA Annu Symp Proc ; : 141-5, 2008 Nov 06.
Article in English | MEDLINE | ID: mdl-18999067

ABSTRACT

Clinical trials are one of the most valuable sources of scientific evidence for improving the practice of medicine. The Trial Bank project aims to improve structured access to trial findings by including formalized trial information into a knowledge base. Manually extracting trial information from published articles is costly, but automated information extraction techniques can assist. The current study highlights a single architecture to extract a wide array of information elements from full-text publications of randomized clinical trials (RCTs). This architecture combines a text classifier with a weak regular expression matcher. We tested this two-stage architecture on 88 RCT reports from 5 leading medical journals, extracting 23 elements of key trial information such as eligibility rules, sample size, intervention, and outcome names. Results prove this to be a promising avenue to help critical appraisers, systematic reviewers, and curators quickly identify key information elements in published RCT articles.


Subject(s)
Artificial Intelligence , Evidence-Based Medicine/methods , Information Storage and Retrieval/methods , Manuscripts, Medical as Topic , Natural Language Processing , Pattern Recognition, Automated/methods , Randomized Controlled Trials as Topic , Research Design , Abstracting and Indexing/methods , Internationality
SELECTION OF CITATIONS
SEARCH DETAIL
...