RESUMO
PURPOSE: To date, there are no automated tools for the identification and fine-grained classification of paraphasias within discourse, the production of which is the hallmark characteristic of most people with aphasia (PWA). In this work, we fine-tune a large language model (LLM) to automatically predict paraphasia targets in Cinderella story retellings. METHOD: Data consisted of 332 Cinderella story retellings containing 2,489 paraphasias from PWA, for which research assistants identified their intended targets. We supplemented these training data with 256 sessions from control participants, to which we added 2,415 synthetic paraphasias. We conducted four experiments using different training data configurations to fine-tune the LLM to automatically "fill in the blank" of the paraphasia with a predicted target, given the context of the rest of the story retelling. We tested the experiments' predictions against our human-identified targets and stratified our results by ambiguity of the targets and clinical factors. RESULTS: The model trained on controls and PWA achieved 50.7% accuracy at exactly matching the human-identified target. Fine-tuning on PWA data, with or without controls, led to comparable performance. The model performed better on targets with less human ambiguity and on paraphasias from participants with fluent or less severe aphasia. CONCLUSIONS: We were able to automatically identify the intended target of paraphasias in discourse using just the surrounding language about half of the time. These findings take us a step closer to automatic aphasic discourse analysis. In future work, we will incorporate phonological information from the paraphasia to further improve predictive utility. SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.24463543.
Assuntos
Afasia , Idioma , Humanos , Afasia/diagnóstico , LinguísticaRESUMO
PURPOSE: Item response theory (IRT) is a modern psychometric framework with several advantageous properties as compared with classical test theory. IRT has been successfully used to model performance on anomia tests in individuals with aphasia; however, all efforts to date have focused on noun production accuracy. The purpose of this study is to evaluate whether the Verb Naming Test (VNT), a prominent test of action naming, can be successfully modeled under IRT and evaluate its reliability. METHOD: We used responses on the VNT from 107 individuals with chronic aphasia from AphasiaBank. Unidimensionality and local independence, two assumptions prerequisite to IRT modeling, were evaluated using factor analysis and Yen's Q 3 statistic (Yen, 1984), respectively. The assumption of equal discrimination among test items was evaluated statistically via nested model comparisons and practically by using correlations of resulting IRT-derived scores. Finally, internal consistency, marginal and empirical reliability, and conditional reliability were evaluated. RESULTS: The VNT was found to be sufficiently unidimensional with the majority of item pairs demonstrating adequate local independence. An IRT model in which item discriminations are constrained to be equal demonstrated fit equivalent to a model in which unique discrimination parameters were estimated for each item. All forms of reliability were strong across the majority of IRT ability estimates. CONCLUSIONS: Modeling the VNT using IRT is feasible, yielding ability estimates that are both informative and reliable. Future efforts are needed to quantify the validity of the VNT under IRT and determine the extent to which it measures the same construct as other anomia tests. SUPPLEMENTAL MATERIAL: https://doi.org/10.23641/asha.22329235.
Assuntos
Anomia , Humanos , Anomia/diagnóstico , Reprodutibilidade dos Testes , Análise Fatorial , PsicometriaRESUMO
PURPOSE: ParAlg (Paraphasia Algorithms) is a software that automatically categorizes a person with aphasia's naming error (paraphasia) in relation to its intended target on a picture-naming test. These classifications (based on lexicality as well as semantic, phonological, and morphological similarity to the target) are important for characterizing an individual's word-finding deficits or anomia. In this study, we applied a modern language model called BERT (Bidirectional Encoder Representations from Transformers) as a semantic classifier and evaluated its performance against ParAlg's original word2vec model. METHOD: We used a set of 11,999 paraphasias produced during the Philadelphia Naming Test. We trained ParAlg with word2vec or BERT and compared their performance to humans. Finally, we evaluated BERT's performance in terms of word-sense selection and conducted an item-level discrepancy analysis to identify which aspects of semantic similarity are most challenging to classify. RESULTS: Compared with word2vec, BERT qualitatively reduced word-sense issues and quantitatively reduced semantic classification errors by almost half. A large percentage of errors were attributable to semantic ambiguity. Of the possible semantic similarity subtypes, responses that were associated with or category coordinates of the intended target were most likely to be misclassified by both models and humans alike. CONCLUSIONS: BERT outperforms word2vec as a semantic classifier, partially due to its superior handling of polysemy. This work is an important step for further establishing ParAlg as an accurate assessment tool.