Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
Mais filtros

Bases de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
J Cardiovasc Magn Reson ; 26(1): 101035, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38460841

RESUMO

BACKGROUND: Patients are increasingly using Generative Pre-trained Transformer 4 (GPT-4) to better understand their own radiology findings. PURPOSE: To evaluate the performance of GPT-4 in transforming cardiovascular magnetic resonance (CMR) reports into text that is comprehensible to medical laypersons. METHODS: ChatGPT with GPT-4 architecture was used to generate three different explained versions of 20 various CMR reports (n = 60) using the same prompt: "Explain the radiology report in a language understandable to a medical layperson". Two cardiovascular radiologists evaluated understandability, factual correctness, completeness of relevant findings, and lack of potential harm, while 13 medical laypersons evaluated the understandability of the original and the GPT-4 reports on a Likert scale (1 "strongly disagree", 5 "strongly agree"). Readability was measured using the Automated Readability Index (ARI). Linear mixed-effects models (values given as median [interquartile range]) and intraclass correlation coefficient (ICC) were used for statistical analysis. RESULTS: GPT-4 reports were generated on average in 52 s ± 13. GPT-4 reports achieved a lower ARI score (10 [9-12] vs 5 [4-6]; p < 0.001) and were subjectively easier to understand for laypersons than original reports (1 [1] vs 4 [4,5]; p < 0.001). Eighteen out of 20 (90%) standard CMR reports and 2/60 (3%) GPT-generated reports had an ARI score corresponding to the 8th grade level or higher. Radiologists' ratings of the GPT-4 reports reached high levels for correctness (5 [4, 5]), completeness (5 [5]), and lack of potential harm (5 [5]); with "strong agreement" for factual correctness in 94% (113/120) and completeness of relevant findings in 81% (97/120) of reports. Test-retest agreement for layperson understandability ratings between the three simplified reports generated from the same original report was substantial (ICC: 0.62; p < 0.001). Interrater agreement between radiologists was almost perfect for lack of potential harm (ICC: 0.93, p < 0.001) and moderate to substantial for completeness (ICC: 0.76, p < 0.001) and factual correctness (ICC: 0.55, p < 0.001). CONCLUSION: GPT-4 can reliably transform complex CMR reports into more understandable, layperson-friendly language while largely maintaining factual correctness and completeness, and can thus help convey patient-relevant radiology information in an easy-to-understand manner.


Assuntos
Compreensão , Imageamento por Ressonância Magnética , Valor Preditivo dos Testes , Humanos , Reprodutibilidade dos Testes , Variações Dependentes do Observador , Letramento em Saúde , Educação de Pacientes como Assunto , Doenças Cardiovasculares/diagnóstico por imagem , Feminino , Masculino
2.
J Biomed Inform ; 158: 104727, 2024 Sep 16.
Artigo em Inglês | MEDLINE | ID: mdl-39293643

RESUMO

OBJECTIVE: The reading level of health educational materials significantly influences the understandability and accessibility of the information, particularly for minoritized populations. Many patient educational resources surpass widely accepted standards for reading level and complexity. There is a critical need for high-performing text simplification models for health information to enhance dissemination and literacy. This need is particularly acute in cancer education, where effective prevention and screening education can substantially reduce morbidity and mortality. METHODS: We introduce Simplified Digestive Cancer (SimpleDC), a parallel corpus of cancer education materials tailored for health text simplification research, comprising educational content from the American Cancer Society, Centers for Disease Control and Prevention, and National Cancer Institute. The corpus includes 31 web pages with the corresponding manually simplified versions. It consists of 1183 annotated sentence pairs (361 train, 294 development, and 528 test). Utilizing SimpleDC and the existing Med-EASi corpus, we explore Large Language Model (LLM)-based simplification methods, including fine-tuning, reinforcement learning (RL), reinforcement learning with human feedback (RLHF), domain adaptation, and prompt-based approaches. Our experimentation encompasses Llama 2, Llama 3, and GPT-4. We introduce a novel RLHF reward function featuring a lightweight model adept at distinguishing between original and simplified texts when enables training on unlabeled data. RESULTS: Fine-tuned Llama models demonstrated high performance across various metrics. Our RLHF reward function outperformed existing RL text simplification reward functions. The results underscore that RL/RLHF can achieve performance comparable to fine-tuning and improve the performance of fine-tuned models. Additionally, these methods effectively adapt out-of-domain text simplification models to a target domain. The best-performing RL-enhanced Llama models outperformed GPT-4 in both automatic metrics and manual evaluation by subject matter experts. CONCLUSION: The newly developed SimpleDC corpus will serve as a valuable asset to the research community, particularly in patient education simplification. The RL/RLHF methodologies presented herein enable effective training of simplification models on unlabeled text and the utilization of out-of-domain simplification corpora.

3.
J Med Internet Res ; 20(8): e10779, 2018 08 02.
Artigo em Inglês | MEDLINE | ID: mdl-30072361

RESUMO

BACKGROUND: While health literacy is important for people to maintain good health and manage diseases, medical educational texts are often written beyond the reading level of the average individual. To mitigate this disconnect, text simplification research provides methods to increase readability and, therefore, comprehension. One method of text simplification is to isolate particularly difficult terms within a document and replace them with easier synonyms (lexical simplification) or an explanation in plain language (semantic simplification). Unfortunately, existing dictionaries are seldom complete, and consequently, resources for many difficult terms are unavailable. This is the case for English and Spanish resources. OBJECTIVE: Our objective was to automatically generate explanations for difficult terms in both English and Spanish when they are not covered by existing resources. The system we present combines existing resources for explanation generation using a novel algorithm (SubSimplify) to create additional explanations. METHODS: SubSimplify uses word-level parsing techniques and specialized medical affix dictionaries to identify the morphological units of a term and then source their definitions. While the underlying resources are different, SubSimplify applies the same principles in both languages. To evaluate our approach, we used term familiarity to identify difficult terms in English and Spanish and then generated explanations for them. For each language, we extracted 400 difficult terms from two different article types (General and Medical topics) balanced for frequency. For English terms, we compared SubSimplify's explanation with the explanations from the Consumer Health Vocabulary, WordNet Synonyms and Summaries, as well as Word Embedding Vector (WEV) synonyms. For Spanish terms, we compared the explanation to WordNet Summaries and WEV Embedding synonyms. We evaluated quality, coverage, and usefulness for the simplification provided for each term. Quality is the average score from two subject experts on a 1-4 Likert scale (two per language) for the synonyms or explanations provided by the source. Coverage is the number of terms for which a source could provide an explanation. Usefulness is the same expert score, however, with a 0 assigned when no explanations or synonyms were available for a term. RESULTS: SubSimplify resulted in quality scores of 1.64 for English (P<.001) and 1.49 for Spanish (P<.001), which were lower than those of existing resources (Consumer Health Vocabulary [CHV]=2.81). However, in coverage, SubSimplify outperforms all existing written resources, increasing the coverage from 53.0% to 80.5% in English and from 20.8% to 90.8% in Spanish (P<.001). This result means that the usefulness score of SubSimplify (1.32; P<.001) is greater than that of most existing resources (eg, CHV=0.169). CONCLUSIONS: Our approach is intended as an additional resource to existing, manually created resources. It greatly increases the number of difficult terms for which an easier alternative can be made available, resulting in greater actual usefulness.


Assuntos
Letramento em Saúde/métodos , Semântica , Algoritmos , Compreensão , Humanos , Idioma , Estudos de Validação como Assunto
4.
J Cancer Educ ; 33(1): 134-140, 2018 02.
Artigo em Inglês | MEDLINE | ID: mdl-27271268

RESUMO

People with relatively limited English language proficiency find the Internet's cancer and health information difficult to access and understand. The presence of unfamiliar words and complex grammar make this particularly difficult for Deaf people. Unfortunately, current technology does not support low-cost, accurate translations of online materials into American Sign Language. However, current technology is relatively more advanced in allowing text simplification, while retaining content. This research team developed a two-step approach for simplifying cancer and other health text. They then tested the approach, using a crossover design with a sample of 36 deaf and 38 hearing college students. Results indicated that hearing college students did well on both the original and simplified text versions. Deaf college students' comprehension, in contrast, significantly benefitted from the simplified text. This two-step translation process offers a strategy that may improve the accessibility of Internet information for Deaf, as well as other low-literacy individuals.


Assuntos
Compreensão , Informação de Saúde ao Consumidor , Internet , Alfabetização , Neoplasias , Pessoas com Deficiência Auditiva , Língua de Sinais , Traduções , Estudos Cross-Over , Surdez , Feminino , Humanos , Masculino , Estudantes , Estados Unidos , Universidades , Adulto Jovem
5.
J Biomed Inform ; 69: 55-62, 2017 05.
Artigo em Inglês | MEDLINE | ID: mdl-28342946

RESUMO

Many different text features influence text readability and content comprehension. Negation is commonly suggested as one such feature, but few general-purpose tools exist to discover negation and studies of the impact of negation on text readability are rare. In this paper, we introduce a new negation parser (NegAIT) for detecting morphological, sentential, and double negation. We evaluated the parser using a human annotated gold standard containing 500 Wikipedia sentences and achieved 95%, 89% and 67% precision with 100%, 80%, and 67% recall, respectively. We also investigate two applications of this new negation parser. First, we performed a corpus statistics study to demonstrate different negation usage in easy and difficult text. Negation usage was compared in six corpora: patient blogs (4K sentences), Cochrane reviews (91K sentences), PubMed abstracts (20K sentences), clinical trial texts (48K sentences), and English and Simple English Wikipedia articles for different medical topics (60K and 6K sentences). The most difficult text contained the least negation. However, when comparing negation types, difficult texts (i.e., Cochrane, PubMed, English Wikipedia and clinical trials) contained significantly (p<0.01) more morphological negations. Second, we conducted a predictive analytics study to show the importance of negation in distinguishing between easy and difficulty text. Five binary classifiers (Naïve Bayes, SVM, decision tree, logistic regression and linear regression) were trained using only negation information. All classifiers achieved better performance than the majority baseline. The Naïve Bayes' classifier achieved the highest accuracy at 77% (9% higher than the majority baseline).


Assuntos
Curadoria de Dados , Processamento de Linguagem Natural , Software , Teorema de Bayes , Compreensão , Humanos , Idioma , Informática Médica/métodos
6.
IT Prof ; 18(3): 45-51, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27698611

RESUMO

Limited health literacy is a barrier to understanding health information. Simplifying text can reduce this barrier and possibly other known disparities in health. Unfortunately, few tools exist to simplify text with demonstrated impact on comprehension. By leveraging modern data sources integrated with natural language processing algorithms, we are developing the first semi-automated text simplification tool. We present two main contributions. First, we introduce our evidence-based development strategy for designing effective text simplification software and summarize initial, promising results. Second, we present a new study examining existing readability formulas, which are the most commonly used tools for text simplification in healthcare. We compare syllable count, the proxy for word difficulty used by most readability formulas, with our new metric 'term familiarity' and find that syllable count measures how difficult words 'appear' to be, but not their actual difficulty. In contrast, term familiarity can be used to measure actual difficulty.

7.
Cureus ; 16(3): e55304, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38559518

RESUMO

INTRODUCTION: AI chatbots are being increasingly used in healthcare settings. There is growing interest in using AI to assist in patient education. Currently, extensive healthcare information is found online but is often too complex to understand. Our objective is to determine if physicians can recommend the free version of ChatGPT version 3.5 (OpenAI, San Francisco, CA, USA) for patients to simplify text from the American Academy of Ophthalmology (AAO) in English and Spanish. This version of ChatGPT was assessed in this study due to its increased accessibility across various patient populations. METHODS: Fifteen articles were chosen from AAO in both languages and simplified with ChatGPT 10 times each. The readability of original and simplified articles was assessed with the Flesch Reading Ease and Gunning Fog Index for English and Fernández Huerta, Gutiérrez, Szigriszt-Pazo, INFLESZ, and Legibilidad-µ for Spanish. Grade levels to assess readability were calculated with Flesch Kincaid Grade Level and Crawford Nivel-de-Grado. Mean, standard deviation, and two-tailed t-tests were performed to assess differences before and after simplification. RESULTS: Average grade levels before and after simplification were as follows: English 8.43±1.17 to 8.9±2.1 (p=0.41) and Spanish 5.3±0.34 to 4.1±1.1 (p=0.0001). Spanish articles were significantly simplified per Legibilidad-µ (p=0.003). No significant difference was noted for other scales. CONCLUSIONS: The readability of AAO articles in English worsened without significance but significantly improved in Spanish. This may result from simpler syllable structures and a lesser overall vocabulary in Spanish. With increased testing, physicians can recommend ChatGPT for Spanish-speaking patients to improve health literacy.

8.
J Med Internet Res ; 15(7): e144, 2013 Jul 31.
Artigo em Inglês | MEDLINE | ID: mdl-23903235

RESUMO

BACKGROUND: Adequate health literacy is important for people to maintain good health and manage diseases and injuries. Educational text, either retrieved from the Internet or provided by a doctor's office, is a popular method to communicate health-related information. Unfortunately, it is difficult to write text that is easy to understand, and existing approaches, mostly the application of readability formulas, have not convincingly been shown to reduce the difficulty of text. OBJECTIVE: To develop an evidence-based writer support tool to improve perceived and actual text difficulty. To this end, we are developing and testing algorithms that automatically identify difficult sections in text and provide appropriate, easier alternatives; algorithms that effectively reduce text difficulty will be included in the support tool. This work describes the user evaluation with an independent writer of an automated simplification algorithm using term familiarity. METHODS: Term familiarity indicates how easy words are for readers and is estimated using term frequencies in the Google Web Corpus. Unfamiliar words are algorithmically identified and tagged for potential replacement. Easier alternatives consisting of synonyms, hypernyms, definitions, and semantic types are extracted from WordNet, the Unified Medical Language System (UMLS), and Wiktionary and ranked for a writer to choose from to simplify the text. We conducted a controlled user study with a representative writer who used our simplification algorithm to simplify texts. We tested the impact with representative consumers. The key independent variable of our study is lexical simplification, and we measured its effect on both perceived and actual text difficulty. Participants were recruited from Amazon's Mechanical Turk website. Perceived difficulty was measured with 1 metric, a 5-point Likert scale. Actual difficulty was measured with 3 metrics: 5 multiple-choice questions alongside each text to measure understanding, 7 multiple-choice questions without the text for learning, and 2 free recall questions for information retention. RESULTS: Ninety-nine participants completed the study. We found strong beneficial effects on both perceived and actual difficulty. After simplification, the text was perceived as simpler (P<.001) with simplified text scoring 2.3 and original text 3.2 on the 5-point Likert scale (score 1: easiest). It also led to better understanding of the text (P<.001) with 11% more correct answers with simplified text (63% correct) compared to the original (52% correct). There was more learning with 18% more correct answers after reading simplified text compared to 9% more correct answers after reading the original text (P=.003). There was no significant effect on free recall. CONCLUSIONS: Term familiarity is a valuable feature in simplifying text. Although the topic of the text influences the effect size, the results were convincing and consistent.


Assuntos
Algoritmos , Serviços de Informação , Adulto , Idoso , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Unified Medical Language System , Redação , Adulto Jovem
9.
Front Artif Intell ; 6: 1223924, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37808622

RESUMO

In the field of automatic text simplification, assessing whether or not the meaning of the original text has been preserved during simplification is of paramount importance. Metrics relying on n-gram overlap assessment may struggle to deal with simplifications which replace complex phrases with their simpler paraphrases. Current evaluation metrics for meaning preservation based on large language models (LLMs), such as BertScore in machine translation or QuestEval in summarization, have been proposed. However, none has a strong correlation with human judgment of meaning preservation. Moreover, such metrics have not been assessed in the context of text simplification research. In this study, we present a meta-evaluation of several metrics we apply to measure content similarity in text simplification. We also show that the metrics are unable to pass two trivial, inexpensive content preservation tests. Another contribution of this study is MeaningBERT (https://github.com/GRAAL-Research/MeaningBERT), a new trainable metric designed to assess meaning preservation between two sentences in text simplification, showing how it correlates with human judgment. To demonstrate its quality and versatility, we will also present a compilation of datasets used to assess meaning preservation and benchmark our study against a large selection of popular metrics.

10.
Front Artif Intell ; 6: 1208451, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37791004

RESUMO

In this study, we focus on sentence splitting, a subfield of text simplification, motivated largely by an unproven idea that if you divide a sentence in pieces, it should become easier to understand. Our primary goal in this study is to find out whether this is true. In particular, we ask, does it matter whether we break a sentence into two, three, or more? We report on our findings based on Amazon Mechanical Turk. More specifically, we introduce a Bayesian modeling framework to further investigate to what degree a particular way of splitting the complex sentence affects readability, along with a number of other parameters adopted from diverse perspectives, including clinical linguistics, and cognitive linguistics. The Bayesian modeling experiment provides clear evidence that bisecting the sentence leads to enhanced readability to a degree greater than when we create simplification with more splits.

11.
Front Artif Intell ; 6: 1236963, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38099233

RESUMO

We discover sizable differences between the lexical complexity assignments of first language (L1) and second language (L2) English speakers. The complexity assignments of 940 shared tokens without context were extracted and compared from three lexical complexity prediction (LCP) datasets: the CompLex dataset, the Word Complexity Lexicon, and the CERF-J wordlist. It was found that word frequency, length, syllable count, familiarity, and prevalence as well as a number of derivations had a greater effect on perceived lexical complexity for L2 English speakers than they did for L1 English speakers. We explain these findings in connection to several theories from applied linguistics and then use these findings to inform a binary classifier that is trained to distinguish between spelling errors made by L1 and L2 English speakers. Our results indicate that several of our findings are generalizable. Differences in perceived lexical complexity are shown to be useful in the automatic identification of problematic words for these differing target populations. This gives support to the development of personalized lexical complexity prediction and text simplification systems.

12.
JAMIA Open ; 5(2): ooac044, 2022 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-35663117

RESUMO

Objective: Simplifying healthcare text to improve understanding is difficult but critical to improve health literacy. Unfortunately, few tools exist that have been shown objectively to improve text and understanding. We developed an online editor that integrates simplification algorithms that suggest concrete simplifications, all of which have been shown individually to affect text difficulty. Materials and Methods: The editor was used by a health educator at a local community health center to simplify 4 texts. A controlled experiment was conducted with community center members to measure perceived and actual difficulty of the original and simplified texts. Perceived difficulty was measured using a Likert scale; actual difficulty with multiple-choice questions and with free recall of information evaluated by the educator and 2 sets of automated metrics. Results: The results show that perceived difficulty improved with simplification. Several multiple-choice questions, measuring actual difficulty, were answered more correctly with the simplified text. Free recall of information showed no improvement based on the educator evaluation but was better for simplified texts when measured with automated metrics. Two follow-up analyses showed that self-reported education level and the amount of English spoken at home positively correlated with question accuracy for original texts and the effect disappears with simplified text. Discussion: Simplifying text is difficult and the results are subtle. However, using a variety of different metrics helps quantify the effects of changes. Conclusion: Text simplification can be supported by algorithmic tools. Without requiring tool training or linguistic knowledge, our simplification editor helped simplify healthcare related texts.

13.
Front Psychol ; 13: 707630, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35350726

RESUMO

In this paper, we present an overview of existing parallel corpora for Automatic Text Simplification (ATS) in different languages focusing on the approach adopted for their construction. We make the main distinction between manual and (semi)-automatic approaches in order to investigate in which respect complex and simple texts vary and whether and how the observed modifications may depend on the underlying approach. To this end, we perform a two-level comparison on Italian corpora, since this is the only language, with the exception of English, for which there are large parallel resources derived through the two approaches considered. The first level of comparison accounts for the main types of sentence transformations occurring in the simplification process, the second one examines the results of a linguistic profiling analysis based on Natural Language Processing techniques and carried out on the original and the simple version of the same texts. For both levels of analysis, we chose to focus our discussion mostly on sentence transformations and linguistic characteristics that pertain to the morpho-syntactic and syntactic structure of the sentence.

14.
JMIR Med Inform ; 10(11): e38095, 2022 Nov 18.
Artigo em Inglês | MEDLINE | ID: mdl-36399375

RESUMO

BACKGROUND: In most cases, the abstracts of articles in the medical domain are publicly available. Although these are accessible by everyone, they are hard to comprehend for a wider audience due to the complex medical vocabulary. Thus, simplifying these complex abstracts is essential to make medical research accessible to the general public. OBJECTIVE: This study aims to develop a deep learning-based text simplification (TS) approach that converts complex medical text into a simpler version while maintaining the quality of the generated text. METHODS: A TS approach using reinforcement learning and transformer-based language models was developed. Relevance reward, Flesch-Kincaid reward, and lexical simplicity reward were optimized to help simplify jargon-dense complex medical paragraphs to their simpler versions while retaining the quality of the text. The model was trained using 3568 complex-simple medical paragraphs and evaluated on 480 paragraphs via the help of automated metrics and human annotation. RESULTS: The proposed method outperformed previous baselines on Flesch-Kincaid scores (11.84) and achieved comparable performance with other baselines when measured using ROUGE-1 (0.39), ROUGE-2 (0.11), and SARI scores (0.40). Manual evaluation showed that percentage agreement between human annotators was more than 70% when factors such as fluency, coherence, and adequacy were considered. CONCLUSIONS: A unique medical TS approach is successfully developed that leverages reinforcement learning and accurately simplifies complex medical paragraphs, thereby increasing their readability. The proposed TS approach can be applied to automatically generate simplified text for complex medical text data, which would enhance the accessibility of biomedical research to a wider audience.

15.
J Am Med Inform Assoc ; 29(11): 1976-1988, 2022 10 07.
Artigo em Inglês | MEDLINE | ID: mdl-36083212

RESUMO

OBJECTIVE: Plain language in medicine has long been advocated as a way to improve patient understanding and engagement. As the field of Natural Language Processing has progressed, increasingly sophisticated methods have been explored for the automatic simplification of existing biomedical text for consumers. We survey the literature in this area with the goals of characterizing approaches and applications, summarizing existing resources, and identifying remaining challenges. MATERIALS AND METHODS: We search English language literature using lists of synonyms for both the task (eg, "text simplification") and the domain (eg, "biomedical"), and searching for all pairs of these synonyms using Google Scholar, Semantic Scholar, PubMed, ACL Anthology, and DBLP. We expand search terms based on results and further include any pertinent papers not in the search results but cited by those that are. RESULTS: We find 45 papers that we deem relevant to the automatic simplification of biomedical text, with data spanning 7 natural languages. Of these (nonexclusively), 32 describe tools or methods, 13 present data sets or resources, and 9 describe impacts on human comprehension. Of the tools or methods, 22 are chiefly procedural and 10 are chiefly neural. CONCLUSIONS: Though neural methods hold promise for this task, scarcity of parallel data has led to continued development of procedural methods. Various low-resource mitigations have been proposed to advance neural methods, including paragraph-level and unsupervised models and augmentation of neural models with procedural elements drawing from knowledge bases. However, high-quality parallel data will likely be crucial for developing fully automated biomedical text simplification.


Assuntos
Processamento de Linguagem Natural , Unified Medical Language System , Humanos , Idioma , PubMed , Semântica
16.
Front Artif Intell ; 5: 983008, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36171798

RESUMO

Text simplification involves making texts easier to understand, usually for lay readers. Simplifying texts is a complex task, especially when conducted in a second language. The readability of the produced texts and the way in which authors manage the different phases of the text simplification process are influenced by their writing expertise and by their language proficiency. Training on audience awareness can be beneficial for writers, but most research so far has devoted attention to first-language writers who simplify their own texts. Therefore, this study investigated the impact of text simplification training on second-language writers (university students) who simplify already existing texts. Specifically, after identifying a first and a second phase in the text simplification process (namely, two distinct series of writing dynamics), we analyzed the impact of our training on pausing and revision behavior across phases, as well as levels of readability achieved by the students. Additionally, we examined correlations between pausing behavior and readability by using keystroke logging data and automated text analysis. We found that phases of text simplification differ along multiple dimensions, even though our training did not seem to influence pausing and revision dynamics. Our training led to texts with fewer and shorter words, and with syntactically simpler sentences. The correlation analysis showed that longer and more frequent pauses at specific text locations were linked with increased readability in the same or adjacent text locations. We conclude the paper by discussing theoretical, methodological, and pedagogical implications, alongside limitations and areas for future research.

17.
Front Artif Intell ; 5: 1042258, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36530355

RESUMO

In this paper, we distinguish between four interconnected notions that recur in the literature on text simplification: clarity, easiness, plainness, and simplicity. While plain language and easy language have both been the subject of standardization efforts, there are few attempts to define text clarity and text simplicity. Indeed, in the definition of plain language, clarity has been favored at the expense of simplicity but is employed as a self-evident notion. Meanwhile, text simplicity suffers from a negative connotation and is more likely to be defined by its antonym, text complexity. In our analysis, we examine the current definitions of plain language and easy language and discuss common definitions of text clarity and text complexity. We propose a model of text simplification that can clarify the transition from specialized texts to plain language texts, and easy language texts. It is our contention that text simplification should be placed in a more general framework of discursive ergonomics.

18.
Stud Health Technol Inform ; 284: 249-253, 2021 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-34920520

RESUMO

Complexity and domain-specificity make medical text hard to understand for patients and their next of kin. To simplify such text, this paper explored how word and character level information can be leveraged to identify medical terms when training data is limited. We created a dataset of medical and general terms using the Human Disease Ontology from BioPortal and Wikipedia pages. Our results from 10-fold cross validation indicated that convolutional neural networks (CNNs) and transformers perform competitively. The best F score of 93.9% was achieved by a CNN trained on both word and character level embeddings. Statistical significance tests demonstrated that general word embeddings provide rich word representations for medical term identification. Consequently, focusing on words is favorable for medical term identification if using deep learning architectures.


Assuntos
Redes Neurais de Computação , Projetos de Pesquisa , Humanos
19.
Front Psychol ; 12: 703690, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34764901

RESUMO

Studies on simple language and simplification are often based on datasets of texts, either for children or learners of a second language. In both cases, these texts represent an example of simple language, but simplification likely involves different strategies. As such, this data may not be entirely homogeneous in terms of text simplicity. This study investigates linguistic properties and specific simplification strategies used in Russian texts for primary school children with different language backgrounds and levels of language proficiency. To explore the structure and variability of simple texts for young readers of different age groups, we have trained models for multiclass and binary classification. The models were based on quantitative features of texts. Subsequently, we evaluated the simplification strategies applied to readers of the same age with different linguistic backgrounds. This study is particularly relevant for the Russian language material, where the concept of easy and plain language has not been sufficiently investigated. The study revealed that the three types of texts cannot easily be distinguished from each other by judging the performance of multiclass models based on various quantitative features. Therefore, it can be said that texts of all types exhibit a similar level of accessibility to young readers. In contrast, binary classification tasks demonstrated better results, especially in the R-native vs. non R-native track (with 0.78 F1-score), these results may indicate that the strategies used for adapting or creating texts for each type of audience are different.

20.
Stud Health Technol Inform ; 270: 362-366, 2020 Jun 16.
Artigo em Inglês | MEDLINE | ID: mdl-32570407

RESUMO

Parallel sentences provide semantically similar information which can vary on a given dimension, such as language or register. Parallel sentences with register variation (like expert and non-expert documents) can be exploited for the automatic text simplification. The aim of automatic text simplification is to better access and understand a given information. In the biomedical field, simplification may permit patients to understand medical and health texts. Yet, there is currently no such available resources. We propose to exploit comparable corpora which are distinguished by their registers (specialized and simplified versions) to detect and align parallel sentences. These corpora are in French and are related to the biomedical area. We treat this task as binary classification (alignment/non-alignment). Our results show that the method we present here can be used to automatically generate a corpus of parallel sentences from our comparable corpus.


Assuntos
Idioma , Processamento de Linguagem Natural , Compreensão , Semântica , Unified Medical Language System
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA