Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 85
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
BMC Genomics ; 25(1): 573, 2024 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-38849740

RESUMO

BACKGROUNDS: The single-pass long reads generated by third-generation sequencing technology exhibit a higher error rate. However, the circular consensus sequencing (CCS) produces shorter reads. Thus, it is effective to manage the error rate of long reads algorithmically with the help of the homologous high-precision and low-cost short reads from the Next Generation Sequencing (NGS) technology. METHODS: In this work, a hybrid error correction method (NmTHC) based on a generative neural machine translation model is proposed to automatically capture discrepancies within the aligned regions of long reads and short reads, as well as the contextual relationships within the long reads themselves for error correction. Akin to natural language sequences, the long read can be regarded as a special "genetic language" and be processed with the idea of generative neural networks. The algorithm builds a sequence-to-sequence(seq2seq) framework with Recurrent Neural Network (RNN) as the core layer. The before and post-corrected long reads are regarded as the sentences in the source and target language of translation, and the alignment information of long reads with short reads is used to create the special corpus for training. The well-trained model can be used to predict the corrected long read. RESULTS: NmTHC outperforms the latest mainstream hybrid error correction methods on real-world datasets from two mainstream platforms, including PacBio and Nanopore. Our experimental evaluation results demonstrate that NmTHC can align more bases with the reference genome without any segmenting in the six benchmark datasets, proving that it enhances alignment identity without sacrificing any length advantages of long reads. CONCLUSION: Consequently, NmTHC reasonably adopts the generative Neural Machine Translation (NMT) model to transform hybrid error correction tasks into machine translation problems and provides a novel perspective for solving long-read error correction problems with the ideas of Natural Language Processing (NLP). More remarkably, the proposed methodology is sequencing-technology-independent and can produce more precise reads.


Assuntos
Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala , Redes Neurais de Computação , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Aprendizado de Máquina
2.
J Gen Intern Med ; 39(7): 1095-1102, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38347346

RESUMO

BACKGROUND: Machine translation (MT) apps are used informally by healthcare professionals in many settings, especially where interpreters are not readily available. As MT becomes more accurate and accessible, it may be tempting to use MT more widely. Institutions and healthcare professionals need guidance on when and how these applications might be used safely and how to manage potential risks to communication. OBJECTIVES: Explore factors that may hinder or facilitate communication when using voice-to-voice MT. DESIGN: Health professionals volunteered to use a voice-to-voice MT app in routine encounters with their patients. Both health professionals and patients provided brief feedback on the experience, and a subset of consultations were observed. PARTICIPANTS: Doctors, nurses, and allied health professionals working in the Primary Care Division of the Geneva University Hospitals, Switzerland. MAIN MEASURES: Achievement of consultation goals; understanding and satisfaction; willingness to use MT again; difficulties encountered; factors affecting communication when using MT. KEY RESULTS: Fourteen health professionals conducted 60 consultations in 18 languages, using one of two voice-to-voice MT apps. Fifteen consultations were observed. Professionals achieved their consultation goals in 82.7% of consultations but were satisfied with MT communication in only 53.8%. Reasons for dissatisfaction included lack of practice with the app and difficulty understanding patients. Eighty-six percent of patients thought MT-facilitated communication was easy, and most participants were willing to use MT in the future (73% professionals, 84% patients). Experiences were more positive with European languages. Several conditions and speech practices were identified that appear to affect communication when using MT. CONCLUSION: While professional interpreters remain the gold standard for overcoming language barriers, voice-to-voice MT may be acceptable in some clinical situations. Healthcare institutions and professionals must be attentive to potential sources of MT errors and ensure the conditions necessary for safe and effective communication. More research in natural settings is needed to inform guidelines and training on using MT in clinical communication.


Assuntos
Barreiras de Comunicação , Tradução , Humanos , Masculino , Feminino , Adulto , Pessoa de Meia-Idade , Relações Médico-Paciente , Aplicativos Móveis , Suíça , Idoso , Pessoal de Saúde , Comunicação
3.
Sensors (Basel) ; 24(5)2024 Feb 24.
Artigo em Inglês | MEDLINE | ID: mdl-38475008

RESUMO

Sign language serves as the primary mode of communication for the deaf community. With technological advancements, it is crucial to develop systems capable of enhancing communication between deaf and hearing individuals. This paper reviews recent state-of-the-art methods in sign language recognition, translation, and production. Additionally, we introduce a rule-based system, called ruLSE, for generating synthetic datasets in Spanish Sign Language. To check the usefulness of these datasets, we conduct experiments with two state-of-the-art models based on Transformers, MarianMT and Transformer-STMC. In general, we observe that the former achieves better results (+3.7 points in the BLEU-4 metric) although the latter is up to four times faster. Furthermore, the use of pre-trained word embeddings in Spanish enhances results. The rule-based system demonstrates superior performance and efficiency compared to Transformer models in Sign Language Production tasks. Lastly, we contribute to the state of the art by releasing the generated synthetic dataset in Spanish named synLSE.


Assuntos
Aprendizado Profundo , Humanos , Língua de Sinais , Audição , Comunicação
4.
J Cancer Educ ; 39(5): 477-478, 2024 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-38652432

RESUMO

This commentary evaluates the use of machine translation for multilingual patienteducation in oncology. It critically examines the balance between technologicalbenefits in language accessibility and the potential for increasing healthcare disparities.The analysis emphasizes the need for a multidisciplinary approach to translation thatincorporates linguistic accuracy, medical clarity, and cultural relevance. Additionally, ithighlights the ethical considerations of digital literacy and access, underscoring theimportance of equitable patient education. This contribution seeks to advance thediscussion on the thoughtful integration of technology in healthcare communication,focusing on maintaining high standards of equity, quality, and patient care.


Assuntos
Multilinguismo , Educação de Pacientes como Assunto , Tradução , Humanos , Neoplasias , Letramento em Saúde , Barreiras de Comunicação , Idioma
5.
J Gen Intern Med ; 38(10): 2333-2339, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-36781579

RESUMO

BACKGROUND: Accessing professional medical interpreters for brief, low risk exchanges can be challenging. Machine translation (MT) for verbal communication has the potential to be a useful clinical tool, but few evaluations exist. OBJECTIVE: We evaluated the quality of three MT applications for English-Spanish and English-Mandarin two-way interpretation of low complexity brief clinical communication compared with human interpretation. DESIGN: Audio-taped phrases were interpreted via human and 3 MT applications. Bilingual assessors evaluated the quality of MT interpretation on four assessment categories (accuracy, fluency, meaning, and clinical risk) using 5-point Likert scales. We used a non-inferiority design with 15% inferiority margin to evaluate the quality of three MT applications with professional medical interpreters serving as gold standards. MAIN MEASURES: Proportion of interpretation exchanges deemed acceptable, defined as a composite score of 16 or greater out of 20 based on the four assessment categories. KEY RESULTS: For English to Spanish, the proportion of MT-interpreted phrases scored as acceptable ranged from 0.68 to 0.84, while for English to Mandarin, the range was from 0.62 to 0.76. Both Spanish/Mandarin to English MT interpretation had low acceptable scores (range 0.36 to 0.41). No MT interpretation met the non-inferiority threshold. CONCLUSION: While MT interpretation was better for English to Spanish or Mandarin than the reverse, the overall quality of MT interpretation was poor for two-way clinical communication. Clinicians should advocate for easier access to professional interpretation in all clinical spaces and defer use of MT until these applications improve.


Assuntos
Comunicação , Tradução , Humanos , Pessoal Técnico de Saúde , Barreiras de Comunicação
6.
J Psycholinguist Res ; 52(5): 1525-1544, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37100967

RESUMO

This paper explores the practical prospects for using artificial intelligence technologies in professional English-speaking translator education. At the online conference 'Translation Skills in Times of Artificial Intelligence' (DingTalk platform, January 2022), the teachers of higher education institutions in China prioritized the translator's competencies necessary for successful professional activity during the digital transformation of social and economic business relations. The educators also evaluated the demand for online services used in the education of English-Chinese interpreters. The survey results showed that the use of artificial intelligence technologies in educational practices could have a significant impact on the development of key competencies of future translators. Using a competency-based approach to interpreter training and considering the need to develop abilities, knowledge, and skills required for successful professional translation activity, the author developed the pedagogical concept of the online educational course 'Simultaneous and asynchronous translation in a digital environment.'


Assuntos
Inteligência Artificial , Educação a Distância , Traduções , Humanos , China , Instituições Acadêmicas , Idioma
7.
BMC Bioinformatics ; 22(Suppl 1): 598, 2021 Dec 17.
Artigo em Inglês | MEDLINE | ID: mdl-34920707

RESUMO

BACKGROUND: Automated assignment of specific ontology concepts to mentions in text is a critical task in biomedical natural language processing, and the subject of many open shared tasks. Although the current state of the art involves the use of neural network language models as a post-processing step, the very large number of ontology classes to be recognized and the limited amount of gold-standard training data has impeded the creation of end-to-end systems based entirely on machine learning. Recently, Hailu et al. recast the concept recognition problem as a type of machine translation and demonstrated that sequence-to-sequence machine learning models have the potential to outperform multi-class classification approaches. METHODS: We systematically characterize the factors that contribute to the accuracy and efficiency of several approaches to sequence-to-sequence machine learning through extensive studies of alternative methods and hyperparameter selections. We not only identify the best-performing systems and parameters across a wide variety of ontologies but also provide insights into the widely varying resource requirements and hyperparameter robustness of alternative approaches. Analysis of the strengths and weaknesses of such systems suggest promising avenues for future improvements as well as design choices that can increase computational efficiency with small costs in performance. RESULTS: Bidirectional encoder representations from transformers for biomedical text mining (BioBERT) for span detection along with the open-source toolkit for neural machine translation (OpenNMT) for concept normalization achieve state-of-the-art performance for most ontologies annotated in the CRAFT Corpus. This approach uses substantially fewer computational resources, including hardware, memory, and time than several alternative approaches. CONCLUSIONS: Machine translation is a promising avenue for fully machine-learning-based concept recognition that achieves state-of-the-art results on the CRAFT Corpus, evaluated via a direct comparison to previous results from the 2019 CRAFT shared task. Experiments illuminating the reasons for the surprisingly good performance of sequence-to-sequence methods targeting ontology identifiers suggest that further progress may be possible by mapping to alternative target concept representations. All code and models can be found at: https://github.com/UCDenver-ccp/Concept-Recognition-as-Translation .

8.
J Gen Intern Med ; 36(11): 3361-3365, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-33674922

RESUMO

BACKGROUND: Because many hospitals have no mechanism for written translation, ED providers resort to the use of automated translation software, such as Google Translate (GT) for patient instructions. A recent study of discharge instructions in Spanish and Chinese suggested that accuracy rates of Google Translate (GT) were high. STUDY OBJECTIVE: To perform a pragmatic assessment of GT for the written translation of commonly used ED discharge instructions in seven commonly spoken languages. METHODS: A prospective assessment of the accuracy of GT for 20 commonly used ED discharge instruction phrases, as evaluated by a convenience sample of native speakers of seven commonly spoken languages (Spanish, Chinese, Vietnamese, Tagalog, Korean, Armenian, and Farsi). Translations were evaluated using a previously validated matrix for scoring machine translation, containing 5-point Likert scales for fluency, adequacy, meaning, and severity, in addition to a dichotomous assessment of retention of the overall meaning. RESULTS: Twenty volunteers evaluated 400 google translated discharge statements. Volunteers were 50% female and spoke Spanish (5), Armenian (2), Chinese (3), Tagalog (4), Korean (2), and Farsi (2). The overall meaning was retained for 82.5% (330/400) of the translations. Spanish had the highest accuracy rate (94%), followed by Tagalog (90%), Korean (82.5%), Chinese (81.7%), Farsi (67.5%), and Armenian (55%). Mean Likert scores (on a 5-point scale) were high for fluency (4.2), adequacy (4.4), meaning (4.3), and severity (4.3) but also varied. CONCLUSION: GT for discharge instructions in the ED is inconsistent between languages and should not be relied on for patient instructions.


Assuntos
Ferramenta de Busca , Tradução , Serviço Hospitalar de Emergência , Feminino , Humanos , Idioma , Masculino , Alta do Paciente , Estudos Prospectivos
9.
BMC Med Inform Decis Mak ; 21(1): 258, 2021 09 06.
Artigo em Inglês | MEDLINE | ID: mdl-34488734

RESUMO

BACKGROUND: Biomedical language translation requires multi-lingual fluency as well as relevant domain knowledge. Such requirements make it challenging to train qualified translators and costly to generate high-quality translations. Machine translation represents an effective alternative, but accurate machine translation requires large amounts of in-domain data. While such datasets are abundant in general domains, they are less accessible in the biomedical domain. Chinese and English are two of the most widely spoken languages, yet to our knowledge, a parallel corpus does not exist for this language pair in the biomedical domain. DESCRIPTION: We developed an effective pipeline to acquire and process an English-Chinese parallel corpus from the New England Journal of Medicine (NEJM). This corpus consists of about 100,000 sentence pairs and 3,000,000 tokens on each side. We showed that training on out-of-domain data and fine-tuning with as few as 4000 NEJM sentence pairs improve translation quality by 25.3 (13.4) BLEU for en[Formula: see text]zh (zh[Formula: see text]en) directions. Translation quality continues to improve at a slower pace on larger in-domain data subsets, with a total increase of 33.0 (24.3) BLEU for en[Formula: see text]zh (zh[Formula: see text]en) directions on the full dataset. CONCLUSIONS: The code and data are available at https://github.com/boxiangliu/ParaMed .


Assuntos
Idioma , Processamento de Linguagem Natural , China , Humanos , Tradução
10.
Sensors (Basel) ; 21(8)2021 Apr 10.
Artigo em Inglês | MEDLINE | ID: mdl-33920064

RESUMO

Grammatical Error Correction (GEC) is the task of detecting and correcting various grammatical errors in texts. Many previous approaches to the GEC have used various mechanisms including rules, statistics, and their combinations. Recently, the performance of the GEC in English has been drastically enhanced due to the vigorous applications of deep neural networks and pretrained language models. Following the promising results of the English GEC tasks, we apply the Transformer with Copying Mechanism into the Korean GEC task by introducing novel and effective noising methods for constructing Korean GEC datasets. Our comparative experiments showed that the proposed system outperforms two commercial grammar check and other NMT-based models.

11.
Sensors (Basel) ; 21(4)2021 Feb 21.
Artigo em Inglês | MEDLINE | ID: mdl-33670035

RESUMO

In this paper, we introduce new concepts in the machine translation paradigm. We treat the corpus as a database of frequent word sets. A translation request triggers association rules joining phrases present in the source language, and phrases present in the target language. It has to be noted that a sequential scan of the corpus for such phrases will increase the response time in an unexpected manner. We introduce the pre-processing of the bilingual corpus through proposing a data structure called Corpus-Trie (CT) that renders a bilingual parallel corpus in a compact data structure representing frequent data items sets. We also present algorithms which utilize the CT to respond to translation requests and explore novel techniques in exhaustive experiments. Experiments were performed on specific language pairs, although the proposed method is not restricted to any specific language. Moreover, the proposed Corpus-Trie can be extended from bilingual corpora to accommodate multi-language corpora. Experiments indicated that the response time of a translation request is logarithmic to the count of unrepeated phrases in the original bilingual corpus (and thus, the Corpus-Trie size). In practical situations, 5-20% of the log of the number of the nodes have to be visited. The experimental results indicate that the BLEU score for the proposed CT system increases with the size of the number of phrases in the CT, for both English-Arabic and English-French translations. The proposed CT system was demonstrated to be better than both Omega-T and Apertium in quality of translation from a corpus size exceeding 1,600,000 phrases for English-Arabic translation, and 300,000 phrases for English-French translation.

12.
Sensors (Basel) ; 21(9)2021 Apr 21.
Artigo em Inglês | MEDLINE | ID: mdl-33919018

RESUMO

Real-word errors are characterized by being actual terms in the dictionary. By providing context, real-word errors are detected. Traditional methods to detect and correct such errors are mostly based on counting the frequency of short word sequences in a corpus. Then, the probability of a word being a real-word error is computed. On the other hand, state-of-the-art approaches make use of deep learning models to learn context by extracting semantic features from text. In this work, a deep learning model were implemented for correcting real-word errors in clinical text. Specifically, a Seq2seq Neural Machine Translation Model mapped erroneous sentences to correct them. For that, different types of error were generated in correct sentences by using rules. Different Seq2seq models were trained and evaluated on two corpora: the Wikicorpus and a collection of three clinical datasets. The medicine corpus was much smaller than the Wikicorpus due to privacy issues when dealing with patient information. Moreover, GloVe and Word2Vec pretrained word embeddings were used to study their performance. Despite the medicine corpus being much smaller than the Wikicorpus, Seq2seq models trained on the medicine corpus performed better than those models trained on the Wikicorpus. Nevertheless, a larger amount of clinical text is required to improve the results.


Assuntos
Idioma , Semântica , Humanos , Processamento de Linguagem Natural , Privacidade , Probabilidade
13.
Sensors (Basel) ; 21(19)2021 Sep 29.
Artigo em Inglês | MEDLINE | ID: mdl-34640835

RESUMO

Languages that allow free word order, such as Arabic dialects, are of significant difficulty for neural machine translation (NMT) because of many scarce words and the inefficiency of NMT systems to translate these words. Unknown Word (UNK) tokens represent the out-of-vocabulary words for the reason that NMT systems run with vocabulary that has fixed size. Scarce words are encoded completely as sequences of subword pieces employing the Word-Piece Model. This research paper introduces the first Transformer-based neural machine translation model for Arabic vernaculars that employs subword units. The proposed solution is based on the Transformer model that has been presented lately. The use of subword units and shared vocabulary within the Arabic dialect (the source language) and modern standard Arabic (the target language) enhances the behavior of the multi-head attention sublayers for the encoder by obtaining the overall dependencies between words of input sentence for Arabic vernacular. Experiments are carried out from Levantine Arabic vernacular (LEV) to modern standard Arabic (MSA) and Maghrebi Arabic vernacular (MAG) to MSA, Gulf-MSA, Nile-MSA, Iraqi Arabic (IRQ) to MSA translation tasks. Extensive experiments confirm that the suggested model adequately addresses the unknown word issue and boosts the quality of translation from Arabic vernaculars to Modern standard Arabic (MSA).


Assuntos
Idioma , Vocabulário
14.
J Biomed Inform ; 94: 103207, 2019 06.
Artigo em Inglês | MEDLINE | ID: mdl-31077817

RESUMO

Automatic ICD-10 coding is an unresolved challenge in terms of Machine Learning tasks. Despite hospitals generating an enormous amount of clinical documents, data is considerably sparse, associated with a very skewed and unbalanced code distribution, what entails reduced interoperability. In addition, in some languages the availability of coded documents is very limited. This paper proposes a cross-lingual approach based on Machine Translation methods to code death certificates with ICD-10 using supervised learning. The aim of this approach is to increase the availability of coded documents by combining collections of different languages, which may also contribute to reduce their possible bias in the ICD distribution, i.e. to avoid the promotion of a subset of codes due to service or environmental factors. A significant improvement in system performance is achieved for those labels with few occurrences.


Assuntos
Classificação Internacional de Doenças , Aprendizado de Máquina , Tradução , Automação , Registros Eletrônicos de Saúde , Humanos
15.
Molecules ; 22(10)2017 Oct 17.
Artigo em Inglês | MEDLINE | ID: mdl-29039790

RESUMO

With the development of next generation sequencing techniques, it is fast and cheap to determine protein sequences but relatively slow and expensive to extract useful information from protein sequences because of limitations of traditional biological experimental techniques. Protein function prediction has been a long standing challenge to fill the gap between the huge amount of protein sequences and the known function. In this paper, we propose a novel method to convert the protein function problem into a language translation problem by the new proposed protein sequence language "ProLan" to the protein function language "GOLan", and build a neural machine translation model based on recurrent neural networks to translate "ProLan" language to "GOLan" language. We blindly tested our method by attending the latest third Critical Assessment of Function Annotation (CAFA 3) in 2016, and also evaluate the performance of our methods on selected proteins whose function was released after CAFA competition. The good performance on the training and testing datasets demonstrates that our new proposed method is a promising direction for protein function prediction. In summary, we first time propose a method which converts the protein function prediction problem to a language translation problem and applies a neural machine translation model for protein function prediction.


Assuntos
Biologia Computacional/métodos , Redes Neurais de Computação , Proteínas/metabolismo , Software , Algoritmos , Bases de Dados de Proteínas , Ontologia Genética , Aprendizado de Máquina , Reprodutibilidade dos Testes
16.
Lang Resour Eval ; 49(1): 147-193, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26120290

RESUMO

In this paper, we tackle the problem of domain adaptation of statistical machine translation (SMT) by exploiting domain-specific data acquired by domain-focused crawling of text from the World Wide Web. We design and empirically evaluate a procedure for automatic acquisition of monolingual and parallel text and their exploitation for system training, tuning, and testing in a phrase-based SMT framework. We present a strategy for using such resources depending on their availability and quantity supported by results of a large-scale evaluation carried out for the domains of environment and labour legislation, two language pairs (English-French and English-Greek) and in both directions: into and from English. In general, machine translation systems trained and tuned on a general domain perform poorly on specific domains and we show that such systems can be adapted successfully by retuning model parameters using small amounts of parallel in-domain data, and may be further improved by using additional monolingual and parallel training data for adaptation of language and translation models. The average observed improvement in BLEU achieved is substantial at 15.30 points absolute.

17.
Stud Health Technol Inform ; 310: 1450-1451, 2024 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-38269691

RESUMO

The purpose of this study was to evaluate the accuracy of deep neural machine translation focused on medical device adverse event terminology. 10 models were obtained, and their English-to-Japanese translation accuracy was evaluated using quantitative and qualitative measures. No significant difference was found in the quantitative index except for a few pairs. In the qualitative evaluation, there was a significant difference and googletrans and GPT-3 were regarded as useful models.


Assuntos
Inteligência Artificial , Falha de Equipamento , Tradução , Terminologia como Assunto
18.
Heliyon ; 10(6): e28106, 2024 Mar 30.
Artigo em Inglês | MEDLINE | ID: mdl-38524597

RESUMO

Artificial intelligence has advanced significantly in recent years, affecting multiple aspects of life. In particular, this has had an impact on the machine translation of texts, reducing or removing human interaction. Artificial intelligence (AI)-based translation software models have thus become widely available, and these now include Google Translate, Bing, Microsoft Translator, DeepL, Reverso, Systran Translate, and Amazon Translate. Several computer-aided translation (CAT) tools such as Memoq, Trados, Smartcat, Lokalise, Smartling, Crowdin, TextUnited, and Memsource are also available. More recently, artificial intelligence has been applied in the development of applications such as ChatGPT, ChatSonic, GPT-3 Playground, Chat GPT 4 and YouChat, which simulate conversational responses to researchers' inquiries, mimicking human interactions more directly. This study thus aimed to examine any remaining contrasts between human and AI translation in the legal field to investigate the potential hypothesis that there is now no difference between human and AI translation. The paper thus also examined concerns about whether the need for human translators will decline in the face of AI development, as well as beginning to assess whether it will ever be possible for those in the legal field to depend only on machine translation. To achieve this, a collection of legal texts from various contracts was chosen, and these pieces were both allocated to legal translators and subjected to AI translation systems. Using a contrastive methodology, the study thus examined the differences between AI and human translation, examining the strengths and weaknesses of both approaches and discussing the situations in which each approach might be most effective.

19.
Heliyon ; 10(7): e28535, 2024 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-38560143

RESUMO

This study overviews the technology-related skills required by the translation market and the potential (mis)alignment between market needs and the skills developed in translator training programs in the Arab world. To this end, we collected and analyzed a corpus of 145 job ads for Arabic translation or localization services, seeking to spot market trends in the technology skills required. The study also collected and analyzed documentary evidence on translation programs in the Arab world to reveal the types of technology-related courses and the skills they foster. The findings reveal that computer-aided translation and software localization skills are increasingly required by the Arabic translation market. Moreover, the number of technology-related courses increased over time as training programs updated their offerings to meet current and expected future market demands. However, only a limited number of translation programs offer localization courses. This indicated a potential area of improvement. This study's insights should inform the development of translator training program courses to meet the job market's evolving needs.

20.
PeerJ Comput Sci ; 10: e2122, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38983192

RESUMO

Grammar error correction systems are pivotal in the field of natural language processing (NLP), with a primary focus on identifying and correcting the grammatical integrity of written text. This is crucial for both language learning and formal communication. Recently, neural machine translation (NMT) has emerged as a promising approach in high demand. However, this approach faces significant challenges, particularly the scarcity of training data and the complexity of grammar error correction (GEC), especially for low-resource languages such as Indonesian. To address these challenges, we propose InSpelPoS, a confusion method that combines two synthetic data generation methods: the Inverted Spellchecker and Patterns+POS. Furthermore, we introduce an adapted seq2seq framework equipped with a dynamic decoding method and state-of-the-art Transformer-based neural language models to enhance the accuracy and efficiency of GEC. The dynamic decoding method is capable of navigating the complexities of GEC and correcting a wide range of errors, including contextual and grammatical errors. The proposed model leverages the contextual information of words and sentences to generate a corrected output. To assess the effectiveness of our proposed framework, we conducted experiments using synthetic data and compared its performance with existing GEC systems. The results demonstrate a significant improvement in the accuracy of Indonesian GEC compared to existing methods.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA