Your browser doesn't support javascript.
loading
Pseudotext Injection and Advance Filtering of Low-Resource Corpus for Neural Machine Translation.
Adjeisah, Michael; Liu, Guohua; Nyabuga, Douglas Omwenga; Nortey, Richard Nuetey; Song, Jinling.
Afiliação
  • Adjeisah M; School of Computer Science and Technology, Donghua University, Shanghai, China.
  • Liu G; School of Computer Science and Technology, Donghua University, Shanghai, China.
  • Nyabuga DO; School of Computer Science and Technology, Donghua University, Shanghai, China.
  • Nortey RN; School of Information Science and Technology, Donghua University, Shanghai, China.
  • Song J; School of Mathematics and Information Technology, Hebei Normal University of Science & Technology, Qinhuangdao, Hebei, China.
Comput Intell Neurosci ; 2021: 6682385, 2021.
Article em En | MEDLINE | ID: mdl-33936190
ABSTRACT
Scaling natural language processing (NLP) to low-resourced languages to improve machine translation (MT) performance remains enigmatic. This research contributes to the domain on a low-resource English-Twi translation based on filtered synthetic-parallel corpora. It is often perplexing to learn and understand what a good-quality corpus looks like in low-resource conditions, mainly where the target corpus is the only sample text of the parallel language. To improve the MT performance in such low-resource language pairs, we propose to expand the training data by injecting synthetic-parallel corpus obtained by translating a monolingual corpus from the target language based on bootstrapping with different parameter settings. Furthermore, we performed unsupervised measurements on each sentence pair engaging squared Mahalanobis distances, a filtering technique that predicts sentence parallelism. Additionally, we extensively use three different sentence-level similarity metrics after round-trip translation. Experimental results on a diverse amount of available parallel corpus demonstrate that injecting pseudoparallel corpus and extensive filtering with sentence-level similarity metrics significantly improves the original out-of-the-box MT systems for low-resource language pairs. Compared with existing improvements on the same original framework under the same structure, our approach exhibits tremendous developments in BLEU and TER scores.
Assuntos

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Tradução / Processamento de Linguagem Natural Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2021 Tipo de documento: Article

Texto completo: 1 Coleções: 01-internacional Base de dados: MEDLINE Assunto principal: Tradução / Processamento de Linguagem Natural Tipo de estudo: Prognostic_studies Idioma: En Ano de publicação: 2021 Tipo de documento: Article