Enhancing Low-Resource Language NMT Models Through Constrained Sampling-Based Data Augmentation人工智能自然语言处理神经网络机器翻译低资源语言数据增强约束采样数据增强(DA)是自然语言处理中的一种流行技术,特别是在机器翻译中.它涉及从现有数据集创建额外的训练数据以提高模型性能.然而,现有的针对低资源语言的DA方法...
Data augmentation (DA) is a ubiquitous approach for several text generation tasks. Intuitively, in the machine translation paradigm, especially in low-resource languages scenario, many DA methods have appeared. The most commonly used methods are building pseudocorpus by randomly sampling, omitting, or...
Proceedings of the AMTA 2018 Workshop on Technologies for MT of Low Resource Languages (LoResMT 2018) CH Liu – Proceedings of the AMTA 2018 Workshop on …, 2018 – aclweb.org… System Description of Supervised and Unsupervised Neural Machine Translation Approaches IV Page 8 … Her present ...
🌍African NMT.@jaderabbitstarted an initiative at the Indaba Deep Learning School 2019 to"put African NMT on the map". The goal is to build and collect NMT models for low-resource African languages. TheMasakhane repositorycontains and explains all the code you need to train Joey NMT and ...
Long-range Dependency Handling: Optimized for complex ancient sentence structures. Transfer Learning: Improved accuracy for low-resource language scenarios. Dataset The custom dataset contains 1,474 parallel sentences from Gāhā Sattasaī, using data augmentation and transfer learning to address the data ...
Incorporating theoretical information into the dataset, tokenization and subword splitting improves translation quality in low-resource settings. Previous research has shown that one can train a reasonably good translation model by training a model with
However, the efficacy of NMT relies heavily on the availability of substantial functional corpora, which is readily accessible in languages like English. For low-resourced languages, particularly those with minimal or nonexistent literature, the impact of NMT systems is considerably limited. This study...
Document Level NMT of Low-Resource Languages with Backtranslation.Sami ul HaqSadaf Abdul-RaufArsalan ShaukatAbdullah SaeedAssociation for Computational LinguisticsEmpirical Methods in Natural Language Processing
Efficient incremental training using a novel NMT-SMT hybrid framework for translation of low-resource languagesdoi:10.3389/frai.2024.1381290Bhuvaneswari, KumarVaralakshmi, MurugesanFrontiers in Artificial Intelligence
Low-Resource Unsupervised NMT: Diagnosing the Problem and Providing a Linguistically Motivated SolutionUnsupervised Machine Translation has been advancing our ability to translate without parallel data, but state-of-the-art methods assume an abundance of monolingual data. This paper investigates the ...