Tag-less Back-Translation


An effective method to generate a large number of parallel sentences for training improved neural machine translation (NMT) systems is the use of back-translations of the target-side monolingual data. The method was not able to utilize the available huge amount of monolingual data because of the inability of models to differentiate between the authentic and synthetic parallel data. Tagging, or using gates, has been used to enable translation models to distinguish between synthetic and authentic data, improving standard back-translation and also enabling the use of iterative back-translation on language pairs that under-performed using standard back-translation. This work presents pre-training and fine-tuning as a simplified but more effective approach of differentiating between the two data. The approach - tag-less back-translation - trains the model on the synthetic data and fine-tunes it on the authentic data. Experiments have shown the approach to outperform the baseline and standard back-translation by 4.0 and 0.7 BLEU respectively on low resource English-Vietnamese NMT. While the need for tagging (noising) the dataset has been removed, the technique outperformed tagged back-translation by 0.4 BLEU. The approach reached the best scores in less training time than the standard and tagged back-translation approaches.