Skip to content

NMT model use the multi-task learning and regularization techniques to explore the bidirectional decoding

License

Notifications You must be signed in to change notification settings

andy-yangz/bidirectional-decoder-NMT

 
 

Repository files navigation

Bidirectional Decoding for Neural Machine Translation

This repository is built upon the OpenNMT-py code. The basic usage of this repository please go to check its document.

The intuition of this project is that because of the property of the autoregressive decoding process of the sequential decoder, it normally translate in one direction. In this way, we think this kind of model have not make enough usage of bidirectional information of target language. So we proposed two ways to explore this kind of bidiretional information.

Multi-task Learning

The first way we explore is Multi-task learning (MTL) way. In this method, we take forward and backward decoding as two tasks.

And later we jointly train them with sharing some components. Here, we share same encoder defaultly. We mainly share three components: attention, embedding, and generator.

Then we can try to share single component or multiple component. Such as:

Or share multiple:

In the training phase, we assume the shared component learned backward information. So in the test phase, we throw the backward decoding component except shared componets, and predict target with forward decoding.

Result show this model get improvement on WMT DE-EN task (on the full data we get +0.98 near 1 BLEU score improvement than base model) and ZH-EN task (only test on new commentary data because of limited resource, with +0.95 BLEU score improvement).

Regularization

This idea is quite simple. We enforce the forward and backward decoding RNN hidden states in same time step to close each other by regularization.

Regulatization here can be various. We use two ways here. First, we just use L2 regularization directly. But this is too strict. Second, to add more flexibilty, we add two linear layer to both hidden states before do L2 regularization.

Quickstart

You can enable above train options as below:

  • -share_atten: Expecify sharing attention component.
  • -share_embed: Expecify sharing word embedding component.
  • -share_gen: Expecify sharing generator component.
  • -l2_reg: Expecify L2 regularization, you choose between three options. none,direct, and affine.

About

NMT model use the multi-task learning and regularization techniques to explore the bidirectional decoding

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 80.7%
  • Perl 10.4%
  • Emacs Lisp 4.0%
  • Shell 3.2%
  • Smalltalk 0.5%
  • Ruby 0.4%
  • Other 0.8%