Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Paulmzr authored Feb 22, 2023
1 parent 2d06841 commit 4be16a4
Showing 1 changed file with 4 additions and 40 deletions.
44 changes: 4 additions & 40 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# DA-Transformer
# FA-DAT

Implementation for the ICML2022 paper "**Directed Acyclic Transformer for Non-Autoregressive Machine Translation**".
Implementation for the ICLR2023 paper "**Fuzzy Alignments in Directed Acyclic Graph for Non-autoregressive Machine Translation**".

**Abstract**: Directed Acyclic Transformer (DA-Transformer) represents the hidden states in a Directed Acyclic Graph (DAG), where each path of the DAG corresponds to a specific translation. The whole DAG simultaneously captures multiple translations and facilitates fast predictions in a non-autoregressive fashion.

Expand All @@ -12,8 +12,6 @@ Implementation for the ICML2022 paper "**Directed Acyclic Transformer for Non-Au



![model](model.png)



This repo is modified from [``fairseq:5175fd``](https://github.com/pytorch/fairseq/tree/5175fd5c267adceec9445bf067597686e159e7e7), please see [fairseq documentation](https://fairseq.readthedocs.io/en/latest/) for more information.
Expand All @@ -35,50 +33,16 @@ This repo is modified from [``fairseq:5175fd``](https://github.com/pytorch/fairs

## Main Files

Most codes of the framework are from Fairseq. We mainly add the following files.
Most codes of the framework are from Fairseq and DA-Transformer. We mainly add the following file.

### fs_plugins

```
fs_plugins
├── criterions
│ └── nat_dag_loss.py # DA-Transformer loss
├── cub # Requirements: Nvidia CUDA programming model
├── custom_ops # operations implementations and cuda kernels
│ ├── dag_best_alignment.cu
│ ├── logsoftmax_gather.cu
│ ├── dag_loss.cu
│ ├── dag_loss.py
│ └── dag_loss.cpp
├── models
│ ├── glat_decomposed_with_link.py # A PyTorch implementation of DA-Transformer
│ ├── ls_glat_decomposed_with_link.py # A lightseq implementation of DA-Transformer
│ └── ls_* # Other files required for lightseq
├── optimizer
│ └── ls_adam.py # Lightseq Adam
├── scripts
│ ├── test_tradeoff.py # Parameter search script used in BeamSearch
│ ├── average_checkpoints.py # Average checkpoints tricks
| └── convert_ls_to_fairseq.py # Converting lightseq model to fairseq model
└── tasks
└── translation_lev_modified.py
└── nat_dag_loss_ngram.py # fuzzy alignment loss
```

### Modified LightSeq for NAT

We include a customized [LightSeq](https://github.com/thu-coai/lightseq-nat/). The changes include:

* Implement the non-autoregressive decoder based on the LightSeq autoregressive decoder
* Increase the supported max length (1024 for now)
* Align the parameters and model architectures with Fairseq implementation, and provide a script for checkpoint conversion.

### BeamSearch on DAG

We include [dag_search](https://github.com/thu-coai/DAG-Search) to implement the BeamSearch algorithm.

## Data Preprocessing

Please follow the [instruction](https://github.com/facebookresearch/fairseq/tree/main/examples/translation#wmt14-english-to-german-convolutional) in Fairseq to prepare the data.

The training require binarized data generated by the following script:

Expand Down

0 comments on commit 4be16a4

Please sign in to comment.