This repository provides the code, data and scripts for jointly training a vanilla transformer model from scratch in PyTorch. We train the model to learn two tasks simultaneously.
- BIO Slot Tagging (multi-class token classification)
- Core Relation extraction (multi-label sequence classification)
The dataset is generated based on film schema of Freebase knowledge graph. There are two files data/hw1_train.csv and data/hw1_test.csv. The train csv file has three columns: utterances, IOB Slot tags and Core Relations. The test csv file has only the utterances. The dataset looks like this:
Install the required libraries using the following command:
pip install -r requirements.txt
Run the train script using the following command:
./scripts/train.sh
Run the test script using the following command:
./scripts/test.sh