Skip to content

Commit

Permalink
Fixes tokenization of XNLI training file
Browse files Browse the repository at this point in the history
  • Loading branch information
Alexis Conneau committed Sep 10, 2019
1 parent cd9c7c8 commit e6ecf46
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion get-data-xnli.sh
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ echo "*** Preparing English train set ****"
echo -e "premise\thypo\tlabel" > $XNLI_PATH/en.train
sed '1d' $OUTPATH/XNLI-MT-1.0/multinli/multinli.train.en.tsv | cut -f1 | python $LOWER_REMOVE_ACCENT > $XNLI_PATH/train.f1
sed '1d' $OUTPATH/XNLI-MT-1.0/multinli/multinli.train.en.tsv | cut -f2 | python $LOWER_REMOVE_ACCENT > $XNLI_PATH/train.f2
sed '1d' $OUTPATH/XNLI-MT-1.0/multinli/multinli.train.en.tsv | cut -f3 | sed 's/\tcontradictory/\tcontradiction/g' > $XNLI_PATH/train.f3
sed '1d' $OUTPATH/XNLI-MT-1.0/multinli/multinli.train.en.tsv | cut -f3 | sed 's/contradictory/contradiction/g' > $XNLI_PATH/train.f3
paste $XNLI_PATH/train.f1 $XNLI_PATH/train.f2 $XNLI_PATH/train.f3 >> $XNLI_PATH/en.train

rm $XNLI_PATH/train.f1 $XNLI_PATH/train.f2 $XNLI_PATH/train.f3
Expand Down

0 comments on commit e6ecf46

Please sign in to comment.