Name	Name	Last commit message	Last commit date
Latest commit History 40 Commits
XSum-Topic-ConvS2S	XSum-Topic-ConvS2S
checkpoints-topic-convs2s	checkpoints-topic-convs2s
data-topic-convs2s	data-topic-convs2s
dataset	dataset
gen-files	gen-files
rouge	rouge
.gitignore	.gitignore
README.md	README.md
extract-hypothesis-fairseq.py	extract-hypothesis-fairseq.py
final-hypothesis.txt	final-hypothesis.txt
rouge_scores.txt	rouge_scores.txt
test-output-topic-convs2s-checkpoint-best.pt	test-output-topic-convs2s-checkpoint-best.pt

Name

Last commit message

Last commit date

XSum-Topic-ConvS2S

checkpoints-topic-convs2s

extract-hypothesis-fairseq.py

final-hypothesis.txt

rouge_scores.txt

test-output-topic-convs2s-checkpoint-best.pt

Summary-T-ConvS2S

An implementation of Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization

XSum Dataset

Follow the steps mentioned in this README for

Generating the XSum dataset starting from bbc Urls
Training the LDA Model from scratch
Decoding word-topics and doc-topics using the LDA model
Data Processing

Training

Data Preprocessing

Generate source and target dictionary files. In this case, both files are identical (due to "--joined-dictionary"). It operates on the raw format data.

TEXT= {path to xsum_data_topic_convs2s dir}
!python ./XSum-Topic-ConvS2S/preprocess.py --source-lang document \
                                         --target-lang summary \
                                         --trainpref $TEXT/train \
                                         --validpref $TEXT/validation \
                                         --testpref $TEXT/test \
                                         --destdir $TEXT \
                                         --joined-dictionary \
                                         --nwordstgt 50000 \
                                         --nwordssrc 50000 \
                                         --output-format raw

Model Training

The model requires GPU for training. Check usage with -h for changing variant and hyperparameters

Model variants:

TCONVS2S enc(t',tD) dec(tD)
TCONVS2S enc(t') dec(tD)

save_directory = "./checkpoints-topic-convs2s"
CUDA_VISIBLE_DEVICES=1 
!python ./dataset/scripts/XSum-Topic-ConvS2S/train.py $TEXT --source-lang document \
                                                            --target-lang summary \
                                                            --doctopics doc-topics \
                                                            --max-sentences 32 \
                                                            --arch fconv \
                                                            --variant 1 \
                                                            --criterion label_smoothed_cross_entropy \
                                                            --max-epoch 200 \
                                                            --clip-norm 0.1 \
                                                            --lr 0.10 \
                                                            --dropout 0.2 \
                                                            --save-dir {save_directory} \
                                                            --no-progress-bar \
                                                            --log-interval 10

Run with the Pretrained model

Download the pretrained model at Pretrained Topic-ConvS2S model and dictionary files (1.2 GB) Make sure that ./xsum-data-topic-convs2s has the test files to decode, the source and target dictionary files.

!python ./XSum-Topic-ConvS2S/generate.py ./xsum-data-topic-convs2s-output --path ../checkpoints-topic-convs2s/checkpoint_last.pt \
                                                                          --batch-size 1 \
                                                                          --beam 10 \
                                                                          --replace-unk \
                                                                          --source-lang document \
                                                                          --target-lang summary \
                                                                          --doctopics doc-topics \
                                                                          --encoder-embed-dim 512 > ./test-output-topic-convs2s-checkpoint-best.pt

Extract the Hypothesis

To extract the summary from a given document, run the following

!python ./extract-hypothesis-fairseq.py -o .//test-output-topic-convs2s-checkpoint-best.pt -f ./final-test-output-topic-convs2s-checkpoint-best.pt

ROUGE

!python path/eval_rouge.py --summary {system_summary_file} --mod_sum {model_summary_file}

Take txt files with generated summaries and a file with the corresponding model gold summaries and evaluates P, R, F on rouge-1, rouge-2, rouge-l Sample Output

rouge-1:	P: 30.00	R: 37.50	F1: 33.33
rouge-2:	P: 11.11	R: 14.29	F1: 12.50
rouge-l:	P: 26.15	R: 31.50	F1: 28.58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Summary-T-ConvS2S

XSum Dataset

Training

Data Preprocessing

Model Training

Run with the Pretrained model

Extract the Hypothesis

ROUGE

About

Releases

Packages

Contributors 3

Languages

ramshankar99/Summary-T-ConvS2S

Folders and files

Latest commit

History

Repository files navigation

Summary-T-ConvS2S

XSum Dataset

Training

Data Preprocessing

Model Training

Run with the Pretrained model

Extract the Hypothesis

ROUGE

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages