Skip to content

Code for Fact-level Extractive Summarization with Hierarchical Graph Mask on BERT (coling 2020)

Notifications You must be signed in to change notification settings

CSgaoan/FactExsum-coling2020

Repository files navigation

FactExsum-coling2020

Code for Fact-level Extractive Summarization with Hierarchical Graph Mask on BERT (coling 2020)

The CNN/DaliyMail dataset we use is directly from the chunked data in https://github.com/JafferWilson/Process-Data-of-CNN-DailyMailv, Download FINISHED FILES. The chunked data is put in /data/DMCNN/...

If you are interested in the fact-level CNN/DaliyMail dataset described in our paper, you can download them here: https://drive.google.com/file/d/1ma0uuXd5b2EgMUslRIGGF6pVPFHBCIs-/view?usp=sharing.


Introduction for the files:

/data/DMCNN/...: use to store the chunked CNN/DaliyMail dataset.

/data/raw_data_loader.py: use to extract article-summary pair from the chunked data.

/data_file/DMCNN/...: use to store the pickle files that contain processed data generated by make_data.py, and there are some examples in the folder. You can obtain the complete organized fact-level data with the link above.

/model/BERT.py: it contains BERT encoder with Hierarchical Graph Mask and the classifier for extractive summarization.

/utility/pyrougex.py: use to evaluate the result with ROUGE.

/utility/utility.py: it contains some functions used in make_data.py.

call_rouge.py: use to evaluate the result with ROUGE.

data_loader.py: data loader for training and testing the model, and it convert the data in pickle files into the form that used for BERT. It also construct the mask matrix.

make_data.py: split the chunked data into fact level and process the data. The output are pickle files stored in data_file.

run.py: use to train and test the model.


The output summary of our model "our s+f" is in result folder, the our s+f_cand refers to the standard setting described in our paper and our s+f 6_cand represents the result that extract 6 facts rather than 4 facts.

About

Code for Fact-level Extractive Summarization with Hierarchical Graph Mask on BERT (coling 2020)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%