Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
data_file/DMCNN		data_file/DMCNN
model		model
result		result
utility		utility
README.md		README.md
call_rouge.py		call_rouge.py
data_loader.py		data_loader.py
make_data.py		make_data.py
run.py		run.py
utility.py		utility.py

Repository files navigation

FactExsum-coling2020

Code for Fact-level Extractive Summarization with Hierarchical Graph Mask on BERT (coling 2020)

The CNN/DaliyMail dataset we use is directly from the chunked data in https://github.com/JafferWilson/Process-Data-of-CNN-DailyMailv, Download FINISHED FILES. The chunked data is put in /data/DMCNN/...

If you are interested in the fact-level CNN/DaliyMail dataset described in our paper, you can download them here: https://drive.google.com/file/d/1ma0uuXd5b2EgMUslRIGGF6pVPFHBCIs-/view?usp=sharing.

Introduction for the files:

/data/DMCNN/...: use to store the chunked CNN/DaliyMail dataset.

/data/raw_data_loader.py: use to extract article-summary pair from the chunked data.

/data_file/DMCNN/...: use to store the pickle files that contain processed data generated by make_data.py, and there are some examples in the folder. You can obtain the complete organized fact-level data with the link above.

/model/BERT.py: it contains BERT encoder with Hierarchical Graph Mask and the classifier for extractive summarization.

/utility/pyrougex.py: use to evaluate the result with ROUGE.

/utility/utility.py: it contains some functions used in make_data.py.

call_rouge.py: use to evaluate the result with ROUGE.

data_loader.py: data loader for training and testing the model, and it convert the data in pickle files into the form that used for BERT. It also construct the mask matrix.

make_data.py: split the chunked data into fact level and process the data. The output are pickle files stored in data_file.

run.py: use to train and test the model.

The output summary of our model "our s+f" is in result folder, the our s+f_cand refers to the standard setting described in our paper and our s+f 6_cand represents the result that extract 6 facts rather than 4 facts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FactExsum-coling2020

About

Releases

Packages

Languages

CSgaoan/FactExsum-coling2020

Folders and files

Latest commit

History

Repository files navigation

FactExsum-coling2020

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages