Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
README.md		README.md
batchify.py		batchify.py
batchify.sh		batchify.sh
clean_tb.py		clean_tb.py
constant.py		constant.py
corpus.py		corpus.py
grammar.py		grammar.py
operand.py		operand.py
spmrl.py		spmrl.py
treebank.py		treebank.py

Repository files navigation

XCFGs

Aiming at unifying all extensions of context-free grammars (XCFGs). X stands for weighted, (compound) probabilistic, and neural extensions, etc.

Data

The repo handles WSJ, CTB, and SPMRL. Have a look at treebank.py.

If you are looking for the data used in C-PCFGs. Follow the instructions in treebank.py and put all outputs in the same folder, let us say ./data.punct. The script only removes morphology features and creates data splits. To remove punctuation we will need clean_tb.py. For example, I used python clean_tb.py ./data.punct ./data.clean. All the cleaned treebanks will reside in /data.clean. Then simply execute the command ./batchify.sh ./data.clean/, you will have all the data needed to reproduce the results in C-PCFGs. Feel free to change parameters in batchify.sh if you want to use a different batch size or vocabulary size.

Citing XCFGs

If you use XCFGs in your research or wish to refer to the results in C-PCFGs, please use the following BibTeX entry.

@article{zhao2020xcfg,
  author = {Zhao, Yanpeng},
  title  = {An Empirical Study of Compound PCFGs},
  journal= {https://github.com/zhaoyanpeng/cpcfg},
  url    = {https://github.com/zhaoyanpeng/cpcfg},
  year   = {2020}
}

Acknowledgements

batchify.py is borrowed from C-PCFGs.

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

XCFGs

Data

Citing XCFGs

Acknowledgements

License

About

Languages

zhaoyanpeng/xcfg

Folders and files

Latest commit

History

Repository files navigation

XCFGs

Data

Citing XCFGs

Acknowledgements

License

About

Topics

Resources

Stars

Watchers

Forks

Languages