Skip to content

Code for generating synthetic PCFGs for testing grammatical inference algorithms.

License

Notifications You must be signed in to change notification settings

alexc17/syntheticpcfg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

79 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code for generating synthetic PCFGs with desirable properties, that can be useful for learning experiments.

The generated PCFGs will be in CNF, and will be consistent (i.e. the sum of the probability of all strings will be 1.)

There are two main types that we generate: the first has a non trivial CFG backbone, and the second has a trivial CFG, 
including all possible productions.
The default settings will give grammars with 10 nonterminals, about 10,000 terminals, a length distribution that is 
zero truncated Poisson with expected length about 5 and a fat tailed lexical distribution (Zipfian) using a lognormal distribution.

python sample_grammar.py /tmp/grammar1.pcfg


python sample_fullgrammar.py /tmp/grammar1.pcfg


python sample_corpus.py /tmp/grammar1.pcfg /tmp/grammar1.samples


python sample_corpus.py --yieldonly --omitprobs /tmp/grammar1.pcfg /tmp/grammar1.samples



About

Code for generating synthetic PCFGs for testing grammatical inference algorithms.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published