extracting probabilities? #5

regehr · 2019-11-21T03:31:56Z

Hello! I'm sorry to abuse github issues in this fashion, but I'm not sure how else to ask for a bit of advice.

I have a simple random program generator that looks like a decision tree. It is not context free, but it is not all that far from being context free. When making a uniform choice among alternatives at every decision point, it produces extremely skewed output. I want to be able to (approximately) uniformly sample the leaves of my decision tree.

I am happy to extract a CFG from my generator (by hand) so that I can run it through gramtropy, but what I want is not for gram to sample from the resulting grammar, but rather a list of branch probabilities that I can paste back into my program so that it (approximately) uniformly samples from its leaves.

Is this a feasible thing to ask your tool for? Any help appreciated, thanks.

regehr · 2019-11-21T03:33:24Z

If you wish a bit of additional background, this is the generator I'm working on:
https://github.com/regehr/opt-fuzz
and here's a related blog post:
https://blog.regehr.org/archives/1700

sipa · 2019-11-21T18:12:52Z

Interesting use case!

Depending on how closely you want the output to match the input grammar, that may be easy or hard to do.

The output of gramc is a file that contains a representation of a DAG, where each leaf is a list (disjunction) of strings, and each internal node is either a concatenation or disjunction of 2 or more nodes. Each node also has a count associated with it: the total number of combinations that node (including all its children) can expand to. This representation avoids cycles by specializing each symbol in the input grammar for each length (in number of string characters), so while SYM = "" | (("a" | "b") SYM) is cyclic and has an infinite number of expansions, the specialization of SYM for 7 characters is acyclic, and has a finite number of expansions.

Furthermore a number of optimizations are applied, where (specialized) symbols with only a small number of probabilities are just pre-expanded into all possible strings.

Outputting the DAG in a human-readable format with branch probabilities for all disjunctions wouldn't be very hard, but mapping it back to the original CFG would be a lot harder.

regehr · 2019-11-29T04:59:47Z

Interesting, thanks!
Hmm, I definitely want to map back to the original grammar.
Do you mind saying where the method you implemented originated? For example I've been reading this one and some of its predecessors and successors: https://ir.canterbury.ac.nz/handle/10092/11231

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extracting probabilities? #5

extracting probabilities? #5

regehr commented Nov 21, 2019

regehr commented Nov 21, 2019

sipa commented Nov 21, 2019 •

edited

Loading

regehr commented Nov 29, 2019

extracting probabilities? #5

extracting probabilities? #5

Comments

regehr commented Nov 21, 2019

regehr commented Nov 21, 2019

sipa commented Nov 21, 2019 • edited Loading

regehr commented Nov 29, 2019

sipa commented Nov 21, 2019 •

edited

Loading