[AIE] Solve it. Work In Progress adding a solver-based postpipeliner #255
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The solver tries to find an (II,NStages) SWP schedule. The variables represent the stage and modulo cycle in which each instruction should run. From those we get linear expressions for the execution cycle, which are used to generate linear constraints representing the dependencies and their latencies.
Further constraints make sure every instruction is scheduled, and that only a single instance of each slot and resource is used in every modulo cycle.
In practice, adding more constraints increases the runtime. In particular, adding the conflicts for loop-carried dependences and memory bank conflicts for conv2d_bf16-sized kernels has been seen to increase solver time to over an hour, which is clearly not acceptable.
The solution is used to guide a regular postpipeliner strategy, which will reject the solution if it violates constraints. This allows us to solve an incompletely constrained problem and opportunistically apply it if it fits.
Status
Implementation is pretty stable wrt not causing crashes or wrong code. There's a crude time control in place for the solver.
The constraints currently aren't tight enough to guarantee a correct solution, but the solution that is found is ultimately checked against all real constraints by a dedicated postpipeliner strategy.
There are three solvers implementing the SWPSolver interface: Z3Binary, Z3Linear and LPFile. Only Z3Binary is mature; Z3Linear's constraints aren't nearly as complete, and runtime seems to be more of a problem. LPFile holds the original prototype implementation, but just generates an lpfile that needs to be solved and the results interpreted.
The main gap is the fact that XM slot is seen as a slot that is independent from X and M. That makes the solutions over-optimistic, and since it is a one-shot try, we will fall back to the less powerful heuristics. Since these slots also feed into SlotCounts, ResMII is over-optimistic, but this doesn't hurt so much, it only adds extra II steps.
There is separate work that tries to alleviate this flaw, by recognizing the implication of XM to X and M.
TODO: create a PR on that branch and link it to this one.
Productization also needs to make z3 available in the product distribution. Preferably we link it statically, but we need to arrange for that at build time and hence in CI.