Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIE] Solve it. Work In Progress adding a solver-based postpipeliner #255

Draft
wants to merge 4 commits into
base: aie-public
Choose a base branch
from

Conversation

martien-de-jong
Copy link
Collaborator

@martien-de-jong martien-de-jong commented Jan 10, 2025

The solver tries to find an (II,NStages) SWP schedule. The variables represent the stage and modulo cycle in which each instruction should run. From those we get linear expressions for the execution cycle, which are used to generate linear constraints representing the dependencies and their latencies.
Further constraints make sure every instruction is scheduled, and that only a single instance of each slot and resource is used in every modulo cycle.

In practice, adding more constraints increases the runtime. In particular, adding the conflicts for loop-carried dependences and memory bank conflicts for conv2d_bf16-sized kernels has been seen to increase solver time to over an hour, which is clearly not acceptable.

The solution is used to guide a regular postpipeliner strategy, which will reject the solution if it violates constraints. This allows us to solve an incompletely constrained problem and opportunistically apply it if it fits.

Status
Implementation is pretty stable wrt not causing crashes or wrong code. There's a crude time control in place for the solver.
The constraints currently aren't tight enough to guarantee a correct solution, but the solution that is found is ultimately checked against all real constraints by a dedicated postpipeliner strategy.

There are three solvers implementing the SWPSolver interface: Z3Binary, Z3Linear and LPFile. Only Z3Binary is mature; Z3Linear's constraints aren't nearly as complete, and runtime seems to be more of a problem. LPFile holds the original prototype implementation, but just generates an lpfile that needs to be solved and the results interpreted.

The main gap is the fact that XM slot is seen as a slot that is independent from X and M. That makes the solutions over-optimistic, and since it is a one-shot try, we will fall back to the less powerful heuristics. Since these slots also feed into SlotCounts, ResMII is over-optimistic, but this doesn't hurt so much, it only adds extra II steps.
There is separate work that tries to alleviate this flaw, by recognizing the implication of XM to X and M.
TODO: create a PR on that branch and link it to this one.

Productization also needs to make z3 available in the product distribution. Preferably we link it statically, but we need to arrange for that at build time and hence in CI.

Martien de Jong added 4 commits January 22, 2025 10:25
The solver tries to find an (II,NStages) SWP schedule.
The variables represent the stage and modulo cycle inwhich each instruction
should run. From those we get linear expressions for the execution cycle,
which are used to generate linear constraints representing the dependencies
and their latencies.
Further constraints make sure every instruction is scheduled, and that only a
single instance of each slot and resource is used in every modulo cycle.

In practice, adding more constraints increases the runtime. In particular,
adding the conflicts for loop-carried dependences and memory bank conflicts
for conv2d_bf16-sized kernels has been seen to increase solver time to over
an hour, which is clearly not acceptable.

The solution is used to guide a regular postpipeliner strategy, which will
reject the solution if it violates constraints. This allows us to solve an
incompletely constrained problem and opportunistically apply it if it fits
Put in some runtime control, a real one based on timeouts, and a deterministic
one pre-deciding based on estimated milliseconds
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant