topi is the operator collection library for TVM intended at sharing the effort of crafting and optimizing tvm generated kernels. The goal:
- Provide sugars for operator declaration
- Give common primitives for fused op creation.
- Provide commonly used schedules under each architectures
- include C++ library, header only
- python python library
- recipe Recipe collections containing useful operator examples.
- Use numpy-style naming convention for known ops
- Seperate operator declaration from schedule when possible.
- This can be inconvenient but enables more general scheduling across ops.
- We can always recover the tensors from its outputs by traversing the tree.
- Deliberately assert the requirements
- Some kernels have requirements on shape and data layout, assert them
- Data layout aware, if not specified in argument or in function, assume NCHW by default.
- Add testcases to testout the schedule and dataflow in the TOPI workflow
- Only do correctness testing without attaching compiler flags and only run it once.
Since TVM is work in progress, some optimization might not be perfect. One quick way I find useful is to do codegen plus manual modification. The workflow is:
- Generate the GPU kernels, write them into a file, say
perf/matexp_generated.cu
- Copy the generated file into another one, say
perf/matexp_manual.cu
, do modifications according to your intuition. - Set use_manual flag in the script to continue the codegen workflow as normal, but piggy back the manual written code instead.
- Observe the performance difference.
- If the performance improves, mark the manual code and think of optimization pass to generate the desired target code.