Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
imitation_diffuser.ipynb		imitation_diffuser.ipynb
vis.mp4		vis.mp4

Repository files navigation

Diffusion model for control and planning tutorial

Recap of the diffusion model

We learn a score function (similar to the noise direction) and use that to recover our distribution.

Emperical success:

handling multimodal action distributions
being suitable for high-dimensional action spaces
exhibiting impressive training stability

Movitation: why do we need a diffuser?

Overall objective: use a diffuser as a powerful distribution matching tool for control and planning problems.

Where do we need a distribution match?

Imitation learning: match the expert's action distribution (mentioning GAIL, adverserial training. with diffusion model, it become more stable. especially for multi-task)
Offline reinforcement learning: match the policy's action distribution (need to be expressive enough to match the distribution of the policy and also not deviate too much from the expert's distribution, extrapolation error problem)
- challenge: extrapolation error problem
- current solution: panelize/constrain OOD samples -> overconserative
Model-based reinforcement learning: match the dynamic model (need to work in the long horizon) + policy's action distribution(sometimes)

Why diffusion works here?

non-autoregressive (no sequential dependency): compounding error is not a problem, but still can generate any length of sequence with certain architecture choise
multimodal: can handle multimodal action distributions
matching the distribution: can match the distribution of the expert's action
High capacity + high expressiveness: can handle high-dimensional action spaces -> foundation models, 50 demostrations per task

smooth

Practice: how to use diffuser?

Things to diffuse:

in image: 2d pixel value
in control: 1d control/trajectory sequence

Architecture:

temporal convolutional network (TCN)

How to make it condition on certain objective?

guidance function: directly shift the distribution / cost or learned value etc.
inpainting: fill the missing part of the distribution so as to constrains certain part of the distribution

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diffusion model for control and planning tutorial

Recap of the diffusion model

Movitation: why do we need a diffuser?

Practice: how to use diffuser?

Applications: research progress in diffuser for control and planning

Limitations: what are the challenges?

About

Releases

Packages

License

jc-bao/diffuser-control-tutorial

Folders and files

Latest commit

History

Repository files navigation

Diffusion model for control and planning tutorial

Recap of the diffusion model

Movitation: why do we need a diffuser?

Practice: how to use diffuser?

Applications: research progress in diffuser for control and planning

Limitations: what are the challenges?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages