You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -88,6 +93,63 @@ Some quantization methods are aliases (for example, `int8wo` is the commonly use
88
93
89
94
Refer to the official torchao documentation for a better understanding of the available quantization methods and the exhaustive list of configuration options available.
90
95
96
+
## Serializing and Deserializing quantized models
97
+
98
+
To serialize a quantized model in a given dtype, first load the model with the desired quantization dtype and then save it using the [`~ModelMixin.save_pretrained`] method.
99
+
100
+
```python
101
+
import torch
102
+
from diffusers import FluxTransformer2DModel, TorchAoConfig
Some quantization methods, such as `uint4wo`, cannot be loaded directly and may result in an `UnpicklingError` when trying to load the models, but work as expected when saving them. In order to work around this, one can load the state dict manually into the model. Note, however, that this requires using `weights_only=False` in `torch.load`, so it should be run only if the weights were obtained from a trustable source.
130
+
131
+
```python
132
+
import torch
133
+
from accelerate import init_empty_weights
134
+
from diffusers import FluxPipeline, FluxTransformer2DModel, TorchAoConfig
"Currently, `device_map` is automatically inferred for quantized bitsandbytes models. Support for providing `device_map` as an input will be added in the future."
723
+
"Currently, providing `device_map` is not supported for quantized models. Providing `device_map` as an input will be added in the future."
0 commit comments