Skip to content

Commit

Permalink
Add Shap-E (huggingface#3742)
Browse files Browse the repository at this point in the history
* refactor prior_transformer

adding conversion script

add pipeline

add step_index from pipeline, + remove permute

add zero pad token

remove copy from statement for betas_for_alpha_bar function

* add

* add

* update conversion script for renderer model

* refactor camera a little bit

* clean up

* style

* fix copies

* Update src/diffusers/schedulers/scheduling_heun_discrete.py

Co-authored-by: Patrick von Platen <[email protected]>

* Update src/diffusers/pipelines/shap_e/pipeline_shap_e.py

Co-authored-by: Patrick von Platen <[email protected]>

* Update src/diffusers/pipelines/shap_e/pipeline_shap_e.py

Co-authored-by: Patrick von Platen <[email protected]>

* alpha_transform_type

* remove step_index argument

* remove get_sigmas_karras

* remove _yiyi_sigma_to_t

* move the rescale prompt_embeds from prior_transformer to pipeline

* replace baddbmm with einsum to match origial repo

* Revert "replace baddbmm with einsum to match origial repo"

This reverts commit 3f6b435.

* add step_index to scale_model_input

* Revert "move the rescale prompt_embeds from prior_transformer to pipeline"

This reverts commit 5b5a8e6.

* move rescale from prior_transformer to pipeline

* correct step_index in scale_model_input

* remove print lines

* refactor prior - reduce arguments

* make style

* add prior_image

* arg embedding_proj_norm -> norm_embedding_proj

* add pre-norm for proj_embedding

* move rescale prompt from pipeline to _encode_prompt

* add img2img pipeline

* style

* copies

* Update src/diffusers/models/prior_transformer.py

Co-authored-by: Patrick von Platen <[email protected]>

* Update src/diffusers/models/prior_transformer.py

Co-authored-by: Patrick von Platen <[email protected]>

* Update src/diffusers/models/prior_transformer.py

Co-authored-by: Patrick von Platen <[email protected]>

* Update src/diffusers/models/prior_transformer.py

add arg: encoder_hid_proj

Co-authored-by: Patrick von Platen <[email protected]>

* Update src/diffusers/models/prior_transformer.py

add new config: norm_in_type

Co-authored-by: Patrick von Platen <[email protected]>

* Update src/diffusers/models/prior_transformer.py

add new config: added_emb_type

Co-authored-by: Patrick von Platen <[email protected]>

* Update src/diffusers/models/prior_transformer.py

rename out_dim -> clip_embed_dim

Co-authored-by: Patrick von Platen <[email protected]>

* Update src/diffusers/models/prior_transformer.py

rename config: out_dim -> clip_embed_dim

Co-authored-by: Patrick von Platen <[email protected]>

* Update src/diffusers/models/prior_transformer.py

Co-authored-by: Patrick von Platen <[email protected]>

* Update src/diffusers/models/prior_transformer.py

Co-authored-by: Patrick von Platen <[email protected]>

* finish refactor prior_tranformer

* make style

* refactor renderer

* fix

* make style

* refactor img2img

* remove params_proj

* add test

* add upcast_softmax to prior_transformer

* enable num_images_per_prompt, add save_gif utility

* add

* add fast test

* make style

* add slow test

* style

* add test for img2img

* refactor

* enable batching

* style

* refactor scheduler

* update test

* style

* attempt to solve batch related tests timeout

* add doc

* Update src/diffusers/pipelines/shap_e/pipeline_shap_e.py

Co-authored-by: Patrick von Platen <[email protected]>

* Update src/diffusers/pipelines/shap_e/pipeline_shap_e_img2img.py

Co-authored-by: Patrick von Platen <[email protected]>

* hardcode rendering related config

* update betas_for_alpha_bar on ddpm_scheduler

* fix copies

* fix

* export_to_gif

* style

* second attempt to speed up batching tests

* add doc page to index

* Remove intermediate clipping

* 3rd attempt to speed up batching tests

* Remvoe time index

* simplify scheduler

* Fix more

* Fix more

* fix more

* make style

* fix schedulers

* fix some more tests

* finish

* add one more test

* Apply suggestions from code review

Co-authored-by: Sayak Paul <[email protected]>
Co-authored-by: Pedro Cuenca <[email protected]>
Co-authored-by: Patrick von Platen <[email protected]>

* style

* apply feedbacks

* style

* fix copies

* add one example

* style

* add example for img2img

* fix doc

* fix more doc strings

* size -> frame_size

* style

* update doc

* style

* fix on doc

* update repo name

* improve the usage example in shap-e img2img

* add usage examples in the shap-e docs.

* consolidate examples.

* minor fix.

* update doc

* Apply suggestions from code review

* Apply suggestions from code review

* remove upcast

* Make sure background is white

* Update src/diffusers/pipelines/shap_e/pipeline_shap_e.py

* Apply suggestions from code review

* Finish

* Apply suggestions from code review

* Update src/diffusers/pipelines/shap_e/pipeline_shap_e.py

* Make style

---------

Co-authored-by: yiyixuxu <yixu310@gmail,com>
Co-authored-by: Patrick von Platen <[email protected]>
Co-authored-by: Sayak Paul <[email protected]>
Co-authored-by: Pedro Cuenca <[email protected]>
  • Loading branch information
5 people authored Jul 6, 2023
1 parent 7462156 commit 45f6d52
Show file tree
Hide file tree
Showing 37 changed files with 3,534 additions and 116 deletions.
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -226,6 +226,8 @@
title: Self-Attention Guidance
- local: api/pipelines/semantic_stable_diffusion
title: Semantic Guidance
- local: api/pipelines/shap_e
title: Shap-E
- local: api/pipelines/spectrogram_diffusion
title: Spectrogram Diffusion
- sections:
Expand Down
139 changes: 139 additions & 0 deletions docs/source/en/api/pipelines/shap_e.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# Shap-E

## Overview


The Shap-E model was proposed in [Shap-E: Generating Conditional 3D Implicit Functions](https://arxiv.org/abs/2305.02463) by Alex Nichol and Heewon Jun from [OpenAI](https://github.com/openai).

The abstract of the paper is the following:

*We present Shap-E, a conditional generative model for 3D assets. Unlike recent work on 3D generative models which produce a single output representation, Shap-E directly generates the parameters of implicit functions that can be rendered as both textured meshes and neural radiance fields. We train Shap-E in two stages: first, we train an encoder that deterministically maps 3D assets into the parameters of an implicit function; second, we train a conditional diffusion model on outputs of the encoder. When trained on a large dataset of paired 3D and text data, our resulting models are capable of generating complex and diverse 3D assets in a matter of seconds. When compared to Point-E, an explicit generative model over point clouds, Shap-E converges faster and reaches comparable or better sample quality despite modeling a higher-dimensional, multi-representation output space.*

The original codebase can be found [here](https://github.com/openai/shap-e).

## Available Pipelines:

| Pipeline | Tasks |
|---|---|
| [pipeline_shap_e.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/shap_e/pipeline_shap_e.py) | *Text-to-Image Generation* |
| [pipeline_shap_e_img2img.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/shap_e/pipeline_shap_e_img2img.py) | *Image-to-Image Generation* |

## Available checkpoints

* [`openai/shap-e`](https://huggingface.co/openai/shap-e)
* [`openai/shap-e-img2img`](https://huggingface.co/openai/shap-e-img2img)

## Usage Examples

In the following, we will walk you through some examples of how to use Shap-E pipelines to create 3D objects in gif format.

### Text-to-3D image generation

We can use [`ShapEPipeline`] to create 3D object based on a text prompt. In this example, we will make a birthday cupcake for :firecracker: diffusers library's 1 year birthday. The workflow to use the Shap-E text-to-image pipeline is same as how you would use other text-to-image pipelines in diffusers.

```python
import torch

from diffusers import DiffusionPipeline

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

repo = "openai/shap-e"
pipe = DiffusionPipeline.from_pretrained(repo, torch_dtype=torch.float16)
pipe = pipe.to(device)

guidance_scale = 15.0
prompt = ["A firecracker", "A birthday cupcake"]

images = pipe(
prompt,
guidance_scale=guidance_scale,
num_inference_steps=64,
frame_size=256,
).images
```

The output of [`ShapEPipeline`] is a list of lists of images frames. Each list of frames can be used to create a 3D object. Let's use the `export_to_gif` utility function in diffusers to make a 3D cupcake!

```python
from diffusers.utils import export_to_gif

export_to_gif(images[0], "firecracker_3d.gif")
export_to_gif(images[1], "cake_3d.gif")
```
![img](https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/shap_e/firecracker_out.gif)
![img](https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/shap_e/cake_out.gif)


### Image-to-Image generation

You can use [`ShapEImg2ImgPipeline`] along with other text-to-image pipelines in diffusers and turn your 2D generation into 3D.

In this example, We will first genrate a cheeseburger with a simple prompt "A cheeseburger, white background"

```python
from diffusers import DiffusionPipeline
import torch

pipe_prior = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-1-prior", torch_dtype=torch.float16)
pipe_prior.to("cuda")

t2i_pipe = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-1", torch_dtype=torch.float16)
t2i_pipe.to("cuda")

prompt = "A cheeseburger, white background"

image_embeds, negative_image_embeds = pipe_prior(prompt, guidance_scale=1.0).to_tuple()
image = t2i_pipe(
prompt,
image_embeds=image_embeds,
negative_image_embeds=negative_image_embeds,
).images[0]

image.save("burger.png")
```

![img](https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/shap_e/burger_in.png)

we will then use the Shap-E image-to-image pipeline to turn it into a 3D cheeseburger :)

```python
from PIL import Image
from diffusers.utils import export_to_gif

repo = "openai/shap-e-img2img"
pipe = DiffusionPipeline.from_pretrained(repo, torch_dtype=torch.float16)
pipe = pipe.to("cuda")

guidance_scale = 3.0
image = Image.open("burger.png").resize((256, 256))

images = pipe(
image,
guidance_scale=guidance_scale,
num_inference_steps=64,
frame_size=256,
).images

gif_path = export_to_gif(images[0], "burger_3d.gif")
```
![img](https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/shap_e/burger_out.gif)

## ShapEPipeline
[[autodoc]] ShapEPipeline
- all
- __call__

## ShapEImg2ImgPipeline
[[autodoc]] ShapEImg2ImgPipeline
- all
- __call__
Loading

0 comments on commit 45f6d52

Please sign in to comment.