From 0c9346be5820adacff269834caffb2573756b5ec Mon Sep 17 00:00:00 2001 From: Jael Gu Date: Tue, 19 Jul 2022 11:32:25 +0800 Subject: [PATCH] Add README for towhee.models.clip (#1582) Signed-off-by: Jael Gu --- towhee/models/clip/README.md | 63 +++++++++++++++++++++++++++++++++++- 1 file changed, 62 insertions(+), 1 deletion(-) diff --git a/towhee/models/clip/README.md b/towhee/models/clip/README.md index beb6eb315b..4e5bae3b28 100644 --- a/towhee/models/clip/README.md +++ b/towhee/models/clip/README.md @@ -1,3 +1,64 @@ # CLIP -The original codes, weights, and other files for CLIP are from the [official implementation](https://github.com/openai/CLIP). +CLIP in Towhee is built on top of the [official implementation](https://github.com/openai/CLIP). + +Available model names: +- clip_vit_b16 +- clip_vit_b32 (support multilingual) +- clip_resnet_r50 +- clip_resnet_r101 + +## Code Example + +- Create model +```python +from towhee.models import clip + +# Create CLIP model with parameters +model = clip.create_model( + embed_dim=512, image_resolution=4, + vision_layers=12, vision_width=768, vision_patch_size=2, + context_length=77, vocab_size=49408, transformer_width=512, + transformer_heads=8, transformer_layers=12 + ) + +# Create CLIP model with model name (no-pretrain) +model = clip.create_model(model_name='clip_vit_b32', pretrained=False) + +# Load pretrained model with model name +model = clip.create_model(model_name='clip_vit_b32', pretrained=True) +``` + +- Encode image +```python +import torch + +dummy_img = torch.rand(1, 3, 224, 224) +img_features = model.encode_image(dummy_img) +``` + +- Encode text +```python +# Tokenized input +text = torch.randint(high=49408, size=(1, 77), dtype=torch.int32) +text_features = model.encode_text(text) + +# String input +text = ['test'] +text_features = model.encode_text(text) + +# Multilingual only for supported models +text_chinese = ['测试'] +text_features = model.encode_text(text, multilingual=True) +``` + +- Calculate similarities +```python +img = torch.rand(1, 3, 224, 224) +text = ['test'] +logits_per_img, logits_per_text = model(img, text) + +# Multilingual only for supported models +text = ['测试'] +logits_per_img, logits_per_text = model(img, text, multilingual=True) +``` \ No newline at end of file