Ting Pan1,2*, Lulu Tang2*, Xinlong Wang2¶, Shiguang Shan1
We present Tokenize Anything via Prompting, a unified and promptable model capable of simultaneously segmenting, recognizing, and captioning arbitrary regions, with flexible visual prompts (point, box and sketch). The model is trained with exhaustive segmentation masks sourced from SA-1B, coupled with semantic priors from a pre-trained EVA-CLIP with 5 billion parameters.
torch
flash-attn
>= 2.3.3 (Install the pre-built wheel distribution from URL)
gradio-image-prompter
(for GradioApp, Install from URL)
Clone this repository to local disk and install:
cd tokenize-anything && pip install .
You can also install from the remote repository:
pip install git+ssh://[email protected]/baaivision/tokenize-anything.git
The TAP models can be used for diverse vision and language tasks.
We adopt a modular design that decouples all components and predictors.
As a best practice, implement your custom predictor and asynchronous pipeline as follows:
from tokenize_anything import model_registry
with <distributed_actor>:
model = model_registry["<model_type>"](checkpoint="<path/to/checkpoint>")
results = <custom_predictor>(model, *args, **kwargs)
server.collect_results()
See builtin examples (web-demo and evaluations) provided in scripts for more details.
See Inference Guide.
See Concept Guide.
See Evaluation Guide for TAP-L.
See Evaluation Guide for TAP-B.
Two versions of the model are available with different image encoders.
Model | Description | MD5 | Weights |
---|---|---|---|
tap_vit_l | ViT-L TAP model | 03f8ec | 🤗 HF link |
tap_vit_b | ViT-B TAP model | b45cbf | 🤗 HF link |
Note: You can generate these weights following the Concept Guide.
Concept | Description | Weights |
---|---|---|
Merged-2560 | Merged concepts | 🤗 HF link |
LVIS-1203 | LVIS concepts | 🤗 HF link |
COCO-80 | COCO concepts | 🤗 HF link |
- We are looking for research interns at BAAI Vision Team.
If you are interested in working with us on Vision Foundation Models (e.g., SAM variants), please contact Xinlong Wang (
[email protected]
).
@article{pan2023tap,
title={Tokenize Anything via Prompting},
author={Pan, Ting and Tang, Lulu and Wang, Xinlong and Shan, Shiguang},
journal={arXiv preprint arXiv:2312.09128},
year={2023}
}
We thank the repositories: SAM, EVA, LLaMA, FlashAttention, Gradio, Detectron2 and CodeWithGPU.