Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
annotator		annotator
cldm		cldm
configs		configs
fonts		fonts
ldm		ldm
readme_files		readme_files
scripts		scripts
text_prompts		text_prompts
LICENSE		LICENSE
README.md		README.md
app.py		app.py
environment_simple.yaml		environment_simple.yaml
glyph_instructions.yaml		glyph_instructions.yaml
inference.py		inference.py
requirements.txt		requirements.txt

Repository files navigation

GlyphControl: Glyph Conditional Control for Visual Text Generation

🌟 Highlights

We propose a glyph-conditional text-to-image generation model named GlyphControl for visual text generation, which outperforms DeepFloyd IF and Stable Diffusion in terms of OCR accuracy and CLIP score while saving the number of parameters by more than 3×.
We introduce a visual text generation benchmark named LAION-Glyph by filtering the LAION-2B-en and selecting the images with rich visual text content by using the modern OCR system. We conduct experiments on three different dataset scales: LAION-Glyph-100K,LAION-Glyph-1M, and LAION-Glyph-10M.
We report flexible and customized visual text generation results. We empirically show that the users can control the content, locations, and sizes of generated visual text through the interface of glyph instructions.

💾 Test Benchmark

SimpleBench: A simple text prompt benchmark following the Character-aware Paper. The format of prompts remains the same: `A sign that says "<word>".'
CreativeBench: A creative text prompt benchmark adapted from GlyphDraw. We adopt diverse English-version prompts in the original benchmark and replace the words inside quotes. As an example, the prompt may look like: `Little panda holding a sign that says "<word>".' or 'A photographer wears a t-shirt with the word "<word>." printed on it.'

(The prompts are listed in the text_prompts folder)

Following Character-aware Paper, we collect a pool of single-word candidates from Wikipedia. These words are then categorized into four buckets based on their frequencies: ${Bucket}^{1k}{top}$, ${Bucket}{1k}^{10k}$, Bucket^100k^10, and Bucket_100k^plus. Each bucket contains words with frequencies in the respective range. To form input prompts, we randomly select 100 words from each bucket and insert them into the above templates. We generate four images for each word during the evaluation process.

💾 Quantitative Results

Method	#Params	Training Dataset	$\mathbf{Acc}(%)\uparrow$	$\mathbf{\hat{Acc}}(%)\uparrow$	$\mathbf{LD}\downarrow$	CLIP Score $\uparrow$
Stable Diffusion v2.0	865M	LAION 1.2B	$0/0$	$3/2$	$4.25/5.01$	$31.6/33.8$
DeepFloyd (IF-I-M)	2.1B	LAION 1.2B	$0.3/0.1$	$18/11$	$2.44/3.86$	$32.8/34.3$
DeepFloyd (IF-I-L)	2.6B	LAION 1.2B	$0.3/0.7$	$26/17$	$1.97/3.37$	$33.1/34.9$
DeepFloyd (IF-I-XL)	6.0B	LAION 1.2B	$0.6/1$	$33/21$	$1.63/3.09$	$33.5/35.2$
GlyphControl	1.3B	LAION-Glyph-100K	$30/19$	$37/24$	$1.77/2.58$	$33.7/36.2$
GlyphControl	1.3B	LAION-Glyph-1M	$40/26$	$45/30$	$1.59/2.47$	$33.4/36.0$
GlyphControl	1.3B	LAION-Glyph-10M	$\bf{42}/\bf{28}$	$\bf{48}/\bf{34}$	$\bf{1.43}/\bf{2.40}$	$\bf{33.9}/\bf{36.2}$

🛠️ Installation

Clone this repo:

git clone https://github.com/AIGText/GlyphControl-release.git
cd GlyphControl-release

Install required Python packages

conda create -n GlyphControl python=3.9
conda activate GlyphControl
pip install -r requirements.txt

Or

conda env create -f environment_simple.yaml
conda activate GlyphControl

Althoguh you could run our codes on CPU device, we recommend you to use CUDA device for faster inference. The recommended CUDA setting is cuda11.3.

💾 Available Checkpoints

Download the checkpoints from our hugging face space and put the corresponding checkpoint files into the checkpoints folder.

We provide four types of checkpoints. The relevant information is shown below.

Checkpoint File	Training Dataset	Trainig Epochs	$\mathbf{Acc}(%)\uparrow$	$\mathbf{\hat{Acc}}(%)\uparrow$	$\mathbf{LD}\downarrow$	CLIP Score $\uparrow$
laion10M_epoch_6_model_wo_ema.ckpt	LAION-Glyph-10M	6	$\bf{42}$/$\bf{28}$	$\bf{48}$/$\bf{34}$	$\bf{1.43}$/$\bf{2.40}$	$\bf{33.9}$/$\bf{36.2}$
textcaps5K_epoch_10_model_wo_ema.ckpt	TextCaps 5K	10	$58/30$	$64/34$	$1.01/2.40$	$33.8/35.1$
textcaps5K_epoch_20_model_wo_ema.ckpt	TextCaps 5K	20	$57/32$	$66/38$	$0.97/2.26$	$34.2/35.5$
textcaps5K_epoch_40_model_wo_ema.ckpt	TextCaps 5K	40	$\bf{71}/\bf{41}$	$\bf{77}/\bf{46}$	$\bf{0.55}/\bf{1.67}$	$\bf{34.2}/\bf{35.8}$

🧨 Inference

To run inference code locally, you need specify the glyph instructions first in the file glyph_instructions.yaml.

And then execute the code like this:

python inference.py --cfg configs/config.yaml --ckpt checkpoints/laion10M_epoch_6_model_wo_ema.ckpt --save_path generated_images --glyph_instructions glyph_instructions.yaml --prompt <Prompt> --num_samples 4

If you do not want to generate visual text, you could remove the "--glyph_instructions" parameter in the command.

🧨 Demo (Recommend)

As an easier way to conduct trials on our models, you could test through a demo.

After downloading the checkpoints, execute the code:

python app.py

Then you could generate visual text through a local demo interface.

Or you can directly try our demo in our hugging face space GlyphControl.

💌 Acknowledgement

Dataset: We sincerely thank the open-source large image-text dataset LAION-2B-en and corresponding aesthetic score prediction codes LAION-Aesthetics_Predictor V2. As for OCR detection, thanks for the open-source tool PP-OCRv3.

Methodolgy and Demo: Our method is based on the powerful controllable image generation method ControlNet. Thanks to their open-source codes. As for demo, we use the ControlNet demo as reference.

Comparison Methods in the paper: Thanks to the open-source diffusion codes or demos: DALL-E 2, Stable Diffusion 2.0, Stable Diffusion XL, DeepFloyd.

✉️ Contact

For help or issues about the github codes or huggingface demo of GlyphControl, please email Yukang Yang ([email protected]), Dongnan Gui ([email protected]), and Yuhui Yuan ([email protected]) or submit a GitHub issue.

🌿 Citation

If you find this code useful in your research, please consider citing:

@misc{yang2023glyphcontrol,
      title={GlyphControl: Glyph Conditional Control for Visual Text Generation}, 
      author={Yukang Yang and Dongnan Gui and Yuhui Yuan and Haisong Ding and Han Hu and Kai Chen},
      year={2023},
      eprint={2305.18259},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GlyphControl: Glyph Conditional Control for Visual Text Generation

🌟 Highlights

💾 Test Benchmark

💾 Quantitative Results

🛠️ Installation

💾 Available Checkpoints

🧨 Inference

🧨 Demo (Recommend)

💌 Acknowledgement

✉️ Contact

🌿 Citation

About

Releases

Packages

Languages

License

mugenen/GlyphControl-release

Folders and files

Latest commit

History

Repository files navigation

GlyphControl: Glyph Conditional Control for Visual Text Generation

🌟 Highlights

💾 Test Benchmark

💾 Quantitative Results

🛠️ Installation

💾 Available Checkpoints

🧨 Inference

🧨 Demo (Recommend)

💌 Acknowledgement

✉️ Contact

🌿 Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages