Skip to content

unixpickle/HCText2Image

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HCText2Image

Train a toy autoregressive image generator and a VQ encoder/decoder.

I ran this for a few weeks on my laion-icons dataset. It does learn something—it clearly understands something about colors and basic shapes, as shown by the few cases where the model actually follows the prompt. However, for most prompts, the samples are pretty much garbage.

For a big dump of samples, see this page.

Here are some samples for the following prompts, sweeping guidance scales 1, 2, 4, and 8:

  1. a red heart icon, red heart vector graphic
  2. a green tree, a tree with green leaves
  3. A blue square. A simple blue square icon.
  4. a cute corgi vector graphic. corgi dog graphic

Samples from the model

For most complex prompts, the model just totally fails in my experience. I'd expect it to need a lot more compute before we end up with anything particularly useful.

Data and pretrained models

Using the model yourself

After downloading the above model checkpoints, you can run a local server for generating images.

$ swift run -c release HCText2Image server vqmodel_ssim_high.plist transformer_75e-5_d24_bs8.plist <port>

This will listen on http://localhost:<port>. You can load it in your browser to enter a prompt and sample an image.

About

Text-to-image example in Honeycrisp

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published