multimodal Learning multimodality backwards. Things to learn Image generation: CLIP. Text encoder Denoising Math: Gaussian distribution (do I need?) Readings https://www.assemblyai.com/blog/minimagen-build-your-own-imagen-text-to-image-model/