https://arxiv.org/abs/2111.11133
L-Verse: Bidirectional Generation Between Image and Text (Taehoon Kim, Gwangmo Song, Sihaeng Lee, Sangyun Kim, Yewon Seo, Soonyoung Lee, Seung Hwan Kim, Honglak Lee, Kyunghoon Bae)
vqvae + autoregressive transformer 조합으로 image2text 혹은 text2image가 가능한 모델. 이홍락 선생님 lg에 계셨군요?
#multimodal_generation