-
Peking University
- Beijing China
-
01:44
(UTC +08:00) - https://orcid.org/0009-0001-3011-2148
- https://scholar.google.com.hk/citations?user=_knPaYsAAAAJ&hl=zh-CN
Lists (3)
Sort Name ascending (A-Z)
Stars
Code for "Diffusion Model Alignment Using Direct Preference Optimization"
【2024 ECAI】First Creating Backgrounds Then Rendering Texts: A New Paradigm for Visual Text Blending
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
(CVPR 2024) Bridging the Gap Between End-to-End and Two-Step Text Spotting.
Official repo for ART:Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation
OpenOCR: A general OCR system with accuracy and efficiency. Supporting 24 Scene Text Recognition methods trained from scratch on large-scale real datasets, and will continue to add the latest methods.
Peak signal-to-noise ratio and The structural similarity calculation tool
Official inference repo for FLUX.1 models
Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch
A general fine-tuning kit geared toward diffusion models.
Official implementation for "GLASS: Global to Local Attention for Scene-Text Spotting" (ECCV'22)
Lumina-T2X is a unified framework for Text to Any Modality Generation
Official implementation of Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer
This repository is the implementation of "Don't Forget Me: Accurate Background Recovery for Text Removal via Modeling Local-Global Context".
A novel inpainting framework that can remove objects from images based on the instructions given as text prompts.
🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022
Official repository for "PosterLayout: A New Benchmark and Approach for Content-aware Visual-Textual Presentation Layout" (CVPR 2023).
UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…
[ICML 2024 Spotlight] FiT: Flexible Vision Transformer for Diffusion Model
The official code for the CVPR 2024 paper: Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer