- Measuring CLIP capability
- C-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature Dispersion. ICLR, 2024.
- DSG:Davidsonian scene graph: Improving reliability in fine-grained evaluation for textimage generation. ICLR, 2024.
- Decomposed CLIPScore: Improving Text-to-Image Consistency via Automatic Prompt Optimization. Meta, 2024.
- CLIP Finetuning
- Fine-tuned CLIP Models are Efficient Video Learners. CVPR, 2023. 55.
- Fine-tuning CLIP Text Encoders with Two-step Paraphrasing. EACL, 2024. 0.
- Improving CLIP Fine-tuning Performance. ICCV, 2023. 2.
- π CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet. arXiv, 2022. 17.
- π ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models. NeurIPS, 2022. 92.
- π Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs. CVPR 2024.
- Tip-Adapter: Training-free Adaption of CLIP for Few-shot Classification. ECCV, 2022. 139.
- CLIP-Adapter: Better Vision-Language Models with Feature Adapters. IJCV, 2024. 480.
- π A Closer Look at the Few-Shot Adaptation of Large Vision-Language Models. CVPR, 2024. 2.
- Feature Adaptation with CLIP for Few-shot Classification. ACM, 2023. 0.
- Multimodality helps unimodality: Cross-modal few-shot learning with multimodal models. CVPR, 2023. 53.
- Multimodal Adaptation of CLIP for Few-Shot Action Recognition. CVPR, 2023. 6.
- Not all features matter: Enhancing few-shot clip with adaptive prior refinement. ICCV, 2023.
- π A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation. ICLR, 2024.0.
- Task Residual for Tuning Vision-Language Models. CVPR, 2023. 32.
- Towards Calibrated Robust Fine-Tuning of Vision-Language Models. NeurIPS_W 2023. 3.
- Robust Cross-Modal Representation Learning with Progressive Self-Distillation. CVPR, 2022. 36.(contrastive learning with noise data)
- CoOp: Learning to Prompt for Vision-Language Models. IJCV. 2022. 1316.
- CLIP pretraining & Analyzing
- Long-CLIP: Long-CLIP: Unlocking the Long-Text Capability of CLIP, Mar 2024.
- DreamLIP: Language-Image Pre-training with Long Captions (project). arXiv, 2024.
- Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies. arXiv. Arp 2024.
- Vision-Language Pre-Training: Basics, Recent Advances, and Future Trends. 122. (survey from MS)
- Interpreting CLIP's Image Representation via Text-Based Decomposition. ICLR 2024.
- SigCLIP
- MetaCLIP
- CLIP adaptation
- Domain Adaptation via Prompt Learning. arXiv 2022.
- Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval. ICCV, 2023. 5.
- AD-CLIP: Adapting Domains in Prompt Space Using CLIP. ICCV workshop, 2023. 13.
- AutoLabel: CLIP-based framework for Open-set Video Domain Adaptation. CVPR, 2023. 6.
- PromptStyler: Prompt-driven Style Generation for Source-free Domain Generalization. ICCV, 2023. 18.
- POUF: Prompt-oriented unsupervised fine-tuning for large pre-trained models. ICML 2023. 18. (SFDA)
- Sus-x: Training-free name-only transfer of vision-language models. ICCV, 2023. 28. (training-free)
- Improving zero-shot generalization and robustness of multi-modal models. CVPR, 2023. 15 (training-free)
- π TPT: Test-time prompt tuning for zero-shot generalization in vision language models. NeurIPS, 2022. 141.
- Robust Multi-Task Learning and Online Refinement for Spacecraft Pose Estimation across Domain Gap. Advances in Space Research. 2022. 34.
- π SwapPrompt: Test-Time Prompt Adaptation for Vision-Language Models. NeurIPS, 2023. 3.
- π DiffTPT: Diverse data augmentation with diffusions for effective test-time prompt tuning. ICCV, 2023. 15.
- BaFTA: Backprop-Free Test-Time Adaptation for Zero-shot Vision Language Models. ICLR 2024 rejected (but good scores)
- π Empowering Unsupervised Domain Adaptation with Large-scale Pre-trained Vision-Language Models. WACV, 2024. 1.
- π TDA: Efficient Test-Time Adaptation of Vision-Language Models. CVPR, 2024.
- π Source-Free Domain Adaptation with Frozen Multimodal Foundation Model CVPR 2024.
- π ReCLIP Refine Contrastive Language Image Pre-Training with Source Free Domain Adaptation. WACV oral, 2024.
- Retrival augmented methods in computer vision.
- linkedin post1, linkedin post2
- Retrieval augmented classiο¬cation for long-tail visual recognition. CPVR, 2022. 67.
- π Improving Image Recognition by Retrieving from Web-Scale Image-Text Data. CVPR, 2023. 10.
- π REACT: Learning Customized Visual Models with Retrieval-Augmented Knowledge. CVPR, 2023. 10.
- π Retrieval-Augmented Multimodal Language Modeling. ICML, 2023. 44.
- Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback. MS. 2023. 234.
- K-lite: Learning transferable visual models with external knowledge. NeurIPS, 2022. 64
- SAM + Domain adaptation
- π SAM-DA: UAV Tracks Anything at Night with SAM-Powered Domain Adaptation. arXiv, 2024. 7 (github)
- π SAM4UDASS: When SAM Meets Unsupervised Domain Adaptive Semantic Segmentation in Intelligent Vehicles. arXiv, 2024.
- SAM-guided Unsupervised Domain Adaptation for 3D Segmentation. arXiv(ICLR2024 submitted), 2024.
- Utilizing text-image alignment.
- SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs. NeurIPS 2023 spotlight. 14.
- Using Language to Extend to Unseen Domains. ICLR, 2023. 20.
- StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators. ACM, 2021. 471.
- Diagnosing and Rectifying Vision Models using Language. ICLR, 2023. 27.
- TextManiA: Enriching Visual Feature by Text-driven Manifold Augmentation. ICCV, 2023. 2.
- Using Language to Entend to Unseen Domains. ICLR, 2023. 20.
- Embedding Arithmetic of Multimodal Queries for Image Retrieval. CVPRW, 2022. 17.
- Distillation
- NVIDIA-AI-IOT/CLIP-distillation (github)
- CLIP-KD: An Empirical Study of Distilling CLIP Models. CVPR 2024. 3.
- TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance. ICCV 2023. 10.
- CLIPPING: Distilling CLIP-Based Models with a Student Base for Video-Language Retrieval. CVPR 2024. 18.
- Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks. MS, 2022. (CLIP-TD: CLIP Targeted Distillation). 5.
- EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything. CVPR 2024. 15.
- MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation. CVPR, 2023. 141.
- dd
- Generalize/Distill and Adapt
- Generalize then Adapt: Source-Free Domain Adaptive Semantic Segmentation. ICCV 2021. 102.
- DiGA: Distil to Generalize and then Adapt for Domain Adaptive Semantic Segmentation. CVPR 2023. 8.
-
Notifications
You must be signed in to change notification settings - Fork 0
junha1125/Awesome-DA-CLIP
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Β | Β | |||
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published