
-
Dublin City University
- Dublin, Ireland
-
15:24
(UTC -12:00) - https://baohl00.github.io/
- @baohl00
Lists (10)
Sort Name ascending (A-Z)
Stars
Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)
A ComfyUI extension for chatting with your images with LLaVA. Runs locally, no external services, no filter.
Collection of Composed Image Retrieval (CIR) papers.
SEED-Story: Multimodal Long Story Generation with Large Language Model
SEED-Story is a JAX/Flax implementation of a multimodal story generation model based on the paper "SEED-Story: Multimodal Long Story Generation with Large Language Model". This model combines visio…
Implementation of CoCa, Contrastive Captioners are Image-Text Foundation Models, in Pytorch
Explore concepts like Self-Correct, Self-Refine, Self-Improve, Self-Contradict, Self-Play, and Self-Knowledge, alongside o1-like reasoning elevation🍓 and hallucination alleviation🍄.
[SIGIR'2024 Best Paper Honorable Mention] Official repository for "LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composed Image Retrieval"
[ICCV 2023] - Zero-shot Composed Image Retrieval with Textual Inversion
[ICML'24 Oral] "MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions"
🔍 Explore Egocentric Vision: research, data, challenges, real-world apps. Stay updated & contribute to our dynamic repository! Work-in-progress; join us!
TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
Finetune Llama 3.3, DeepSeek-R1 & Reasoning LLMs 2x faster with 70% less memory! 🦥
Explore VLM-Eval, a framework for evaluating Video Large Language Models, enhancing your video analysis with cutting-edge AI technology.
The PyTorch implementation of Generative Pre-trained Transformers (GPTs) using Kolmogorov-Arnold Networks (KANs) for language modeling
A Unified Library for Parameter-Efficient and Modular Transfer Learning
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.
Awesome List of Vision Language Prompt Papers
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
Official code for the Paper "RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance"
pkunlp-icler / MIC
Forked from HaozheZhao/MICMMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU