-
Ph.D. Student in University of Science and Technology of China (USTC)
- Singapore
-
04:33
(UTC +08:00) - https://scholar.google.com/citations?user=qWOFgUcAAAAJ&hl=zh-CN
Highlights
- Pro
Stars
Repository for ShowUI: One Vision-Language-Action Model for GUI Visual Agent
[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
LlamaIndex is a data framework for your LLM applications
A modular graph-based Retrieval-Augmented Generation (RAG) system
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
The implementation of the paper "Evaluating Coherence in Dialogue Systems using Entailment"
✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**
[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
Code examples and resources for DBRX, a large language model developed by Databricks
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Code for the paper “Multimodal Dialogue Systems via Capturing Context-aware Dependencies and Ordinal Information of Semantic Elements”
[CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs
[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception
Rough codebase for exploring initialization strategies for new word embeddings in pretrained LMs
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
🤖 AgentVerse 🪐 is designed to facilitate the deployment of multiple LLM-based agents in various applications, which primarily provides two frameworks: task-solving and simulation