Skip to content
View starsholic's full-sized avatar

Highlights

  • Pro

Block or report starsholic

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Repository for ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Jupyter Notebook 691 39 Updated Dec 24, 2024

[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos

Python 82 2 Updated Dec 14, 2024

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。

Python 21,855 1,571 Updated Dec 20, 2024

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,324 91 Updated Aug 13, 2024

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 3,740 233 Updated Dec 4, 2024

LlamaIndex is a data framework for your LLM applications

Python 37,536 5,387 Updated Dec 24, 2024

A modular graph-based Retrieval-Augmented Generation (RAG) system

Python 20,921 2,051 Updated Dec 24, 2024

✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM

Python 1,063 64 Updated Dec 24, 2024
Python 3,140 275 Updated Oct 16, 2024

Video Search and Streaming Agent 🕵️‍♂️

Python 447 28 Updated Jan 31, 2024

The Memory layer for your AI apps

Python 23,493 2,169 Updated Dec 23, 2024

💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.

356 19 Updated Dec 21, 2024

VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)

Python 267 31 Updated Aug 15, 2024

The implementation of the paper "Evaluating Coherence in Dialogue Systems using Entailment"

Python 74 5 Updated Sep 21, 2024

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

424 18 Updated Dec 14, 2024
Python 129 3 Updated Nov 1, 2024

Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**

Jupyter Notebook 145 11 Updated May 24, 2024

[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Python 309 12 Updated Aug 12, 2024

Code examples and resources for DBRX, a large language model developed by Databricks

Python 2,517 238 Updated May 1, 2024

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Python 2,158 202 Updated Apr 24, 2024
Python 72 4 Updated May 10, 2024

Gemini demo but with GPT-4 Vision API

Python 27 4 Updated Dec 10, 2023
Python 7 2 Updated Mar 26, 2024

Kernel Temporal Segmentation

Python 53 18 Updated Mar 11, 2019

Code for the paper “Multimodal Dialogue Systems via Capturing Context-aware Dependencies and Ordinal Information of Semantic Elements”

3 Updated Jan 18, 2024

[CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs

Python 135 2 Updated Jul 23, 2024

[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception

Python 498 31 Updated May 8, 2024

Rough codebase for exploring initialization strategies for new word embeddings in pretrained LMs

Python 14 1 Updated Dec 10, 2021

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

Python 3,571 243 Updated Mar 5, 2024

🤖 AgentVerse 🪐 is designed to facilitate the deployment of multiple LLM-based agents in various applications, which primarily provides two frameworks: task-solving and simulation

JavaScript 4,314 419 Updated Sep 9, 2024
Next