starsholic

David Wu starsholic

11 followers · 38 following

Ph.D. Student in University of Science and Technology of China (USTC)
Singapore
13:50 (UTC +08:00)
https://scholar.google.com/citations?user=qWOFgUcAAAAJ&hl=zh-CN

Achievements

Highlights

Stars

showlab / ShowUI

Repository for ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Jupyter Notebook 691 39 Updated Dec 24, 2024

showlab / VideoLISA

[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos

Python 82 2 Updated Dec 14, 2024

opendatalab / MinerU

A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具，将PDF转换成Markdown和JSON格式。

Python 21,855 1,571 Updated Dec 20, 2024

QwenLM / Qwen2-Audio

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 1,324 91 Updated Aug 13, 2024

QwenLM / Qwen2-VL

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 3,740 233 Updated Dec 4, 2024

run-llama / llama_index

LlamaIndex is a data framework for your LLM applications

Python 37,536 5,387 Updated Dec 24, 2024

microsoft / graphrag

A modular graph-based Retrieval-Augmented Generation (RAG) system

Python 20,921 2,051 Updated Dec 24, 2024

VITA-MLLM / VITA

✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM

Python 1,063 64 Updated Dec 24, 2024

LLaVA-VL / LLaVA-NeXT

Python 3,140 275 Updated Oct 16, 2024

video-db / StreamRAG

Video Search and Streaming Agent 🕵️‍♂️

Python 447 28 Updated Jan 31, 2024

mem0ai / mem0

The Memory layer for your AI apps

Python 23,493 2,169 Updated Dec 23, 2024

showlab / Awesome-GUI-Agent

💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.

356 19 Updated Dec 21, 2024

showlab / videollm-online

VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)

Python 267 31 Updated Aug 15, 2024

nouhadziri / DialogEntailment

The implementation of the paper "Evaluating Coherence in Dialogue Systems using Entailment"

Python 74 5 Updated Sep 21, 2024

BradyFU / Video-MME

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

424 18 Updated Dec 14, 2024

mutonix / Vript

Python 129 3 Updated Nov 1, 2024

dilab-zju / self-speculative-decoding

Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**

Jupyter Notebook 145 11 Updated May 24, 2024

pkunlp-icler / FastV

[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Python 309 12 Updated Aug 12, 2024

databricks / dbrx

Code examples and resources for DBRX, a large language model developed by Databricks

Python 2,517 238 Updated May 1, 2024

deepseek-ai / DeepSeek-VL

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Python 2,158 202 Updated Apr 24, 2024

showlab / cosmo

Python 72 4 Updated May 10, 2024

unconv / gpt4v-gemini

Gemini demo but with GPT-4 Vision API

Python 27 4 Updated Dec 10, 2023

robincourant / FunnyNet

Python 7 2 Updated Mar 26, 2024

TatsuyaShirakawa / KTS

Kernel Temporal Segmentation

Python 53 18 Updated Mar 11, 2019

githwd2016 / PMATE

Code for the paper “Multimodal Dialogue Systems via Capturing Context-aware Dependencies and Ordinal Information of Semantic Elements”

3 Updated Jan 18, 2024

dvlab-research / Prompt-Highlighter

[CVPR 2024] Prompt Highlighter: Interactive Control for Multi-Modal LLMs

Python 135 2 Updated Jul 23, 2024

shenyunhang / APE

[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception

Python 498 31 Updated May 8, 2024

john-hewitt / embed-init

Rough codebase for exploring initialization strategies for new word embeddings in pretrained LMs

Python 14 1 Updated Dec 10, 2021

Luodian / Otter

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

Python 3,571 243 Updated Mar 5, 2024

OpenBMB / AgentVerse

🤖 AgentVerse 🪐 is designed to facilitate the deployment of multiple LLM-based agents in various applications, which primarily provides two frameworks: task-solving and simulation

JavaScript 4,314 419 Updated Sep 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

David Wu starsholic

Achievements

Achievements

Highlights

Block or report starsholic

Stars

showlab / ShowUI

showlab / VideoLISA

opendatalab / MinerU

QwenLM / Qwen2-Audio

QwenLM / Qwen2-VL

run-llama / llama_index

microsoft / graphrag

VITA-MLLM / VITA

LLaVA-VL / LLaVA-NeXT

video-db / StreamRAG

mem0ai / mem0

showlab / Awesome-GUI-Agent

showlab / videollm-online

nouhadziri / DialogEntailment

BradyFU / Video-MME

mutonix / Vript

dilab-zju / self-speculative-decoding

pkunlp-icler / FastV

databricks / dbrx

deepseek-ai / DeepSeek-VL

showlab / cosmo

unconv / gpt4v-gemini

robincourant / FunnyNet

TatsuyaShirakawa / KTS

githwd2016 / PMATE

dvlab-research / Prompt-Highlighter

shenyunhang / APE

john-hewitt / embed-init

Luodian / Otter

OpenBMB / AgentVerse