Skip to content
View wzk1015's full-sized avatar
😎
😎

Highlights

  • Pro

Block or report wzk1015

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Academic Survey Paper Generation.

Python 42 3 Updated Feb 24, 2025

Official implementation of Mozart's Touch: A Lightweight Multi-modal Music Generation Framework Based on Pre-Trained Large Models

Python 29 6 Updated Dec 19, 2024

Long-Term Rhythmic Video Soundtracker, ICML2023

Python 56 1 Updated Jul 5, 2024

Solve Visual Understanding with Reinforced VLMs

Python 3,041 166 Updated Feb 24, 2025
Python 71 8 Updated Jan 24, 2024

Data and code for FreshLLMs (https://arxiv.org/abs/2310.03214)

Jupyter Notebook 343 16 Updated Feb 24, 2025

A paper list of some recent works about Token Compress for Vit and VLM

328 16 Updated Feb 9, 2025

Mixture-of-Experts for Large Vision-Language Models

Python 2,092 132 Updated Dec 3, 2024

[ICLR2025] γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models

Python 31 3 Updated Feb 14, 2025

The code repository of UniMoD

7 Updated Feb 10, 2025
Python 28 1 Updated Feb 14, 2025

Video Background Music Generation Using Unpaired Audio-Visual Data

Python 23 3 Updated Oct 8, 2024

A Simple Aerial Detection Baseline of Multimodal Language Models.

Python 49 2 Updated Feb 18, 2025

A framework to enable multimodal models to operate a computer.

Python 9,338 1,264 Updated Feb 3, 2025
Python 16 Updated Jan 21, 2025

[ACM MM 2021 Best Paper Award] Video Background Music Generation with Controllable Music Transformer

Python 303 35 Updated Dec 15, 2024

[ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding

Python 29 1 Updated Dec 13, 2024

Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources

1,468 88 Updated Feb 14, 2025

This is the official repository for M2UGen

Jupyter Notebook 475 37 Updated Jan 2, 2025

Some Conferences' accepted paper lists (including AI, ML, Robotic)

Python 1,024 75 Updated Jan 23, 2025

Official code of paper "DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution"

Python 70 6 Updated Feb 14, 2025

This is the repo for the paper "OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use".

203 9 Updated Feb 18, 2025

A curated list of awesome Multimodal studies.

HTML 138 16 Updated Feb 24, 2025

Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation: A framework for generating multimodal music by bridging different representations and enhancing generation with RAG.

23 2 Updated Jan 21, 2025

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting yo…

TypeScript 71,685 10,460 Updated Feb 24, 2025

Convert any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/

TypeScript 7,995 627 Updated Feb 24, 2025
Python 386 35 Updated Nov 22, 2024
Next