Lists (9)
Sort Name ascending (A-Z)
Stars
a state-of-the-art-level open visual language model | 多模态预训练模型
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
🎉 Repo for LaWGPT, Chinese-Llama tuned with Chinese Legal knowledge. 基于中文法律知识的大语言模型
AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
Data manipulation and transformation for audio signal processing, powered by PyTorch
[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
Source code for the paper "Empowering LLM to use Smartphone for Intelligent Task Automation"
Building Open LLM Web Agents with Self-Evolving Online Curriculum RL
AndroidWorld is an environment and benchmark for autonomous agents
Towards Large Multimodal Models as Visual Foundation Agents
VisionTasker introduces a novel two-stage framework combining vision-based UI understanding and LLM task planning for mobile task automation in a step-by-step manner.