Lists (9)
Sort Name descending (Z-A)
Stars
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
Brings the iOS scrolling experience to Android.
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
The model, data and code for the visual GUI Agent SeeClick
[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
Towards Large Multimodal Models as Visual Foundation Agents
Building Open LLM Web Agents with Self-Evolving Online Curriculum RL
A simple screen parsing tool towards pure vision based GUI agent
🔥🔥 btrace(AKA RheaTrace) is a high performance Android trace tool which is based on Perfetto, it support to define custom events automatically during building apk and using bhook to provider more n…
VisionTasker introduces a novel two-stage framework combining vision-based UI understanding and LLM task planning for mobile task automation in a step-by-step manner.
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
AndroidWorld is an environment and benchmark for autonomous agents
🔥Android无障碍服务(AccessibilityService)开发框架,Android自动化脚本框架,快速开发复杂自动化任务、远程协助、监听等
Vreo (VR Video 缩写) 是基于如视三维渲染引擎 Five 和 用户界面构建库 React 实现的如视 3D 空间剧本播放器。
An input-component for controlling your app in natural language using an LLM though LangChain.dart
a state-of-the-art-level open visual language model | 多模态预训练模型
Paper list for Personal LLM Agents
Source code for the paper "Empowering LLM to use Smartphone for Intelligent Task Automation"
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
Modular and customizable Material Design UI components for Android
Data manipulation and transformation for audio signal processing, powered by PyTorch
Real-Time audio processing library written in Dart.
🦜🔗 Build context-aware reasoning applications
Noise is an Android wrapper for kissfft, a FFT implementation written in C.