Skip to content
View Nayuta403's full-sized avatar
🎯
Focusing
🎯
Focusing

Organizations

@bytedance @LianjiaTech @cfug @fluttercandies

Block or report Nayuta403

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Mobile-Agent: The Powerful Mobile Device Operation Assistant Family

Python 3,626 349 Updated Mar 11, 2025

Brings the iOS scrolling experience to Android.

Java 130 5 Updated Sep 16, 2023

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型

Python 5,960 513 Updated Mar 7, 2025

The model, data and code for the visual GUI Agent SeeClick

HTML 332 15 Updated Nov 22, 2024

[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.

Python 1,080 65 Updated Mar 11, 2025

2d 纯计算高性能刚体物理引擎

TypeScript 76 12 Updated Mar 20, 2022

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

Jupyter Notebook 47,880 5,101 Updated Jan 22, 2025

Towards Large Multimodal Models as Visual Foundation Agents

Python 192 6 Updated Feb 5, 2025

Building Open LLM Web Agents with Self-Evolving Online Curriculum RL

Python 322 25 Updated Feb 8, 2025

A simple screen parsing tool towards pure vision based GUI agent

Jupyter Notebook 20,000 1,621 Updated Mar 11, 2025

🔥🔥 btrace(AKA RheaTrace) is a high performance Android trace tool which is based on Perfetto, it support to define custom events automatically during building apk and using bhook to provider more n…

Kotlin 1,997 281 Updated Sep 18, 2023

VisionTasker introduces a novel two-stage framework combining vision-based UI understanding and LLM task planning for mobile task automation in a step-by-step manner.

Python 62 9 Updated Feb 17, 2025

💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.

544 31 Updated Mar 10, 2025

AndroidWorld is an environment and benchmark for autonomous agents

Python 231 26 Updated Mar 6, 2025

🔥Android无障碍服务(AccessibilityService)开发框架,Android自动化脚本框架,快速开发复杂自动化任务、远程协助、监听等

Kotlin 378 112 Updated Mar 10, 2025

Vreo (VR Video 缩写) 是基于如视三维渲染引擎 Five 和 用户界面构建库 React 实现的如视 3D 空间剧本播放器。

TypeScript 33 8 Updated May 9, 2024
HTML 4 2 Updated Apr 9, 2024

Android 技术中台,但愿人长久,搬砖不再有

Java 6,572 1,374 Updated Sep 10, 2022

An input-component for controlling your app in natural language using an LLM though LangChain.dart

Dart 12 4 Updated Nov 1, 2024

a state-of-the-art-level open visual language model | 多模态预训练模型

Python 6,411 427 Updated May 29, 2024

Paper list for Personal LLM Agents

374 18 Updated May 8, 2024

Source code for the paper "Empowering LLM to use Smartphone for Intelligent Task Automation"

Python 327 49 Updated Mar 22, 2024

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Jupyter Notebook 5,086 675 Updated Aug 5, 2024

AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.

Python 5,588 616 Updated Mar 6, 2025

Modular and customizable Material Design UI components for Android

Java 16,608 3,115 Updated Mar 11, 2025

Data manipulation and transformation for audio signal processing, powered by PyTorch

Python 2,620 684 Updated Mar 11, 2025

Real-Time audio processing library written in Dart.

C 108 12 Updated Jul 18, 2024

🦜🔗 Build context-aware reasoning applications

Jupyter Notebook 102,958 16,674 Updated Mar 11, 2025

Noise is an Android wrapper for kissfft, a FFT implementation written in C.

Java 328 41 Updated Nov 8, 2019
Next