Cognisim

Interaction utilies for crossplatform interaction agents

LLM Control Library for iOS and Android

Have you ever wanted to test your mobile app or control iOS and Android devices with an LLM? You've probably encountered context problems due to the accessibility view being too long or just sending a screenshot to the LLM, which provides limited accuracy.

(example of using cognisim to control an android device on the arcteryx app (bout to be dripped out))

Our Solution

We combine the accessibility tree with a set of mark prompting to provide a readable state for the LLM.

Real-World Application

At Revyl, we use this approach to test mobile apps with LLMs. Our platform integrates resilient end-to-end tests using agentic LLMs with open telemetry tracing, offering proactive observability into your mobile app.

If you are interested in putting your testing on autopilot, and catching bugs before your users do,

book a demo with us

Revyl AI

Prerequisites

Android Virtual Device (for Android adaptation)
iOS Simulator and Xcode (for iOS adaptation - coming soon)
macOS or Linux (recommended)

Quick Start

Create a Simulator with ios/android and make sure you have appium installed

For macOS, install Appium using Homebrew:

brew install appium

For all other operating systems, install Appium using npm:

npm i -g appium

To install the mobileadapt package:

poetry add cognisim

or if you have pip installed:

pip install cognisim

For detailed instructions on getting started with Mobileadapt, please refer to our Quickstart Guide.

Usage

Android Basic Example

import asyncio
from cognisim import mobileadapt

async def main():
    # Initialize and start Android device
    android_device = mobileadapt(platform="android")
    await android_device.start_device()

    # Get initial state and perform tap
    _, _, _ = await android_device.get_state()
    await android_device.tap(100, 100)

    # Get state after tap
    new_encoded_ui, _, _ = await android_device.get_state()
    print("State after tap:", new_encoded_ui)

if __name__ == "__main__":
    asyncio.run(main())

IOS Basic Example

import asyncio
from cognisim import mobileadapt

async def main():
    # Initialize and start iOS device
    ios_device = mobileadapt(platform="ios")
    await ios_device.start_device()

    # Get device state
    encoded_ui, _, _ = await ios_device.get_state()
    print("Current state:", encoded_ui)

if __name__ == "__main__":
    asyncio.run(main())

Go to Documentation or the cookbook folder for more examples and usage.

Documentation

For full documentation, visit mobileadapt.revyl.ai.

Key Features

Android Support: Works seamlessly with Android devices and emulators.
IOS Support: Works seamlessly with Android devices and emulators.
Appium Integration: Leverages the power of Appium for reliable mobile automation.
LLM Agent Compatibility: Designed to work seamlessly with language model agents.
iOS Support: Coming soon!

Local Development

Clone the repository:

git clone https://github.com/RevylAI/Mobileadapt/ && cd mobileadapt/deploy

Start the server:
```
./scripts/setup.sh
```

Roadmap

iOS Support
Abstract to different drivers other than appium
Recording interactions
Screen sharing via websocket to host recording

Contributing

We welcome contributions to the Mobileadapt project! If you'd like to contribute, please check our Contribution Guidelines.

License

Mobileadapt is released under the MIT License. See the LICENSE file for more details.

Credits

@inproceedings{shvoEtAl2021appbuddy, title={AppBuddy: Learning to Accomplish Tasks in Mobile Apps via Reinforcement Learning}, author={Maayan Shvo and Zhiming Hu and Rodrigo Toro Icarte and Iqbal Mohomed and Allan D. Jepson and Sheila A. McIlraith}, booktitle={Canadian Conference on Artificial Intelligence}, year={2021} }

@misc{google-research, title={Google Research}, author={Google}, year={2021}, howpublished={\url{https://github.com/Berrylcm/google-research}} }

How does it work?

We use Appium under the hood to control the device and collect the UI. We then use a custom UI parser to convert the UI to a string that can be used by the LLM.

The UI is parsed with a ui parser and then set of mark is generated for the image and we send that to the LLM..

The UI is parsed with a ui parser and then a set of marks is generated for the image, and we send that to the LLM. For example, the parsed UI might look like this:

<html>
  <button id=0">None</button>
  <button id=1 class="home_button">Open the home page</button>
  <button id=2 class="optional_toolbar_button">New tab</button>
  <button id=3 class="tab_switcher_button">Switch or close tabs</button>
  <button id=4 class="menu_button">Customize and control Google Chrome</button>
  <input id=5 class="url_bar">revyl.ai</input>
  <img id=6 class="location_bar_status_icon" alt="Connection is secure" />
  <p id=7">None</p>
  <img id=8 class="toolbar_hairline" alt="None" />
  <button id=9">Dismiss banner</button>
  <p id=10">Revyl is in private beta →</p>
  <p id=11">None</p>
  <button id=12">Menu</button>
  <p id=13">Revyl</p>
  <button id=14">None</button>
  <button id=15">None</button>
  <p id=16">None</p>
  <p id=17">AI Native Proactive Observability</p>
  <p id=18">Catch bugs</p>
  <p id=19">they happen using agentic E2E testing and OpenTelemetry's Tracing. Book a demo</p>
  <p id=20">before</p>
  <p id=21">now</p>
  <p id=22">!</p>
  <button id=23">Join the waitlist →</button>
  <p id=24">Book a demo</p>
  <button id=25">None</button>
  <p id=26">TRUSTED AND BUILT BY ENGINEERS AT</p>
  <button id=27">Uber</button>
  <button id=28">Salesforce</button>
  <p id=29">VendorPM</p>
</html>

This structured representation of the UI elements is then used by the LLM to understand and interact with the mobile interface.

Each of the ids are mapped to an element in the UI.

We also create a set of mark prompting of the given state

Here's an example of a set of mark image generated for the UI state:

This image shows the UI elements with their corresponding IDs overlaid on the screenshot. This visual representation helps the LLM understand the layout and structure of the interface, making it easier to interact with specific elements.

Citations

bibtex
@misc{revylai2024mobileadapt,
  title        = {Cognisim},
  author       = {Anam Hira, Landseer Enga, Aarib Sarker, Wasif Sarker, Hanzel Hira, Sushan Leel},
  year         = {2024},
  howpublished = {GitHub},
  url          = {https://github.com/RevylAI/Mobileadapt}
}

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
.github		.github
cognisim		cognisim
cookbook		cookbook
deploy		deploy
scripts		scripts
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cognisim

Interaction utilies for crossplatform interaction agents

Revyl AI

Prerequisites

Quick Start

Usage

Android Basic Example

IOS Basic Example

Go to Documentation or the cookbook folder for more examples and usage.

Documentation

Key Features

Local Development

Roadmap

Contributing

License

Credits

How does it work?

Citations

About

Releases

Packages

Contributors 5

Languages

License

RevylAI/CogniSim

Folders and files

Latest commit

History

Repository files navigation

Cognisim

Interaction utilies for crossplatform interaction agents

Revyl AI

Prerequisites

Quick Start

Usage

Android Basic Example

IOS Basic Example

Go to Documentation or the cookbook folder for more examples and usage.

Documentation

Key Features

Local Development

Roadmap

Contributing

License

Credits

How does it work?

Citations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages