Automating mobile gestures

While the Selenium WebDriver spec has support for certain kinds of mobile interaction, its parameters are not always easily mappable to the functionality that the underlying device automation (like UIAutomation in the case of iOS) provides. To that end, Appium augments the WebDriver spec with extra commands and parameters for mobile gestures:

tap (on screen or on element) with options:
- how many fingers
- how long to tap
- how many taps
- where precisely to tap on the screen or element
flick (on screen or on element) with options:
- how many fingers
- where to start the flick on screen or element
- where to end the flick on screen or element
swipe/drag (on screen or on element) with options:
- how many fingers
- how long the swipe/drag takes in seconds
- where to start the swipe on screen or element
- where to end the swipe on screen or element
scroll to (element)
slider
shake
longTap (element)
set the orientation with option:
- new orientation (landscape or portrait)

JSON Wire Protocol server extensions

Here are the endpoints with which we have implemented these additions to the spec.

Note on coordinates: All the X and Y parameters listed below can be used in two ways. If they are between 0 and 1 (e.g., 0.5), they are taken to be percentage of screen or element size. In other words, {x: 0.5, y: 0.25} means a coordinate that is 50% from the left side of the screen/element, and 25% from the top of the screen/element. If the values are greater than 1, they are taken as pixels. So, {x: 100, y: 300} means a coordinate that is 100 pixels from the left and 300 from the top of the screen/element.

Note on performing actions on screen vs elements: These methods all take an optional element parameter. If present, this is taken to be the ID of an element which has already been retrieved. So in this case, the coordinates will be taken to refer to the rectangle of that element only . So {x: 0.5, y: 0.5, element: '3'} means "the exact middle point of the element with ID '3'".

POST session/:sessionId/touch/tap - perform a tap on the screen or an element
- URL Parameter: sessionId of session to route to
- JSON parameters:
  - tapCount (optional, default 1): how many times to tap
  - touchCount (optional, default 1): how many fingers to tap with
  - duration (optional, default 0.1): how long (in seconds) to tap
  - x (optional, default 0.5): x coordinate to tap (in pixels or relative units)
  - y (optional, default 0.5): y coordinate to tap (in pixels or relative units)
  - element (optional): ID of element to scope this command to
POST session:/sessionId/touch/flick_precise - perform a flick on the screen or an element
- URL Parameter: sessionId of session to route to
- JSON parameters:
  - touchCount (optional, default 1): how many fingers to flick with
  - startX (optional, default 0.5): x coordinate where flick begins (in pixels or relative units)
  - startY (optional, default 0.5): y coordinate where flick begins (in pixels or relative units)
  - endX (required): x coordinate where flick ends (in pixels or relative units)
  - endY (required): y coordinate where flick ends (in pixels or relative units)
  - element (optional): ID of element to scope this command to
POST session:/sessionId/touch/swipe - perform a swipe/drag on the screen or an element
- URL Parameter: sessionId of session to route to
- JSON parameters:
  - touchCount (optional, default 1): how many fingers to flick with
  - startX (optional, default 0.5): x coordinate where swipe begins (in pixels or relative units)
  - startY (optional, default 0.5): y coordinate where swipe begins (in pixels or relative units)
  - endX (required): x coordinate where swipe ends (in pixels or relative units)
  - endY (required): y coordinate where swipe ends (in pixels or relative units)
  - duration (optional, default 0.8): time (in seconds) to spend performing the swipe/drag
  - element (optional): ID of element to scope this command to

Note on setting orientation: Setting the orientation takes different parameters than the tap, flick, and swipe methods. This action is performed by setting the orientation of the browser to "LANDSCAPE" or "PORTRAIT". The alternative access method below does not apply to setting orientation.

POST /session/:sessionId/orientation - set the orientation of the browser
- URL Parameter: sessionId of session to route to
- JSON parameters:
  - orientation (required): new orientation, either "LANDSCAPE" or "PORTRAIT"

Alternative access method

Extending the JSON Wire Protocol is great, but it means that the various WebDriver language bindings will have to implement access to these endpoints in their own way. Naturally, this will take different amounts of time depending on the project. We have instituted a way to get around this delay, by using driver.execute() with special parameters.

POST session/:sessionId/execute takes two JSON parameters:

script (usually a snippet of javascript)
args (usually an array of arguments passed to that snippet in the javascript engine)

In the case of these new mobile methods, script must be one of:

mobile: tap
mobile: flick
mobile: swipe
mobile: scrollTo
mobile: scroll
mobile: shake (The mobile: prefix allows us to route these requests to the appropriate endpoint).

And args will be an array with one element: a Javascript object defining the parameters for the corresponding function. So, let's say I want to call tap on a certain screen position. I can do so by calling driver.execute with these JSON parameters:

{
  "script": "mobile: tap",
  "args": [{
    "x": 0.8,
    "y": 0.4
  }]
}

In this example, our new tap method will be called with the x and y params as described above.

Code examples

In these examples, note that the element parameter is always optional.

Tap

WD.js:

driver.elementsByTagName('tableCell', function(err, els) {
var tapOpts = {
  x: 150 // in pixels from left
  , y: 30 // in pixels from top
  , element: els[4].value // the id of the element we want to tap
};
driver.execute("mobile: tap", [tapOpts], function(err) {
  // continue testing
});
});

Java:

WebElement row = driver.findElements(By.tagName("tableCell")).get(4);
JavascriptExecutor js = (JavascriptExecutor) driver;
HashMap<String, Double> tapObject = new HashMap<String, Double>();
tapObject.put("x", 150); // in pixels from left
tapObject.put("y", 30); // in pixels from top
tapObject.put("element", ((RemoteWebElement) row).getId()); // the id of the element we want to tap
js.executeScript("mobile: tap", tapObject);

//In iOS app, if UI element visbile property is "false".
//Using element location tap on it.
WebElement element = wd.findElement(By.xpath("//window[1]/scrollview[1]/image[1]"));
JavascriptExecutor js = (JavascriptExecutor) wd;
HashMap<String, Double> tapObject = new HashMap<String, Double>();
tapObject.put("x", (double) element.getLocation().getX());
tapObject.put("y", (double) element.getLocation().getY());
tapObject.put("duration", 0.1);
js.executeScript("mobile: tap", tapObject);

Python:

driver.execute_script("mobile: tap", {"touchCount":"1", "x":"0.9", "y":"0.8", "element":element.id})

Ruby:

@driver.execute_script 'mobile: tap', :x => 150, :y => 30

Ruby:

b = @driver.find_element :name, 'Sign In'
@driver.execute_script 'mobile: tap', :element => b.ref

C#:

Dictionary<String, Double> coords = new Dictionary<string, double>();
coords.Add("x", 12);
coords.Add("y", 12);
driver.ExecuteScript("mobile: tap", coords);

Flick

WD.js:

// options for a 2-finger flick from the center of the screen to the top left
var flickOpts = {
endX: 0
, endY: 0
, touchCount: 2
};
driver.execute("mobile: flick", [flickOpts], function(err) {
// continue testing
});

Java:

JavascriptExecutor js = (JavascriptExecutor) driver;
HashMap<String, Double> flickObject = new HashMap<String, Double>();
flickObject.put("endX", 0);
flickObject.put("endY", 0);
flickObject.put("touchCount", 2);
js.executeScript("mobile: flick", flickObject);

Swipe

Note: Swiping is unfortunately broken in iOS7, because of a bug in Apple's frameworks. For iOS7, see mobile: scroll as a workaround that works for most cases.

WD.js:

// options for a slow swipe from the right edge of the screen to the left
var swipeOpts = {
startX: 0.95
, startY: 0.5
, endX: 0.05
, endY: 0.5
, duration: 1.8
};
driver.execute("mobile: swipe", [swipeOpts], function(err) {
// continue testing
});

Java:

JavascriptExecutor js = (JavascriptExecutor) driver;
HashMap<String, Double> swipeObject = new HashMap<String, Double>();
swipeObject.put("startX", 0.95);
swipeObject.put("startY", 0.5);
swipeObject.put("endX", 0.05);
swipeObject.put("endY", 0.5);
swipeObject.put("duration", 1.8);
js.executeScript("mobile: swipe", swipeObject);

Scroll

WD.js:

// scroll the view down
driver.execute("mobile: scroll", [{direction: 'down'}], function(err) {
// continue testing
});

Java:

JavascriptExecutor js = (JavascriptExecutor) driver;
HashMap<String, String> scrollObject = new HashMap<String, String>();
scrollObject.put("direction", "down");
scrollObject.put("element", ((RemoteWebElement) element).getId());
js.executeScript("mobile: scroll", scrollObject);

Slider

iOS

Java

// slider values can be string representations of numbers between 0 and 1
// e.g., "0.1" is 10%, "1.0" is 100%
WebElement slider =  wd.findElement(By.xpath("//window[1]/slider[1]"));
slider.sendKeys("0.1");

Android

The best way to interact with the slider on Android is with the 'mobile: tap' gesture. It is difficult to find a reliable way to set a specific percentage that works on all screen sizes, however. Therefore, it is recommended to write tests that focus on minimum, 50%, and maximum.

Ruby

# 0%
@driver.execute_script 'mobile: tap', :x =>slider.location.x, :y =>slider.location.y

# 100%
@driver.execute_script 'mobile: tap', :x =>slider.location.x + slider.size.width - 1, :y =>slider.location.y

# 50%
slider.click

Set orientation

WD.js:

driver.setOrientation("LANDSCAPE", function(err) {
// continue testing
});

Python:

driver.orientation = "LANDSCAPE"

Scroll To

  b = @driver.find_element :name, 'Sign In'
  @driver.execute_script 'mobile: scrollTo', :element => b.ref

JavascriptExecutor js = (JavascriptExecutor) driver;
WebElement element = wd.findElement(By.name("Log In"));;
HashMap<String, String> scrollToObject = new HashMap<String, String>();
scrollToObject.put("element",((RemoteWebElement) element).getId());
js.executeScript("mobile: scrollTo", scrollToObject);

longTap

c#

// long tap an element
//
Dictionary<string, object> parameters = new Dictionary<string, object>();
parameters.Add("using", _attributeType);
parameters.Add("value", _attribute);
Response response = rm.executescript(DriverCommand.FindElement, parameters);
Dictionary<string, object> elementDictionary = response.Value as Dictionary<string, object>;
string id = null;
if (elementDictionary != null)
{
 id = (string)elementDictionary["ELEMENT"];
}
IJavaScriptExecutor js = (IJavaScriptExecutor)remoteDriver;
Dictionary<String, String> longTapObject = new Dictionary<String, String>();
longTapObject.Add("element", id);
js.ExecuteScript("mobile: longClick", longTapObject);

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gestures.md

gestures.md

Automating mobile gestures

JSON Wire Protocol server extensions

Alternative access method

Code examples

Tap

Flick

Swipe

Scroll

Slider

Set orientation

Scroll To

longTap

Files

gestures.md

Latest commit

History

gestures.md

File metadata and controls

Automating mobile gestures

JSON Wire Protocol server extensions

Alternative access method

Code examples

Tap

Flick

Swipe

Scroll

Slider

Set orientation

Scroll To

longTap