Introduction to Selenium WebDriver

A quick introduction to Selenium WebDriver, a practical tool for running functional tests and browser automation. The sample is written with Firefox 58 and GeckoDriver 0.20.

Overview

Selenium is a portable software-testing framework for web applications. Selenium WebDriver is the successor to Selenium RC: it accepts commands and sends them to a browser. This is implemented through a browser-specific browser driver, which sends commands to a browser, and retrieves results. Most browser drivers actually launch and access a browser application (such as Firefox, Chrome, Internet Explorer, Safari, or Microsoft Edge); there is also an HtmlUnit browser driver, which simulates a browser using the headless browser HtmlUnit.

In this post, I’ll use Selenium WebDriver 3.8 in Mac OS with Firefox 58. After reading this post, you’ll understand:

  • How to install GeckoDriver (for Firefox)
  • How to initialize WebDriver in Java
  • How to select WebElement
  • How to execute native JS command
  • How to send keys to element
  • How to wait
  • How to use basic XPath (XML Path Language)
  • Troubleshooting

Installation

GeckoDriver is a proxy for using W3C WebDriver-compatible clients to interact with Gecko-based browsers. GeckoDriver provides HTTP API described by the WebDriver protocol to communicate with Gecko browsers, such as Firefox version above 47.

Install GeckoDriver via brew, then check the version.

$ brew install geckodriver
$ geckodriver --version
geckodriver 0.20.0

Initialize WebDriver

import org.junit.*;
import org.openqa.selenium.*;
import org.openqa.selenium.firefox.*;

/**
 * Integration test for an awesome page.
 */
public class AwesomePageIT {

  private static WebDriver driver;

  @BeforeClass
  public static void beforeAll() {
    FirefoxProfile profile = new FirefoxProfile();
    profile.setPreference(R.FIREFOX_SAFE_MODE, "-1");

    FirefoxOptions options = new FirefoxOptions();
    options.setProfile(profile);

    driver = new FirefoxDriver(options);
    driver.get(NUXEO_URL);
  }

  @AfterClass
  public static void afterAll() {
    driver.close();
  }

  // TODO Add tests here...
}

WebElement Selection

Create a page for storing all the information related to a page, equivalent to a HTML document object, but in Java.

import org.openqa.selenium.*;

public class AwesomePage {

  private WebDriver driver;

  public AwesomePage(WebDriver driver) {
    this.driver = driver;
  }

  public WebElement getElementFoo() { ... }
}

Once you’ve created such page, you can retrieve web element in different ways: by class name, by CSS selector, by ID, by link text, by partial link text, by name, by tag, and by xpath. Here’re some examples for querying the following HTML content.

<div>
  <button id="confirm-btn" name="confirm-button">Confirm</button>
  <a class="red" href="#">Cancel</a>
</div>

Let’s take a look:

WebElement e1 = driver.findElement(By.className("red"));
WebElement e2 = driver.findElement(By.id("btn-id"));
WebElement e3 = driver.findElement(By.linkText("Cancel"));
WebElement e4 = driver.findElement(By.name("confirm-button"));
WebElement e5 = driver.findElement(By.tag("div"));
WebElement e6 = driver.findElement(By.xpath("//a[contains(@class, 'red')]"));

Execute Native JavaScript Command

You might want to execute native JavaScript code in Java via WebDriver. For example, scrolling the document so that the target element in on the top of the viewport. You can achieve it by doing:

WebDriver driver = ...;
WebElement element = driver.findElement(By.id("foo"));
JavascriptExecutor executor = (JavascriptExecutor) driver;
executor.executeScript("arguments[0].scrollIntoView(true);", element);

This can be simplified if you’re using a remote web driver. No cast is required:

RemoteWebDriver driver = ...;
WebElement element = driver.findElement(By.id("foo"));
driver.executeScript("arguments[0].scrollIntoView(true);", element);

Send Keys to Element

You can send keys to input HTML elements, e.g. <input> and <textarea>.

WebElement input = ...;
input.sendKeys(Keys.BACK_SPACE);
input.sendKeys(Keys.ESCAPE);

Wait WebElement

Use FluentWait to wait a web element, until a predicate is satisfied. The generic type <F> is the input type for each condition used with this instance.

Wait<WebDriver> wait = new FluentWait<>(driver);
wait.until(ExpectedConditions.visibilityOf(myElement));

XPath

Here’s a list of XPath that I used frequently.

Expression Description
//*[@id='foo'] Select any tag having id “foo”.
//a[text()='foo'] Select tag <a> having text “foo”.
//a[contains(@class, 'red')] Select tag <a> having “red” in its attribute class.
//a[contains(text(), 'foo')] Select tag <a> having “foo” in its text.

You can test the xpath expression in your browsers. First, open the console via shortcut:

  • + + C for Firefox
  • + + C for Chrome

Then write the xpath expression. If the browser returns a non-empty results, then the xpath works:

$x("//*[@id='logo']");

Trouble Shooting

Some points that need to be careful.

Scrolling

If you need to scroll the document before clicking an element, do not scroll the element directly, scroll its container:

public void clickButton(WebElement container, WebElement button) {
  driver.executeScript("arguments[0].scrollIntoView(true);", container);
  button.click();
}

Method Element.scrollIntoView() scrolls the element on which it’s called into the visible area of the browser window. If set to true, the element will be scrolled and be aligned to the top of the viewport.

Other Points

Question/answer available on StackOverflow:

References