PyChromeless

Python (selenium) Lambda Chromium Automation

PyChromeless allows to automate actions to any webpage from AWS Lambda. The aim of this project is to provide the scaffolding for future robot implementations.

But... how?

All the process is explained here. Technologies used are:

Python 3.6
Selenium
Chrome driver
Small chromium binary

Downloading files

If your goal is to use selenium to download files instead of just scraping content from web pages, then you will need to specify a download_dir when initializing the WebDriverWrapper. Your download location should be a writable Lambda directory such as /tmp. For example, the first code in lambda_handler would become

driver = WebDriverWrapper(download_location='/tmp')

This will cause file downloads to automatically download into the download_location without requiring a confirmation dialog. You might need to sleep the handler until the file is downloaded since this occurs asynchronously.

In order to download a file from a link that opens in a new tab (i.e. target='_blank') you will need to call enable_download_in_headless_chrome in your scraping script after navigating to the desired page, but before clicking to download. This will replace all target='_blank' with target='_self'. For example:

# Navigate to download page
driver._driver.find_element_by_xpath('//a[@href="/downloads/"]').click()
# Enable headless chrome file download
driver.enable_download_in_headless_chrome()
# Click the download link
driver._driver.find_element_by_class_name("btn").click()

Building

nix build

Uploading the distributable package

Just add ./result to your serverless package

layers:
  bla:  # serverlessjs expose this as BlaLambdaLayer ¯\_"/ _/¯
    name: your-layer-name-at-aws-console
    path: pathToThisFolder/ressult

# we could use it in the same serverless.yaml
# or deploy only this layer and use its arn in other yamls
functions:
  someFunction: 
    # ... rest of your function info
    layers:
      - Ref: BlaLambdaLayer

Python example for your function

from pychromeless.webdriver_wrapper import WebDriverWrapper
from selenium.webdriver.common.keys import Keys

def lambda_handler(*args, **kwargs):
    driver = WebDriverWrapper()

    driver.get_url('http://example.com')
    example_text = driver.get_inner_html('(//div//h1)[1]')

    driver.close()

    return example_text

Shouts to

Contributors

Jairo Vadillo (@jairovadillo)
Pere Giro ()
Ricard Falcó (@ricardfp)

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.github/workflows		.github/workflows
fonts		fonts
lib		lib
nix		nix
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
default.nix		default.nix
flake.lock		flake.lock
flake.nix		flake.nix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyChromeless

But... how?

Downloading files

Building

Uploading the distributable package

Shouts to

Contributors

About

Releases

Packages

Languages

License

cruel-intentions/pychromeless

Folders and files

Latest commit

History

Repository files navigation

PyChromeless

But... how?

Downloading files

Building

Uploading the distributable package

Shouts to

Contributors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages