Skip to content

cruel-intentions/pychromeless

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyChromeless

Python (selenium) Lambda Chromium Automation

PyChromeless allows to automate actions to any webpage from AWS Lambda. The aim of this project is to provide the scaffolding for future robot implementations.

But... how?

All the process is explained here. Technologies used are:

Downloading files

If your goal is to use selenium to download files instead of just scraping content from web pages, then you will need to specify a download_dir when initializing the WebDriverWrapper. Your download location should be a writable Lambda directory such as /tmp. For example, the first code in lambda_handler would become

driver = WebDriverWrapper(download_location='/tmp')

This will cause file downloads to automatically download into the download_location without requiring a confirmation dialog. You might need to sleep the handler until the file is downloaded since this occurs asynchronously.

In order to download a file from a link that opens in a new tab (i.e. target='_blank') you will need to call enable_download_in_headless_chrome in your scraping script after navigating to the desired page, but before clicking to download. This will replace all target='_blank' with target='_self'. For example:

# Navigate to download page
driver._driver.find_element_by_xpath('//a[@href="/downloads/"]').click()
# Enable headless chrome file download
driver.enable_download_in_headless_chrome()
# Click the download link
driver._driver.find_element_by_class_name("btn").click()

Building

nix build

Uploading the distributable package

Just add ./result to your serverless package

layers:
  bla:  # serverlessjs expose this as BlaLambdaLayer ¯\_"/ _/¯
    name: your-layer-name-at-aws-console
    path: pathToThisFolder/ressult

# we could use it in the same serverless.yaml
# or deploy only this layer and use its arn in other yamls
functions:
  someFunction: 
    # ... rest of your function info
    layers:
      - Ref: BlaLambdaLayer

Python example for your function

from pychromeless.webdriver_wrapper import WebDriverWrapper
from selenium.webdriver.common.keys import Keys

def lambda_handler(*args, **kwargs):
    driver = WebDriverWrapper()

    driver.get_url('http://example.com')
    example_text = driver.get_inner_html('(//div//h1)[1]')

    driver.close()

    return example_text

Shouts to

Contributors

Releases

No releases published

Packages

No packages published

Languages

  • Python 66.5%
  • Nix 33.5%