GoogleScraper/examples at master · fsakbas/GoogleScraper

History

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
async_mode_example.py		async_mode_example.py
basic.py		basic.py
basic_2_pages.py		basic_2_pages.py
chunks.txt		chunks.txt
finding_plagiarized_content.py		finding_plagiarized_content.py
http_mode_example.py		http_mode_example.py
image_search.py		image_search.py
keywords.py		keywords.py
keywords.txt		keywords.txt
phantomjs_example.py		phantomjs_example.py
selenium_mode_example.py		selenium_mode_example.py

README.md

Examples of using GoogleScraper

In this directory you can find a wide range of examples of how to use GoogleScraper. It would be great if some of you could create Pull Requests with your own examples of how to use GoogleScraper!

In all the below example, caching is disabled by default.

Asynchronous Mode

In this example, the quite fast asynchronous mode is used. Two SERP pages are requested and the results stored in the CSV file out.csv. It can be found in the same directory where the example script was saved.

Basic Usage

In the basic usage program the script scrapes a single keyword (Let's go bubbles!) with one SERP page. Selenium mode is used with Chrome Browser as frontend. Caching is disabled, so the results are always fresh!

Basic Usage with two pages per keyword

This example shows how to scrape more than one SERP page per keyword. The config that is passed looks like this:

config = {
    'use_own_ip': True,
    'keyword': 'reddit',
    'search_engines': ['bing',],
    'num_pages_for_keyword': 2,
    'scrape_method': 'selenium',
    'sel_browser': 'chrome',
}

Finding plagiarized content

This is a slightly more complex use case, where some predefined strings are searched literally with GoogleScraper. Google as a search engine is used and selenium mode with Chrome as frontend. Each serp result has a serp.effective_query property. If helps to determine whether the literal search (with " quotes) got us some results.

Http Mode example

This example demonstrates the most simple mode: HTTP mode. In this mode, requests are created without any intermediary such as a browser.

Image Scraping

This is another quite cute use case of GoogleScraper. In this example, images are scraped and saved in a images/ directory. The configuration used looks like this:

config = {
    'keyword': 'beautiful landscape', # :D hehe have fun my dear friends
    'search_engines': ['yandex', 'google', 'bing', 'yahoo'], # duckduckgo not supported
    'search_type': 'image',
    'scrape_method': 'selenium',
    'do_caching': True,
}

When the links are available, they are downloaded with multiple threads.

Phantomjs Scraping

If you want to scrape with a headless browser, then this is the perfect example. Phantomjs doesn't need as much resources as Chrome and Firefox and may be run on servers also. This is the perfect mode for long running processes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples

examples

README.md

Examples of using GoogleScraper

Asynchronous Mode

Basic Usage

Basic Usage with two pages per keyword

Finding plagiarized content

Http Mode example

Image Scraping

Phantomjs Scraping

Files

examples

Directory actions

More options

Directory actions

More options

Latest commit

History

examples

Folders and files

parent directory

README.md

Examples of using GoogleScraper

Asynchronous Mode

Basic Usage

Basic Usage with two pages per keyword

Finding plagiarized content

Http Mode example

Image Scraping

Phantomjs Scraping