Name	Name	Last commit message	Last commit date
Latest commit History 155 Commits
.github	.github
assets	assets
docs	docs
requirements	requirements
scripts	scripts
search_engine_parser	search_engine_parser
.all-contributorsrc	.all-contributorsrc
.gitignore	.gitignore
.pylintrc	.pylintrc
.readthedocs.yml	.readthedocs.yml
.travis.yml	.travis.yml
CODE_OF_CONDUCT.md	CODE_OF_CONDUCT.md
CONTRIBUTING.md	CONTRIBUTING.md
LICENSE	LICENSE
MANIFEST.in	MANIFEST.in
README.md	README.md
setup.py	setup.py

Search Engine Parser

"If it is a search engine, then it can be parsed" - Some random guy

Package to query popular search engines and scrape for result titles, links and descriptions. Aims to scrape the widest range of search engines. View all supported engines

Search Engine Parser

Popular Supported Engines

Some of the popular search engines include:

Google
DuckDuckGo
GitHub
StackOverflow
Baidu
YouTube

View all supported engines

Installation

    # install only package dependencies
    pip install search-engine-parser

    # install with command line interface dependencies
    pip install "search-engine-parser[pysearch]"

Development

Clone the repository

    git clone [email protected]:bisoncorps/search-engine-parser.git

Create virtual environment and install requirements

    mkvirtualenv search_engine_parser
    pip install -r requirements/dev.txt

Code Documentation

Found on Read the Docs

Running the tests

    pytest

Usage

Code

Query Results can be scraped from popular search engines as shown in the example snippet below

    from search_engine_parser import YahooSearch, GoogleSearch, BingSearch
    import pprint

    search_args = ('preaching to the choir', 1)
    gsearch = GoogleSearch()
    ysearch = YahooSearch()
    bsearch = BingSearch()
    gresults = gsearch.search(*search_args)
    yresults = ysearch.search(*search_args)
    bresults = bsearch.search(*search_args)
    a = {
        "Google": gresults,
        "Yahoo": yresults,
        "Bing": bresults}
    # pretty print the result from each engine
    for k, v in a.items():
        print(f"-------------{k}------------")
            pprint.pprint(v)

    # print first title from google search
    print(gresults["titles"][0])
    # print 10th link from yahoo search
    print(yresults["links"][9])
    # print 6th description from bing search
    print(bresults["descriptions"][5])

Command line

Search engine parser comes with a CLI tool known as pysearch e.g

pysearch --engine bing search --query "Preaching to the choir" --type descriptions

Result

'Preaching to the choir' originated in the USA in the 1970s. It is a variant of the earlier 'preaching to the converted', which dates from England in the late 1800s and has the same meaning. Origin - the full story 'Preaching to the choir' (also sometimes spelled quire) is of US origin.

There is a needed argument for the CLI i.e -e Engine followed by either of two subcommands in the CLI i.e search and summary

SearchEngineParser

positional arguments:
  {search,summary}      help for subcommands
    search              search help
    summary             summary help

optional arguments:
  -h, --help            show this help message and exit
  -e ENGINE, --engine ENGINE
                        Engine to use for parsing the query e.g google, yahoo,
                        bing, duckduckgo (default: google)

summary just shows the summary of each search engine added with descriptions on the return

pysearch --engine google summary

Full arguments for the search subcommand shown below

usage: pysearch search [-h] -q QUERY [-p PAGE] [-t TYPE] [-r RANK]

optional arguments:
  -h, --help            show this help message and exit
  -q QUERY, --query QUERY
                        Query string to search engine for
  -p PAGE, --page PAGE  Page of the result to return details for (default: 1)
  -t TYPE, --type TYPE  Type of detail to return i.e full, links, desciptions
                        or titles (default: full)
  -r RANK, --rank RANK  ID of Detail to return e.g 5 (default: 0)