Logo-Extractor

Crawler which extracts logos from websites

Cloning Crawler

git clone [email protected]:Isabek/Logo-Extractor.git ~/projects/Logo-Extractor

Installation

Virtual Environment

Install virtualenv and create virtual environment for crawler.

cd ~/projects/Logo-Extractor

virtualenv -p python3 venv

source venv/bin/activate

Install Dependencies

pip install -r requirements.txt

Install Chrome driver

You can download Chrome driver here and install it.

If you use Ubuntu you can install like this:

wget https://chromedriver.storage.googleapis.com/2.41/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
sudo mv chromedriver /usr/bin/chromedriver
sudo chown ${USER}:${GROUP} /usr/bin/chromedriver
sudo chmod +x /usr/bin/chromedriver

Run chrome headless driver

chromedriver --url-base=/wd/hub

Usage

scrapy runspider spider.py -a input_file_path=logo-extraction.txt -o result.json

input_file_path - localtion of file which contains websites

o - output file

Format should be as shown below

Webpage Url,Logo Url
http://ground-truth-data.s3-website-us-east-1.amazonaws.com/autoglassforyou.com,http://ground-truth-data.s3-website-us-east-1.amazonaws.com/autoglassforyou.com/images/logo-change.gif

Checker

Crawler writes results to json file. Example: result.json.

If you want to check accuracy of extracted logos run next command

python checker.py -actual logo-extraction.txt -json result.json

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
README.md		README.md
checker.py		checker.py
extractor.py		extractor.py
requirements.txt		requirements.txt
rule.py		rule.py
spider.py		spider.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Logo-Extractor

Cloning Crawler

Installation

Virtual Environment

Install Dependencies

Install Chrome driver

Usage

Checker

About

Releases

Packages

Languages

bipurevpn/Logo-Extractor

Folders and files

Latest commit

History

Repository files navigation

Logo-Extractor

Cloning Crawler

Installation

Virtual Environment

Install Dependencies

Install Chrome driver

Usage

Checker

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages