Scrapy Random User-Agent

Does your scrapy spider get identified and blocked by servers because you use the default user-agent or a generic one?

Use this random_useragent module and set a random user-agent for every request. You are limited only by the number of different user-agents you set in a text file.

Installing

Installing it is pretty simple.

pip install scrapy-random-useragent

Usage

In your settings.py file, update the DOWNLOADER_MIDDLEWARES variable like this.

DOWNLOADER_MIDDLEWARES = {
    'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None,
    'random_useragent.RandomUserAgentMiddleware': 400
}

This disables the default UserAgentMiddleware and enables the RandomUserAgentMiddleware.

Then, create a new variable USER_AGENT_LIST with the path to your text file which has the list of all user-agents (one user-agent per line).

USER_AGENT_LIST = "/path/to/useragents.txt"

Now all the requests from your crawler will have a random user-agent picked from the text file.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.rst		README.rst
random_useragent.py		random_useragent.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scrapy Random User-Agent

Installing

Usage

About

Releases 1

Packages

Contributors 3

Languages

License

cnu/scrapy-random-useragent

Folders and files

Latest commit

History

Repository files navigation

Scrapy Random User-Agent

Installing

Usage

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

Packages