This is a small project that demonstrates how you could scrape flight data from Google Flights using Scrapy and Pyppeteer. Currently, only one-way flights are supported.
To run this spider, you will need to install pyppeteer, scrapy-poet and spidermon.
After cloning and installing the necessary requirements, you should be able to start the spider with the command scrapy crawl google_flights
.
You can see more options for running the spider with scrapy crawl --help
.
The searches performed by the spider are performed in the file searches.json
. This file can be found in the folder scrapy_google_flights/resources/
.
The format looks like this:
{
"origin": "airport from which you start",
"destination": "airport at which you arrive",
"days_to_depart": days until start of flight
}
origin
and destination
are specified as IATA airport codes. You can use the search engine for IATA airport codes provided here.
days_to_depart
is an integer which defines in how many days from now the flight starts.
For example, if you wanted to travel from "Berlin Brandenburg Airport" (BER) to "Barcelona–El Prat Airport" (BCN) in 30 days, your json would look like this:
{
"origin": "BER",
"destination": "BCN",
"days_to_depart": 30
}
You can get notifications when the Spider starts and finishes via Telegram.
To enable these notifications, you need to create a Telegram bot and obtain your api access token.
A decent guide on how to do that can be found here.
After creating your bot, you can add and message it via Telegram. Then, you could replace <BOT_TOKEN_HERE> with your token and get the chat id here: https://api.telegram.org/bot<BOT_TOKEN_HERE>/getUpdates
.
You can then set the following values in settings.py
:
SPIDERMON_TELEGRAM_SENDER_TOKEN = 'your api access token'
SPIDERMON_TELEGRAM_RECIPIENTS = ['chat id']
For more options see this guide: How do I configure a Telegram bot for Spidermon?