A scraper to pull images from NBDC's network of buoys that have cam feeds (link), lonely sensors that constantly monitor our seas, air and atmosphere. The cameras are updated on an hourly basis, so the script is set to scrape the feeds every hour from the time it starts running.
Make sure you have node installed. Download the files, open a terminal in the buoy-cam-scraper folder and run:
npm install
Open up a terminal in buoy-cam-scraper/ and run:
npm run scrape-cams
This will start reading and capturing new images from the buoy cameras and saving them to scraped-images. The script will re-scrape every 1 hour. Each file is named in the following format:
[UTC ms when photo was taken by buoy]-[UTC ms when photo was downloaded]-[ID of buoy].jpg
The UTC time when the photo was taken by the buoy is parsed via OCR using tesseract.js. Images that have no data (i.e. all white images) and images that have already been downloaded will be skipped.
The code is written to be run on a Raspberry Pi. On a Raspberry Pi 3, it takes about ~2 minutes to scrape images from all the buoy cams. The bulk of the time is spent on OCR.
See the info about the NBDC web APIs here.
The data folder contains some scraped meta information:
- buoycam-id-list.json - a list of buoy IDs that have cameras, hand collected on 8/1/17
- buoycam-info.json - meta information about the buoy cams including: name, lat-long location and a base 64 image of the buoy. Scraped on 3/27/18 using
npm run gather-station-info
.
Note: last scraped data is stored with date appended to filenames.