Skip to content

jnawjux/web_scraping_corgis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Corgis of Instagram!

A fun scraping project to see the recent posts from a bunch of prominent Corgis of Instagram to see their comparative performance and some of the hastags and mentions they use to help promote their posts. You can read more about the process from my related blog post Tutorial: Web scraping Instagram’s most precious resource — corgis.

The functions being used do the following:

  • recent_post_links: scrapes the most recent Instagram posts and grabs their urls (can set any number)
  • insta_link_details: takes a post url and returns a dictionary with post details, including:
    • link - original url link
    • type - whether it is a photo or video
    • likes/views - count of likes or views for photo or video
    • age - when posted
    • comment - initial comment from poster
    • hashtags - hashtags extracted from comment, via regular expression
    • mentions - mentions extracted from comment, via regular expression

Bonus: How to extract the photo:

  • insta_url_to_img: gets photo from post url. Note: does not currently work with posts that have multiple images, only grabs the first, and does not work with videos.

Quick Start

This works to make a csv file with all of the from insta_scrape.py. Installation: Clone this repo and cd into it. Make sure geckodriver the Firefox Selenium Driver is executable in project path. Edit the make.py file with your desired username. Then run python3 make.py to compile your CSV.

About

🐶📸 Demo of web scraping Instagram with Selenium

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published