Skip to content

ishdaddy7/crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

crawler

Hybrid personal/work project in to help with a job-related task. Detects what technology vendors are utilized by a given website. In short, it:

  • Finds links on a given site
  • Loads a number of the pages/links
  • Attempts to click the "accept"/privacy notice to fully load the page content.
  • Sniffs HTTP requests made by the browser (via Puppeteer)
  • Attempts to identify vendors based on the hostname in the request URL, checking against a database I created.
  • Writes results to a mysql database, which was visualized in a Redash.io dashboard.

Not the most accurate, as it's unclear which vendors are still active vs. left on the page/unused. But for the task at hand it produced directional insights that fulfilled the objective.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published