Skip to content

mrotaru/nscrape

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scraping with Node

Extract structured data from websites using "spiders".

A spider is a json file. The structure is as such:

{
    "name": "reddit",
    "baseUrl": "http://www.reddit.com/r/javascript",
    "itemTypes": [{
        "name": "link",
        "container": ".linklisting",
        "selector": ".thing",
        "properties": {
            "title": "a.title",
            "votes": ".score:not(.dislikes):not(.likes)"
        }
    }]
}

Spiders to load are selected with the --spider option. For example:

node scrape.js  --spider reddit

In this case, nsc will require("reddit") and validate the result against the spider schema. So, reddit can be a standard module, or just a plain JSON file. If it's JSON:

node scrape.js  --spider ./reddit.json

About

Node.js scraper

Resources

Stars

Watchers

Forks

Packages

No packages published