Name		Name	Last commit message	Last commit date
Latest commit History 135 Commits
schemas		schemas
src		src
test		test
tools		tools
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
NOTES.md		NOTES.md
README.md		README.md
package.json		package.json
sample-spider.json		sample-spider.json

Repository files navigation

Scraping with Node

Extract structured data from websites using "spiders".

A spider is a json file. The structure is as such:

{
    "name": "reddit",
    "baseUrl": "http://www.reddit.com/r/javascript",
    "itemTypes": [{
        "name": "link",
        "container": ".linklisting",
        "selector": ".thing",
        "properties": {
            "title": "a.title",
            "votes": ".score:not(.dislikes):not(.likes)"
        }
    }]
}

Spiders to load are selected with the --spider option. For example:

node scrape.js  --spider reddit

In this case, nsc will require("reddit") and validate the result against the spider schema. So, reddit can be a standard module, or just a plain JSON file. If it's JSON:

node scrape.js  --spider ./reddit.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scraping with Node

About

Releases

Packages

Contributors 2

Languages

mrotaru/nscrape

Folders and files

Latest commit

History

Repository files navigation

Scraping with Node

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages