Skip to content

isabella232/node-osmosis

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

#Osmosis

HTML/XML parser and web scraper for NodeJS.

NPM

Build Status

##Features

  • Uses native libxml C bindings

  • Clean promise-like interface

  • Supports CSS 3.0 and XPath 1.0 hybrids in a single selector

  • Powerful JQuery-like CSS extensions

  • No large dependencies like jQuery, cheerio, or jsdom

  • Supports deep and complex data structures

  • HTML parser features

    • Fast parsing
    • Very fast searching
    • Small memory footprint
  • HTML DOM features

    • Load and search ajax content
    • DOM interaction and events
    • Execute embedded and remote scripts
    • Execute code in the DOM
  • HTTP request features

    • Logs urls, redirects, and errors
    • Cookie jar and custom cookies/headers/user agent
    • Login/form submission, session cookies, and basic auth
    • Single proxy or multiple proxies and handles proxy failure
    • Retries and redirect limits

##Example: scrape all craigslist listings

var osmosis = require('osmosis');

osmosis
.get('www.craigslist.org/about/sites')
.find('h1 + div a')
.set('location')
.follow('@href')
.find('header + div + div li > a')
.set('category')
.follow('@href')
.paginate('.totallink + a.button.next:first')
.find('p > a')
.follow('@href')
.set({
    'title':        'section > h2',
    'description':  '#postingbody',
    'subcategory':  'div.breadbox > span[4]',
    'date':         'time@datetime',
    'latitude':     '#map@data-latitude',
    'longitude':    '#map@data-longitude',
    'images':       ['img@src']
})
.data(function(listing) {
    // do something with listing data
})
.log(console.log)
.error(console.log)
.debug(console.log)

##Documentation

For documentation and examples check out https://github.com/rc0x03/node-osmosis/wiki

##Dependencies

##Donate Donations will accelerate development and improve the quality and stability of this project.

###Donation offers:

  • $15 - A custom Osmosis scraper to extract the data you need efficiently and in as few lines of code as possible.
  • $25/month - Become a sponsor. Your company will be listed on this page. Priority support and bug fixes.

Donate

About

Web scraper for NodeJS

Resources

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 100.0%