oneshot

Tool to take a snapshot of a web page and dump it into a single html file, taking advantage of data urls to keep even binary data (mostly images) in the document and require no other files. This way it can be viewed in any modern browser without special plugins, is easy to ship around and for other toosl to deal with (relatively) without taking special file naming/directory layouts into account, and prevents you from having to keep potentially hundreds of extra files around for a single page.

This is still just a first pass implementation and still has a number of problems. The saving strategy and concerns were influenced by the paper describing webpagedump http://www.dbai.tuwien.ac.at/user/pollak/webpagedump/ .

Current Implementation:

SnapshotLocal or SnapshotRemote, indicates where to read the file from and how to treat relative paths inside the document
read the local file or GET the remote file. at the moment, remote runs RunPhantom and uses files output by Phantomjs
parse the content into a gokogiri DOM document
call processDocument:
- fixupScripts: fetches and inlines any remote scripts
- remove script tags (alternative: neuter script tags)
- fixupCss:
  - expand CSS urls for styles that are already inline
  - fetch remote CSS, change from link to style elements, expand urls to absolute, and inline the content
- findImageReferences (populate keys in our imageData map for later fetching of values):
  - src attribute from img elements
  - values inside url() in style elements
- fetch the files referenced by each key in the imageData map, datafy it, and save it as the value in imageData map
- go back through document and replace img src and css url() references with the data urls
write out the document to a file

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
takeshot		takeshot
README.md		README.md
oneshot.go		oneshot.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

oneshot

About

Releases

Packages

Languages

fcheslack/oneshot

Folders and files

Latest commit

History

Repository files navigation

oneshot

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages