bleg: web mirroring tool

Feb 16, 2007 12:01

My current project is to annotate web pages. Since these pages could change, go down, etc, I need to make a static mirror.

I have used WebSuck+WebGet, which mirror the HTMLs found. I imagine this worked great 10 years ago, before the era of dynamically-generated web content.

It has a few problems:
* if it visits a page that ends in "/" (i.e. index.html or similar), it won't know to save the file as index.html.
* if it visits a dynamically-generated page, it won't save the content as an HTML file. If I wanted to save PHPs as PHP, I would need some way to set up a server, etc, which is a bad idea. The ideal solution is to rename the saved PHP (it's saved statically) and fix the links.
* it won't fix the links to point to content in the mirror. This shouldn't be too hard to do with a search&replace script.

Any ideas?
Previous post Next post
Up