Is there any way to build my own, self hosted, Internet archive, but just for sites I want?
Eg, I specify a list of domains, or Web pages, and it will use a headless browser to download the site, and save it (and all images etc) locally. And then have a way for me to browse those later?
#archive #anarchive #selfhost
@lifning hehe. Yeah I know about wget 🙂. It's a type of software I'd like myself and rather than try to write things myself, I'm seeing if anyone else has done it.
@ebel there may not be too terribly much else to need to build beyond what its mirror mode provides, though? if memory serves, the result is something you could plop into a web server and browse. unless I'm missing the problem statement :p
@ebel s/missing/misunderstanding/
@lifning websites which load things from JS wouldn't work. I'm not sure how good wget is at converting *all* HTML links
@ebel ah that's right, the web sucks because it's 2017. :( blah
@lifning plus I'd like the show system of backing it all up. Dates etc.
@ebel maybe sounds like something you could do with Huginn or maybe wallabag. Huginn lets you create agent scripts based on various hooks (like ifttt); wallabag is like bookmarking, but it creates a stored copy of the bookmarked site, iirc it strips it down for storage reasons, but that’s configurable.
@ebel wget -mk http://example.com
may be a good place to start!