forked from ArchiveTeam/ArchiveBot
ArchiveBot, an IRC bot for archiving websites
License
mback2k/ArchiveBot
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
1. ArchiveBot <SketchCow> Coders, I have a question. <SketchCow> Or, a request, etc. <SketchCow> I spent some time with xmc discussing something we could do to make things easier around here. <SketchCow> What we came up with is a trigger for a bot, which can be triggered by people with ops. <SketchCow> You tell it a website. It crawls it. WARC. Uploads it to archive.org. Boom. <SketchCow> I can supply machine as needed. <SketchCow> Obviously there's some sanitation issues, and it is root all the way down or nothing. <SketchCow> I think that would help a lot for smaller sites <SketchCow> Sites where it's 100 pages or 1000 pages even, pretty simple. <SketchCow> And just being able to go "bot, get a sanity dump" 2. More info For the user's guide, read the COMMANDS file. For a half-assed installation and operation guide, read INSTALL. For a polished installation guide, submit a pull request. 3. License Copyright 2013 David Yip; made available under the MIT license. See LICENSE for details. 4. Acknowledgments Thanks to Alard (@alard), who added WARC generation and Lua scripting to GNU Wget. Wget+lua was the first web crawler used by ArchiveBot. Thanks to Christopher Foo (@chfoo) for wpull, ArchiveBot's current web crawler. Thanks to Ivan Kozik (@ivan) for maintaining ignore patterns and tracking down performance problems at scale. Other thanks go to the following projects: * Celluloid <http://celluloid.io/> * Cinch <https://github.com/cinchrb/cinch/> * CouchDB <http://couchdb.apache.org/> * Ember.js <http://emberjs.com/> * Redis <http://redis.io/> * Seesaw <https://github.com/ArchiveTeam/seesaw-kit> 5. Special thanks Dragonette, Barnaby Bright, Vienna Teng, NONONO. The memory hole of the Web has gone too far. Don't look down, never look away; ArchiveBot's like the wind. vim:ts=2:sw=2:tw=72:et
About
ArchiveBot, an IRC bot for archiving websites
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published
Languages
- Ruby 58.1%
- Python 29.8%
- CoffeeScript 8.3%
- CSS 3.7%
- JavaScript 0.1%