Skip to content

hewayGitHub/2014-cola

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cola

##Change log: Sometime, we need to crawl a small network of users, so i rewrite the code to support for crawl by setting the level of network you want to crawl

Cola is a distributed crawling framework.

Why named cola? hmm, I like cola, and cola sounds a bit like crawler.

##Quick Start

  • download or clone source code, add cola to python path.
  • start cola master: /path/to/cola/bin/start_master.py
  • start cola worker: /path/to/cola/bin/start_worker.py --master [ip address]
  • run job: /path/to/cola/bin/coca.py -runLocalJob /path/to/cola/contrib/wiki

##Tips

  • Chinese docs(wiki).
  • I am trying my best to make cola stable.
  • Cola can also run in a single machine, you don't need to start master, workers and so on. Everything is simple!

About

A distributed crawling framework.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%