ScrapySwarm

About

Messy spiders crawl news.sina.com.cn, news.qq.com, chinanews.com, weibo.cn, based on Python Scrapy frame.
- This proj is serves for a web data analysing proj as a base module.
It's a remote cooperative practice project, so I can't guarantee that every spider works well.
- It may contains code which copied from anywhere, even without a valid licence.
- Few comments in the code.
Our team's git server looks like it will never be ready, so I persuaded the same group to use at least github.
Documents and notes are mostly Chinese.
But I'll try my best to standardize this proj with my buddy.
¯\(ツ)/¯

Our project is a bit special:
It accepts a keyword and starts searching and crawling data that contains keywords，
instead of building a website topology in general.

For a diagram type view of this project, click here

Directory introduction:

/ScrapySwarm, yeh that's a scrapy project.
/Doc, documentions about ScrapySwarm.
scrapy.cfg, auto-generated by scrapy console when init project.
/mysite, a django app, witch have a web interface to run all spiders.
- Can only do run-all-spiders process.
- You'd better to use python script to import Scrapyswarm.control.swarm_api, to run spiders.

Set up environment

https://github.com/boholder/ScrapySwarm/wiki/Set-Up-Environment

How to run

https://github.com/boholder/ScrapySwarm/blob/master/ScrapySwarm/control/swarm_api.py

Name		Name	Last commit message	Last commit date
Latest commit History 119 Commits
Doc		Doc
ScrapySwarm		ScrapySwarm
mysite		mysite
.gitignore		.gitignore
README.md		README.md
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doc

Doc

ScrapySwarm

ScrapySwarm

mysite

mysite

.gitignore

.gitignore

README.md

README.md

scrapy.cfg

scrapy.cfg

Repository files navigation

ScrapySwarm

About

Directory introduction:

Set up environment

How to run

About

Releases

Packages

Contributors 2

Languages

qieting/ScrapySwarm

Folders and files

Latest commit

History

Repository files navigation

ScrapySwarm

About

Directory introduction:

Set up environment

How to run

About

Resources

Stars

Watchers

Forks

Languages