Skip to content

msean/crawl-manufacurenet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

该爬虫爬取中国制造网相关数据(https://cn.made-in-china.com/)

1 运行环境:

ubuntu 14.04

python 2.7.6

scrapy 1.0.3

redis

spider 目录:

主要负责爬虫页面的解析

scrapy_redis 目录:

通过redis构建分布式,将所有发现的请求放在redis库中,然后所有采集设备从中调度分配请求连接 ,可以参考scrapy-redis源码: https://github.com/darkrho/scrapy-redis.git

statscol目录:

通过graphite进行爬虫监控,可以查看爬虫采集数量以及请求状态,graphite必须装在linux环境中

middlewares目录:

设置浏览器user-agent和代理proxy

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages