ScrapyBot

This is simple scrapybot to scratch out some data i need, it uses scrapy-framework.

it has a mongo backend, i write a simple & flexiable toy mongo piple. to automatically sync data to mongodb. scrapymongo.

Current spiders in it.

* qidan  - famous chinese online novel site.
* pptv - chinese online video site
* deviantart -  art/pictures sharing site
* freelancer.com - a popular freelancer site
* proxylist(plist) -   freeproxylists.net 
* crunchbase -  a YP, company information site.  ( not work now  due to site redesign)
* manta  - small business info site. (site is not reachable now)

* cache - a web crawler for cache whole site with externdable filter method

and a simple flask webapp to visulize scraped data for

* qidan

Try it

    git clone https://github.com/roadt/scrapybot
	git submodule update
	scrapy crawl qidian     # make sure your default (local) mongodb is up.

or scrapy to csv file

    scrapy crawl qidian -o qidian.csv   [download example result data](./qidian.csv)

start flask app $ web/qidian/app.sh

More examples:

# cache whole dmoz site (restrict link resolution among dmoz.org site)
scrapy crawl cache  -a url=http://www.dmoz.org -a regex=dmoz.org

# scrap a dev art (pic or file)
scrapy crawl devart -a url=http://xxx.deviantart.com/art/YYYY

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
scrapybot		scrapybot
scrapymongo @ fce0870		scrapymongo @ fce0870
shovel		shovel
test		test
web/qidian		web/qidian
.gitignore		.gitignore
.gitmodules		.gitmodules
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
Guardfile		Guardfile
README.md		README.md
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scrapybot

scrapybot

scrapymongo @ fce0870

scrapymongo @ fce0870

shovel

shovel

test

test

web/qidian

web/qidian

.gitignore

.gitignore

.gitmodules

.gitmodules

Gemfile

Gemfile

Gemfile.lock

Gemfile.lock

Guardfile

Guardfile

README.md

README.md

requirements.txt

requirements.txt

scrapy.cfg

scrapy.cfg

setup.py

setup.py

Repository files navigation

ScrapyBot

Try it

MIT License.

About

Releases

Packages

Languages

roadt/scrapybot

Folders and files

Latest commit

History

Repository files navigation

ScrapyBot

Try it

MIT License.

About

Resources

Stars

Watchers

Forks

Languages