zhihuquestions

A web spider for zhihu.com, which is used for zhihuquestions.
This spider can scrape question & topic data from zhihu.com.

This spider is based on zhihu-spider.

Author

Tian Gao

Run it

What do you need to run it

Python 2.7.6 (Maybe it work for other versions.)
MySQL
BeautifulSoup

How to run it

Download the code
Set up your database using MySQL
Initialize your database using init.sql
Find out your cookie of zhihu.com throught browser's developer tool.
Modify config.ini
If you set up zhihu username and cookies correctly, you may run initDB.py to get all your current focused topics into database as seeds, otherwise you can manually insert some topics in TOPIC as scrape seeds.
Use python topic.py to get topics and questions from zhihu.com
Use python question.py to analyze questions from zhihu.com
You have to use both topic.py and questions.py in rotation to make the database grow.

Warning

You can change thread amount in config.ini to make this spider run faster.
But your IP may be blocked from zhihu.com if you connect to zhihu.com too frequently.
You'd better use proxy when you use multi thread mode.

License

The MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
user-spider		user-spider
LICENSE		LICENSE
README.md		README.md
config.ini		config.ini
init.sql		init.sql
initDB.py		initDB.py
listSql.py		listSql.py
question.py		question.py
topic.py		topic.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

user-spider

user-spider

LICENSE

LICENSE

README.md

README.md

config.ini

config.ini

init.sql

init.sql

initDB.py

initDB.py

listSql.py

listSql.py

question.py

question.py

topic.py

topic.py

util.py

util.py

Repository files navigation

zhihuquestions

Author

Run it

What do you need to run it

How to run it

Warning

License

About

Releases

Packages

Languages

License

gaogaotiantian/zhihuquestions

Folders and files

Latest commit

History

Repository files navigation

zhihuquestions

Author

Run it

What do you need to run it

How to run it

Warning

License

About

Resources

License

Stars

Watchers

Forks

Languages