GitHub

Author:Haibo wang. Date:2016-03-23

you can visit me in my website:http://hbnnlove.sinaapp.com.

This spider is used to crawl job information from several famous websites.such as zhilian,job51,dajie,lagou and byr etc. this module is very simple to use.for excample:



from jobspider.spider import producer

 if __name__ == "__main__":

    #producer(store_type='MySQL')
    # run single spider
    spider = Spider('python')
	spider.single_run("lagou")

By default,after the programm ends,the data has been stored in json file. of course,you can store data to other files or databases, but you must has created corresponding table.you can add code in baseclass.utils.store_data py file.

If you wanna to store data to MySQL.you should add user into MySQL.user whose username is admin,password:admin of course ,you should create table by yourself,the field includes:int id,title varchar,company varchar,linke varchar, salary,create_day,date.

In addition,you can write yourself spider like this:


 from jobspider.baseclass.base_spider import Base_Spider

 class DJ_Spider(Base_Spider):

    def __init__(self,website,*args):
        super(DJ_Spider,self).__init__(website,args)

    def parse(self,url):
        pass

    def pages_parse(self,keyword):
        for page in xrange(1,2):
            url = ''
            self.parse(url)


 if __name__=="__main__":
    dj = DJ_Spider('dajie','X-Requested-With','Host','Referer','Cookie')
    for item in dj.pages_parse('python'):
        print item

you should write the information of headers like X-http-request,cookie,host,referer into config/webinfo.cfg.

as long as you can write code like that above,you can get data normally.

At last, you can query result in my website:http://hbnnlove.sinaapp.com.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
build/lib/jobspider		build/lib/jobspider
dist		dist
doc		doc
jobspider.egg-info		jobspider.egg-info
jobspider		jobspider
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
conf.py		conf.py
lauch.sh		lauch.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

build/lib/jobspider

build/lib/jobspider

dist

dist

doc

doc

jobspider.egg-info

jobspider.egg-info

jobspider

jobspider

.gitignore

.gitignore

README.md

README.md

init.py

init.py

conf.py

conf.py

lauch.sh

lauch.sh

setup.py

setup.py

Repository files navigation

About

Releases

Packages

Languages

haipersist/jobspider

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages