Skip to content

govsb/Crawler-Of-Lianjia

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

Crawler Of Lianjia,Python,Scrapy

#items_*.py

is for independent spider in directory spiders, every time you want to creat a new spider , please add one and edit it

#lianjia_bj_zufang.py

crawl data of lianjia_zufang data in beijing .Because the next page is generated by javascript that couldn't get it from scrapy response . So you should crawl all the urls first by lianjia_url.

#lianjia_ershou

I thought it's more simple for crawling data from json url , but the trick here is you have to use for loop and change the 'return' to 'yield' the same as above.The mistake occurs too to lianjia_bj_zufang found here finally.So sad.

#nohup

All the spiders could run with nohup.such as 'nohup scrapy crawl lianjia_ershou -o lianjia_ershou.json > nohup_lianjia_ershou.out&' .So you can run the thread in background and continue something else. More else ,you will get all the log in out file, it's so convenient for crawlers.

About

Crawler Of Lianjia,Python,Scrapy

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%