Skip to content

yupengyan/scraper

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crawler Instructions

If you have any questions, please contact @Anfernee Chang

Product Database Schema

Validation

  • Please run your spider and make sure it passes scraper/pipelines/validation.py before sending it.
  • Please make sure the spider doesn't raise any errors with 'scrapy crawl spider' before sending it.
  • Any spiders sent without checking will result in 'penalties!'

Notes

  1. Please follow PEP8 style.
  2. Please use 'pasre_product' to be the parsing method for A product and pass no meta in if you can.
  3. Please add node's XPath in the spider class variable 'xpaths' dict. We will use these information to check your spider.
  4. Please raises ValueError('XXX!') if the page have no data for the XPath to any Required Fields.
  5. Please use 'copy.deepcopy' or 'new ProductItem()' to re-generate a item for each different product variation(colors etc.).
  6. Since we use Duplicate Filter to save the carwled url, please use 'dont_filter' carefully.
  7. To complete the job, we'd only be requiring the spiders/store.py file from you. Please send it by email.

Running Your Test Crawlers

https://github.com/titanjer/scraper/wiki/Testing

About

scrapy template

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%