Skip to content

phoenix24/grabber

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A bunch of simple webpage crawlers; to help us make some nifty stuff.

- hackaway!


  


These below are known issues; and need to be moved to an issue tracker.


TODOs
===============================
- seperate parser from crawler.
- timestamp and save the crawled webpages;

- multithread the crawler.
- multithread the parser.



List of Websites to crawl from.
===============================
1. flipkart.com       : basic crawler  done.
2. letsbuy.com        : tbd.
3. infibeam.com       : tbd.
4. homeshop18.com     : tbd.
5. themobilestore.com : tbd.
6. futurebazaar.com   :
7. indiaplaza.in      : 
8. saholic.com        : tbd.
9. ibazaar.com        :
10. taggle.com        :
11. buytheprice       :
12. adexmart.com      : later. ajax-loading.
13. landmark          :
14. nbcindia          :
15. pustak            :
16. rediff            :
17. tradus            :
18. uread             :
19. friendsofbooks    :
20. crosword          :
21. coralhub          :
22. coinjoos          :
23. cerramatter       :
24. bookadda          :
25. a1books           :


-- refactor tests.
-- url prefix into settings.
-- source into settings.
-- source, as part of constructor arg.



About

this is a simple grabber project.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published