Skip to content

toby1991/PageCrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

##PageCrawler working_screenshot ###0x01 DESCRIPTION


For html page crawling.
Written by Python 3.5
###0x02 NEEDLE


  • pyquery
  • pycurl
  • BytesIO
  • os
  • re

####Attention: Packages all above are for python3!
###0x03 HOW TO USE


import crawler
page_name = 'xxx' 
base_url = 'http://xxx/themes/xxx/'
first_page = 'index.html'
retry_times = 3
result = crawler.crawl(page_name, base_url, first_page, retry_times)
if result == True:
    print('oh yeah.')

after these codes, the pages will appear in the folder named variable page_name, in this example, folder named xxx ###0x04 WHAT CANNOT DO


In currently version, it can just crawl for html page, pages that written by angularjs would cannot be crawled.

About

For html page crawling.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages