Python BlogCrawler.scratch 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: crawler.blogcrawler

클래스/타입: BlogCrawler

메소드/함수: scratch

hotexamples.com에서의 예제들: 4

Python BlogCrawler.scratch - 4개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 crawler.blogcrawler.BlogCrawler.scratch에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

BlogCrawler(2)

scratch(2)

자주 사용되는 메소드들

BlogCrawler (2)

scratch (2)

예제 #1

파일 보기

파일: controller2.py 프로젝트: coolspiderghy/sina_weibo_crawler

 def run(self):
     """
     多线程的入口函数
     """
     crawler = BlogCrawler()
     # crawler = UserCrawler()
     while not Controller.taskpool.empty():
         un = Controller.taskpool.get()
         print "\n已处理 %d 个任务, 还剩 %d 个任务" % (Controller.finished_count, Controller.taskpool.qsize())
         # print uns
         try:
             urls = get_urls(get_uns_uids(config.UID_FILEPATH)[1])
             uns = get_uns_uids(config.UID_FILEPATH)[0]
             # print urls,'testing........'
             print "task start"
             # userinfo_dic={'username':userid}
             # url = 'http://weibo.com/u/1340714021'
             # url='http://weibo.com/u/1756439121'
             # url = 'http://weibo.com/caikangyong'
             # url = 'http://weibo.com/u/1704116960'
             # url = 'http://weibo.com/u/1730336902'
             # userinfo = crawler.scratch(un)
             # Controller.save_userinfo(userinfo,un)
             print "crawlering %s th bloger...." % (uns.index(un) + 1)
             blogs = crawler.scratch(urls[uns.index(un)])
             Controller.save_csv(blogs, un)
             print "task end"
         except:
             print un
         Controller.finished_count += 1

예제 #2

파일 보기

 def run(self):
     """
     多线程的入口函数
     """
     crawler = BlogCrawler()
     #crawler = UserCrawler()
     while not Controller.taskpool.empty():
         uid = Controller.taskpool.get()
         print "\n已处理 %d 个任务, 还剩 %d 个任务" % (Controller.finished_count,
                                            Controller.taskpool.qsize())
         #print uid
         try:
             print 'task start'
             #userinfo_dic={'username':userid}
             url = 'http://weibo.com/u/1340714021'
             #url='http://weibo.com/u/1756439121'
             #url = 'http://weibo.com/caikangyong'
             #url = 'http://weibo.com/u/1704116960'
             #url = 'http://weibo.com/u/1730336902'
             #userinfo = crawler.scratch(uid)
             blogs = crawler.scratch(url)
             Controller.save_csv(blogs, uid)
             print 'task end'
         except:
             print uid
         Controller.finished_count += 1

예제 #3

파일 보기

파일: controller.py 프로젝트: coolspiderghy/sina_weibo_crawler

    def run(self):
        """
        多线程的入口函数
        """
        crawler = BlogCrawler()
        #crawler = UserCrawler()
        while not Controller.taskpool.empty():
            uid = Controller.taskpool.get()
            print "\n已处理 %d 个任务, 还剩 %d 个任务" % (Controller.finished_count, Controller.taskpool.qsize())
            #print uid
            try:
		print 'task start'
                #userinfo_dic={'username':userid}
                url = 'http://weibo.com/u/1340714021'
                #url='http://weibo.com/u/1756439121'
                #url = 'http://weibo.com/caikangyong'
                #url = 'http://weibo.com/u/1704116960'
                #url = 'http://weibo.com/u/1730336902'
                #userinfo = crawler.scratch(uid)
                blogs = crawler.scratch(url)
                Controller.save_csv(blogs, uid)
		print 'task end'
            except:
                print uid
            Controller.finished_count += 1

예제 #4

파일 보기

파일: controller2.py 프로젝트: coolspiderghy/sina_weibo_crawler

 def run(self):
     """
     多线程的入口函数
     """
     crawler = BlogCrawler()
     #crawler = UserCrawler()
     while not Controller.taskpool.empty():
         un = Controller.taskpool.get()
         print "\n已处理 %d 个任务, 还剩 %d 个任务" % (Controller.finished_count,
                                            Controller.taskpool.qsize())
         #print uns
         try:
             urls = get_urls(get_uns_uids(config.UID_FILEPATH)[1])
             uns = get_uns_uids(config.UID_FILEPATH)[0]
             #print urls,'testing........'
             print 'task start'
             #userinfo_dic={'username':userid}
             #url = 'http://weibo.com/u/1340714021'
             #url='http://weibo.com/u/1756439121'
             #url = 'http://weibo.com/caikangyong'
             #url = 'http://weibo.com/u/1704116960'
             #url = 'http://weibo.com/u/1730336902'
             #userinfo = crawler.scratch(un)
             #Controller.save_userinfo(userinfo,un)
             print 'crawlering %s th bloger....' % (uns.index(un) + 1)
             blogs = crawler.scratch(urls[uns.index(un)])
             Controller.save_csv(blogs, un)
             print 'task end'
         except:
             print un
         Controller.finished_count += 1