Exemplos de Match.SinaBlog em Python

Linguagem de programação: Python

Espaço para nome / nome do pacote: src.tools.match

Classe / Tipo: Match

Método / Função: SinaBlog

Exemplos em hotexamples.com: 2

Match.SinaBlog em Python - 2 exemplos encontrados. Esses são os exemplos do mundo real mais bem avaliados de src.tools.match.Match.SinaBlog em Python extraídos de projetos de código aberto. Você pode avaliar os exemplos para nos ajudar a melhorar a qualidade deles.

Métodos Frequentes

Exibir Ocultar

fix_html(5)

fix_filename(5)

get_website_kind(3)

column(3)

create_img_element_with_file_name(3)

csdnblog_author(2)

sinablog_author(2)

jianshu_author(2)

get_url_kind(2)

SinaBlog_profile(2)

SinaBlog(2)

create_local_img_src(2)

article(2)

answer(2)

author(2)

collection(2)

huxiu(1)

doc360(1)

replace_words(1)

jianshu_notebooks(1)

jianshu_collection(1)

fiel(1)

isUrlOk(1)

huawei(1)

cnblogs_author(1)

html_body(1)

detect_recipe_kind(1)

avatar_create_img_element_with_file_name(1)

get_recipe_kind(1)

generate_img_src(1)

format_avatar(1)

sinablog_profile(1)

Métodos Frequentes

fix_html (5)

fix_filename (5)

get_website_kind (3)

column (3)

create_img_element_with_file_name (3)

csdnblog_author (2)

sinablog_author (2)

jianshu_author (2)

get_url_kind (2)

SinaBlog_profile (2)

Métodos Frequentes

SinaBlog (2)

create_local_img_src (2)

article (2)

answer (2)

author (2)

collection (2)

huxiu (1)

doc360 (1)

replace_words (1)

jianshu_notebooks (1)

jianshu_collection (1)

fiel (1)

isUrlOk (1)

huawei (1)

cnblogs_author (1)

html_body (1)

detect_recipe_kind (1)

avatar_create_img_element_with_file_name (1)

get_recipe_kind (1)

generate_img_src (1)

Métodos Frequentes

jianshu_collection (1)

fiel (1)

isUrlOk (1)

huawei (1)

cnblogs_author (1)

html_body (1)

detect_recipe_kind (1)

avatar_create_img_element_with_file_name (1)

get_recipe_kind (1)

generate_img_src (1)

format_avatar (1)

sinablog_profile (1)

Métodos Frequentes

format_avatar (1)

sinablog_profile (1)

Exemplo n.º 1

0

Exibir arquivo

def create_work_set(self, target_url): u""" 根据博客首页的url, 首先通过re获得博客id, 然后根据博客"关于我"的页面的内容获得写入SinaBlog_Info 的数据(这部分理应不在这个函数中, 可以改进), 最后通过博客目录页面的内容, 获得每篇博文的地址, 放入work_set中 :param target_url: 博客首页的url :return: """ Debug.logger.debug(u"target_url是:" + str(target_url)) if target_url in self.task_complete_set: return result = Match.SinaBlog(target_url) SinaBlog_author_id = int(result.group('SinaBlog_people_id')) href_article_list = 'http://blog.sina.com.cn/s/articlelist_{}_0_1.html'.format( SinaBlog_author_id) href_profile = 'http://blog.sina.com.cn/s/profile_{}.html'.format( SinaBlog_author_id) # ############下面这部分应该是SinaBlogAuthorWorker的内容, 写到SinaBlog_Info, 暂时写在这, 以后再优化 content_profile = Http.get_content(href_profile) parser = SinaBlogParser(content_profile) self.question_list += parser.get_SinaBlog_info_list() # Debug.logger.debug(u"create_work_set中的question_list是什么??" + str(self.question_list)) # #############上面这部分应该是SinaBlogAuthorWorker的内容, 写到SinaBlog_Info, 暂时写在这, 以后再优化 # content_index = Http.get_content(href_index) content_article_list = Http.get_content(href_article_list) article_num = int(self.parse_article_num(content_article_list)) Debug.logger.debug(u"article_num:" + str(article_num)) if article_num % 50 != 0: page_num = article_num / 50 + 1 # 博客目录页面, 1页放50个博客链接 else: page_num = article_num / 50 self.question_list[0][ 'article_num'] = article_num # 这样的话, 每行只能放一个新浪博客地址!!! # 上面这行, 暂时只能这样写, 因为"关于我"的页面, 没有文章的数量 self.task_complete_set.add(target_url) for page in range(page_num): url = 'http://blog.sina.com.cn/s/articlelist_{}_0_{}.html'.format( SinaBlog_author_id, page + 1) content_article_list = Http.get_content(url) article_list = self.parse_get_article_list(content_article_list) for item in article_list: self.work_set.add(item) # self.work_set.add(article_list[0]) return

Exemplo n.º 2

0

Exibir arquivo

def parse_SinaBlog(command): u""" :param command: 某个新浪博客博主的首页地址 :return: task: """ result = Match.SinaBlog(command) SinaBlog_author_id = result.group('SinaBlog_people_id') Debug.logger.debug(u"SinaBlog_people_id:" + str(SinaBlog_author_id)) task = SingleTask() task.author_id = SinaBlog_author_id task.kind = 'SinaBlog' task.spider.href_article_list = 'http://blog.sina.com.cn/s/articlelist_{}_0_1.html'.format(SinaBlog_author_id) task.spider.href = 'http://blog.sina.com.cn/u/{}'.format(SinaBlog_author_id) task.spider.href_profile = 'http://blog.sina.com.cn/s/profile_{}.html'.format(SinaBlog_author_id) task.book.kind = 'SinaBlog' task.book.sql.info_extra = 'creator_id = "{}"'.format(SinaBlog_author_id) task.book.sql.article_extra = 'author_id = "{}"'.format(SinaBlog_author_id) task.book.author_id = SinaBlog_author_id Debug.logger.debug(u"在parse_SinaBlog中, task.book.author_id为" + str(task.book.author_id)) return task