Python HandleContent.get_BScontext 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: crawlerimpl.weixin.processdata

클래스/타입: HandleContent

메소드/함수: get_BScontext

hotexamples.com에서의 예제들: 2

Python HandleContent.get_BScontext - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 crawlerimpl.weixin.processdata.HandleContent.get_BScontext에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

strformat(2)

get_BScontext(1)

자주 사용되는 메소드들

strformat (2)

get_BScontext (1)

예제 #1

파일 보기

 def crawl(self):
     key = self.key
     data = self.data
     homepage = "http://card.weibo.com/article/aj/articleshow?cid=" + key
     url = "http://weibo.com/p/" + key
     html_stream = _get_url(homepage)
     json_stream = change_to_json(str(html_stream.text))
     html_stream = json_stream['data']['article']
     soup = HandleContent.get_BScontext(html_stream, text=True)
     title = soup.select('.title')[0].text
     pubtime = soup.select('.time')[0].text
     pubtime = HandleContent.strformat(str(pubtime))
     content = soup.select('.WBA_content')[0]
     content = clear_label(list(content))
     comment = {}
     text = HandleContent.get_BScontext(content, text=True).text
     comment['content'] = clear_space(text)
     publishers = soup.select('.S_link2')
     # author = reduce(lambda x, y: x + y, [item.text for item in authors])
     try:
         publisher = publishers[1].text if len(
             publishers) > 1 else publishers[0].text
     except:
         publisher = ''
     crawl_data = {}
     date = new_time()
     crawl_data = {
         'title': title,
         'pubtime': pubtime,
         'source': 'weibo',
         'publisher': publisher,
         'crtime_int': date.get('crtime_int'),
         'crtime': date.get('crtime'),
         'origin_source': u'微博搜索',
         'url': url,
         'key': data.get('key', ''),
         'type': u'元搜索',
         'source_type': data.get('source_type', ''),
         'content': content,
         'comment': comment,
     }
     model = SearchArticleModel(crawl_data)
     export(model)

예제 #2

파일 보기

파일: weibo.py 프로젝트: xxguo/crawler

 def crawl(self):
     key = self.key
     data = self.data
     homepage = "http://card.weibo.com/article/aj/articleshow?cid="+ key
     url = "http://weibo.com/p/"+ key
     html_stream = _get_url(homepage)    
     json_stream = change_to_json(str(html_stream.text))
     html_stream = json_stream['data']['article']
     soup = HandleContent.get_BScontext(html_stream, text=True)
     title = soup.select('.title')[0].text
     pubtime = soup.select('.time')[0].text
     pubtime = HandleContent.strformat(str(pubtime))
     content = soup.select('.WBA_content')[0]
     content = clear_label(list(content))
     comment = {}
     text = HandleContent.get_BScontext(content, text=True).text
     comment['content'] = clear_space(text)
     publishers = soup.select('.S_link2')
     # author = reduce(lambda x, y: x + y, [item.text for item in authors])
     try:
         publisher = publishers[1].text if len(publishers)> 1 else publishers[0].text
     except:
         publisher = ''
     crawl_data = {}
     date = new_time()
     crawl_data = {
         'title': title,
         'pubtime': pubtime,
         'source': 'weibo',
         'publisher': publisher,
         'crtime_int': date.get('crtime_int'),
         'crtime': date.get('crtime'),
         'origin_source': u'微博搜索',
         'url': url,
         'key': data.get('key', ''),
         'type': u'元搜索',
         'source_type': data.get('source_type', ''),
         'content': content,
         'comment': comment,
     }
     model = SearchArticleModel(crawl_data)
     export(model)