Python html_to_unicode 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: encoding

메소드/함수: html_to_unicode

hotexamples.com에서의 예제들: 7

Python html_to_unicode - 7개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 encoding.html_to_unicode에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

파일: test_linote.py 프로젝트: pombredanne/linote

 def test_html_to_unicode(self):
     """Linote html_to_unicode function"""
     html_to_unicode(
         'charset=("zh_cn")',
         '<html><h1>漢字汉字</h1></html>').should.eq(
             ('utf8',
              u'<html><h1>\u6f22\u5b57\u6c49\u5b57</h1></html>'))

예제 #2

파일 보기

파일: worker.py 프로젝트: pombredanne/recrawler

def handle(job, *args, **kwargs):
    queue = kwargs['queue']
    task = json.loads(job)
    url = task["url"]
    status, source = fetcher.fetch(url, use_proxy=False)
    logger.info('%s|%s' % (url, status))
    try:
        _, source = encoding.html_to_unicode('', source)
    except Exception, e:
        print e

예제 #3

파일 보기

파일: utils.py 프로젝트: hackrole/scrapy-utils

def handle(job, *args, **kwargs):
    print 'handle', args, kwargs
    task = json.loads(job)
    url = task["url"]
    domain = tldextracter.extract_domain(url)
    status, content = fetch(url, use_proxy=False)
    try:
        url = url.encode('utf8')
        urlhash = cityhash.CityHash64(url)
    except:
        return (url, None, status, domain, content)
    logger.info('%s|%s' % (url, status))
    if magic.from_buffer(content, mime=True) != 'text/html':
        return (url, urlhash, status, domain, content)
    _, content = encoding.html_to_unicode('', content)
    if status != 200:
        db.push(url, detail=False)
        return (url, urlhash, status, domain, content)
    return (url, urlhash, status, domain, content)

예제 #4

파일 보기

파일: linote.py 프로젝트: pombredanne/linote

 def format(self, note):
     content = ''
     if note is not None:
         _, content = encoding.html_to_unicode('', note.content)
         content = encoding_match.sub('', content)
     return content

예제 #5

파일 보기

파일: test_linote.py 프로젝트: solos/linote

 def test_html_to_unicode(self):
     """Linote html_to_unicode function"""
     html_to_unicode(
         'charset=("zh_cn")', '<html><h1>漢字汉字</h1></html>').should.eq(
             ('utf8', u'<html><h1>\u6f22\u5b57\u6c49\u5b57</h1></html>'))

예제 #6

파일 보기

파일: extracter.py 프로젝트: solos/sohutv

        'nid': nid,
        'pid': pid,
        'cover': cover,
        'playlistId': playlistId,
        'o_playlistId': o_playlistId,
        'cid': cid,
        'subcid': subcid,
        'osubcid': osubcid,
        'category': category,
        'cateCode': cateCode,
        'pianhua': pianhua,
        'tag': tag,
        'tvid': tvid,
        'title': title,
        'last': last,
        'brief': brief
    }
    return item

if __name__ == '__main__':
    import fetcher
    url = 'http://tv.sohu.com'
    url = 'http://tv.sohu.com/20131223/n392267093.shtml'
    url = 'http://tv.sohu.com/20131223/n392267093.shtml'
    status, content = fetcher.fetch(url)
    _, ucontent = encoding.html_to_unicode('', content)
    #print extract_links(url, ucontent)
    #print extract_content(url, ucontent)
    #print extract_sohutv(url, ucontent)
    print extract_sohutv_data_by_regex(url, ucontent)

예제 #7

파일 보기

 def format(self, note):
     content = ''
     if note is not None:
         _, content = encoding.html_to_unicode('', note.content)
         content = encoding_match.sub('', content)
     return content