Python get_clean_body_content 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: juriscraper.lib.html_utils

메소드/함수: get_clean_body_content

hotexamples.com에서의 예제들: 4

Python get_clean_body_content - 4개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 juriscraper.lib.html_utils.get_clean_body_content에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

파일: manual_import.py 프로젝트: ishammi/courtlistener

 def get_file(location):
     if location.startswith('/'):
         with open(location) as f:
             r = requests.Session()
             r.content = f.read()
     else:
         r = requests.get(location)
     return fromstring(r.content), get_clean_body_content(r.content)

예제 #2

파일 보기

 def get_file(location):
     if location.startswith('/'):
         with open(location) as f:
             r = requests.Session()
             r.content = f.read()
     else:
         r = requests.get(location)
     return fromstring(r.content), get_clean_body_content(r.content)

예제 #3

파일 보기

파일: ill.py 프로젝트: enyst/juriscraper

        def fetcher(url):
            r = requests.get(url,
                             allow_redirects=False,
                             headers={'User-Agent': 'Juriscraper'})
            # Throw an error if a bad status code is returned.
            r.raise_for_status()

            html_tree = html.fromstring(r.text)
            html_tree.make_links_absolute(self.url)

            path = '//p[contains(@style, "justify")]/span[@style="font-weight: bold" ]/../following-sibling::p[not(contains(@style, "justify"))][position()=2]/following-sibling::p'
            summary_string = ""
            for e in html_tree.xpath(path):
                s = html.tostring(e, method='html', encoding='unicode')
                summary_string += s
            return get_clean_body_content(summary_string, remove_extra_tags=['span'])

예제 #4

파일 보기

파일: ill.py 프로젝트: nowherenearithaca/juriscraper

        def fetcher(url):
            r = requests.get(url,
                             allow_redirects=False,
                             headers={'User-Agent': 'Juriscraper'})
            # Throw an error if a bad status code is returned.
            r.raise_for_status()

            html_tree = html.fromstring(r.text)
            html_tree.make_links_absolute(self.url)

            path = '//p[contains(@style, "justify")]/span[@style="font-weight: bold" ]/../following-sibling::p[not(contains(@style, "justify"))][position()=2]/following-sibling::p'
            summary_string = ""
            for e in html_tree.xpath(path):
                s = html.tostring(e, method='html', encoding='unicode')
                summary_string += s
            return get_clean_body_content(summary_string,
                                          remove_extra_tags=['span'])