Esempi in Python per clean_html

Linguaggio di programmazione: Python

Spazio dei nomi/nome del pacchetto: cleaners.html_cleaner

Metodo/funzione: clean_html

Esempi su hotexamples.com: 6

clean_html in Python: 6 esempi trovati. Questi sono i migliori esempi reali in Python per cleaners.html_cleaner.clean_html, estratti da progetti open source. Li puoi valutare, per aiutarci a migliorare la qualità dei nostri esempi.

Esempio n. 1

Mostra file

File: readability.py Progetto: jayakumark/NewsBlur

 def _parse(self, input):
     doc, self.encoding = build_doc(input)
     doc = html_cleaner.clean_html(doc)
     base_href = self.options.get("url", None)
     if base_href:
         doc.make_links_absolute(base_href, resolve_base_href=True)
     else:
         doc.resolve_base_href()
     return doc

Esempio n. 2

Mostra file

File: htmls.py Progetto: Sadhanandh/Chat-thumbnailer

def parse(input, url):
    logging.debug('parse url: %s', url)
    raw_doc = build_doc(input)
    doc = html_cleaner.clean_html(raw_doc)
    if url:
        doc.make_links_absolute(url, resolve_base_href=True)
    else:
        doc.resolve_base_href()
    return doc

Esempio n. 3

Mostra file

File: readability.py Progetto: qij3/NewsBlur

 def _parse(self, input):
     doc, self.encoding = build_doc(input)
     doc = html_cleaner.clean_html(doc)
     base_href = self.options.get('url', None)
     if base_href:
         doc.make_links_absolute(base_href, resolve_base_href=True)
     else:
         doc.resolve_base_href()
     return doc

Esempio n. 4

Mostra file

File: readability.py Progetto: Leechael/python-readability

	def _parse(self, input):
		doc = build_doc(input)
		doc = html_cleaner.clean_html(doc)
		base_href = self.options['url']
		if base_href:
			doc.make_links_absolute(base_href, resolve_base_href=True)
		else:
			doc.resolve_base_href()
		return doc

Esempio n. 5

Mostra file

File: htmls.py Progetto: yishh/lxml-readability

def parse(input, url):
    logging.debug('parse url: %s', url)
    raw_doc = build_doc(input)
    doc = html_cleaner.clean_html(raw_doc)
    if url:
        doc.make_links_absolute(url, resolve_base_href=True)
    else:
        doc.resolve_base_href()
    return doc

Esempio n. 6

Mostra file

File: readability.py Progetto: bgruszka/python-readability

    def __init__(self, url, text=None, page=1, min_article_length=250, min_article_percentage=0.075):
        """
        :param url: the url of the document
        :param text: optionally the string value of the page may be passed in
        :param page: if this is one in a series of documents in an article this should be set
        :param min_article_length: if an article is less than this number of characters it's not an article
        :param min_article_percentage: an article must be this % of the text on the page
        """
        self.url = url
        self.page = page
        self._article = None
        self.min_article_length = min_article_length
        self.min_article_percentage = min_article_percentage

        if text:
            self.text = text
        else:
            self.text = requests.get(url).text

        # parses the HTML and cleans it up removing elements this doesn't want to deal with (e.g., head, script, form)
        doc, self.encoding = build_doc(self.text)
        doc = html_cleaner.clean_html(doc)
        doc.make_links_absolute(self.url, resolve_base_href=True)
        self.html = doc