Esempi in Python per Parser.getFormattedText

Linguaggio di programmazione: Python

Spazio dei nomi/nome del pacchetto: goose.parsers

Classe/tipologia: Parser

Metodo/funzione: getFormattedText

Esempi su hotexamples.com: 2

Parser.getFormattedText in Python: 2 esempi trovati. Questi sono i migliori esempi reali in Python per goose.parsers.Parser.getFormattedText, estratti da progetti open source. Li puoi valutare, per aiutarci a migliorare la qualità dei nostri esempi.

Metodi utilizzati di frequente

Mostra Nascondi

getAttribute(9)

fromstring(9)

css_select(6)

getPath(4)

hasChildTag(3)

clearText(2)

createElement(2)

getFormattedText(2)

hasChildTags(2)

adjustTopNode(1)

childNodesWithText(1)

getComments(1)

getElementById(1)

removeTitle(1)

Esempio n. 1

Mostra file

File: outputformatters.py Progetto: iKalin/python-goose

 def convertToText(self,article):
     text = Parser.getFormattedText(self.topNode)
     lines = text.split(u'\n')
     good_lines = []
     for line in lines:
         if re.search('[^ \xa0]',line): good_lines.append(line.strip())
     text = u'\n'.join(good_lines)
     Parser.adjustTopNode(article)
     return text

Esempio n. 2

Mostra file

File: outputformatters.py Progetto: ilovenwd/python-goose

    def convertToText(self,article):
        txts = []
        for node in list(self.getTopNode()):
            txt = Parser.getFormattedText(node)
            if txt:
                txt = HTMLParser().unescape(txt)
                txts.append(innerTrim(txt))
        text = '\n'.join(txts)
	text = re.sub(u'[\ufffc]','\n',text)
        lines = text.split('\n')
        text = ''
        # cutting title from article text if found in first 4 rows
        if len(lines) > 4:
            for i in range(0,4):
                if lines[i] == article.h1 or lines[i] == article.title:
                    del lines[i]
                    break
        for line in lines:
            if re.search('[^ \t\r]',line): text += line + '\n'
        return text