Python Parser.getFormattedText Examples

Programming Language: Python

Namespace/Package Name: goose.parsers

Class/Type: Parser

Method/Function: getFormattedText

Examples at hotexamples.com: 2

Python Parser.getFormattedText - 2 examples found. These are the top rated real world Python examples of goose.parsers.Parser.getFormattedText extracted from open source projects. You can rate examples to help us improve the quality of examples.

Frequently Used Methods

Show Hide

getAttribute(9)

fromstring(9)

css_select(6)

getPath(4)

hasChildTag(3)

clearText(2)

createElement(2)

getFormattedText(2)

hasChildTags(2)

adjustTopNode(1)

childNodesWithText(1)

getComments(1)

getElementById(1)

removeTitle(1)

Example #1

Show file

File: outputformatters.py Project: iKalin/python-goose

 def convertToText(self,article):
     text = Parser.getFormattedText(self.topNode)
     lines = text.split(u'\n')
     good_lines = []
     for line in lines:
         if re.search('[^ \xa0]',line): good_lines.append(line.strip())
     text = u'\n'.join(good_lines)
     Parser.adjustTopNode(article)
     return text

Example #2

Show file

File: outputformatters.py Project: ilovenwd/python-goose

    def convertToText(self,article):
        txts = []
        for node in list(self.getTopNode()):
            txt = Parser.getFormattedText(node)
            if txt:
                txt = HTMLParser().unescape(txt)
                txts.append(innerTrim(txt))
        text = '\n'.join(txts)
	text = re.sub(u'[\ufffc]','\n',text)
        lines = text.split('\n')
        text = ''
        # cutting title from article text if found in first 4 rows
        if len(lines) > 4:
            for i in range(0,4):
                if lines[i] == article.h1 or lines[i] == article.title:
                    del lines[i]
                    break
        for line in lines:
            if re.search('[^ \t\r]',line): text += line + '\n'
        return text