Exemplos de Parser.getFormattedText em Python

Linguagem de programação: Python

Espaço para nome / nome do pacote: goose.parsers

Classe / Tipo: Parser

Método / Função: getFormattedText

Exemplos em hotexamples.com: 2

Parser.getFormattedText em Python - 2 exemplos encontrados. Esses são os exemplos do mundo real mais bem avaliados de goose.parsers.Parser.getFormattedText em Python extraídos de projetos de código aberto. Você pode avaliar os exemplos para nos ajudar a melhorar a qualidade deles.

Métodos Frequentes

Exibir Ocultar

getAttribute(9)

fromstring(9)

css_select(6)

getPath(4)

hasChildTag(3)

clearText(2)

createElement(2)

getFormattedText(2)

hasChildTags(2)

adjustTopNode(1)

childNodesWithText(1)

getComments(1)

getElementById(1)

removeTitle(1)

Métodos Frequentes

getAttribute (9)

fromstring (9)

css_select (6)

getPath (4)

hasChildTag (3)

clearText (2)

createElement (2)

getFormattedText (2)

hasChildTags (2)

adjustTopNode (1)

Métodos Frequentes

childNodesWithText (1)

getComments (1)

getElementById (1)

removeTitle (1)

Exemplo n.º 1

0

Exibir arquivo

Arquivo: outputformatters.py Projeto: iKalin/python-goose

def convertToText(self,article): text = Parser.getFormattedText(self.topNode) lines = text.split(u'\n') good_lines = [] for line in lines: if re.search('[^ \xa0]',line): good_lines.append(line.strip()) text = u'\n'.join(good_lines) Parser.adjustTopNode(article) return text

Exemplo n.º 2

0

Exibir arquivo

Arquivo: outputformatters.py Projeto: ilovenwd/python-goose

def convertToText(self,article): txts = [] for node in list(self.getTopNode()): txt = Parser.getFormattedText(node) if txt: txt = HTMLParser().unescape(txt) txts.append(innerTrim(txt)) text = '\n'.join(txts) text = re.sub(u'[\ufffc]','\n',text) lines = text.split('\n') text = '' # cutting title from article text if found in first 4 rows if len(lines) > 4: for i in range(0,4): if lines[i] == article.h1 or lines[i] == article.title: del lines[i] break for line in lines: if re.search('[^ \t\r]',line): text += line + '\n' return text