Python Parser.hasChildTags示例

编程语言: Python

命名空间/包名称: goose.parsers

类/类型: Parser

方法/功能: hasChildTags

hotexamples.com的示例: 2

Python Parser.hasChildTags - 已找到2个示例。这些是从开源项目中提取的最受好评的goose.parsers.Parser.hasChildTags现实Python示例。您可以评价示例，以帮助我们提高示例质量。

常用方法

显示隐藏

getAttribute(9)

fromstring(9)

css_select(6)

getPath(4)

hasChildTag(3)

clearText(2)

createElement(2)

getFormattedText(2)

hasChildTags(2)

adjustTopNode(1)

childNodesWithText(1)

getComments(1)

getElementById(1)

removeTitle(1)

示例#1

显示文件

文件： extractors.py 项目： iKalin/python-goose

    def isNodeScoreThreshholdMet(self, node, e):
        topNodeScore = self.getScore(node)
        currentNodeScore = self.getScore(e)
        thresholdScore = topNodeScore * 0.08

        if topNodeScore < 0 and currentNodeScore < 0:
            return True

        if not Parser.hasChildTags(e, ['a','img']):
            return True

        if e.tag in ['ul']:
            textLen,stopCount,isHighLink = self.getTextStats(e)
            if stopCount > 5:
                return True

        if currentNodeScore < thresholdScore and e.tag != 'td':
            return False
        return True

示例#2

显示文件

文件： cleaners.py 项目： iKalin/python-goose

    def convertDivsToParagraphs(self, doc, domTypes):
        divs = Parser.getElementsByTags(doc, domTypes)
        tags = self.child_tags

        for div in divs:
            if div is None: continue
            attrs = div.attrib['goose_attributes'] if 'goose_attributes' in div.attrib else ''
            if not Parser.hasChildTags(div, tags): div.tag = 'p'
            elif self.re_dontconvert.search(attrs) is not None: continue
            else:
                replaceNodes = self.getReplacementNodes(div)
                text = div.tail
                attrib = {}
                for a in div.attrib: attrib[a] = div.attrib[a]
                div.clear()
                div.extend(replaceNodes)
                div.tail = text
                for a in attrib: div.attrib[a] = attrib[a]

        return doc