Python parse示例

编程语言: Python

命名空间/包名称: html5_parser.html_parser

方法/功能: parse

hotexamples.com的示例: 3

Python parse - 已找到3个示例。这些是从开源项目中提取的最受好评的html5_parser.html_parser.parse现实Python示例。您可以评价示例，以帮助我们提高示例质量。

示例#1

显示文件

文件： Document.py 项目： nlarew/mut-index

 def parse_html(self, path):
     '''Return head and content elements of the document.'''
     capsule = html_parser.parse(path.read(), maybe_xhtml=True)
     doc = etree.adopt_external_document(capsule).getroot()
     selectors = {
         'head': 'head',
         'main_content': ' '.join(['.main-column', '.section'])
     }
     return {k: doc.cssselect(sel)[0] for k, sel in selectors.items()}

示例#2

显示文件

 def test_lxml_integration(self):
     capsule = html_parser.parse(b'<p id=1>xxx')
     root = etree.adopt_external_document(capsule).getroot()
     self.ae(list(root.iterchildren('body')), list(root.xpath('./body')))
     self.ae(root.find('body/p').text, 'xxx')
     self.ae(root.xpath('//@id'), ['1'])
     # Test that lxml is not copying the doc internally
     root.set('attr', 'abc')
     cap2 = html_parser.clone_doc(capsule)
     root2 = etree.adopt_external_document(cap2).getroot()
     self.ae(tostring(root), tostring(root2))

示例#3

显示文件

    def parse_html(self, fh: IO) -> Dict[str, Any]:
        '''Return head and content elements of the document.'''
        capsule = html_parser.parse(fh.read(), maybe_xhtml=True)
        doc = etree.adopt_external_document(capsule).getroot()

        result = {}
        result['head'] = doc.cssselect('head')[0]

        for candidate in ('.main-column .section', '.main__content'):
            elements = doc.cssselect(candidate)
            if elements:
                result['main_content'] = elements[0]
                break

        if 'main_content' not in result:
            raise ValueError('No main content element found')

        return result