Python load_chunk示例

编程语言: Python

命名空间/包名称: opencorpora.xml_utils

方法/功能: load_chunk

hotexamples.com的示例: 3

Python load_chunk - 已找到3个示例。这些是从开源项目中提取的最受好评的opencorpora.xml_utils.load_chunk现实Python示例。您可以评价示例，以帮助我们提高示例质量。

示例#1

显示文件

文件： reader.py 项目： ixtel/opencorpora-tools

 def _get_doc_by_raw_offset(self, doc_id):
     """
     Load document from xml using bytes offset information.
     XXX: this is not tested under Windows.
     """
     bounds = self._get_meta()[str(doc_id)].bounds
     return xml_utils.load_chunk(self.filename, bounds)

示例#2

显示文件

文件： reader.py 项目： ixtel/opencorpora-tools

 def _get_doc_by_line_offset(self, doc_id):
     """
     Load document from xml using line offset information.
     This is much slower than _get_doc_by_raw_offset but should
     work everywhere.
     """
     bounds = self._get_meta()[str(doc_id)].bounds
     return xml_utils.load_chunk(self.filename, bounds, slow=True)

示例#3

显示文件

文件： __init__.py 项目： jaeroong/opencorpora-tools

    def _compute_document_meta(self):
        """
        Returns documents meta information that can
        be used for fast document lookups. Meta information
        consists of documents titles, categories and positions
        in file.
        """
        meta = compat.OrderedDict()
        bounds_iter = xml_utils.bounds(self.filename,
            r'<text id="(\d+)"[^>]*name="([^"]*)"',
            r'</text>',
        )
        for match, bounds in bounds_iter:
            doc_id, title = str(match.group(1)), match.group(2)
            title = xml_utils.unescape_attribute(title)

            # cache categories
            xml = xml_utils.load_chunk(self.filename, bounds)
            doc = Document(compat.ElementTree.XML(xml.encode('utf8')))

            meta[doc_id] = _DocumentMeta(title, bounds, doc.categories())
        return meta