Python string_pieces示例

编程语言: Python

命名空间/包名称: metanl.token_utils

方法/功能: string_pieces

hotexamples.com的示例: 5

Python string_pieces - 已找到5个示例。这些是从开源项目中提取的最受好评的metanl.token_utils.string_pieces现实Python示例。您可以评价示例，以帮助我们提高示例质量。

示例#1

显示文件

文件： allPythonContent.py 项目： Mondego/pyreco

    def analyze(self, text):
        """
        Runs a line of text through MeCab, and returns the results as a
        list of lists ("records") that contain the MeCab analysis of each
        word.
        """
        try:
            self.process  # make sure things are loaded
            text = render_safe(text).replace('\n', ' ').lower()
            results = []
            for chunk in string_pieces(text):
                self.send_input((chunk + '\n').encode('utf-8'))
                while True:
                    out_line = self.receive_output_line().decode('utf-8')
                    if out_line == 'EOS\n':
                        break

                    word, info = out_line.strip('\n').split('\t')
                    record_parts = [word] + info.split(',')

                    # Pad the record out to have 10 parts if it doesn't
                    record_parts += [None] * (10 - len(record_parts))
                    record = MeCabRecord(*record_parts)

                    # special case for detecting nai -> n
                    if (record.surface == 'ん' and
                        record.conjugation == '不変化型'):
                        # rebuild the record so that record.root is 'nai'
                        record_parts[MeCabRecord._fields.index('root')] = 'ない'
                        record = MeCabRecord(*record_parts)

                    results.append(record)
            return results
        except ProcessError:
            self.restart_process()
            return self.analyze(text)

示例#2

显示文件

    def analyze(self, text):
        """
        Runs a line of text through MeCab, and returns the results as a
        list of lists ("records") that contain the MeCab analysis of each
        word.
        """
        try:
            self.process  # make sure things are loaded
            text = render_safe(text).replace('\n', ' ').lower()
            results = []
            for chunk in string_pieces(text):
                self.send_input((chunk + '\n').encode('utf-8'))
                while True:
                    out_line = self.receive_output_line().decode('utf-8')
                    if out_line == 'EOS\n':
                        break

                    word, info = out_line.strip('\n').split('\t')
                    record_parts = [word] + info.split(',')

                    # Pad the record out to have 10 parts if it doesn't
                    record_parts += [None] * (10 - len(record_parts))
                    record = MeCabRecord(*record_parts)

                    # special case for detecting nai -> n
                    if (record.surface == 'ん'
                            and record.conjugation == '不変化型'):
                        # rebuild the record so that record.root is 'nai'
                        record_parts[MeCabRecord._fields.index('root')] = 'ない'
                        record = MeCabRecord(*record_parts)

                    results.append(record)
            return results
        except ProcessError:
            self.restart_process()
            return self.analyze(text)

示例#3

显示文件

文件： test_tokens.py 项目： pombredanne/metanl

def test_string_pieces():
    # Break as close to whitespace as possible
    text = "12 12 12345 123456 1234567-12345678"
    eq_(list(string_pieces(text, 6)), ["12 12 ", "12345 ", "123456", " ", "123456", "7-", "123456", "78"])

示例#4

显示文件

文件： allPythonContent.py 项目： Mondego/pyreco

def test_string_pieces():
    # Break as close to whitespace as possible
    text = "12 12 12345 123456 1234567-12345678"
    eq_(list(string_pieces(text, 6)),
        ['12 12 ', '12345 ', '123456', ' ', '123456', '7-', '123456', '78'])

示例#5

显示文件

文件： test_tokens.py 项目： perkin6es/metanl

def test_string_pieces():
    # Break as close to whitespace as possible
    text = "12 12 12345 123456 1234567-12345678"
    eq_(list(string_pieces(text, 6)),
        ['12 12 ', '12345 ', '123456', ' ', '123456', '7-', '123456', '78'])