Python Token.part_of_speech 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: whoosh.analysis.acore

클래스/타입: Token

메소드/함수: part_of_speech

hotexamples.com에서의 예제들: 2

Python Token.part_of_speech - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 whoosh.analysis.acore.Token.part_of_speech에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

text(12)

Token(12)

pos(11)

endchar(9)

startchar(9)

boost(8)

original(7)

stopped(5)

lemma(1)

named_entity(1)

part_of_speech(1)

예제 #1

파일 보기

파일: TIMEindex.py 프로젝트: fbkarsdorp/takeover

    def __call__(self, value, positions=False, chars=False, keeporiginal=False,
                 removestops=True, start_pos=0, start_char=0, tokenize=True,
                 mode='', **kwargs):

        assert isinstance(value, text_type), "%s is not unicode" % repr(value)

        t = Token(positions, chars, removestops=removestops, mode=mode,
                  **kwargs)
        # The default: expression matches are used as tokens
        for i, match in enumerate(value.split('\n')):
            fields = match.strip().split('\t')
            word, lemma, pos, ne = fields if len(fields) is 4 else ["", "", "", ""]
            t.text = match.strip().split('\t')[0]
            t.lemma = lemma
            t.part_of_speech = pos
            t.named_entity = ne
            t.boost = 1.0
            if keeporiginal:
                t.original = t.text
            t.stopped = False
            if positions:
                t.pos = start_pos + i
            if chars:
                t.startchar = start_char + match.start()
                t.endchar = start_char + match.end()
            yield t

예제 #2

파일 보기

    def __call__(self,
                 value,
                 positions=False,
                 chars=False,
                 keeporiginal=False,
                 removestops=True,
                 start_pos=0,
                 start_char=0,
                 tokenize=True,
                 mode='',
                 **kwargs):

        assert isinstance(value, text_type), "%s is not unicode" % repr(value)

        t = Token(positions,
                  chars,
                  removestops=removestops,
                  mode=mode,
                  **kwargs)
        # The default: expression matches are used as tokens
        for i, match in enumerate(value.split('\n')):
            fields = match.strip().split('\t')
            word, lemma, pos, ne = fields if len(fields) is 4 else [
                "", "", "", ""
            ]
            t.text = match.strip().split('\t')[0]
            t.lemma = lemma
            t.part_of_speech = pos
            t.named_entity = ne
            t.boost = 1.0
            if keeporiginal:
                t.original = t.text
            t.stopped = False
            if positions:
                t.pos = start_pos + i
            if chars:
                t.startchar = start_char + match.start()
                t.endchar = start_char + match.end()
            yield t