Python tokenize_list 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: metanl.general

메소드/함수: tokenize_list

hotexamples.com에서의 예제들: 7

Python tokenize_list - 7개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 metanl.general.tokenize_list에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

파일: snowball.py 프로젝트: Brainsciences/metanl

 def normalize_list(self, text):
     """
     Get a list of word stems that appear in the text. Stopwords and an initial
     'to' will be stripped.
     """
     pieces = [self.snowball_stem(word) for word in tokenize_list(text) if self.good_lemma(word)]
     if not pieces:
         return text
     return pieces

예제 #2

파일 보기

 def normalize_list(self, text):
     """
     Get a list of word stems that appear in the text. Stopwords and an initial
     'to' will be stripped.
     """
     pieces = [
         self.snowball_stem(word) for word in tokenize_list(text)
         if self.good_lemma(word)
     ]
     if not pieces:
         return text
     return pieces

예제 #3

파일 보기

파일: english.py 프로젝트: Brainsciences/metanl

def normalize_list(text):
    """
    Get a list of word stems that appear in the text. Stopwords and an initial
    'to' will be stripped.
    """
    pieces = [morphy_stem(word) for word in tokenize_list(text)]
    pieces = [piece for piece in pieces if good_lemma(piece)]
    if not pieces:
        return text
    if pieces[0] == 'to':
        pieces = pieces[1:]
    return pieces

예제 #4

파일 보기

파일: english.py 프로젝트: tazjel/metanl

def tag_and_stem(text):
    """
    Returns a list of (stem, tag, token) triples:

    - stem: the word's uninflected form
    - tag: the word's part of speech
    - token: the original word, so we can reconstruct it later
    """
    tokens = tokenize_list(preprocess_text(text))
    tagged = nltk.pos_tag(tokens)
    out = []
    for token, tag in tagged:
        if token in BRACKET_DIC:
            out.append((token, BRACKET_DIC[token], token))
        else:
            stem = morphy_stem(token, tag)
            out.append((stem, tag, token))
    return out

예제 #5

파일 보기

파일: english.py 프로젝트: tazjel/metanl

def tag_and_stem(text):
    """
    Returns a list of (stem, tag, token) triples:

    - stem: the word's uninflected form
    - tag: the word's part of speech
    - token: the original word, so we can reconstruct it later
    """
    tokens = tokenize_list(preprocess_text(text))
    tagged = nltk.pos_tag(tokens)
    out = []
    for token, tag in tagged:
        if token in BRACKET_DIC:
            out.append((token, BRACKET_DIC[token], token))
        else:
            stem = morphy_stem(token, tag)
            out.append((stem, tag, token))
    return out

예제 #6

파일 보기

파일: english.py 프로젝트: tazjel/metanl

def normalize_list(text):
    """
    Get a list of word stems that appear in the text. Stopwords and an initial
    'to' will be stripped, unless this leaves nothing in the stem.

    >>> normalize_list('the dog')
    [u'dog']
    >>> normalize_list('big dogs')
    [u'big', u'dog']
    >>> normalize_list('the')
    [u'the']
    """
    text = preprocess_text(text)
    pieces = [morphy_stem(word) for word in tokenize_list(text)]
    pieces = [piece for piece in pieces if good_lemma(piece)]
    if not pieces:
        return [text]
    if pieces[0] == 'to':
        pieces = pieces[1:]
    return pieces

예제 #7

파일 보기

파일: english.py 프로젝트: tazjel/metanl

def normalize_list(text):
    """
    Get a list of word stems that appear in the text. Stopwords and an initial
    'to' will be stripped, unless this leaves nothing in the stem.

    >>> normalize_list('the dog')
    [u'dog']
    >>> normalize_list('big dogs')
    [u'big', u'dog']
    >>> normalize_list('the')
    [u'the']
    """
    text = preprocess_text(text)
    pieces = [morphy_stem(word) for word in tokenize_list(text)]
    pieces = [piece for piece in pieces if good_lemma(piece)]
    if not pieces:
        return [text]
    if pieces[0] == 'to':
        pieces = pieces[1:]
    return pieces