Python word_tokenize 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: txttk.nlptools

메소드/함수: word_tokenize

hotexamples.com에서의 예제들: 9

Python word_tokenize - 9개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 txttk.nlptools.word_tokenize에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

파일: test_nlptools.py 프로젝트: jeroyang/txttk

 def test_word_tokenzie(self):
     sentence = "A 2.1 cm tumor (right tongue) noted on 2013-11-11."
     wanted = [
         "A",
         " ",
         "2.1",
         " ",
         "cm",
         " ",
         "tumor",
         " ",
         "(",
         "right",
         " ",
         "tongue",
         ")",
         " ",
         "noted",
         " ",
         "on",
         " ",
         "2013-11-11",
         ".",
     ]
     self.assertEqual(list(nlptools.word_tokenize(sentence)), wanted)

예제 #2

파일 보기

파일: corpus.py 프로젝트: jeroyang/txttk

def normalize_sent(text):
    output = []
    tokens = list(word_tokenize(text))
    if is_title(text):
        for token in tokens:
            output.append(normalize(token))
    else:
        output.append(normalize(tokens[0]))
        output.extend(tokens[1:])
    return ''.join(output)

예제 #3

파일 보기

파일: corpus.py 프로젝트: jeroyang/txttk

def is_title(text):
    tokens = word_tokenize(text)
    bol_list = []
    for i, token in enumerate(tokens):
        if i==0:
            bol_list.append(True)
        elif token.lower() in stop_words:
            bol_list.append(True)
        elif token[0] not in string.ascii_lowercase:
            bol_list.append(True)
        else:
            bol_list.append(False)
    return all(bol_list)

예제 #4

파일 보기

파일: test_nlptools.py 프로젝트: jeroyang/txttk

 def test_word_tokenize_intergration(self):
     for sent in self.sentences:
       self.assertEqual(''.join(list(nlptools.word_tokenize(sent))), sent)

예제 #5

파일 보기

파일: test_nlptools.py 프로젝트: jeroyang/txttk

 def test_word_tokenzie2(self):
     sentence = '-999 1,234,000 3.1415'
     wanted = ['-999', ' ', '1,234,000', ' ', '3.1415']
     self.assertEqual(list(nlptools.word_tokenize(sentence)), wanted)

예제 #6

파일 보기

파일: test_nlptools.py 프로젝트: jeroyang/txttk

 def test_word_tokenzie(self):
     sentence = 'A 2.1 cm tumor (right tongue) noted on 2013-11-11.'
     wanted = ['A', ' ', '2.1', ' ', 'cm', ' ', 'tumor', ' ', '(', 'right', ' ', 'tongue', ')', ' ', 'noted', ' ', 'on', ' ', '2013-11-11', '.']
     self.assertEqual(list(nlptools.word_tokenize(sentence)), wanted)

예제 #7

파일 보기

파일: test_nlptools.py 프로젝트: jeroyang/txttk

 def test_word_tokenize_intergration(self):
     for sent in self.sentences:
         self.assertEqual("".join(list(nlptools.word_tokenize(sent))), sent)

예제 #8

파일 보기

파일: test_nlptools.py 프로젝트: jeroyang/txttk

 def test_word_tokenzie2(self):
     sentence = "-999 1,234,000 3.1415"
     wanted = ["-999", " ", "1,234,000", " ", "3.1415"]
     self.assertEqual(list(nlptools.word_tokenize(sentence)), wanted)

예제 #9

파일 보기

파일: test_nlptools.py 프로젝트: Wkryst/txttk

 def test_word_tokenzie(self):
     sentence = 'A 2.1 x 3.3 cm tumor arising from the tongue base (right side) is noted.'
     wanted = ['A', ' ', '2.1', ' ', 'x', ' ', '3.3', ' ', 'cm', ' ', 'tumor', ' ', 'arising', ' ', 'from', ' ', 'the', ' ', 'tongue', ' ', 'base', ' ', '(', 'right', ' ', 'side', ')', ' ', 'is', ' ', 'noted', '.']
     self.assertEqual(list(nlptools.word_tokenize(sentence)), wanted)