Python tokenize示例

编程语言: Python

命名空间/包名称: DS_Scratch.Ch13_Naive_Bayes.spam_classifier

方法/功能: tokenize

hotexamples.com的示例: 3

Python tokenize - 已找到3个示例。这些是从开源项目中提取的最受好评的DS_Scratch.Ch13_Naive_Bayes.spam_classifier.tokenize现实Python示例。您可以评价示例，以帮助我们提高示例质量。

示例#1

显示文件

def wc_mapper(document):
    """ generator that yields (word,1) for each word in document. 1
    indicates word presence """
    # note we are looking at the occurence of distinct words from each
    # document, not the overall occurence of words. For example document 1
    # may use the word 'it' ten times but this will be counted only once for
    # document 1. The text seems to miss this point.
    for word in tokenize(document):
           yield (word, 1)

示例#2

显示文件

def words_per_user_mapper(status_update):
    """ yields (username, (word, 1)) tuple """
    # note the tokenize function forms a set of distinct words, so here we
    # are getting the most popular words across status updates. This is a
    # choice we could have looked for the most popular word among all the
    # words from every update.
    user = status_update['username']
    for word in tokenize(status_update['text']):
        yield (user, (word, 1))

示例#3

显示文件

def word_count(documents):
    """ basic word counting by looping through each document and counting
    words. """
    return Counter(word for document in documents 
                   for word in tokenize(document))