Python Wordlist示例

编程语言: Python

命名空间/包名称: metanl.wordlist

类/类型: Wordlist

hotexamples.com的示例: 8

Python Wordlist - 已找到8个示例。这些是从开源项目中提取的最受好评的metanl.wordlist.Wordlist现实Python示例。您可以评价示例，以帮助我们提高示例质量。

常用方法

显示隐藏

load(4)

示例#1

显示文件

文件： english.py 项目： Brainsciences/metanl

def word_frequency(word, default_freq=0):
    """
    Looks up the word's frequency in a modified version of the Google Books
    1-grams list.

    The characters may be in any case (they'll be case-smashed
    to uppercase) and may include non-ASCII letters in UTF-8 or Unicode.

    Words appear in the list if they meet these criteria, which improve the
    compactness and accuracy of the list:

    - They consist entirely of letters, digits and/or ampersands
    - They contain at least one ASCII letter
    - They appear at least 1000 times in Google Books OR
      (they appear at least 40 times in Google Books and also appear in
      Wiktionary or WordNet)
    
    Apostrophes are assumed to be at the edge of the word,
    in which case they'll be stripped like they were in the Google data, or
    in the special token "n't" which is treated as "not". This matches the
    output of the tokenize() function.

    >>> word_frequency('normalization')
    223058.0

    >>> word_frequency('budap', default_freq=100.)
    100.0
    """
    freqs = Wordlist.load('google-unigrams.txt')
    if " " in word:
        raise ValueError("word_frequency only can only look up single words, but %r contains a space" % word)
    word = preprocess_text(word.strip("'")).upper()
    if word == "N'T":
        word = 'NOT'
    return freqs.get(word, default_freq)

示例#2

显示文件

文件： snowball.py 项目： Brainsciences/metanl

    def word_frequency(self, word, default_freq=0):
        """
        Looks up the word's frequency in the Leeds Internet corpus for the
        appropriate language.

        FIXME: this returns 0 for words that stem differently in FreeLing when
        we use FreeLing frequencies, and that's most of the words
        """
        freqs = Wordlist.load('leeds-internet-%s.txt' % self.lang)
        word = self.snowball_stem(word)
        if " " in word:
            raise ValueError("word_frequency only can only look up single words, but %r contains a space" % word)
        word = preprocess_text(word.strip("'")).lower()
        return freqs.get(word, default_freq)

示例#3

显示文件

    def word_frequency(self, word, default_freq=0):
        """
        Looks up the word's frequency in the Leeds Internet corpus for the
        appropriate language.

        FIXME: this returns 0 for words that stem differently in FreeLing when
        we use FreeLing frequencies, and that's most of the words
        """
        freqs = Wordlist.load('leeds-internet-%s.txt' % self.lang)
        word = self.snowball_stem(word)
        if " " in word:
            raise ValueError(
                "word_frequency only can only look up single words, but %r contains a space"
                % word)
        word = preprocess_text(word.strip("'")).lower()
        return freqs.get(word, default_freq)

示例#4

显示文件

文件： english.py 项目： tazjel/metanl

def word_frequency(word, default_freq=0):
    """
    Looks up the word's frequency in a modified version of the Google Books
    1-grams list.

    The characters may be in any case (they'll be case-smashed
    to uppercase) and may include non-ASCII letters in UTF-8 or Unicode.

    Words appear in the list if they meet these criteria, which improve the
    compactness and accuracy of the list:

    - They consist entirely of letters, digits and/or ampersands
    - They contain at least one ASCII letter
    - They appear at least 1000 times in Google Books OR
      (they appear at least 40 times in Google Books and also appear in
      Wiktionary or WordNet)

    Apostrophes are assumed to be at the edge of the word,
    in which case they'll be stripped like they were in the Google data, or
    in the special token "n't" which is treated as "not". This matches the
    output of the tokenize() function.

    >>> word_frequency('normalization')
    223058.0

    >>> word_frequency('budap', default_freq=100.)
    100.0
    """
    freqs = Wordlist.load('google-unigrams.txt')
    if " " in word:
        raise ValueError("word_frequency only can only look up single words, "
                         "but %r contains a space" % word)
    word = preprocess_text(word.strip("'")).lower()
    if word == "n't":
        word = 'not'
    return freqs.get(word, default_freq)

示例#5

显示文件

文件： japanese.py 项目： tazjel/metanl

def get_wordlist():
    return Wordlist.load('leeds-internet-ja.txt')

示例#6

显示文件

文件： english.py 项目： tazjel/metanl

def get_wordlist():
    return Wordlist.load('google-unigrams.txt')

示例#7

显示文件

文件： japanese.py 项目： Web5design/metanl

def get_wordlist():
    return Wordlist.load("leeds-internet-ja.txt")

示例#8

显示文件

文件： english.py 项目： tazjel/metanl

def get_wordlist():
    return Wordlist.load('google-unigrams.txt')