Python FreqDist._cumulative_frequencies Examples

Programming Language: Python

Namespace/Package Name: nltk

Class/Type: FreqDist

Method/Function: _cumulative_frequencies

Examples at hotexamples.com: 1

Python FreqDist._cumulative_frequencies - 1 examples found. These are the top rated real world Python examples of nltk.FreqDist._cumulative_frequencies extracted from open source projects. You can rate examples to help us improve the quality of examples.

Frequently Used Methods

Show Hide

keys(30)

N(30)

values(30)

update(30)

plot(30)

most_common(30)

FreqDist(30)

items(30)

freq(30)

inc(26)

hapaxes(25)

B(22)

get(22)

max(18)

iteritems(7)

pop(6)

copy(5)

tabulate(4)

samples(3)

__delitem__(2)

pformat(2)

sort_values(2)

has_key(1)

__init__(1)

transpose(1)

sort(1)

pprint(1)

reverse(1)

reset_index(1)

r_Nr(1)

_cumulative_frequencies(1)

clear(1)

elements(1)

insert(1)

viewkeys(1)

Example #1

Show file

File: vocab.py Project: zhangjiekui/quick-nlp

class Vocab:
    def __init__(self,
                 tokens: List[Tokens],
                 special_symbols: List[str] = None):
        special_symbols = [] if special_symbols is None else special_symbols
        special_symbols = special_symbols + [
            "<eot>", "<response>", "<eos>", "<unk>", "<pad>", "<bos>"
        ]
        self.vocab = FreqDist()
        self.cdf = 0.
        for sample in tokens:
            for token in sample:
                if token not in special_symbols:
                    self.vocab[token] += 1

        print(
            f"total samples in vocab: {self.vocab.N()}, total tokens in vocab: {self.vocab.B()}"
        )
        self.itos = []
        self.stoi = {}

    def fit(self, num_tokens=15000):
        cdf = 0.
        for cdf in self.vocab._cumulative_frequencies(
            [i[0] for i in self.vocab.most_common(num_tokens)]):
            pass
        self.cdf = cdf / self.vocab.N()
        print(
            f"cdf of the {num_tokens} most common tokens in vocab {self.cdf}")
        self.itos = ["<unk>", "<pad>", "<eos>", "<bos>"] + [
            tup[0] for tup in self.vocab.most_common(num_tokens)
        ]
        self.stoi = Counter(
            {key: index
             for index, key in enumerate(self.itos)})