Python TokenDictionary.finalize Exemples

Langage de programmation: Python

Espace de nommage/Pack: fairseq.data

Class/Type: TokenDictionary

Méthode/Fonction: finalize

Exemples au hotexamples.com: 3

Python TokenDictionary.finalize - 3 exemples trouvés. Ce sont les exemples réels les mieux notés de fairseq.data.TokenDictionary.finalize extraits de projets open source. Vous pouvez noter les exemples pour nous aider à en améliorer la qualité.

Méthodes fréquemment utilisées

Afficher Cacher

TokenDictionary(3)

finalize(3)

load(3)

pad(3)

add_symbol(2)

eos(2)

add_file_to_dictionary(1)

index(1)

Méthodes fréquemment utilisées

TokenDictionary (3)

finalize (3)

load (3)

pad (3)

add_symbol (2)

eos (2)

add_file_to_dictionary (1)

index (1)

Exemple #1

0

Afficher le fichier

def make_dictionary(): """construct dictionary.""" d = TokenDictionary() alphabet = string.ascii_lowercase for token in alphabet: d.add_symbol(token) d.add_symbol('<space>') d.finalize(padding_factor=1) # don't add extra padding symbols d.space_index = d.indices.get('<space>', -1) return d

Exemple #2

0

Afficher le fichier

Fichier : test_speech_utils.py Projet : zqma2/espresso

def make_dictionary(vocab, non_lang_syms=[]): """construct dictionary.""" assert isinstance(vocab, list) and isinstance(non_lang_syms, list) d = TokenDictionary() for token in vocab: d.add_symbol(token) d.add_symbol('<space>') for token in non_lang_syms: d.add_symbol(token) d.finalize(padding_factor=1) # don't add extra padding symbols d.space_index = d.indices.get('<space>', -1) return d

Exemple #3

0

Afficher le fichier

def build_dictionary(cls, filenames, workers=1, threshold=-1, nwords=-1, padding_factor=8): """Build the dictionary Args: filenames (list): list of filenames workers (int): number of concurrent workers threshold (int): defines the minimum word count nwords (int): defines the total number of words in the final dictionary, including special symbols padding_factor (int): can be used to pad the dictionary size to be a multiple of 8, which is important on some hardware (e.g., Nvidia Tensor Cores). """ d = TokenDictionary() for filename in filenames: TokenDictionary.add_file_to_dictionary(filename, d, tokenizer.tokenize_line, workers) d.finalize(threshold=threshold, nwords=nwords, padding_factor=padding_factor) return d