Python AutoTokenizer.batch_encode_plus Exemples

Langage de programmation: Python

Espace de nommage/Pack: transformers

Class/Type: AutoTokenizer

Méthode/Fonction: batch_encode_plus

Exemples au hotexamples.com: 2

Python AutoTokenizer.batch_encode_plus - 2 exemples trouvés. Ce sont les exemples réels les mieux notés de transformers.AutoTokenizer.batch_encode_plus extraits de projets open source. Vous pouvez noter les exemples pour nous aider à en améliorer la qualité.

Méthodes fréquemment utilisées

Afficher Cacher

from_pretrained(30)

convert_tokens_to_ids(9)

tokenize(7)

encode_plus(6)

AutoTokenizer(5)

encode(4)

get_special_tokens_mask(3)

register(3)

save_vocabulary(3)

batch_encode_plus(2)

convert_ids_to_tokens(2)

decode(2)

pad(1)

raise_error(1)

Méthodes fréquemment utilisées

from_pretrained (30)

convert_tokens_to_ids (9)

tokenize (7)

encode_plus (6)

AutoTokenizer (5)

encode (4)

get_special_tokens_mask (3)

register (3)

save_vocabulary (3)

batch_encode_plus (2)

Méthodes fréquemment utilisées

convert_ids_to_tokens (2)

decode (2)

pad (1)

raise_error (1)

Exemple #1

0

Afficher le fichier

Fichier : train_model.py Projet : mgavish/drug-lit-contradictory-claims

def regular_encode(texts: list, tokenizer: transformers.AutoTokenizer, maxlen: int = 512, multi_class: bool = True): """ Encode sentences for input to Transformer models. :param texts: list of strings to be encoded :param tokenizer: tokenizer for encoding :param maxlen: max number of characters of input string being encoded :param multi_class: if True, the default truncation is applied. If False, implies auxillary input and custom truncation is applied. :return: numpy array of encoded strings """ # TODO: Intersphinx link to transformers.AutoTokenizer is failing. What's wrong with my docs/source/conf.py? if not multi_class: # If len > maxlen, truncate text upto maxlen-8 characters and append the 8-character auxillary input texts = [ text[:maxlen - 8] + text[-8:] if len(text) > maxlen else text for text in texts ] enc_di = tokenizer.batch_encode_plus( texts, return_attention_mask=False, return_token_type_ids=False, pad_to_max_length=True, # sep_token='[SEP]', max_length=maxlen, truncation=True) # Is this what we want? return np.array(enc_di['input_ids'])

Exemple #2

0

Afficher le fichier

Fichier : train_model.py Projet : dnsosa/drug-lit-contradictory-claims

def regular_encode(texts: list, tokenizer: transformers.AutoTokenizer, maxlen: int = 512): """ Encode sentences for input to Transformer models. :param texts: list of strings to be encoded :param tokenizer: tokenizer for encoding :param maxlen: max number of characters of input string being encoded :return: numpy array of encoded strings """ # TODO: Intersphinx link to transformers.AutoTokenizer is failing. What's wrong with my docs/source/conf.py? enc_di = tokenizer.batch_encode_plus(texts, return_attention_mask=False, return_token_type_ids=False, pad_to_max_length=True, # sep_token='[SEP]', max_length=maxlen, truncation=True) # Is this what we want? return np.array(enc_di['input_ids'])