Python BertTokenizer.prepare_for_model Exemples

Langage de programmation: Python

Espace de nommage/Pack: transformers

Class/Type: BertTokenizer

Méthode/Fonction: prepare_for_model

Exemples au hotexamples.com: 2

Python BertTokenizer.prepare_for_model - 2 exemples trouvés. Ce sont les exemples réels les mieux notés de transformers.BertTokenizer.prepare_for_model extraits de projets open source. Vous pouvez noter les exemples pour nous aider à en améliorer la qualité.

Méthodes fréquemment utilisées

Afficher Cacher

BertTokenizer(30)

convert_ids_to_tokens(30)

from_pretrained(30)

encode_plus(30)

encode(30)

convert_tokens_to_ids(30)

tokenize(30)

batch_encode_plus(18)

decode(14)

save_pretrained(11)

build_inputs_with_special_tokens(8)

add_special_tokens(8)

convert_tokens_to_string(7)

get_special_tokens_mask(5)

save_vocabulary(4)

_pad_token(3)

prepare_for_model(2)

clean_up_tokenization(2)

_convert_id_to_token(2)

_convert_token_to_id(2)

get_vocab(2)

batch_decode(1)

prepare_for_tokenization(1)

_from_pretrained(1)

morphs(1)

get_morphes_by_tags(1)

add_tokens(1)

create_token_type_ids_from_sequences(1)

Méthodes fréquemment utilisées

BertTokenizer (30)

convert_ids_to_tokens (30)

from_pretrained (30)

encode_plus (30)

encode (30)

convert_tokens_to_ids (30)

tokenize (30)

batch_encode_plus (18)

decode (14)

save_pretrained (11)

Méthodes fréquemment utilisées

build_inputs_with_special_tokens (8)

add_special_tokens (8)

convert_tokens_to_string (7)

get_special_tokens_mask (5)

save_vocabulary (4)

_pad_token (3)

prepare_for_model (2)

clean_up_tokenization (2)

_convert_id_to_token (2)

_convert_token_to_id (2)

get_vocab (2)

batch_decode (1)

prepare_for_tokenization (1)

_from_pretrained (1)

morphs (1)

get_morphes_by_tags (1)

add_tokens (1)

create_token_type_ids_from_sequences (1)

Méthodes fréquemment utilisées

get_vocab (2)

batch_decode (1)

prepare_for_tokenization (1)

_from_pretrained (1)

morphs (1)

get_morphes_by_tags (1)

add_tokens (1)

create_token_type_ids_from_sequences (1)

Exemple #1

0

Afficher le fichier

def build_feature(tokenizer: transformers.BertTokenizer, examples: list, max_length: int = None): ''' @param tokenizer (transformers.BertTokenizer): tokenzier to convert token to ids @param examples (list): input examples @param maxlength (int): set max length to cut off example sequence @return examples (list): new examples with input feature ''' if max_length is not None: length = max_length else: length = 1e3 for example in examples: context = tokenizer.convert_tokens_to_ids( example['context'][:min(length, len(example['context']))]) # print(context) question = tokenizer.convert_tokens_to_ids( example['question'][:min(length, len(example['question']))]) # print(question) out = tokenizer.prepare_for_model(context, question, return_token_type_ids=True, return_attention_mask=True) inputs = out['input_ids'] token_type_ids = out['token_type_ids'] attention_mask = out['attention_mask'] # print(inputs) # print(token_type_ids) # print(attention_mask) example['input_feature'] = inputs example['token_type_ids'] = token_type_ids example['attention_mask'] = attention_mask return examples

Exemple #2

0

Afficher le fichier

a = DataLoader(x, batch_size=10, sampler=SubsetRandomSampler(x.sampler), shuffle=False, collate_fn=my_collate) for i, s in enumerate(a): print(i) print(s) tokenizer = BertTokenizer("data/atis/token.vocab", bos_token="<BOS>", eos_token="<EOS>", model_max_len=50) tokenizer.prepare_for_model(tokenizer.encode(y), return_tensors="pt") tokenizer.SPECIAL_TOKENS_ATTRIBUTES tokenizer.encode(y) tokenizer.encode_plus(y) y = "<BOS> embedding what is the flight number <EOS>" ids = tokenizer.encode_plus tokenizer.decode(tokenizer.encode(y)) tokenizer.save_pretrained("data/atis/save") tokenizer.save_vocabulary("data/atis/save/saved") tokenizer = BertTokenizer.from_pretrained("bert-base-uncased", bos_token="<BOS>", eos_token="<EOS>") tokenizer.tokenize("i like tea") special_tokens = {"bos_token": "<BOS>", "eos_token": "<EOS>"}