Python BertTokenizer.prepare_for_model 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: transformers

클래스/타입: BertTokenizer

메소드/함수: prepare_for_model

hotexamples.com에서의 예제들: 2

Python BertTokenizer.prepare_for_model - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 transformers.BertTokenizer.prepare_for_model에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

BertTokenizer(30)

convert_ids_to_tokens(30)

from_pretrained(30)

encode_plus(30)

encode(30)

convert_tokens_to_ids(30)

tokenize(30)

batch_encode_plus(18)

decode(14)

save_pretrained(11)

build_inputs_with_special_tokens(8)

add_special_tokens(8)

convert_tokens_to_string(7)

get_special_tokens_mask(5)

save_vocabulary(4)

_pad_token(3)

prepare_for_model(2)

clean_up_tokenization(2)

_convert_id_to_token(2)

_convert_token_to_id(2)

get_vocab(2)

batch_decode(1)

prepare_for_tokenization(1)

_from_pretrained(1)

morphs(1)

get_morphes_by_tags(1)

add_tokens(1)

create_token_type_ids_from_sequences(1)

예제 #1

파일 보기

def build_feature(tokenizer: transformers.BertTokenizer,
                  examples: list,
                  max_length: int = None):
    '''
    @param tokenizer (transformers.BertTokenizer): tokenzier to convert token to ids

    @param examples (list): input examples

    @param maxlength (int): set max length to cut off example sequence

    @return examples (list): new examples with input feature
    '''

    if max_length is not None:
        length = max_length
    else:
        length = 1e3

    for example in examples:
        context = tokenizer.convert_tokens_to_ids(
            example['context'][:min(length, len(example['context']))])
        # print(context)
        question = tokenizer.convert_tokens_to_ids(
            example['question'][:min(length, len(example['question']))])
        # print(question)
        out = tokenizer.prepare_for_model(context,
                                          question,
                                          return_token_type_ids=True,
                                          return_attention_mask=True)
        inputs = out['input_ids']
        token_type_ids = out['token_type_ids']
        attention_mask = out['attention_mask']
        # print(inputs)
        # print(token_type_ids)
        # print(attention_mask)

        example['input_feature'] = inputs
        example['token_type_ids'] = token_type_ids
        example['attention_mask'] = attention_mask

    return examples

예제 #2

파일 보기


a = DataLoader(x,
               batch_size=10,
               sampler=SubsetRandomSampler(x.sampler),
               shuffle=False,
               collate_fn=my_collate)
for i, s in enumerate(a):
    print(i)
    print(s)

tokenizer = BertTokenizer("data/atis/token.vocab",
                          bos_token="<BOS>",
                          eos_token="<EOS>",
                          model_max_len=50)
tokenizer.prepare_for_model(tokenizer.encode(y), return_tensors="pt")

tokenizer.SPECIAL_TOKENS_ATTRIBUTES
tokenizer.encode(y)
tokenizer.encode_plus(y)
y = "<BOS> embedding what is the flight number <EOS>"
ids = tokenizer.encode_plus
tokenizer.decode(tokenizer.encode(y))
tokenizer.save_pretrained("data/atis/save")
tokenizer.save_vocabulary("data/atis/save/saved")

tokenizer = BertTokenizer.from_pretrained("bert-base-uncased",
                                          bos_token="<BOS>",
                                          eos_token="<EOS>")
tokenizer.tokenize("i like tea")
special_tokens = {"bos_token": "<BOS>", "eos_token": "<EOS>"}