Python BPEmb.pad_token_id 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: bpemb

클래스/타입: BPEmb

메소드/함수: pad_token_id

hotexamples.com에서의 예제들: 2

Python BPEmb.pad_token_id - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 bpemb.BPEmb.pad_token_id에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

BPEmb(30)

encode_ids(27)

encode(21)

decode_ids(9)

embed(9)

encode_ids_with_bos_eos(8)

decode(2)

encode_ids_with_eos(2)

encode_with_bos_eos(2)

pad_token_id(2)

available_vocab_sizes(1)

encode_with_eos(1)

most_similar(1)

예제 #1

파일 보기

파일: domain_adaptation_baselines.py 프로젝트: copenlu/cite-worth

def get_transformer(ff_dim: int, n_layers: int, n_heads: int,
                    dropout_prob: float):
    """
    Creates a new transformer and tokenizer using the given parameters
    :param ff_dim:
    :param n_layers:
    :param n_heads:
    :param dropout_prob:
    :return:
    """
    # Load english model with 25k word-pieces
    tokenizer = BPEmb(lang='en', dim=300, vs=25000)
    # Extract the embeddings and add a randomly initialized embedding for our extra [PAD] token
    pretrained_embeddings = np.concatenate(
        [tokenizer.emb.vectors,
         np.zeros(shape=(1, 300))], axis=0)
    # Extract the vocab and add an extra [PAD] token
    vocabulary = tokenizer.emb.index2word + ['[PAD]']
    tokenizer.pad_token_id = len(vocabulary) - 1

    model = TransformerClassifier(torch.tensor(pretrained_embeddings).type(
        torch.FloatTensor),
                                  ff_dim=ff_dim,
                                  d_model=300,
                                  n_heads=n_heads,
                                  n_layers=n_layers,
                                  dropout_prob=dropout_prob).to(device)

    return model, tokenizer

예제 #2

파일 보기

def get_cnn(in_channels, out_channels, kernel_heights, stride, padding, dropout_prob):
    """
    Creates a new CNN and tokenizer using the given parameters
    :return:
    """
    # Load english model with 25k word-pieces
    tokenizer = BPEmb(lang='en', dim=300, vs=25000)
    # Extract the embeddings and add a randomly initialized embedding for our extra [PAD] token
    pretrained_embeddings = np.concatenate([tokenizer.emb.vectors, np.zeros(shape=(1, 300))], axis=0)
    # Extract the vocab and add an extra [PAD] token
    vocabulary = tokenizer.emb.index2word + ['[PAD]']
    tokenizer.pad_token_id = len(vocabulary) - 1

    model = CNN(
        torch.tensor(pretrained_embeddings).type(torch.FloatTensor),
        n_labels=2,
        in_channels=in_channels,
        out_channels=out_channels,
        kernel_heights=kernel_heights,
        stride=stride,
        padding=padding,
        dropout=dropout_prob
    ).to(device)

    return model, tokenizer