Python _generate_subtokensの例

プログラミング言語: Python

名前空間/パッケージ名: official.transformer.utils.tokenizer

メソッド/関数: _generate_subtokens

hotexamples.comのコード掲載数: 2

Python _generate_subtokens - 2件のコード例が見つかりました。すべてオープンソースプロジェクトから抽出されたPythonのofficial.transformer.utils.tokenizer._generate_subtokensの実例で、最も評価が高いものを厳選しています。コード例の評価を行っていただくことで、より質の高いコード例が表示されるようになります。

コード例 #1

ファイルを表示

  def test_generate_subtokens(self):
    token_counts = {"ab": 1, "bc": 3, "abc": 5}
    alphabet = set("abc_")
    min_count = 100
    num_iterations = 1
    reserved_tokens = ["reserved", "tokens"]

    vocab_list = tokenizer._generate_subtokens(
        token_counts, alphabet, min_count, num_iterations, reserved_tokens)

    # Check that reserved tokens are at the front of the list
    self.assertEqual(vocab_list[:2], reserved_tokens)

    # Check that each character in alphabet is in the vocab list
    for c in alphabet:
      self.assertIn(c, vocab_list)

コード例 #2

ファイルを表示

ファイル: tokenizer_test.py プロジェクト: 812864539/models

  def test_generate_subtokens(self):
    token_counts = {"ab": 1, "bc": 3, "abc": 5}
    alphabet = set("abc_")
    min_count = 100
    num_iterations = 1
    reserved_tokens = ["reserved", "tokens"]

    vocab_list = tokenizer._generate_subtokens(
        token_counts, alphabet, min_count, num_iterations, reserved_tokens)

    # Check that reserved tokens are at the front of the list
    self.assertEqual(vocab_list[:2], reserved_tokens)

    # Check that each character in alphabet is in the vocab list
    for c in alphabet:
      self.assertIn(c, vocab_list)