Python vocab_from_raw_text_file 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: torchtext.experimental.vocab

메소드/함수: vocab_from_raw_text_file

hotexamples.com에서의 예제들: 2

Python vocab_from_raw_text_file - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 torchtext.experimental.vocab.vocab_from_raw_text_file에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

파일: benchmark_experimental_vocab.py 프로젝트: zxin1023/text

def benchmark_experimental_vocab_construction(vocab_file_path, is_raw_text=True, is_legacy=True, num_iters=1):
    f = open(vocab_file_path, 'r')
    t0 = time.monotonic()
    if is_raw_text:
        if is_legacy:
            print("Loading from raw text file with legacy python function")
            for _ in range(num_iters):
                legacy_vocab_from_file_object(f)

            print("Construction time:", time.monotonic() - t0)
        else:
            print("Loading from raw text file with basic_english_normalize tokenizer")
            for _ in range(num_iters):
                tokenizer = basic_english_normalize()
                jited_tokenizer = torch.jit.script(tokenizer.to_ivalue())
                vocab_from_raw_text_file(f, jited_tokenizer, num_cpus=1)
            print("Construction time:", time.monotonic() - t0)
    else:
        for _ in range(num_iters):
            vocab_from_file(f)
        print("Construction time:", time.monotonic() - t0)

예제 #2

파일 보기

파일: test_vocab.py 프로젝트: cpuhrsch/text

 def test_vocab_from_raw_text_file(self):
     asset_name = 'vocab_raw_text_test.txt'
     asset_path = get_asset_path(asset_name)
     with open(asset_path, 'r') as f:
         tokenizer = basic_english_normalize()
         jit_tokenizer = torch.jit.script(tokenizer.to_ivalue())
         v = vocab_from_raw_text_file(f, jit_tokenizer, unk_token='<new_unk>')
         expected_itos = ['<new_unk>', "'", 'after', 'talks', '.', 'are', 'at', 'disappointed',
                          'fears', 'federal', 'firm', 'for', 'mogul', 'n', 'newall', 'parent',
                          'pension', 'representing', 'say', 'stricken', 't', 'they', 'turner',
                          'unions', 'with', 'workers']
         expected_stoi = {x: index for index, x in enumerate(expected_itos)}
         self.assertEqual(v.get_itos(), expected_itos)
         self.assertEqual(dict(v.get_stoi()), expected_stoi)