Python _SequencePreprocessing 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: ludwig.features.sequence_feature

메소드/함수: _SequencePreprocessing

hotexamples.com에서의 예제들: 4

Python _SequencePreprocessing - 4개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 ludwig.features.sequence_feature._SequencePreprocessing에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

파일: test_sequence_features.py 프로젝트: ludwig-ai/ludwig

def test_sequence_preproc_module_bert_tokenizer():
    metadata = {
        "preprocessing": {
            "lowercase": True,
            "tokenizer": "bert",
            "unknown_symbol": "<UNK>",
            "padding_symbol": "<PAD>",
            "computed_fill_value": "<UNK>",
        },
        "max_sequence_length": SEQ_SIZE,
        "str2idx": {
            "<EOS>": 0,
            "<SOS>": 1,
            "<PAD>": 2,
            "<UNK>": 3,
            "hello": 4,
            "world": 5,
            "pale": 7,
            "##ont": 8,
            "##ology": 9,
        },
    }
    module = _SequencePreprocessing(metadata)

    res = module([
        "paleontology", "unknown", "hello world hello",
        "hello world hello world"
    ])

    assert torch.allclose(
        res,
        torch.tensor([[1, 7, 8, 9, 0, 2], [1, 3, 0, 2, 2, 2],
                      [1, 4, 5, 4, 0, 2], [1, 4, 5, 4, 5, 0]]))

예제 #2

파일 보기

파일: test_sequence_features.py 프로젝트: ludwig-ai/ludwig

def test_text_preproc_module_space_punct_tokenizer():
    metadata = {
        "preprocessing": {
            "lowercase": True,
            "tokenizer": "space_punct",
            "unknown_symbol": "<UNK>",
            "padding_symbol": "<PAD>",
            "computed_fill_value": "<UNK>",
        },
        "max_sequence_length": SEQ_SIZE,
        "str2idx": {
            "<EOS>": 0,
            "<SOS>": 1,
            "<PAD>": 2,
            "<UNK>": 3,
            "this": 4,
            "sentence": 5,
            "has": 6,
            "punctuation": 7,
            ",": 8,
            ".": 9,
        },
    }
    module = _SequencePreprocessing(metadata)

    res = module(
        ["punctuation", ",,,,", "this... this... punctuation", "unknown"])

    assert torch.allclose(
        res,
        torch.tensor([[1, 7, 0, 2, 2, 2], [1, 8, 8, 8, 8, 0],
                      [1, 4, 9, 9, 9, 4], [1, 3, 0, 2, 2, 2]]))

예제 #3

파일 보기

파일: test_sequence_features.py 프로젝트: ludwig-ai/ludwig

def test_sequence_preproc_module_bad_tokenizer():
    metadata = {
        "preprocessing": {
            "lowercase": True,
            "tokenizer": "dutch_lemmatize",
            "unknown_symbol": "<UNK>",
            "padding_symbol": "<PAD>",
            "computed_fill_value": "<UNK>",
        },
        "max_sequence_length": SEQ_SIZE,
        "str2idx": {
            "<EOS>": 0,
            "<SOS>": 1,
            "<PAD>": 2,
            "<UNK>": 3,
            "▁hell": 4,
            "o": 5,
            "▁world": 6
        },
    }

    with pytest.raises(ValueError):
        _SequencePreprocessing(metadata)

예제 #4

파일 보기

파일: text_feature.py 프로젝트: ludwig-ai/ludwig

 def create_preproc_module(metadata: Dict[str, Any]) -> torch.nn.Module:
     return _SequencePreprocessing(metadata)