Python TokenAligner.project_span Examples

Programming Language: Python

Namespace/Package Name: jiant.utils.retokenize

Class/Type: TokenAligner

Method/Function: project_span

Examples at hotexamples.com: 4

Python TokenAligner.project_span - 4 examples found. These are the top rated real world Python examples of jiant.utils.retokenize.TokenAligner.project_span extracted from open source projects. You can rate examples to help us improve the quality of examples.

Frequently Used Methods

Show Hide

TokenAligner(21)

project_token_idxs(14)

project_tokens(14)

project_span(4)

project_token_span(4)

project_token_to_char_span(2)

_project_span(1)

project_char_to_token_span(1)

Example #1

Show file

File: test_token_alignment.py Project: pruksmhc/jiant-1

def test_project_invalid_span():
    src_tokens = ["Members", "of", "the", "House", "clapped", "their", "hands"]
    tgt_tokens = [
        "Members", "Ġof", "Ġthe", "ĠHouse", "Ġcl", "apped", "Ġtheir", "Ġhands"
    ]
    # reference: tgt_token_index = [[0], [1], [2], [3], [4, 5], [6], [7]]
    ta = TokenAligner(src_tokens, tgt_tokens)
    with pytest.raises(ValueError):
        ta.project_span(0, 0)

Example #2

Show file

File: test_token_alignment.py Project: pruksmhc/jiant-1

def test_token_aligner_project_span_last_token_range_is_end_exclusive():
    source_tokens = ["abc", "def", "ghi", "jkl"]
    target_tokens = ["abc", "d", "ef", "ghi", "jkl"]
    ta = TokenAligner(source_tokens, target_tokens)
    m = ta.project_span(3, 4)
    m_expected = np.array([4, 5])
    assert (m == m_expected).all()

Example #3

Show file

File: test_token_alignment.py Project: pruksmhc/jiant-1

def test_token_aligner_project_span():
    source_tokens = ["abc", "def", "ghi", "jkl"]
    target_tokens = ["abc", "d", "ef", "ghi", "jkl"]
    ta = TokenAligner(source_tokens, target_tokens)
    m = ta.project_span(1, 2)
    m_expected = np.array([1, 3])
    assert (m == m_expected).all()

Example #4

Show file

File: test_token_alignment.py Project: pruksmhc/jiant-1

def test_project_span_covering_whole_sequence():
    src_tokens = ["Members", "of", "the", "House", "clapped", "their", "hands"]
    tgt_tokens = [
        "Members", "Ġof", "Ġthe", "ĠHouse", "Ġcl", "apped", "Ġtheir", "Ġhands"
    ]
    # reference: tgt_token_index = [[0], [1], [2], [3], [4, 5], [6], [7]]
    ta = TokenAligner(src_tokens, tgt_tokens)
    assert (0, 8) == ta.project_span(0, 7)