Ejemplos de from_id en Python

Lenguaje de programación: Python

Namespace/Package Name: postal.text.token_types.token_types

Método / Función: from_id

Ejemplos en hotexamples.com: 3

Python from_id - 3 ejemplos encontrados. Estos son los ejemplos en Python del mundo real mejor valorados de postal.text.token_types.token_types.from_id extraídos de proyectos de código abierto. Puedes valorar ejemplos para ayudarnos a mejorar la calidad de los ejemplos.

Ejemplo n.º 1

Mostrar archivo

def normalized_tokens(s, string_options=DEFAULT_STRING_OPTIONS,
                      token_options=DEFAULT_TOKEN_OPTIONS,
                      strip_parentheticals=True):
    '''
    Normalizes a string, tokenizes, and normalizes each token
    with string and token-level options.

    This version only uses libpostal's deterministic normalizations
    i.e. methods with a single output. The string tree version will
    return multiple normalized strings, each with tokens.

    Usage:
        normalized_tokens(u'St.-Barthélemy')
    '''
    s = safe_decode(s)
    if string_options & _normalize.NORMALIZE_STRING_LATIN_ASCII:
        normalized = _normalize.normalize_string_latin(s, string_options)
    else:
        normalized = _normalize.normalize_string_utf8(s, string_options)

    # Tuples of (offset, len, type)
    raw_tokens = tokenize_raw(normalized)
    tokens = [(_normalize.normalize_token(normalized, t, token_options),
               token_types.from_id(t[-1])) for t in raw_tokens]

    if strip_parentheticals:
        return remove_parens(tokens)
    else:
        return tokens

Ejemplo n.º 2

Mostrar archivo

Archivo: tokenize.py Proyecto: nvkelso/libpostal

def tokenize(s):
    u = safe_decode(s)
    s = safe_encode(s)
    return [(safe_decode(s[start:start + length]), token_types.from_id(token_type))
            for start, length, token_type in _tokenize.tokenize(u)]

Ejemplo n.º 3

Mostrar archivo

Archivo: tokenize.py Proyecto: pombredanne/libpostal

def tokenize(s):
    u = safe_decode(s)
    s = safe_encode(s)
    return [(safe_decode(s[start:start + length]),
             token_types.from_id(token_type))
            for start, length, token_type in _tokenize.tokenize(u)]