Python script_cat Examples

Programming Language: Python

Namespace/Package Name: unicodedata2

Method/Function: script_cat

Examples at hotexamples.com: 4

Python script_cat - 4 examples found. These are the top rated real world Python examples of unicodedata2.script_cat extracted from open source projects. You can rate examples to help us improve the quality of examples.

Example #1

Show file

File: namsor_tools.py Project: Oshan96/namsor-python-tools-v2

def computeScriptFirst(someString):
    for i in range(len(someString)):
        c = someString[i]
        script = unicodedata2.script_cat(c)[0]
        if script == "Common":
            continue

        return script

    return None

Example #2

Show file

 def tokenize_real(self, text):
     chars = ((unicodedata2.script_cat(c), c) for c in text)
     tokens = list()
     for (key, group) in itertools.groupby(chars, operator.itemgetter(0)):
         if (key[1][0] == 'L' and key[0] not in self.DISCARD_SCRIPTS):
             cand = ''.join((c[1] for c in group))
             if (key[0] in self.JP_SCRIPTS):
                 tokens.extend(self.tiny.tokenize(cand))
             else:
                 tokens.append(cand.lower())
     return tokens

Example #3

Show file

File: unicode_props.py Project: pombredanne/quac

 def tokenize_real(self, text):
     chars = ((unicodedata2.script_cat(c), c) for c in text)
     tokens = list()
     for (key, group) in itertools.groupby(chars, operator.itemgetter(0)):
         if key[1][0] == "L" and key[0] not in self.DISCARD_SCRIPTS:
             cand = "".join((c[1] for c in group))
             if key[0] in self.JP_SCRIPTS:
                 tokens.extend(self.tiny.tokenize(cand))
             else:
                 tokens.append(cand.lower())
     return tokens

Example #4

Show file

File: wttrin_png.py Project: ericharris/wttrin-dockerized

def script_category(char):
    """
    Returns category of a Unicode character
    Possible values:
        default, Cyrillic, Greek, Han, Hiragana
    """
    cat = unicodedata2.script_cat(char)[0]
    if char == u'：':
        return 'Han'
    if cat in ['Latin', 'Common']:
        return 'default'
    else:
        return cat