Python split_identifier 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: ncc.tokenizers.tokenization

메소드/함수: split_identifier

hotexamples.com에서의 예제들: 4

Python split_identifier - 4개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 ncc.tokenizers.tokenization.split_identifier에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

 def string2tokens(string):
     string = ''.join([char if str.isalpha(char) else ' ' for char in string])
     string = SPACE_SPLITTER.sub(" ", string)
     tokens = string.split()
     tokens = [split_identifier(tok) for tok in tokens]
     tokens = list(itertools.chain(*tokens))
     tokens = [str.lower(tok) for tok in tokens]
     # tokens = vocab.encode(string, out_type=str)
     # tokens = str.replace(' '.join(tokens), SPM_SPACE, '')
     return tokens

예제 #2

파일 보기

 def parse_docstring_tokens(self, docstring_tokens):
     # parse comment from docstring_tokens
     docstring_tokens = [''.join([char for char in token if char not in MEANINGLESS_TOKENS]) \
                         for token in docstring_tokens]
     docstring_tokens = itertools.chain(*[
         split_identifier(token, str_flag=False)
         for token in docstring_tokens
     ])
     docstring_tokens = util.stress_tokens(docstring_tokens)
     if self.to_lower:
         docstring_tokens = util.lower(docstring_tokens)
     return docstring_tokens

예제 #3

파일 보기

 def parse_docstring(self, docstring):
     '''parse comment from docstring'''
     docstring = re.sub(r'\{\@\S+', '', docstring)
     docstring = re.sub(r'{.+}', '', docstring)
     docstring = ''.join(
         [char for char in docstring if char not in MEANINGLESS_TOKENS])
     docstring = [
         split_identifier(token, str_flag=False)
         for token in docstring.split(' ')
     ]
     docstring = list(itertools.chain(*docstring))
     docstring = util.stress_tokens(docstring)
     if self.to_lower:
         docstring = util.lower(docstring)
     return docstring

예제 #4

파일 보기

def pad_leaf_node(ast_tree: Dict, max_len: int, PAD_TOKEN=PAD) -> Dict:
    '''
    pad leaf node's child into [XX, [XX, ...]]
    split token and pad it with PAD_TOKEN till reach MAX_TOKEN_LIST_LEN
    e.g. VariableName ->  [VariableName, [Variable, Name, PAD_TOKEN, PAD_TOKEN, ...]]
    '''
    for idx, node in ast_tree.items():
        if len(node['children']) == 1 and isinstance(node['children'][0], str):
            subtokens = split_identifier(node['children'][0], False)
            if len(subtokens) == 0:
                subtokens = [node['children'][0]]
            if len(subtokens) >= max_len:
                subtokens = subtokens[:max_len]
            else:
                subtokens.extend([PAD_TOKEN] * (max_len - len(subtokens)))
            node['children'].append(subtokens)
    return ast_tree