Python word_splitter 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: util

메소드/함수: word_splitter

hotexamples.com에서의 예제들: 6

Python word_splitter - 6개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 util.word_splitter에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

def url_args_handler(url_args: str) -> List[ParamValPair]:
    '''
    Tokenizes the parameter-value pairs in the parameter part of the url

    Args:
        url_args (str): Parameter part of url of webpage

    Returns:
        paths (ParamValPair): List of param-val pairs as 2tuple of lists. If no
                              val is given, the second value in the tuple is
                              the empty list []

    Examples:
        >>> url_args_handler('sid=4')
        [(['sid'], ['4'])]
        >>> url_args_handler('sid=4&amp;ring=hent&amp;list')
        [(['sid'], ['4']), (['ring'], ['hent']), (['list'], [])]
        >>> url_args_handler('')
        []
    '''
    if len(url_args) == 0:
        return []

    pair_list = []
    for pair in re.split(r'(?:&amp;)|;|&|\\', url_args):
        splitted = pair.split('=')[:2]
        param, val = (splitted[0], '') if len(splitted) == 1 else splitted
        param_val_tup = (word_splitter(param), word_splitter(val))
        pair_list.append(param_val_tup)
    return pair_list

예제 #2

파일 보기

def url_domains_handler(url_domains: str) -> DomainData:
    '''
    Splits the domain part of the URL to individual tokens

    Args:
        url_domains (str): Domains part of url of webpage

    Returns:
        sub_domains (List[str]): List of subdomains, i.e. ['www', 'blog']
        main_domain (List[str]): Main domain, i.e. ['geo', 'cities']
        domain_ending (str): The domain ending, i.e. 'com' or 'net'

    Examples:
        >>> url_domains_handler('geocities.com')
        ([], ['geo', 'cities'], 'com')
        >>> url_domains_handler('www.members.tripod.net')
        (['www', 'members'], ['tripod'], 'net')
    '''
    splitted = url_domains.split('.')
    sub_domains = flatten([word_splitter(w) for w in splitted[:-2]])
    main_domain = word_splitter(splitted[-2])
    domain_ending = splitted[-1]
    return (sub_domains, main_domain, domain_ending)

예제 #3

파일 보기

def url_path_handler(url_path: str) -> List[str]:
    '''
    Splits the path part of the url

    Args:
        url_path (str): Path part of url of webpage

    Returns:
        paths (List[str]): List of tokenized paths

    Examples:
        >>> url_path_handler('/path1/path2/page.html')
        ['path', '1', 'path', '2', 'page', 'html']
        >>> url_path_handler('/')
        []
    '''
    token_lst = flatten(
        [word_splitter(token) for token in url_path.split('/') if token])
    if url_path.find('@') >= 0:
        token_lst.append('@')
    return token_lst

예제 #4

파일 보기

파일: test_util.py 프로젝트: shmulvad/nlp-project

 def test_non_hyphenated_word(self):
     assert word_splitter('someword') == ['some', 'word']

예제 #5

파일 보기

파일: test_util.py 프로젝트: shmulvad/nlp-project

 def test_short_word(self):
     assert word_splitter('abc') == ['abc']

예제 #6

파일 보기

파일: test_util.py 프로젝트: shmulvad/nlp-project

 def test_empty_word(self):
     assert word_splitter('') == []