Python anagram_hash 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: denoiser.models.inline.hashing

메소드/함수: anagram_hash

hotexamples.com에서의 예제들: 6

Python anagram_hash - 6개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 denoiser.models.inline.hashing.anagram_hash에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

파일: __init__.py 프로젝트: williammo2016/ocr-pipeline

    def append_data(self, bigrams, unigrams):
        anaghash_map = {
            anagram_hash(word): set()
            for word in bigrams.keys() + unigrams.keys()
        }

        for word in bigrams.keys() + unigrams.keys():
            anaghash_map[anagram_hash(word)].add(word)

        self.anagram_hashmap = anaghash_map

        clean_word = re.compile(r"^[a-zA-Z '-]+$")
        alphabet = set()

        for word in unigrams:
            word = " " + word + " "
            chars = [char for char in word]  # Getting letters from the word
            chars += map(add, chars[:-1],
                         chars[1:])  # Adding bigrams to the list

            alphabet = alphabet.union([
                anagram_hash(char) for char in set(chars)
                if not clean_word.match(char) is None
            ])

        alphabet.add(0)

        self.anagram_alphabet = alphabet
        self.save()

예제 #2

파일 보기

파일: utils.py 프로젝트: pdessauw/ocr-pipeline

def select_anagrams(token, structures):
    """Select possible anagrams for a given token

    Parameters:
        token (:func:`str`): Cleaned token
        structures (:func:`dict`): Datastructures from file

    Returns:
        :func:`dict` - Possible anagrams (keys) along with their score (values)
    """
    anagrams = {}
    focus_alphabet = generate_alphabet_from_word(token[1])
    token_hash = anagram_hash(token)

    hash_list = []
    for c in structures["alphabet"]:
        for f in focus_alphabet:
            hash_list.append(token_hash + c - f)

    hash_counter = Counter(hash_list)  # Counting retrieval occurence

    for h in set(hash_counter.keys()).intersection(set(structures["anagrams"].keys())):
        count = hash_counter[h]
        anag_list = [anag for anag in structures["anagrams"][h] if edit_distance(anag, token) <= 3]

        for anag in anag_list:
            anag_score = rate_anagram(structures["occurence_map"], token, anag, count)

            if anag_score > 0:
                anagrams[anag] = anag_score

    return anagrams

예제 #3

파일 보기

파일: utils.py 프로젝트: pdessauw/ocr-pipeline

def generate_alphabet_from_word(word):
    """Generate anagram hash for all chars in a word

    Parameters:
        word (:func:`str`): Word to generate hash
    Returns:
        set - Set of hashes
    """
    word = " "+word+" "
    chars = [char for char in word]  # Getting letters from the word
    chars += map(add, chars[:-1], chars[1:])  # Adding bigrams to the list

    # Computing hash of items and add 0 to the list
    return set([0] + [anagram_hash(c) for c in set(chars)])

예제 #4

파일 보기

파일: utils.py 프로젝트: williammo2016/ocr-pipeline

def generate_alphabet_from_word(word):
    """Generate anagram hash for all chars in a word

    Parameters:
        word (:func:`str`): Word to generate hash
    Returns:
        set - Set of hashes
    """
    word = " " + word + " "
    chars = [char for char in word]  # Getting letters from the word
    chars += map(add, chars[:-1], chars[1:])  # Adding bigrams to the list

    # Computing hash of items and add 0 to the list
    return set([0] + [anagram_hash(c) for c in set(chars)])

예제 #5

파일 보기

파일: __init__.py 프로젝트: pdessauw/ocr-pipeline

    def append_data(self, bigrams, unigrams):
        anaghash_map = {anagram_hash(word): set() for word in bigrams.keys() + unigrams.keys()}

        for word in bigrams.keys() + unigrams.keys():
            anaghash_map[anagram_hash(word)].add(word)

        self.anagram_hashmap = anaghash_map

        clean_word = re.compile(r"^[a-zA-Z '-]+$")
        alphabet = set()

        for word in unigrams:
            word = " "+word+" "
            chars = [char for char in word]  # Getting letters from the word
            chars += map(add, chars[:-1], chars[1:])  # Adding bigrams to the list

            alphabet = alphabet.union([anagram_hash(char) for char in set(chars)
                                       if not clean_word.match(char) is None])

        alphabet.add(0)

        self.anagram_alphabet = alphabet
        self.save()

예제 #6

파일 보기

파일: utils.py 프로젝트: williammo2016/ocr-pipeline

def select_anagrams(token, structures):
    """Select possible anagrams for a given token

    Parameters:
        token (:func:`str`): Cleaned token
        structures (:func:`dict`): Datastructures from file

    Returns:
        :func:`dict` - Possible anagrams (keys) along with their score (values)
    """
    anagrams = {}
    focus_alphabet = generate_alphabet_from_word(token[1])
    token_hash = anagram_hash(token)

    hash_list = []
    for c in structures["alphabet"]:
        for f in focus_alphabet:
            hash_list.append(token_hash + c - f)

    hash_counter = Counter(hash_list)  # Counting retrieval occurence

    for h in set(hash_counter.keys()).intersection(
            set(structures["anagrams"].keys())):
        count = hash_counter[h]
        anag_list = [
            anag for anag in structures["anagrams"][h]
            if edit_distance(anag, token) <= 3
        ]

        for anag in anag_list:
            anag_score = rate_anagram(structures["occurence_map"], token, anag,
                                      count)

            if anag_score > 0:
                anagrams[anag] = anag_score

    return anagrams