Python levenshtein 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: smewt.base.textutils

메소드/함수: levenshtein

hotexamples.com에서의 예제들: 4

Python levenshtein - 4개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 smewt.base.textutils.levenshtein에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

파일: tvdbmetadataprovider.py 프로젝트: EQ4/smewt

    def startEpisode(self, episode):
        self.tmdb.lang = guiLanguage().alpha2

        if episode.get('series') is None:
            raise SmewtException("TVDBMetadataProvider: Episode doesn't contain 'series' field: %s", episode)

        name = episode.series.title
        name = name.replace(',', ' ')

        matching_series = self.getSeries(name)

        # Try first with the languages from guessit, and then with english
        languages = tolist(episode.get('language', [])) + ['en']

        # Sort the series by id (stupid heuristic about most popular series
        #                        might have been added sooner to the db and the db id
        #                        follows the insertion order)
        # TODO: we should do something smarter like comparing series name distance,
        #       episodes count and/or episodes names
        #print '\n'.join(['%s %s --> %f [%s] %s' % (x[1], name, textutils.levenshtein(x[1], name), x[2], x[0]) for x in matching_series])
        matching_series.sort(key=lambda x: (textutils.levenshtein(x[1], name), int(x[0])))

        series = None
        language = 'en'
        for lang in languages:
            try:
                language = lang
                ind = zip(*matching_series)[2].index(lang)
                series = matching_series[ind][0]
                break
            except ValueError, e:
                language = matching_series[0][2]
                series = matching_series[0][0]

예제 #2

파일 보기

def fuzzyMatch2(baseGuess, md):
    for p1, p2 in zip(baseGuess.unique_key(), md.unique_key()):
        if type(p1) == str or type(p1) == unicode:
            # TODO: levenshtein doesn't cut it here, we need a better string distance
            if levenshtein(p1.lower(), p2.lower()) > 80:
                return False
        elif isinstance(p1, Metadata):
            if not fuzzyMatch2(p1, p2):
                return False
        else:
            if p1 != p2:
                return False
    return True

예제 #3

파일 보기

파일: simplesolver.py 프로젝트: robmcmullen/smewt

def fuzzyMatch2(baseGuess, md):
    for p1, p2 in zip(baseGuess.unique_key(), md.unique_key()):
        if type(p1) == str or type(p1) == unicode:
            # TODO: levenshtein doesn't cut it here, we need a better string distance
            if levenshtein(p1.lower(), p2.lower()) > 80:
                return False
        elif isinstance(p1, Metadata):
            if not fuzzyMatch2(p1, p2):
                return False
        else:
            if p1 != p2:
                return False
    return True

예제 #4

파일 보기

    def startEpisode(self, episode):
        self.tmdb.lang = guiLanguage().alpha2

        if episode.get('series') is None:
            raise SmewtException(
                "TVDBMetadataProvider: Episode doesn't contain 'series' field: %s",
                episode)

        name = episode.series.title
        name = name.replace(',', ' ')

        matching_series = self.getSeries(name)

        # Try first with the languages from guessit, and then with english
        languages = tolist(episode.get('language', [])) + ['en']

        # Sort the series by id (stupid heuristic about most popular series
        #                        might have been added sooner to the db and the db id
        #                        follows the insertion order)
        # TODO: we should do something smarter like comparing series name distance,
        #       episodes count and/or episodes names
        #print '\n'.join(['%s %s --> %f [%s] %s' % (x[1], name, textutils.levenshtein(x[1], name), x[2], x[0]) for x in matching_series])
        matching_series.sort(
            key=lambda x: (textutils.levenshtein(x[1], name), int(x[0])))

        series = None
        language = 'en'
        for lang in languages:
            try:
                language = lang
                ind = zip(*matching_series)[2].index(lang)
                series = matching_series[ind][0]
                break
            except ValueError, e:
                language = matching_series[0][2]
                series = matching_series[0][0]