Python treebank_to_wordnet 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: chatterbot.utils

메소드/함수: treebank_to_wordnet

hotexamples.com에서의 예제들: 6

Python treebank_to_wordnet - 6개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 chatterbot.utils.treebank_to_wordnet에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

파일: tagging.py 프로젝트: ShaliniR8/Chatbot

    def get_hypernyms(self, pos_tags):
        """
        Return the hypernyms for each word in a list of POS tagged words.
        """
        results = []

        for word, pos in pos_tags:
            try:
                synsets = wordnet.synsets(word, utils.treebank_to_wordnet(pos), lang=self.language.ISO_639)
            except WordNetError:
                synsets = None
            except LookupError:
                # Don't return any synsets if the language is not supported
                synsets = None

            if synsets:
                synset = synsets[0]
                hypernyms = synset.hypernyms()

                if hypernyms:
                    results.append(hypernyms[0].name().split('.')[0])
                else:
                    results.append(word)
            else:
                results.append(word)

        return results

예제 #2

파일 보기

파일: comparisons.py 프로젝트: mohd14shoeb/Chatbot

    def compare(self, statement, other_statement):
        """
        Return the calculated similarity of two
        statements based on the Jaccard index.
        """
        import nltk
        import string

        # Get default English stopwords
        stopwords = nltk.corpus.stopwords.words('english')

        lemmatizer = nltk.stem.wordnet.WordNetLemmatizer()

        # Make both strings lowercase
        a = statement.text.lower()
        b = other_statement.text.lower()

        # Remove punctuation from each string
        table = str.maketrans(dict.fromkeys(string.punctuation))
        a = a.translate(table)
        b = b.translate(table)

        pos_a = nltk.pos_tag(nltk.tokenize.word_tokenize(a))
        pos_b = nltk.pos_tag(nltk.tokenize.word_tokenize(b))

        lemma_a = [
            lemmatizer.lemmatize(
                token, utils.treebank_to_wordnet(pos)
            ) for token, pos in pos_a if token not in stopwords
        ]
        lemma_b = [
            lemmatizer.lemmatize(
                token, utils.treebank_to_wordnet(pos)
            ) for token, pos in pos_b if token not in stopwords
        ]

        # Calculate Jaccard similarity
        numerator = len(set(lemma_a).intersection(lemma_b))
        denominator = float(len(set(lemma_a).union(lemma_b)))
        ratio = numerator / denominator

        return ratio

예제 #3

파일 보기

파일: comparisons.py 프로젝트: django-webapp/webapp

    def compare(self, statement, other_statement):
        """
        Return the calculated similarity of two
        statements based on the Jaccard index.
        """
        from nltk import pos_tag

        word_tokenizer = self.get_word_tokenizer()

        # Get the stopwords for the current language
        stopwords = self.get_stopwords()

        lemmatizer = self.get_lemmatizer()

        # Make both strings lowercase
        a = statement.text.lower()
        b = other_statement.text.lower()

        # Remove punctuation from each string
        a = a.translate(self.punctuation_table)
        b = b.translate(self.punctuation_table)

        pos_a = pos_tag(word_tokenizer.tokenize(a))
        pos_b = pos_tag(word_tokenizer.tokenize(b))

        lemma_a = [
            lemmatizer.lemmatize(token, utils.treebank_to_wordnet(pos))
            for token, pos in pos_a if token not in stopwords
        ]
        lemma_b = [
            lemmatizer.lemmatize(token, utils.treebank_to_wordnet(pos))
            for token, pos in pos_b if token not in stopwords
        ]

        # Calculate Jaccard similarity
        numerator = len(set(lemma_a).intersection(lemma_b))
        denominator = float(len(set(lemma_a).union(lemma_b)))
        ratio = numerator / denominator

        return ratio

예제 #4

파일 보기

파일: tagging.py 프로젝트: zyxPaiDaxing/ChatterBot

    def get_hypernyms(self, pos_tags):
        """
        Return the hypernyms for each word in a list of POS tagged words.
        """
        results = []

        for word, pos in pos_tags:
            synsets = wordnet.synsets(word, treebank_to_wordnet(pos))

            if synsets:
                synset = synsets[0]
                hypernyms = synset.hypernyms()

                if hypernyms:
                    results.append(hypernyms[0].name().split('.')[0])
                else:
                    results.append(word)
            else:
                results.append(word)

        return results

예제 #5

파일 보기

 def test_treebank_to_wordnet_no_match(self):
     self.assertEqual(utils.treebank_to_wordnet('XXX'), None)

예제 #6

파일 보기

 def test_treebank_to_wordnet(self):
     self.assertEqual(utils.treebank_to_wordnet('NNS'), 'n')