Python word_tokenize 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: app.summarizer.stringUtils

메소드/함수: word_tokenize

hotexamples.com에서의 예제들: 5

Python word_tokenize - 5개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 app.summarizer.stringUtils.word_tokenize에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

파일: keywordsTool.py 프로젝트: eddidu/text-shorty

    def extract(self, document):
        """Return keywords"""
        # tokenize text
        sentences = stringUtils.sent_tokenize(document)
        tokens = [stringUtils.word_tokenize(s) for s in sentences]

        #TODO: need to pos tag words for picking only nouns
        #TODO: need to stem tokens for improving accuracy
        ratings = self.compute_ratings(tokens)
        result = self.pick_keywords(ratings, 5)

        return tuple(result)

예제 #2

파일 보기

파일: test_stringUtils.py 프로젝트: eddidu/text-shorty

    def test_word_tokenize_with_stem(self):
        """Does it successfully tokenize words with stem option?"""        
        input_text = "crying buying"

        expected = (
            "cry",
            "buy"
        )

        result = stringUtils.word_tokenize(input_text, filter_stopwords=False, stem=True)

        self.assertTupleEqual(expected, result)

예제 #3

파일 보기

파일: summaryTool.py 프로젝트: eddidu/text-shorty

    def summarize(self, document, summaryLength):
        """Return a list of sentences"""
        # tokenize text
        sentences = stringUtils.sent_tokenize(document)
        tokens = [stringUtils.word_tokenize(s, stem=True) for s in sentences]

        cosine_matrix = self.compute_cosine(tokens, self._treshold)
        normalized_cosine_matrix = self.normalize_matrix(cosine_matrix)
        ratings = self.compute_ratings(normalized_cosine_matrix, self._epsilon)

        result = self.pick_best_sentences(sentences, ratings, summaryLength)

        return tuple(result)

예제 #4

파일 보기

파일: test_stringUtils.py 프로젝트: eddidu/text-shorty

    def test_word_tokenize(self):
        """Does it successfully tokenize words?"""        
        input_text = "This is a sample."

        expected = (
            "this",
            "is",
            "a",
            "sample"
        )

        result = stringUtils.word_tokenize(input_text, filter_stopwords=False, stem=False)

        self.assertTupleEqual(expected, result)

예제 #5

파일 보기

파일: test_stringUtils.py 프로젝트: eddidu/text-shorty

    def test_word_tokenize_with_stopwords_filter(self):
        """Does it successfully tokenize words with stopwords filter option?"""        
        input_text = "How do you choose the article that's listed on the site."

        expected = (
            "choose",
            "article",
            "listed",
            "site"
        )

        result = stringUtils.word_tokenize(input_text, filter_stopwords=True)

        self.assertTupleEqual(expected, result)