Python strip_punc 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: text.utils

메소드/함수: strip_punc

hotexamples.com에서의 예제들: 8

Python strip_punc - 8개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 text.utils.strip_punc에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

파일: classifiers.py 프로젝트: LisaDawn/TextBlob

def contains_extractor(document):
    '''A basic document feature extractor that returns a dict of words that
    the document contains.
    '''
    tokenizer = WordTokenizer()
    if isinstance(document, basestring):
        tokens = set([strip_punc(w, all=False)
                    for w in tokenizer.itokenize(document, include_punc=False)])
    else:
        tokens = set((strip_punc(w, all=False) for w in document))
    features = dict((u'contains({0})'.format(w), True) for w in tokens)
    return features

예제 #2

파일 보기

파일: tokenizers.py 프로젝트: ariaff/TextBlob

    def tokenize(self, text, include_punc=True):
        '''Return a list of word tokens.

        :param text: string of text.
        :param include_punc: (optional) whether to include punctuation as separate tokens. Default to True.
        '''
        tokens = nltk.tokenize.word_tokenize(text)
        if include_punc:
            return tokens
        else:
            # Return each word token
            # Strips punctuation unless the word comes from a contraction
            # e.g. "Let's" => ["Let", "'s"]
            # e.g. "Can't" => ["Ca", "n't"]
            # e.g. "home." => ['home']
            return [word if word.startswith("'") else strip_punc(word, all=False)
                    for word in tokens if strip_punc(word, all=False)]

예제 #3

파일 보기

파일: classifiers.py 프로젝트: LisaDawn/TextBlob

def basic_extractor(document, train_set):
    '''A basic document feature extractor that returns a dict indicating
    what words in ``train_set`` are contained in ``document``.

    :param document: The text to extract features from. Can be a string or an iterable.
    :param train_set: Training data set, a list of tuples of the form
        ``(words, label)``.
    '''
    tokenizer = WordTokenizer()
    word_features = _get_words_from_dataset(train_set)
    if isinstance(document, basestring):
        tokens = set([strip_punc(w, all=False)
                    for w in tokenizer.itokenize(document, include_punc=False)])
    else:
        tokens = set(strip_punc(w, all=False) for w in document)
    features = dict([(u'contains({0})'.format(word), (word in tokens))
                                            for word in word_features])
    return features

예제 #4

파일 보기

파일: test_utils.py 프로젝트: ratancs/TextBlob

 def test_strip_punc(self):
     assert_equal(strip_punc(self.text),
                 'this Has Punctuation ')

예제 #5

파일 보기

def noun_phrases():
    text = get_text(request)
    noun_phrases = set(TextBlob(text).noun_phrases)
    # Strip punctuation from ends of noun phrases and exclude long phrases
    stripped = [strip_punc(np) for np in noun_phrases if len(np.split()) <= 5]
    return jsonify({"result": stripped})

예제 #6

파일 보기

 def test_strip_punc_all(self):
     assert_equal(strip_punc(self.text, all=True), 'this Has Punctuation')

예제 #7

파일 보기

 def test_strip_punc(self):
     assert_equal(strip_punc(self.text), 'this. Has. Punctuation')

예제 #8

파일 보기

파일: run.py 프로젝트: MansMeg/textfeel-web

def noun_phrases():
    text = get_text(request)
    noun_phrases = set(TextBlob(text).noun_phrases)
    # Strip punctuation from ends of noun phrases and exclude long phrases
    stripped = [strip_punc(np) for np in noun_phrases if len(np.split()) <= 5]
    return jsonify({"result": stripped})