Python Document.preprocess_textの例

プログラミング言語: Python

名前空間/パッケージ名: document

クラス/型: Document

メソッド/関数: preprocess_text

hotexamples.comのコード掲載数: 2

Python Document.preprocess_text - 2件のコード例が見つかりました。すべてオープンソースプロジェクトから抽出されたPythonのdocument.Document.preprocess_textの実例で、最も評価が高いものを厳選しています。コード例の評価を行っていただくことで、より質の高いコード例が表示されるようになります。

よく使われるメソッド

表示非表示

Document(30)

__init__(9)

new_shape(5)

content(4)

apply_op(4)

snapshot(4)

delete(3)

classification(3)

getWords(3)

concatenate(3)

add_header(2)

preprocess_text(2)

getName(2)

calc_hash(2)

getNumDifferentWords(2)

getNumParagraphs(2)

add_predictions(2)

getNumTotalWords(2)

add_tag(2)

getWordCount(2)

getNumSentences(2)

copy(2)

create_or_update(2)

addCut(2)

add(2)

result_file(2)

create(2)

date(2)

render(2)

getPostingsList(1)

download(1)

empty(1)

getUrls(1)

getTopWords(1)

getMostFrequentWord(1)

end_user_action(1)

files(1)

genre(1)

getClause(1)

getClausesList(1)

getImage(1)

getMostFrequentWords(1)

ChangePage(1)

get_selection(1)

getWordsAboveFrequency(1)

remove(1)

tokenize(1)

subscribe_user(1)

statistics(1)

setTextPath(1)

コード例 #1

ファイルを表示

ファイル: train.py プロジェクト: abeautifulman/DoubleCheck

    def __init__(self):

        for dir_ in glob(self.master_dir + "/*"):
            print "\nProcessing", dir_
            for essay in glob(dir_ + "/*"): # essays nested in subdirs
                if essay not in self.essay_vectors.keys():
                    print "\nDoubleChecking", essay 
                    doc = Document(essay, "Wil")
                    doc.document_to_text(essay, essay) # should probably truncate the first "essay" argument to just the filename
                    doc.preprocess_text()
                    doc.statistics()
                    errors = doc.proofread()
                    err_stats = {'grammar': 0,
                                 'suggestion': 0,
                                 'spelling': 0
                                 }
                    try:
                        for err in errors:
                            err_stats[err["type"]] += 1
                    except TypeError:
                        print "No errors!"
                    token_sentence_ratio = doc.stats['tokens'] / doc.stats['sentences']
                    self.essay_vectors[essay] = [
                                                    err_stats['grammar'], 
                                                    err_stats['suggestion'], 
                                                    err_stats['spelling'], 
                                                    token_sentence_ratio
                                                ]
                    print "Completed " + essay + ". Sleeping..."
                    sleep(10)

コード例 #2

ファイルを表示

ファイル: test_suite.py プロジェクト: abeautifulman/DoubleCheck

 def test_word_tokenizing(self):
     text = "This is a test sentence."
     with open("../process/tmp_test_file.txt", "w") as test_file:
         test_file.write(text)
     d = Document("tmp_test_file.txt", "testuser")
     d.preprocess_text()
     self.assertEqual(d.preprocessed['tokens'], 6, "word tokenizing failed, incorrect number of tokens")