Python stripPunctuationの例

プログラミング言語: Python

名前空間/パッケージ名: cleanup

メソッド/関数: stripPunctuation

hotexamples.comのコード掲載数: 3

Python stripPunctuation - 3件のコード例が見つかりました。すべてオープンソースプロジェクトから抽出されたPythonのcleanup.stripPunctuationの実例で、最も評価が高いものを厳選しています。コード例の評価を行っていただくことで、より質の高いコード例が表示されるようになります。

コード例 #1

ファイルを表示

ファイル: clustering.py プロジェクト: sathify/clustering

 def computeTFIDF(self, documentFrequencies, corpusSize=81):
     contents = cleanup.filterStopwords(self.contents)
     contents = cleanup.stripPunctuation(contents)
     scoredWords = self.frequency(contents)
     for word in scoredWords:
         if word in documentFrequencies:
             scoredWords[word] = scoredWords[word] * math.log(corpusSize / documentFrequencies[word])
         else:
             scoredWords[word] = scoredWords[word] * math.log(corpusSize)
     self.tfidfscores = scoredWords

コード例 #2

ファイルを表示

ファイル: clusters.py プロジェクト: sathify/clustering

def buildCorpus(Dir) :
    dict ={}
    size = 0
    for Class in os.listdir(Dir):
        dir=os.path.join(Dir,Class)
        fileList=os.listdir(dir)
        size += len(fileList)
        for file in fileList:
            path=os.path.join(dir,file)
            data=open(path,'r').read()
            contents = cleanup.filterStopwords(data.split())
            contents = cleanup.stripPunctuation(contents)
            for word in set(contents) :
                try:
                    dict[word] += 1
                except:
                    dict[word] = 1
    pickle.dump(dict, open('dictionary','w')) 
    return size, dict

コード例 #3

ファイルを表示

ファイル: clustering.py プロジェクト: sathify/clustering

 def setupdocuments(self, Dir):
     size = 0
     corpusfile = open("dictionary", "r")
     corpus = pickle.load(corpusfile)
     for Class in os.listdir(Dir):
         dir = os.path.join(Dir, Class)
         fileList = os.listdir(dir)
         size += len(fileList)
         for file in fileList:
             path = os.path.join(dir, file)
             try:
                 data = open(path, "r").read()
                 contents = cleanup.filterStopwords(data.split())
                 contents = cleanup.stripPunctuation(contents)
                 d = document(file, Class, contents)
                 d.computeTFIDF(corpus)
                 self.alldocuments[d] = 1
             except:
                 pass