Python Corpus.reduce Exemples

Langage de programmation: Python

Espace de nommage/Pack: pattern.vector

Class/Type: Corpus

Méthode/Fonction: reduce

Exemples au hotexamples.com: 5

Python Corpus.reduce - 5 exemples trouvés. Ce sont les exemples réels les mieux notés de pattern.vector.Corpus.reduce extraits de projets open source. Vous pouvez noter les exemples pour nous aider à en améliorer la qualité.

Méthodes fréquemment utilisées

Afficher Cacher

Corpus(5)

append(4)

build(2)

lsa(2)

reduce(2)

cluster(1)

document(1)

export(1)

extend(1)

feature_selection(1)

filter(1)

load(1)

nn(1)

save(1)

search(1)

Méthodes fréquemment utilisées

Corpus (5)

append (4)

build (2)

lsa (2)

reduce (2)

cluster (1)

document (1)

export (1)

extend (1)

feature_selection (1)

Méthodes fréquemment utilisées

filter (1)

load (1)

nn (1)

save (1)

search (1)

Associées

oreheInp_tp_dp

bug_comment_from_commit_text

source_and_offset

forward_backward

open_serial_connection

MeanFieldModel

appendQuery

selectedLayer

DataStreamEvaluator

ChangeNotesLinesParser

Related in langs

getCurrentHostname (PHP)

_sendMail (PHP)

TeacherDB (C#)

TeacherDA (C#)

NaClUntrustedThreadsSuspendAll (C++)

start_lltd (C++)

BackupHandle (Go)

ArrayEach (Go)

ACBaseActivity (Java)

ConstraintDescriptorImpl (Java)

Exemple #1

0

Afficher le fichier

Fichier : lsa.py Projet : seekshreyas/i290-DigitalHumanities-Theme

clusters = clusterer.cluster(vectors, True) ## k-means clustering # clusterer = nltk.cluster.KMeansClusterer(2, nltk.cluster.euclidean_distance) # clusters = clusterer.cluster(U, assign_clusters=True, trace=False) print "clusterer: ", clusterer print "clustered: ", vectors print "As: ", clusters # print "Means: ", clusterer.means() # show the dendrogram clusterer.dendrogram().show(leaf_labels= ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27']) ## lsa analysis corpus = Corpus(documents) corpus.reduce(10) # for document in corpus: # print # print document.name # for concept, w1 in corpus.lsa.vectors[document.id].items(): # for word, w2 in corpus.lsa.concepts[concept].items(): # if w1 !=0 and w2 !=0: # print (word, w1*w2) ## clustering analysis by pattern's hierarchical patterncluster = corpus.cluster(method=HIERARCHICAL, k=10, iterations=1000) print patterncluster, patterncluster.depth

Exemple #2

0

Afficher le fichier

Fichier : 03-lsa.py Projet : AnthonyNystrom/pattern

# This may be too much words for some clustering algorithms (e.g., hierarchical). # We'll reduce the documents to vectors of 4 concepts. # First, let's test how the corpus would perform as a classifier. # The details of KNN are not that important right now, just observe the numbers. # Naturally, we want accuracy to stay the same after LSA reduction, # and hopefully decrease the time needed to run. t = time.time() print "accuracy:", KNN.test(corpus, folds=10)[-1] print "time:", time.time() - t print # Reduce the documents to vectors of 4 concepts (= 1/7 of 30 words). print "LSA reduction..." print corpus.reduce(4) t = time.time() print "accuracy:", KNN.test(corpus, folds=10)[-1] print "time:", time.time() - t print # Not bad, accuracy is about the same but performance is 3x faster, # because each document is now a "4-word summary" of the original review. # Let's take a closer look at the concepts. # The concept vector for the first document: print corpus.lsa.vectors[corpus[0].id] print # It is a dictionary linking concept id's to a score.

Exemple #3

0

Afficher le fichier

Fichier : lsa.py Projet : tegjyotsingh/SimpleEmotionDetection_tweet

#Classify the Preprocessed tweets file into Positive or NEgative categories. # Sample with 20 tweets file. from pattern.vector import Document,Bayes,LSA, Corpus,PORTER,TFIDF from numpy import diag,dot from numpy.linalg import svd filename='sample_20' f=open(filename,'r') lines=f.readlines() Positive=lines[9:] Negative=lines[:9] Type=[0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1] docs=[] for text in lines: vec=Document(text,stopword=True,stemmer=PORTER,type='1') docs.append(vec) corpus=Corpus(documents=docs,weight=TFIDF) print corpus.vectors corpus.reduce(3) print "hi" print corpus.lsa.vectors f.close()

Exemple #4

0

Afficher le fichier

# based on a matrix calculation called "singular value decomposition" (SVD). # It discovers semantically related words across documents. # It groups these into different "concepts" # and creates a "concept vector" instead of a word vector for each document. # This reduces the amount of data to work with (for example when clustering), # and filters out noise, so that semantically related words come out stronger. D1 = Document("The dog wags his tail.", threshold=0, name="dog") D2 = Document("Curiosity killed the cat.", threshold=0, name="cat") D3 = Document("Cats and dogs make good pets.", threshold=0, name="pet") D4 = Document("Curiosity drives science.", threshold=0, name="science") corpus = Corpus([D1,D2,D3,D4]) print corpus.search("curiosity") print corpus.reduce() # A search on the reduced concept space also yields D3 ("pet") as a result, # since D2 and D2 are slightly similar even though D3 does not explicitly contain "curiosity". # Note how the results also yield stronger similarity scores (noise was filtered out). print corpus.search("curiosity") print # The concept vector for document D1: #print corpus.lsa.vectors[D1.id] #print # The word scores for each concept: #print corpus.lsa.concepts

Exemple #5

0

Afficher le fichier

# This may be too much words for some clustering algorithms (e.g., hierarchical). # We'll reduce the documents to vectors of 4 concepts. # First, let's test how the corpus would perform as a classifier. # The details of KNN are not that important right now, just observe the numbers. # Naturally, we want accuracy to stay the same after LSA reduction, # and hopefully decrease the time needed to run. t = time.time() print "accuracy:", KNN.test(corpus, folds=10)[-1] print "time:", time.time() - t print # Reduce the documents to vectors of 4 concepts (= 1/7 of 30 words). print "LSA reduction..." print corpus.reduce(4) t = time.time() print "accuracy:", KNN.test(corpus, folds=10)[-1] print "time:", time.time() - t print # Not bad, accuracy is about the same but performance is 3x faster, # because each document is now a "4-word summary" of the original review. # Let's take a closer look at the concepts. # The concept vector for the first document: print corpus.lsa.vectors[corpus[0].id] print # It is a dictionary linking concept id's to a score.