Python LineFile.head Exemples

Langage de programmation: Python

Espace de nommage/Pack: ngrampy

Class/Type: LineFile

Méthode/Fonction: head

Exemples au hotexamples.com: 3

Python LineFile.head - 3 exemples trouvés. Ce sont les exemples réels les mieux notés de ngrampy.LineFile.head extraits de projets open source. Vous pouvez noter les exemples pour nous aider à en améliorer la qualité.

Méthodes fréquemment utilisées

Afficher Cacher

sort(4)

LineFile(3)

clean(2)

copy(2)

extract_columns(2)

head(2)

restrict_vocabulary(2)

resum_equal(2)

delete_tmp(1)

lines(1)

make_marginal_column(1)

merge(1)

print_average_surprisal(1)

subsample_lines(1)

Méthodes fréquemment utilisées

sort (4)

LineFile (3)

clean (2)

copy (2)

extract_columns (2)

head (2)

restrict_vocabulary (2)

resum_equal (2)

delete_tmp (1)

lines (1)

Méthodes fréquemment utilisées

make_marginal_column (1)

merge (1)

print_average_surprisal (1)

subsample_lines (1)

Exemple #1

0

Afficher le fichier

return abs(x-y) / ((x+y)/2.) < tolerance # This will copy the file, make a new one, and then print out possible lines G = LineFile(files=["/ssd/trigram-stats"], path="/ssd/subsampled-stimuli", header="w1 w2 w3 c123 c1 c2 c3 c12 c23 unigram bigram trigram") # Now throw out the porno words #porno_vocabulary = [ l.strip() for l in open(BAD_WORD_FILE, "r") ] #G.restrict_vocabulary("w1 w2 w3", porno_vocabulary, invert=True) # draw a subsample #if SUBSAMPLE_N is not None: #G.subsample_lines(N=SUBSAMPLE_N) # we need to resort this so that we can have w1 and w3 equal and then all the n-grams matched G.sort("w1 w3 unigram bigram trigram", lines=1000000) G.head() item_number = 0 line_stack = [] for l in G.lines(tmp=False, parts=False): # extract the columns from line w1, w3, unigram, bigram, trigram = G.extract_columns(l, keys="w1 w3 unigram bigram trigram", dtype=[str, str, float, float, float]) # now remove things which cannot possibly match anymore while len(line_stack) > 0: w1_, w3_, unigram_, bigram_, trigram = G.extract_columns(line_stack[0], keys="w1 w3 unigram bigram trigram", dtype=[str, str, float, float, float]) if not (w1_ == w1 and w3_ == w3 and check_tolerance(unigram, unigram_)): del line_stack[0] # now go through the line_stack and try out each

Exemple #2

0

Afficher le fichier

Fichier : find_matched_items.py Projet : zhouxiaojue/ngrampy

# This will copy the file, make a new one, and then print out possible lines G = LineFile(files=["/ssd/trigram-stats"], path="/ssd/subsampled-stimuli", header="w1 w2 w3 c123 c1 c2 c3 c12 c23 unigram bigram trigram") # Now throw out the porno words porno_vocabulary = [l.strip() for l in open(BAD_WORD_FILE, "r")] G.restrict_vocabulary("w1 w2 w3", porno_vocabulary, invert=True) # and then subsample G.subsample_lines(N=SUBSAMPLE_N) # and make sure we are sorted for the below G.sort("unigram bigram trigram", dtype=float) G.head() # just a peek item_number = 0 line_stack = [] for l in G.lines(tmp=False, parts=False): # extrac the columns from line unigram, bigram, trigram = G.extract_columns(l, keys="unigram bigram trigram", dtype=float) # now remove things which cannot possibly match anymore while len(line_stack) > 0 and not check_tolerance( unigram, G.extract_columns(line_stack[0], keys="unigram", dtype=float)[0]): del line_stack[0]

Exemple #3

0

Afficher le fichier

Fichier : find_matched_items.py Projet : Akibalogh/ngrampy

""" return abs(x-y) / ((x+y)/2.) < tolerance # This will copy the file, make a new one, and then print out possible lines G = LineFile(files=["/ssd/trigram-stats"], path="/ssd/subsampled-stimuli", header="w1 w2 w3 c123 c1 c2 c3 c12 c23 unigram bigram trigram") # Now throw out the porno words porno_vocabulary = [ l.strip() for l in open(BAD_WORD_FILE, "r") ] G.restrict_vocabulary("w1 w2 w3", porno_vocabulary, invert=True) # and then subsample G.subsample_lines(N=SUBSAMPLE_N) # and make sure we are sorted for the below G.sort("unigram bigram trigram", dtype=float) G.head() # just a peek item_number = 0 line_stack = [] for l in G.lines(tmp=False, parts=False): # extrac the columns from line unigram, bigram, trigram = G.extract_columns(l, keys="unigram bigram trigram", dtype=float) # now remove things which cannot possibly match anymore while len(line_stack) > 0 and not check_tolerance(unigram, G.extract_columns(line_stack[0], keys="unigram", dtype=float)[0]): del line_stack[0] # now go through the line_stack and try out each # it must already be within tolerance on unigram, or it would have been removed for x in line_stack: #print "Checking ", x