Ejemplos de LineFile.head en Python

Lenguaje de programación: Python

Namespace/Package Name: ngrampy

Clase / Tipo: LineFile

Método / Función: head

Ejemplos en hotexamples.com: 3

Python LineFile.head - 3 ejemplos encontrados. Estos son los ejemplos en Python del mundo real mejor valorados de ngrampy.LineFile.head extraídos de proyectos de código abierto. Puedes valorar ejemplos para ayudarnos a mejorar la calidad de los ejemplos.

Métodos usados con frecuencia

Mostrar Ocultar

sort(4)

LineFile(3)

clean(2)

copy(2)

extract_columns(2)

head(2)

restrict_vocabulary(2)

resum_equal(2)

delete_tmp(1)

lines(1)

make_marginal_column(1)

merge(1)

print_average_surprisal(1)

subsample_lines(1)

Ejemplo n.º 1

Mostrar archivo

	return abs(x-y) / ((x+y)/2.) < tolerance

# This will copy the file, make a new one, and then print out possible lines
G = LineFile(files=["/ssd/trigram-stats"], path="/ssd/subsampled-stimuli", header="w1 w2 w3 c123 c1 c2 c3 c12 c23 unigram bigram trigram")

# Now throw out the porno words
#porno_vocabulary = [ l.strip() for l in open(BAD_WORD_FILE, "r") ]
#G.restrict_vocabulary("w1 w2 w3", porno_vocabulary, invert=True)

# draw a subsample
#if SUBSAMPLE_N is not None:
	#G.subsample_lines(N=SUBSAMPLE_N)

# we need to resort  this so that we can have w1 and w3 equal and then all the n-grams matched
G.sort("w1 w3 unigram bigram trigram", lines=1000000)
G.head()

item_number = 0
line_stack = []
for l in G.lines(tmp=False, parts=False):
	# extract the columns from line
	w1, w3, unigram, bigram, trigram =  G.extract_columns(l, keys="w1 w3 unigram bigram trigram", dtype=[str, str, float, float, float])
	
	# now remove things which cannot possibly match anymore
	while len(line_stack) > 0:
		w1_, w3_, unigram_, bigram_, trigram =  G.extract_columns(line_stack[0], keys="w1 w3 unigram bigram trigram", dtype=[str, str, float, float, float])
		
		if not (w1_ == w1 and w3_ == w3 and check_tolerance(unigram, unigram_)):
			del line_stack[0]
			
	# now go through the line_stack and try out each

Ejemplo n.º 2

Mostrar archivo

Archivo: find_matched_items.py Proyecto: zhouxiaojue/ngrampy

# This will copy the file, make a new one, and then print out possible lines
G = LineFile(files=["/ssd/trigram-stats"],
             path="/ssd/subsampled-stimuli",
             header="w1 w2 w3 c123 c1 c2 c3 c12 c23 unigram bigram trigram")

# Now throw out the porno words
porno_vocabulary = [l.strip() for l in open(BAD_WORD_FILE, "r")]
G.restrict_vocabulary("w1 w2 w3", porno_vocabulary, invert=True)

# and then subsample
G.subsample_lines(N=SUBSAMPLE_N)

# and make sure we are sorted for the below
G.sort("unigram bigram trigram", dtype=float)
G.head()  # just a peek

item_number = 0
line_stack = []
for l in G.lines(tmp=False, parts=False):
    # extrac the columns from line
    unigram, bigram, trigram = G.extract_columns(l,
                                                 keys="unigram bigram trigram",
                                                 dtype=float)

    # now remove things which cannot possibly match anymore
    while len(line_stack) > 0 and not check_tolerance(
            unigram,
            G.extract_columns(line_stack[0], keys="unigram", dtype=float)[0]):
        del line_stack[0]

Ejemplo n.º 3

Mostrar archivo

Archivo: find_matched_items.py Proyecto: Akibalogh/ngrampy

	"""
	return abs(x-y) / ((x+y)/2.) < tolerance

# This will copy the file, make a new one, and then print out possible lines
G = LineFile(files=["/ssd/trigram-stats"], path="/ssd/subsampled-stimuli", header="w1 w2 w3 c123 c1 c2 c3 c12 c23 unigram bigram trigram")

# Now throw out the porno words
porno_vocabulary = [ l.strip() for l in open(BAD_WORD_FILE, "r") ]
G.restrict_vocabulary("w1 w2 w3", porno_vocabulary, invert=True)

# and then subsample
G.subsample_lines(N=SUBSAMPLE_N)

# and make sure we are sorted for the below
G.sort("unigram bigram trigram", dtype=float)
G.head() # just a peek

item_number = 0
line_stack = []
for l in G.lines(tmp=False, parts=False):
	# extrac the columns from line
	unigram, bigram, trigram =  G.extract_columns(l, keys="unigram bigram trigram", dtype=float)
	
	# now remove things which cannot possibly match anymore
	while len(line_stack) > 0 and not check_tolerance(unigram, G.extract_columns(line_stack[0], keys="unigram", dtype=float)[0]):
		del line_stack[0]
	
	# now go through the line_stack and try out each 
	# it must already be within tolerance on unigram, or it would have been removed
	for x in line_stack:
		#print "Checking ", x