Python get_from_regexes 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: util

메소드/함수: get_from_regexes

hotexamples.com에서의 예제들: 2

Python get_from_regexes - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 util.get_from_regexes에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

파일: filter_tweets.py 프로젝트: kennyjoseph/iscram_14

	#Bad tweet
	if lowercase_content is None:
		continue

	#To make the regexes easier to write
	lowercase_content+="\n"

	#Only considering from the week after the disaster
	last_dt = date_time
	if date_time > time_to_break:
		break

	##Find all the terms using regexes
	ignore_int =  get_from_single_regex(ignore_regex,lowercase_content)
	ins_loc = get_from_regexes(loc_regex,lowercase_content)
	ins_ent = get_from_regexes(entities_regex,lowercase_content)
	ins_act = get_from_regexes(actions_regex,lowercase_content)

	##This is kind of ugly, I'm going to check each one twice, but its okay
	if len(ins_loc) or len(ins_ent) or len(ins_act) or len(ignore_int):
		#If we found it, insert the tweet into mongo
		found_tweets +=1
		tweet_json['_id'] = i
		collection.insert(tweet_json)

		##We'll use this for results...write out which terms were found to a simple csv
		for to_ig in ignore_int:
			write_out_tweet(out_fil,i,to_ig,"ignore")
		for z in ins_loc:
			write_out_tweet(out_fil,i,z,"location")

예제 #2

파일 보기

파일: train_on_old_tweets.py 프로젝트: kennyjoseph/iscram_14

	i+=1
	if found_tweets % 1000000 == 0 and len(ush_counter) >0:
		print last_dt
		for u,v in ush_counter.most_common():
			output_file.write(u + "," + str(v) + ","+ str(found_tweets) + "\n")
		output_file.close()
		ush_counter=Counter()
		output_file = codecs.open(tweet_out_fil+str(n_outfil)+".csv",
								  "w",encoding='utf-8')
		n_outfil+=1

	lowercase_content, time_in_minutes, date_time, tweet_json= get_tweet(line)
	if lowercase_content is None:
		continue

	last_dt = date_time
	if date_time > EARTHQUAKE_TWEET_TIME:
		break

	ins = get_from_regexes(regexes,lowercase_content)

	if len(ins) > 0:
		found_tweets +=1	
		for int_term in ins:
			ush_counter[int_term] +=1
found_tweets = str(found_tweets)


output_file.close()
print found_tweets
print i