Python tokenize 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: textProcess

메소드/함수: tokenize

hotexamples.com에서의 예제들: 4

Python tokenize - 4개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 textProcess.tokenize에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

파일: parse_words.py 프로젝트: vatika/Search-Engine

	def get_body(self, content):
		content = re.sub("<ref.*?</ref>", ' ', content)
		content = re.sub("i.e", '', content)
		content = re.sub("\.", ' ', content)
		content = re.sub('[^a-zA-Z0-9 ]', '', content)
		content = re.sub(' +', ' ', content)

		return remove_stop_words(stem_tokens(tokenize(content.lower())))

예제 #2

파일 보기

파일: parse_words.py 프로젝트: vatika/Search-Engine

	def extract_external_links(self, content):
		lines=content.split("\n")
		for i in xrange(len(lines)):
			if '* [' in lines[i] or '*[' in lines[i]:
				word = ""
				temp = lines[i].split(' ')
				word=[key for key in temp if 'http' not in temp]
				try:
					word=' '.join(word).encode('utf-8')
					self.article.token['external_links'].extend(remove_stop_words(stem_tokens(tokenize(word))))
				except:
					pass

예제 #3

파일 보기

파일: parse_words.py 프로젝트: vatika/Search-Engine

	def get_tokens(self, content, title):
		self.article.token['title'] = tokenize(title)
		self.article.token['headings'] = tokenize(self.get_headings())
		self.article.token['References'] = tokenize(self.get_references(content))
		self.article.token['text'] = self.get_body(self.article.content)

예제 #4

파일 보기

파일: parse_words.py 프로젝트: vatika/Search-Engine

def processTitle(title):
	"""
	Title is converted to lower case, tokenized and stemmed
	"""
	return stem_tokens(tokenize(data.lower()), stemmer)