Python BaseItemExporter 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: scrapy.contrib.exporter

클래스/타입: BaseItemExporter

hotexamples.com에서의 예제들: 6

Python BaseItemExporter - 6개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 scrapy.contrib.exporter.BaseItemExporter에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

자주 사용되는 메소드들

보기 숨기기

BaseItemExporter(1)

__init__(1)

_get_serialized_fields(1)

finish_exporting(1)

start_exporting(1)

예제 #1

파일 보기

파일: pipelines.py 프로젝트: lfborjas/Magritte

class DmozPipeline(object):
	def __init__(self):
		self.exporter = BaseItemExporter()
	
	def process_item(self, domain, item):
	        """Store the item, serialized as json, in a file within a directory hierarchy corresponding to it's
		   place in the ontology
	        """
        	#the path is of the form: cat/subcat/leaf
	        filedir = os.path.join(DATA_PATH,'Top',item['category'])	
		#Store the contents in files apart (to optimize the json loadings)
		pdfdir = os.path.join(DATA_PATH,'PDF')	 
		htmldir = os.path.join(DATA_PATH,'HTML')	 
		if not os.path.isdir(pdfdir):
			os.makedirs(pdfdir)
		if not os.path.isdir(htmldir):
			os.makedirs(htmldir)
		#replace evil characters for an underscore 
		#cf. http://www.linfo.org/file_name.html
		rawname = re.sub('[ /.$%]+','_',item['name'])
		filename = os.path.join(filedir, rawname)
		#truncate the filename if it exceeds the permitted maximum...
		filename = filename if len(filename) <= MAX_FILENAME_LENGTH else filename[:MAX_FILENAME_LENGTH]
		if item['type'] == 'pdf':
			#content_str = "%s.pdf" % filename.replace(filedir, pdfdir)
			content_str = os.path.join(pdfdir, rawname+".pdf")
			temp = open(content_str, 'wb')
		else:
			#content_str = "%s" % filename.replace(filedir, htmldir)
			content_str = os.path.join(htmldir, rawname)
			temp = codecs.open(content_str, 'w', 'utf-8')
		temp.write(item['content'])
		temp.close()	
		item['content'] = content_str.replace(HOME, '$HOME')
			
		if not os.path.isdir(filedir):
			os.makedirs(filedir)
		file = open(filename, 'w')
		itemdict = dict(self.exporter._get_serialized_fields(item))
		json.dump(itemdict, file)
		file.close()
		return item

예제 #2

파일 보기

파일: test_contrib_exporter.py 프로젝트: reprior123/TraderSoftwareRP

 def _get_exporter(self, **kwargs):
     return BaseItemExporter(**kwargs)

예제 #3

파일 보기

파일: exporters.py 프로젝트: Gopfu/musiccrawler

 def finish_exporting(self):
     BaseItemExporter.finish_exporting(self)

예제 #4

파일 보기

파일: exporters.py 프로젝트: Gopfu/musiccrawler

 def start_exporting(self):
     BaseItemExporter.start_exporting(self)

예제 #5

파일 보기

파일: pipelines.py 프로젝트: lfborjas/Magritte

	def __init__(self):
		self.exporter = BaseItemExporter()

예제 #6

-1

파일 보기

파일: pipelines.py 프로젝트: emsrc/idiscape

 def __init__(self, file, **kwargs):
     BaseItemExporter.__init__(self, **kwargs)
     self.file = file