def __init__(self): import __main__ as main import sys import traceback self._filename = traceback.extract_stack()[-2][0] #~ self._filename = main.__file__ #~ if self._filename == main.__file__: self._visitor = mdebug.visitor() with open(self._filename) as f: tree = ast.parse(f.read(), self._filename, 'exec') self._visitor.visit(tree) if self._visitor.funcname != None: is_main = False if self._filename == main.__file__: is_main = True print 'mdebug: debugging {0}'.format(self._filename) if self._visitor.options: dbg_dbg = self._visitor.options[0] else: dbg_dbg = False self._transformer = mdebug.transformer(self._visitor.funcname, is_main, dbg_dbg) self._transformer.visit(tree) if self._filename == main.__file__: if dbg_dbg: print astor.misc.dump(tree) print codegen.to_source(tree) print '' cobj = compile(tree, self._filename, 'exec') exec(cobj) if self._filename == main.__file__: sys.exit()
def make_debug_node(self, call): expr_string = re.sub(r'\A[(](.*?)[)]\Z', r'\1', codegen.to_source(call.args[0])) debug_string = 'mdebug: {0} evaluates to {1}' #~ evalnode = ast.Call( #~ func = ast.Name(id = 'eval', ctx = ast.Load()), #~ args = [ast.Str(s = expr_string)], #~ keywords = [], #~ starargs = None, #~ kwargs = None #~ ) formatnode = ast.Call( func = ast.Attribute( value = ast.Str(s = debug_string), attr = 'format', ctx = ast.Load()), args = [ast.Str(s = expr_string), call.args[0]], keywords = [], starargs = None, kwargs = None ) printnode = ast.Print( dest = None, values = [formatnode], nl = True ) return printnode
def create_spider_srcfile(self, import_ast, spider_classdef_ast, spider_classbody): ast = self.build_ast(import_ast, spider_classdef_ast, spider_classbody) source = codegen.to_source(ast) fpath = "%s/eol_spider/spiders/%s.py" % (project_path, self.prop['name']) f = open(fpath, 'w') f.write(source) f.close()
def ast_to_source(node: ast.AST, old_source: str = None, file: str = None) -> str: """ Generate code for node object """ if node and not isinstance(node, ast.AST): raise TypeError('Unexpected type for node: {}'.format(str(type(node)))) if old_source and not isinstance(old_source, str): raise TypeError('Unexpected type for old_src: {}'.format(str(type(old_source)))) return to_source(node) or old_source
def remove_flush_opt(obj): code = inspect.getsource(obj) cnode = ast.parse(code) rf = RemoveFlush(cnode) cnode = rf.visit(cnode) s = to_source(cnode, add_line_information=False) l = {} g = {} eval(compile(ast.parse(s), '<string>', 'exec'), l, g) g['__source'] = s objs = rf.get_all() objs.append(g) return objs
def go_ahead(): ast_file = "out/inline.ast" do_exec = False arg = None if len(sys.argv) > 1: arg = sys.argv[1] if arg and ".ast" in arg: ast_file = arg elif arg: if arg.endswith(".py"): expr = open(arg).read() do_exec = True my_ast = ast.parse(arg) print(arg) print("") x = ast.dump(my_ast, annotate_fields=True, include_attributes=True) print(x) print("\n--------MEDIUM-------- (ok)\n") x = ast.dump(my_ast, annotate_fields=True, include_attributes=False) print(x) print("\n--------SHORT-------- (view-only)\n") x = ast.dump(my_ast, annotate_fields=False, include_attributes=False) print(x) print("") if do_exec: exec(expr) exit(0) try: inline_ast = "set in execfile:" exec(open(ast_file).read()) try: source = codegen.to_source(inline_ast) print(source) # => CODE except Exception as ex: print(ex) my_ast = ast.fix_missing_locations(inline_ast) code = compile(inline_ast, "out/inline", 'exec') # if do_exec: exec(code) except Exception as e: raise
def visit_Call(self, node): # from ipdb import set_trace; set_trace() # print to_source(node, add_line_information=False) if not hasattr(node, 'func'): return node if not hasattr(node.func, 'attr'): return node if not node.func.attr == 'flush': return node l = {} g = {} s = to_source(self.__root, add_line_information=False) eval(compile(ast.parse(s), '<string>', 'exec'), l, g) g['__source'] = s self.__objs.append(g)
def import_kast_to_python(kast_file,py_file_name="test/output.py"): print("import_kast_to_python "+kast_file) my_ast= ast_import.parse_file(kast_file) import ast_export # ast_writer.dump_xml(my_ast) # import codegen from astor import codegen source = codegen.to_source(my_ast) source=source.replace(".new(","(") #hack source=source.replace(".+(","+(") #hack source=source.replace(".!=","!=") #hack source=source.replace(".!()","==False") #hack source=source.replace("!(","(") #hack source=source.replace("?(","(") #hack source=source.replace("from ParserTestHelper import *","") #hack print(source) if py_file_name: py_file=open(py_file_name,"w") py_file.write(source) py_file.close() my_ast=ast.fix_missing_locations(my_ast) # x=ast.dump(my_ast, annotate_fields=False, include_attributes=False) # print("\n".join(x.split("),"))) # code=compile(my_ast, 'file', 'exec') # ast_import.emit_pyc(code) # exec(code) # print(exec(code))#, glob, loc) # code=compile(my_ast, kast_file, 'exec')#flags=None, dont_inherit=None # TypeError: required field 'lineno' missing from stmt # no, what you actually mean is "tuple is not a statement" LOL WTF ;) # exec(code) return my_ast
def import_kast_to_python(kast_file, py_file_name="test/output.py"): print("import_kast_to_python " + kast_file) my_ast = ast_import.parse_file(kast_file) import ast_export # ast_writer.dump_xml(my_ast) # import codegen from astor import codegen source = codegen.to_source(my_ast) source = source.replace(".new(", "(") #hack source = source.replace(".+(", "+(") #hack source = source.replace(".!=", "!=") #hack source = source.replace(".!()", "==False") #hack source = source.replace("!(", "(") #hack source = source.replace("?(", "(") #hack source = source.replace("from ParserTestHelper import *", "") #hack print(source) if py_file_name: py_file = open(py_file_name, "w") py_file.write(source) py_file.close() my_ast = ast.fix_missing_locations(my_ast) # x=ast.dump(my_ast, annotate_fields=False, include_attributes=False) # print("\n".join(x.split("),"))) # code=compile(my_ast, 'file', 'exec') # ast_import.emit_pyc(code) # exec(code) # print(exec(code))#, glob, loc) # code=compile(my_ast, kast_file, 'exec')#flags=None, dont_inherit=None # TypeError: required field 'lineno' missing from stmt # no, what you actually mean is "tuple is not a statement" LOL WTF ;) # exec(code) return my_ast
def main(): HEADER = """#!/usr/bin/env python\n""" merged_node = MergeRelativeModules().get_merged("__init__") open("cm_boot.py", "w").write(HEADER + to_source(merged_node))
def get_eval_node(self, call): e = codegen.to_source(call) e = e.strip() return Call(func=Name(id='eval', ctx=Load()), args=[Str(s=e)], keywords=[])
# quit(1) print(expr) print("") x = ast.dump(my_ast, annotate_fields=True, include_attributes=True) print(x) print("\n--------MEDIUM-------- (ok)\n") x = ast.dump(my_ast, annotate_fields=True, include_attributes=False) print(x) print("\n--------SHORT-------- (view-only)\n") x = ast.dump(my_ast, annotate_fields=False, include_attributes=False) print(x) print("") if do_exec: exec(expr) exit(0) try: inline_ast = "set in execfile:" if sys.version < '3':execfile(ast_file) else: exec(open(ast_file).read()) try: source = codegen.to_source(inline_ast) print(source) # => CODE except Exception as ex: print(ex) my_ast = ast.fix_missing_locations(inline_ast) code = compile(inline_ast, "out/inline", 'exec') # if do_exec: exec(code) except Exception as e: raise
nl=True, lineno=1, col_offset=20 ) ], orelse=[], lineno=1, col_offset=0 ) ]) my_ast=file_ast # my_ast=Module([Assign([Attribute(Name('self', Load()), 'x', Store())], Num(1))]) # my_ast=Module([Assign([Name('self.x', Store())], Num(1))]) # DANGER!! syntaktisch korrekt aber semantisch sich nicht!! # my_ast=Module([Assign([Name('self.x', Store())], Num(1)),Print(None, [Name('self.x', Load())], True)]) # my_ast=Module(body=[Assign(targets=[Attribute(value=Name(id='self', ctx=Load(), lineno=1, col_offset=0), attr='x', ctx=Store(), lineno=1, col_offset=0)], value=Num(n=1, lineno=1, col_offset=7), lineno=1, col_offset=0)]) ast_export.XmlExportVisitor().visit(my_ast) # => XML source=codegen.to_source(my_ast) print(source) # => CODE my_ast=ast.fix_missing_locations(my_ast) code=compile(my_ast, 'file', 'exec') # ast_reader.emit_pyc(code) print("GO!") exec(code) result=eval(code) print(result) # print (self.x)
def generate(self): if self.prop['type'] == "parse_list_detail": file_object = open( '/Users/user/juanpi_workspace/eol_spider/eol_spider/spiders/StanfordSpider.py' ) try: source = file_object.read() finally: file_object.close() p = ast.parse(source=source) print ast.dump(p) source = codegen.to_source(p) # print source # ImportFrom(module='eol_spider.datafilter', names=[alias(name='DataFilter', asname=None)], level=0) # im_1 = ImportFrom() # im_1.module = "eol_spider.items" # im_1.level = 0 # im_1_alias_1 = alias() # im_1_alias_1.name = "1" # im_1_alias_1.asname = None # im_1.names = [im_1_alias_1] # # im_2 = ImportFrom(module='eol_spider.datafilter', names=[alias(name='DataFilter', asname=None)], level=0) # # im_2.module = "eol_spider.datafilter" # # im_2.level = 0 # # im_2_alias_2 = alias(name="DataFilter", asname=None) # # # im_2_alias_2 = alias() # # # im_2_alias_2.name = "DataFilter" # # # im_2_alias_2.asname = None # # im_2.names = [im_2_alias_2] # # module = Module() # module.body = [im_1, im_2] # module = Module(body=[ImportFrom(module='eol_spider.items', names=[alias(name='CandidateBasicItem', asname=None), alias(name='CandidateCoursesItem', asname=None), alias(name='CandidateEducationItem', asname=None), alias(name='CandidatePublicationsItem', asname=None), alias(name='CandidateResearchItem', asname=None), alias(name='CandidateWorkexperienceItem', asname=None)], level=0), ImportFrom(module='eol_spider.datafilter', names=[alias(name='DataFilter', asname=None)], level=0), ImportFrom(module='eol_spider.mysql_utils', names=[alias(name='MYSQLUtils', asname=None)], level=0), ImportFrom(module='eol_spider.settings', names=[alias(name='mysql_connection', asname=None), alias(name='surname_list', asname=None)], level=0), ImportFrom(module='eol_spider.func', names=[alias(name='mysql_datetime', asname=None), alias(name='get_chinese_by_fullname', asname=None)], level=0), ImportFrom(module='scrapy.spiders', names=[alias(name='CrawlSpider', asname=None)], level=0), ImportFrom(module='scrapy', names=[alias(name='Request', asname=None)], level=0), ClassDef(name='StanfordSpider', bases=[Name(id='CrawlSpider', ctx=Load())], body=[Assign(targets=[Name(id='name', ctx=Store())], value=Str(s='StanfordSpider')), Assign(targets=[Name(id='college_name', ctx=Store())], value=Str(s='Stanford')), Assign(targets=[Name(id='college_id', ctx=Store())], value=Str(s='1')), Assign(targets=[Name(id='country_id', ctx=Store())], value=Str(s='1')), Assign(targets=[Name(id='state_id', ctx=Store())], value=Str(s='1')), Assign(targets=[Name(id='city_id', ctx=Store())], value=Str(s='1')), Assign(targets=[Name(id='allowed_domains', ctx=Store())], value=List(elts=[Str(s='stanford.edu')], ctx=Load())), Assign(targets=[Name(id='domain', ctx=Store())], value=Str(s='https://ed.stanford.edu')), Assign(targets=[Name(id='start_urls', ctx=Store())], value=List(elts=[Str(s='https://ed.stanford.edu/faculty/profiles')], ctx=Load())), FunctionDef(name='parse', args=arguments(args=[Name(id='self', ctx=Param()), Name(id='response', ctx=Param())], vararg=None, kwarg=None, defaults=[]), body=[For(target=Name(id='url', ctx=Store()), iter=Call(func=Attribute(value=Call(func=Attribute(value=Name(id='response', ctx=Load()), attr='xpath', ctx=Load()), args=[Str(s='//div[contains(@class, "views-row")]/descendant::div[contains(@class, "name")]/descendant::a/@href')], keywords=[], starargs=None, kwargs=None), attr='extract', ctx=Load()), args=[], keywords=[], starargs=None, kwargs=None), body=[Assign(targets=[Name(id='url', ctx=Store())], value=BinOp(left=Attribute(value=Name(id='self', ctx=Load()), attr='domain', ctx=Load()), op=Add(), right=Name(id='url', ctx=Load()))), Expr(value=Yield(value=Call(func=Name(id='Request', ctx=Load()), args=[Name(id='url', ctx=Load())], keywords=[keyword(arg='callback', value=Attribute(value=Name(id='self', ctx=Load()), attr='parse_item', ctx=Load()))], starargs=None, kwargs=None)))], orelse=[])], decorator_list=[]), FunctionDef(name='parse_item', args=arguments(args=[Name(id='self', ctx=Param()), Name(id='response', ctx=Param())], vararg=None, kwarg=None, defaults=[]), body=[Assign(targets=[Name(id='cb_item', ctx=Store())], value=Call(func=Attribute(value=Name(id='self', ctx=Load()), attr='parse_candidate_basic_item', ctx=Load()), args=[Name(id='response', ctx=Load())], keywords=[], starargs=None, kwargs=None)), Assign(targets=[Name(id='cb_id', ctx=Store())], value=Subscript(value=Call(func=Attribute(value=Name(id='MYSQLUtils', ctx=Load()), attr='save', ctx=Load()), args=[Name(id='self', ctx=Load()), Str(s='candidate_basic'), Name(id='cb_item', ctx=Load())], keywords=[], starargs=None, kwargs=None), slice=Index(value=Num(n=0)), ctx=Load())), Assign(targets=[Name(id='ce_items', ctx=Store())], value=Call(func=Attribute(value=Name(id='self', ctx=Load()), attr='parse_candidate_education_item', ctx=Load()), args=[Name(id='response', ctx=Load()), Name(id='cb_id', ctx=Load())], keywords=[], starargs=None, kwargs=None)), Expr(value=Call(func=Attribute(value=Name(id='MYSQLUtils', ctx=Load()), attr='save', ctx=Load()), args=[Name(id='self', ctx=Load()), Str(s='candidate_education'), Name(id='ce_items', ctx=Load())], keywords=[], starargs=None, kwargs=None)), Assign(targets=[Name(id='cr_items', ctx=Store())], value=Call(func=Attribute(value=Name(id='self', ctx=Load()), attr='parse_candidate_research_item', ctx=Load()), args=[Name(id='response', ctx=Load()), Name(id='cb_id', ctx=Load())], keywords=[], starargs=None, kwargs=None)), Expr(value=Call(func=Attribute(value=Name(id='MYSQLUtils', ctx=Load()), attr='save', ctx=Load()), args=[Name(id='self', ctx=Load()), Str(s='candidate_research'), Name(id='cr_items', ctx=Load())], keywords=[], starargs=None, kwargs=None)), Assign(targets=[Name(id='cp_items', ctx=Store())], value=Call(func=Attribute(value=Name(id='self', ctx=Load()), attr='parse_candidate_publications_item', ctx=Load()), args=[Name(id='response', ctx=Load()), Name(id='cb_id', ctx=Load())], keywords=[], starargs=None, kwargs=None)), Expr(value=Call(func=Attribute(value=Name(id='MYSQLUtils', ctx=Load()), attr='save', ctx=Load()), args=[Name(id='self', ctx=Load()), Str(s='candidate_publications'), Name(id='cp_items', ctx=Load())], keywords=[], starargs=None, kwargs=None)), Assign(targets=[Name(id='cc_items', ctx=Store())], value=Call(func=Attribute(value=Name(id='self', ctx=Load()), attr='parse_candidate_courses_item', ctx=Load()), args=[Name(id='response', ctx=Load()), Name(id='cb_id', ctx=Load())], keywords=[], starargs=None, kwargs=None)), Expr(value=Call(func=Attribute(value=Name(id='MYSQLUtils', ctx=Load()), attr='save', ctx=Load()), args=[Name(id='self', ctx=Load()), Str(s='candidate_courses'), Name(id='cc_items', ctx=Load())], keywords=[], starargs=None, kwargs=None)), Assign(targets=[Name(id='cw_items', ctx=Store())], value=Call(func=Attribute(value=Name(id='self', ctx=Load()), attr='parse_candidate_workexperience_item', ctx=Load()), args=[Name(id='response', ctx=Load()), Name(id='cb_id', ctx=Load())], keywords=[], starargs=None, kwargs=None)), Expr(value=Call(func=Attribute(value=Name(id='MYSQLUtils', ctx=Load()), attr='save', ctx=Load()), args=[Name(id='self', ctx=Load()), Str(s='candidate_workexperience'), Name(id='cw_items', ctx=Load())], keywords=[], starargs=None, kwargs=None))], decorator_list=[]), FunctionDef(name='parse_candidate_basic_item', args=arguments(args=[Name(id='self', ctx=Param()), Name(id='response', ctx=Param())], vararg=None, kwarg=None, defaults=[]), body=[Assign(targets=[Name(id='item', ctx=Store())], value=Call(func=Name(id='CandidateBasicItem', ctx=Load()), args=[], keywords=[], starargs=None, kwargs=None)), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='country_id')), ctx=Store())], value=Attribute(value=Name(id='self', ctx=Load()), attr='country_id', ctx=Load())), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='college_id')), ctx=Store())], value=Attribute(value=Name(id='self', ctx=Load()), attr='college_id', ctx=Load())), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='discipline_id')), ctx=Store())], value=Str(s='0')), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='fullname')), ctx=Store())], value=Call(func=Attribute(value=Name(id='DataFilter', ctx=Load()), attr='simple_format', ctx=Load()), args=[Call(func=Attribute(value=Call(func=Attribute(value=Name(id='response', ctx=Load()), attr='xpath', ctx=Load()), args=[Str(s='//h1[@id="page-title"]/text()[normalize-space(.)]')], keywords=[], starargs=None, kwargs=None), attr='extract', ctx=Load()), args=[], keywords=[], starargs=None, kwargs=None)], keywords=[], starargs=None, kwargs=None)), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='academic_title')), ctx=Store())], value=Call(func=Attribute(value=Name(id='DataFilter', ctx=Load()), attr='simple_format', ctx=Load()), args=[Call(func=Attribute(value=Call(func=Attribute(value=Name(id='response', ctx=Load()), attr='xpath', ctx=Load()), args=[Str(s='//div[contains(@class, "field-label") and contains(text(), "Academic Title")]/following-sibling::*')], keywords=[], starargs=None, kwargs=None), attr='extract', ctx=Load()), args=[], keywords=[], starargs=None, kwargs=None)], keywords=[], starargs=None, kwargs=None)), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='other_title')), ctx=Store())], value=Call(func=Attribute(value=Name(id='DataFilter', ctx=Load()), attr='simple_format', ctx=Load()), args=[Call(func=Attribute(value=Call(func=Attribute(value=Name(id='response', ctx=Load()), attr='xpath', ctx=Load()), args=[Str(s='//div[contains(@class, "field-label") and contains(text(), "Other Titles")]/following-sibling::*')], keywords=[], starargs=None, kwargs=None), attr='extract', ctx=Load()), args=[], keywords=[], starargs=None, kwargs=None)], keywords=[], starargs=None, kwargs=None)), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='nationality')), ctx=Store())], value=Call(func=Name(id='get_chinese_by_fullname', ctx=Load()), args=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='fullname')), ctx=Load()), Name(id='surname_list', ctx=Load())], keywords=[], starargs=None, kwargs=None)), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='email')), ctx=Store())], value=Call(func=Attribute(value=Name(id='DataFilter', ctx=Load()), attr='simple_format', ctx=Load()), args=[Call(func=Attribute(value=Call(func=Attribute(value=Name(id='response', ctx=Load()), attr='xpath', ctx=Load()), args=[Str(s='//a[contains(@href, "mailto:")]/text()[normalize-space(.)]')], keywords=[], starargs=None, kwargs=None), attr='extract', ctx=Load()), args=[], keywords=[], starargs=None, kwargs=None)], keywords=[], starargs=None, kwargs=None)), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='phonenumber')), ctx=Store())], value=Call(func=Attribute(value=Name(id='DataFilter', ctx=Load()), attr='simple_format', ctx=Load()), args=[Call(func=Attribute(value=Call(func=Attribute(value=Name(id='response', ctx=Load()), attr='xpath', ctx=Load()), args=[Str(s='//*[contains(@class, "fa-phone")]/parent::*/following-sibling::*')], keywords=[], starargs=None, kwargs=None), attr='extract', ctx=Load()), args=[], keywords=[], starargs=None, kwargs=None)], keywords=[], starargs=None, kwargs=None)), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='external_link')), ctx=Store())], value=Call(func=Attribute(value=Name(id='DataFilter', ctx=Load()), attr='simple_format', ctx=Load()), args=[Call(func=Attribute(value=Call(func=Attribute(value=Name(id='response', ctx=Load()), attr='xpath', ctx=Load()), args=[Str(s='//*[contains(@class, "fa-external-link")]/parent::*/following-sibling::*')], keywords=[], starargs=None, kwargs=None), attr='extract', ctx=Load()), args=[], keywords=[], starargs=None, kwargs=None)], keywords=[], starargs=None, kwargs=None)), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='experience')), ctx=Store())], value=Str(s='')), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='desc')), ctx=Store())], value=Str(s='')), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='avatar_url')), ctx=Store())], value=Call(func=Attribute(value=Name(id='DataFilter', ctx=Load()), attr='simple_format', ctx=Load()), args=[Call(func=Attribute(value=Call(func=Attribute(value=Name(id='response', ctx=Load()), attr='xpath', ctx=Load()), args=[Str(s='//div[contains(@class, "field-name-field-profile-photo")]/descendant::img/@src')], keywords=[], starargs=None, kwargs=None), attr='extract', ctx=Load()), args=[], keywords=[], starargs=None, kwargs=None)], keywords=[], starargs=None, kwargs=None)), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='create_time')), ctx=Store())], value=Call(func=Name(id='mysql_datetime', ctx=Load()), args=[], keywords=[], starargs=None, kwargs=None)), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='extra')), ctx=Store())], value=Str(s='')), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='url')), ctx=Store())], value=Attribute(value=Name(id='response', ctx=Load()), attr='url', ctx=Load())), Return(value=Name(id='item', ctx=Load())), Pass()], decorator_list=[]), FunctionDef(name='parse_candidate_education_item', args=arguments(args=[Name(id='self', ctx=Param()), Name(id='response', ctx=Param()), Name(id='cb_id', ctx=Param())], vararg=None, kwarg=None, defaults=[]), body=[Assign(targets=[Name(id='now_time', ctx=Store())], value=Call(func=Name(id='mysql_datetime', ctx=Load()), args=[], keywords=[], starargs=None, kwargs=None)), Assign(targets=[Name(id='items', ctx=Store())], value=List(elts=[], ctx=Load())), Assign(targets=[Name(id='edu_items', ctx=Store())], value=Call(func=Attribute(value=Name(id='response', ctx=Load()), attr='xpath', ctx=Load()), args=[Str(s='//*[@id="field-education"]/descendant::li')], keywords=[], starargs=None, kwargs=None)), For(target=Name(id='edu_item', ctx=Store()), iter=Name(id='edu_items', ctx=Load()), body=[Assign(targets=[Name(id='item', ctx=Store())], value=Call(func=Name(id='CandidateEducationItem', ctx=Load()), args=[], keywords=[], starargs=None, kwargs=None)), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='cb_id')), ctx=Store())], value=Name(id='cb_id', ctx=Load())), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='college')), ctx=Store())], value=Str(s='')), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='discipline')), ctx=Store())], value=Str(s='')), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='start_time')), ctx=Store())], value=Str(s='')), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='end_time')), ctx=Store())], value=Str(s='')), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='duration')), ctx=Store())], value=Str(s='')), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='degree')), ctx=Store())], value=Str(s='')), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='desc')), ctx=Store())], value=Call(func=Attribute(value=Name(id='DataFilter', ctx=Load()), attr='simple_format', ctx=Load()), args=[Call(func=Attribute(value=Call(func=Attribute(value=Name(id='edu_item', ctx=Load()), attr='xpath', ctx=Load()), args=[Str(s='./text()[normalize-space(.)]')], keywords=[], starargs=None, kwargs=None), attr='extract', ctx=Load()), args=[], keywords=[], starargs=None, kwargs=None)], keywords=[], starargs=None, kwargs=None)), If(test=UnaryOp(op=Not(), operand=Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='desc')), ctx=Load())), body=[Continue()], orelse=[]), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='create_time')), ctx=Store())], value=Name(id='now_time', ctx=Load())), Expr(value=Call(func=Attribute(value=Name(id='items', ctx=Load()), attr='append', ctx=Load()), args=[Name(id='item', ctx=Load())], keywords=[], starargs=None, kwargs=None))], orelse=[]), Return(value=Name(id='items', ctx=Load())), Pass()], decorator_list=[]), FunctionDef(name='parse_candidate_research_item', args=arguments(args=[Name(id='self', ctx=Param()), Name(id='response', ctx=Param()), Name(id='cb_id', ctx=Param())], vararg=None, kwarg=None, defaults=[]), body=[Assign(targets=[Name(id='now_time', ctx=Store())], value=Call(func=Name(id='mysql_datetime', ctx=Load()), args=[], keywords=[], starargs=None, kwargs=None)), Assign(targets=[Name(id='items', ctx=Store())], value=List(elts=[], ctx=Load())), Assign(targets=[Name(id='item', ctx=Store())], value=Call(func=Name(id='CandidateResearchItem', ctx=Load()), args=[], keywords=[], starargs=None, kwargs=None)), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='cb_id')), ctx=Store())], value=Name(id='cb_id', ctx=Load())), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='interests')), ctx=Store())], value=Call(func=Attribute(value=Name(id='DataFilter', ctx=Load()), attr='simple_format', ctx=Load()), args=[Call(func=Attribute(value=Call(func=Attribute(value=Name(id='response', ctx=Load()), attr='xpath', ctx=Load()), args=[Str(s='//*[@id="field-research-interests"]')], keywords=[], starargs=None, kwargs=None), attr='extract', ctx=Load()), args=[], keywords=[], starargs=None, kwargs=None)], keywords=[], starargs=None, kwargs=None)), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='current_research')), ctx=Store())], value=Call(func=Attribute(value=Name(id='DataFilter', ctx=Load()), attr='simple_format', ctx=Load()), args=[Call(func=Attribute(value=Call(func=Attribute(value=Name(id='response', ctx=Load()), attr='xpath', ctx=Load()), args=[Str(s='//*[@id="field-current-research"]')], keywords=[], starargs=None, kwargs=None), attr='extract', ctx=Load()), args=[], keywords=[], starargs=None, kwargs=None)], keywords=[], starargs=None, kwargs=None)), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='research_summary')), ctx=Store())], value=Call(func=Attribute(value=Name(id='DataFilter', ctx=Load()), attr='simple_format', ctx=Load()), args=[Call(func=Attribute(value=Call(func=Attribute(value=Name(id='response', ctx=Load()), attr='xpath', ctx=Load()), args=[Str(s='//*[@id="field-research-summary"]')], keywords=[], starargs=None, kwargs=None), attr='extract', ctx=Load()), args=[], keywords=[], starargs=None, kwargs=None)], keywords=[], starargs=None, kwargs=None)), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='create_time')), ctx=Store())], value=Name(id='now_time', ctx=Load())), Expr(value=Call(func=Attribute(value=Name(id='items', ctx=Load()), attr='append', ctx=Load()), args=[Name(id='item', ctx=Load())], keywords=[], starargs=None, kwargs=None)), Return(value=Name(id='items', ctx=Load())), Pass()], decorator_list=[]), FunctionDef(name='parse_candidate_publications_item', args=arguments(args=[Name(id='self', ctx=Param()), Name(id='response', ctx=Param()), Name(id='cb_id', ctx=Param())], vararg=None, kwarg=None, defaults=[]), body=[Assign(targets=[Name(id='now_time', ctx=Store())], value=Call(func=Name(id='mysql_datetime', ctx=Load()), args=[], keywords=[], starargs=None, kwargs=None)), Assign(targets=[Name(id='items', ctx=Store())], value=List(elts=[], ctx=Load())), Assign(targets=[Name(id='pub_items', ctx=Store())], value=Call(func=Attribute(value=Name(id='response', ctx=Load()), attr='xpath', ctx=Load()), args=[Str(s='//*[@id="field-recent-pubs"]/descendant::p')], keywords=[], starargs=None, kwargs=None)), For(target=Name(id='pub_item', ctx=Store()), iter=Name(id='pub_items', ctx=Load()), body=[Assign(targets=[Name(id='item', ctx=Store())], value=Call(func=Name(id='CandidatePublicationsItem', ctx=Load()), args=[], keywords=[], starargs=None, kwargs=None)), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='cb_id')), ctx=Store())], value=Name(id='cb_id', ctx=Load())), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='publications')), ctx=Store())], value=Call(func=Attribute(value=Name(id='DataFilter', ctx=Load()), attr='simple_format', ctx=Load()), args=[Call(func=Attribute(value=Call(func=Attribute(value=Name(id='pub_item', ctx=Load()), attr='xpath', ctx=Load()), args=[Str(s='./text()[normalize-space(.)]')], keywords=[], starargs=None, kwargs=None), attr='extract', ctx=Load()), args=[], keywords=[], starargs=None, kwargs=None)], keywords=[], starargs=None, kwargs=None)), If(test=UnaryOp(op=Not(), operand=Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='publications')), ctx=Load())), body=[Continue()], orelse=[]), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='create_time')), ctx=Store())], value=Name(id='now_time', ctx=Load())), Expr(value=Call(func=Attribute(value=Name(id='items', ctx=Load()), attr='append', ctx=Load()), args=[Name(id='item', ctx=Load())], keywords=[], starargs=None, kwargs=None))], orelse=[]), Return(value=Name(id='items', ctx=Load())), Pass()], decorator_list=[]), FunctionDef(name='parse_candidate_courses_item', args=arguments(args=[Name(id='self', ctx=Param()), Name(id='response', ctx=Param()), Name(id='cb_id', ctx=Param())], vararg=None, kwarg=None, defaults=[]), body=[Assign(targets=[Name(id='now_time', ctx=Store())], value=Call(func=Name(id='mysql_datetime', ctx=Load()), args=[], keywords=[], starargs=None, kwargs=None)), Assign(targets=[Name(id='items', ctx=Store())], value=List(elts=[], ctx=Load())), Assign(targets=[Name(id='course_items', ctx=Store())], value=Call(func=Attribute(value=Name(id='response', ctx=Load()), attr='xpath', ctx=Load()), args=[Str(s='//*[@id="field-courses-taught"]/descendant::li')], keywords=[], starargs=None, kwargs=None)), For(target=Name(id='course_item', ctx=Store()), iter=Name(id='course_items', ctx=Load()), body=[Assign(targets=[Name(id='item', ctx=Store())], value=Call(func=Name(id='CandidateCoursesItem', ctx=Load()), args=[], keywords=[], starargs=None, kwargs=None)), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='cb_id')), ctx=Store())], value=Name(id='cb_id', ctx=Load())), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='courses_no')), ctx=Store())], value=Str(s='0')), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='courses_desc')), ctx=Store())], value=Call(func=Attribute(value=Name(id='DataFilter', ctx=Load()), attr='simple_format', ctx=Load()), args=[Call(func=Attribute(value=Call(func=Attribute(value=Name(id='course_item', ctx=Load()), attr='xpath', ctx=Load()), args=[Str(s='./text()[normalize-space(.)]')], keywords=[], starargs=None, kwargs=None), attr='extract', ctx=Load()), args=[], keywords=[], starargs=None, kwargs=None)], keywords=[], starargs=None, kwargs=None)), If(test=UnaryOp(op=Not(), operand=Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='courses_desc')), ctx=Load())), body=[Continue()], orelse=[]), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='create_time')), ctx=Store())], value=Name(id='now_time', ctx=Load())), Expr(value=Call(func=Attribute(value=Name(id='items', ctx=Load()), attr='append', ctx=Load()), args=[Name(id='item', ctx=Load())], keywords=[], starargs=None, kwargs=None))], orelse=[]), Return(value=Name(id='items', ctx=Load())), Pass(), Pass()], decorator_list=[]), FunctionDef(name='parse_candidate_workexperience_item', args=arguments(args=[Name(id='self', ctx=Param()), Name(id='response', ctx=Param()), Name(id='cb_id', ctx=Param())], vararg=None, kwarg=None, defaults=[]), body=[Assign(targets=[Name(id='now_time', ctx=Store())], value=Call(func=Name(id='mysql_datetime', ctx=Load()), args=[], keywords=[], starargs=None, kwargs=None)), Assign(targets=[Name(id='items', ctx=Store())], value=List(elts=[], ctx=Load())), Assign(targets=[Name(id='workexperience_items', ctx=Store())], value=Call(func=Attribute(value=Name(id='response', ctx=Load()), attr='xpath', ctx=Load()), args=[Str(s='//*[@id="field-professional-experience"]/descendant::p')], keywords=[], starargs=None, kwargs=None)), For(target=Name(id='workexperience_item', ctx=Store()), iter=Name(id='workexperience_items', ctx=Load()), body=[Assign(targets=[Name(id='item', ctx=Store())], value=Call(func=Name(id='CandidateWorkexperienceItem', ctx=Load()), args=[], keywords=[], starargs=None, kwargs=None)), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='cb_id')), ctx=Store())], value=Name(id='cb_id', ctx=Load())), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='job_title')), ctx=Store())], value=Str(s='')), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='company')), ctx=Store())], value=Str(s='')), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='start_time')), ctx=Store())], value=Str(s='')), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='end_time')), ctx=Store())], value=Str(s='')), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='duration')), ctx=Store())], value=Str(s='')), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='desc')), ctx=Store())], value=Call(func=Attribute(value=Name(id='DataFilter', ctx=Load()), attr='simple_format', ctx=Load()), args=[Call(func=Attribute(value=Call(func=Attribute(value=Name(id='workexperience_item', ctx=Load()), attr='xpath', ctx=Load()), args=[Str(s='./text()[normalize-space(.)]')], keywords=[], starargs=None, kwargs=None), attr='extract', ctx=Load()), args=[], keywords=[], starargs=None, kwargs=None)], keywords=[], starargs=None, kwargs=None)), If(test=UnaryOp(op=Not(), operand=Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='desc')), ctx=Load())), body=[Continue()], orelse=[]), Assign(targets=[Subscript(value=Name(id='item', ctx=Load()), slice=Index(value=Str(s='create_time')), ctx=Store())], value=Name(id='now_time', ctx=Load())), Expr(value=Call(func=Attribute(value=Name(id='items', ctx=Load()), attr='append', ctx=Load()), args=[Name(id='item', ctx=Load())], keywords=[], starargs=None, kwargs=None))], orelse=[]), Return(value=Name(id='items', ctx=Load())), Pass()], decorator_list=[]), FunctionDef(name='close', args=arguments(args=[Name(id='self', ctx=Param()), Name(id='reason', ctx=Param())], vararg=None, kwarg=None, defaults=[]), body=[Expr(value=Call(func=Attribute(value=Attribute(value=Name(id='self', ctx=Load()), attr='db', ctx=Load()), attr='close', ctx=Load()), args=[], keywords=[], starargs=None, kwargs=None)), Expr(value=Call(func=Attribute(value=Call(func=Name(id='super', ctx=Load()), args=[Name(id='StanfordSpider', ctx=Load()), Name(id='self', ctx=Load())], keywords=[], starargs=None, kwargs=None), attr='close', ctx=Load()), args=[Name(id='self', ctx=Load()), Name(id='reason', ctx=Load())], keywords=[], starargs=None, kwargs=None))], decorator_list=[]), FunctionDef(name='__init__', args=arguments(args=[Name(id='self', ctx=Param())], vararg=None, kwarg='kwargs', defaults=[]), body=[Assign(targets=[Attribute(value=Name(id='self', ctx=Load()), attr='db', ctx=Store())], value=Name(id='mysql_connection', ctx=Load())), Expr(value=Call(func=Attribute(value=Name(id='MYSQLUtils', ctx=Load()), attr='cleanup_data', ctx=Load()), args=[Name(id='self', ctx=Load())], keywords=[], starargs=None, kwargs=None)), Expr(value=Call(func=Attribute(value=Call(func=Name(id='super', ctx=Load()), args=[Name(id='StanfordSpider', ctx=Load()), Name(id='self', ctx=Load())], keywords=[], starargs=None, kwargs=None), attr='__init__', ctx=Load()), args=[], keywords=[], starargs=None, kwargs=Name(id='kwargs', ctx=Load()))), Pass()], decorator_list=[])], decorator_list=[])]) # 将代码切分成几个部分 # 1.Import部分 import_block = [ ImportFrom(module='eol_spider.items', names=[ alias(name='CandidateBasicItem', asname=None), alias(name='CandidateCoursesItem', asname=None), alias(name='CandidateEducationItem', asname=None), alias(name='CandidatePublicationsItem', asname=None), alias(name='CandidateResearchItem', asname=None), alias(name='CandidateWorkexperienceItem', asname=None) ], level=0), ImportFrom(module='eol_spider.datafilter', names=[alias(name='DataFilter', asname=None)], level=0), ImportFrom(module='eol_spider.mysql_utils', names=[alias(name='MYSQLUtils', asname=None)], level=0), ImportFrom(module='eol_spider.settings', names=[ alias(name='mysql_connection', asname=None), alias(name='surname_list', asname=None) ], level=0), ImportFrom(module='eol_spider.func', names=[ alias(name='mysql_datetime', asname=None), alias(name='get_chinese_by_fullname', asname=None) ], level=0), ImportFrom(module='scrapy.spiders', names=[alias(name='CrawlSpider', asname=None)], level=0), ImportFrom(module='scrapy', names=[alias(name='Request', asname=None)], level=0) ] # 2.class 声明部分 class_define = ClassDef(name='StanfordSpider', bases=[Name(id='CrawlSpider', ctx=Load())], decorator_list=[]) # 3.class 属性部分 class_attr = [ Assign(targets=[Name(id='name', ctx=Store())], value=Str(s='StanfordSpider')), Assign(targets=[Name(id='college_name', ctx=Store())], value=Str(s='Stanford')), Assign(targets=[Name(id='college_id', ctx=Store())], value=Str(s='1')), Assign(targets=[Name(id='country_id', ctx=Store())], value=Str(s='1')), Assign(targets=[Name(id='state_id', ctx=Store())], value=Str(s='1')), Assign(targets=[Name(id='city_id', ctx=Store())], value=Str(s='1')), Assign(targets=[Name(id='allowed_domains', ctx=Store())], value=List(elts=[Str(s='stanford.edu')], ctx=Load())), Assign(targets=[Name(id='domain', ctx=Store())], value=Str(s='https: //ed.stanford.edu')), Assign(targets=[Name(id='start_urls', ctx=Store())], value=List(elts=[ Str(s='https: //ed.stanford.edu/faculty/profiles') ], ctx=Load())) ] # 4.class 内部__init__和close方法 init_func = FunctionDef( name='__init__', args=arguments(args=[Name(id='self', ctx=Param())], vararg=None, kwarg='kwargs', defaults=[]), body=[ Assign(targets=[ Attribute(value=Name(id='self', ctx=Load()), attr='db', ctx=Store()) ], value=Name(id='mysql_connection', ctx=Load())), Expr(value=Call(func=Attribute(value=Name(id='MYSQLUtils', ctx=Load()), attr='cleanup_data', ctx=Load()), args=[Name(id='self', ctx=Load())], keywords=[], starargs=None, kwargs=None)), Expr(value=Call(func=Attribute(value=Call( func=Name(id='super', ctx=Load()), args=[ Name(id='StanfordSpider', ctx=Load()), Name(id='self', ctx=Load()) ], keywords=[], starargs=None, kwargs=None), attr='__init__', ctx=Load()), args=[], keywords=[], starargs=None, kwargs=Name(id='kwargs', ctx=Load()))), Pass() ], decorator_list=[]) close_func = FunctionDef( name='close', args=arguments(args=[ Name(id='self', ctx=Param()), Name(id='reason', ctx=Param()) ], vararg=None, kwarg=None, defaults=[]), body=[ Expr(value=Call(func=Attribute(value=Attribute(value=Name( id='self', ctx=Load()), attr='db', ctx=Load()), attr='close', ctx=Load()), args=[], keywords=[], starargs=None, kwargs=None)), Expr(value=Call(func=Attribute(value=Call( func=Name(id='super', ctx=Load()), args=[ Name(id='StanfordSpider', ctx=Load()), Name(id='self', ctx=Load()) ], keywords=[], starargs=None, kwargs=None), attr='close', ctx=Load()), args=[ Name(id='self', ctx=Load()), Name(id='reason', ctx=Load()) ], keywords=[], starargs=None, kwargs=None)) ], decorator_list=[]) # 5.class 内部parse方法 parse_func = FunctionDef( name='parse', args=arguments(args=[ Name(id='self', ctx=Param()), Name(id='response', ctx=Param()) ], vararg=None, kwarg=None, defaults=[]), body=[ For(target=Name(id='url', ctx=Store()), iter=Call(func=Attribute(value=Call( func=Attribute(value=Name(id='response', ctx=Load()), attr='xpath', ctx=Load()), args=[ Str(s='//div[contains(@class, "views-row")]/descendant: : div[contains(@class, "name")]/descendant: : a/@href' ) ], keywords=[], starargs=None, kwargs=None), attr='extract', ctx=Load()), args=[], keywords=[], starargs=None, kwargs=None), body=[ If(test=Compare(left=Subscript( value=Name(id='url', ctx=Load()), slice=Slice(lower=None, upper=Num(n=1), step=None), ctx=Load()), ops=[Eq()], comparators=[Str(s='/')]), body=[ Assign( targets=[Name(id='url', ctx=Store())], value=BinOp(left=Attribute( value=Name(id='self', ctx=Load()), attr='domain', ctx=Load()), op=Add(), right=Name(id='url', ctx=Load()))) ], orelse=[]), Expr(value=Yield(value=Call( func=Name(id='Request', ctx=Load()), args=[Name(id='url', ctx=Load())], keywords=[ keyword(arg='callback', value=Attribute(value=Name( id='self', ctx=Load()), attr='parse_item', ctx=Load())) ], starargs=None, kwargs=None))) ], orelse=[]) ], decorator_list=[]) # 9.组装各个部分生成代码 class_attr.append(init_func) class_attr.append(close_func) class_attr.append(parse_func) class_define.body = class_attr import_block.append(class_define) module = Module(body=import_block) src = codegen.to_source(module) print src # im_1.names = # print im.fields() # # node = ast.UnaryOp() # node.op = ast.USub() # node.operand = ast.Num() # node.operand.n = 5 # node.operand.lineno = 0 # node.operand.col_offset = 0 # node.lineno = 0 # node.col_offset = 0 pass