Python Spider.getContent Examples

Programming Language: Python

Namespace/Package Name: Spider

Class/Type: Spider

Method/Function: getContent

Examples at hotexamples.com: 2

Python Spider.getContent - 2 examples found. These are the top rated real world Python examples of Spider.Spider.getContent extracted from open source projects. You can rate examples to help us improve the quality of examples.

Frequently Used Methods

Show Hide

Spider(30)

__init__(5)

crawl(4)

run(3)

crawl_page(3)

get_page(2)

scapy(2)

getContent(2)

insert_jobs(1)

load_users(1)

main(1)

make_request(1)

modify_data(1)

modify_grade(1)

search_cite_papers(1)

printProblems(1)

queryMatrixProblems(1)

hrefFor2018(1)

queryProblems(1)

read_library(1)

status_crawl(1)

serialFetchAllProblems(1)

runSpider(1)

parallelFetchAllProblems(1)

get_my_follower(1)

getdoc(1)

crawl_url(1)

adapt_job_city(1)

alive(1)

analyze_jobs(1)

associate_key_and_job(1)

close(1)

contentOfArtical(1)

crawlPlayer(1)

crawl_error_user(1)

crawler_data(1)

get_page_count(1)

crawling(1)

crawljobs(1)

deleteDatabase(1)

enable_collection(1)

getData(1)

getItemsCount(1)

get_my_fans(1)

Start(1)

user_crawl(1)

Example #1

Show file

File: start.py Project: 993162337/crawler

def main():
    # 根据所带参数，确定使用哪个网站的配置参数
    try:
        website = sys.argv[1]
        url = sys.argv[2]
    except Exception as e:
        print "please choose one website"
        exit()

    # 实例化
    dic = {
        "qidian": Qidian,
        "heiyan": Heiyan,
    }
    config = dic[website]()

    # 获取关键信息
    handler = Spider(config.title, config.content, config.next)

    chapters = config.getList(url)

    book = open("text.txt", "w")

    for item in chapters:
        print "正在下载->", item["title"]
        content = handler.getContent(item["href"])

        book.writelines(item["title"] + "\n")
        book.writelines(content["content"] + "\n")

Example #2

Show file

File: app_gui.py Project: 993162337/crawler

	def startCB(self):
		# 保存内容的文件
		file = open(self.filePath, "w")

		# 爬取得规则
		titleKlass = {"class": "j_chapterName"}
		contentKlass = {"class": "j_readContent"}
		nextKlass = {"id": "j_chapterNext"}

		page = self.entryUrl.get()
		# 开始爬取
		spider = Spider(titleKlass, contentKlass, nextKlass)

		if page == "" or self.filePath == "":
			tkMessageBox.showerror("woolson", "小说名称或链接未填写！")
		else:
			# 循环抓取下一章
			while page != "":
				result = spider.getContent(page)

				try:
					page = result["nextUrl"]
					file.write(result["title"] + "\n")
					file.write(result["content"] + "\n\n")

					print "正在写入->" + result["title"]
				except Exception as e:
					page = ""
					print "结束", result["error"]