Python Crawler.crawl_and_createfileの例

プログラミング言語: Python

名前空間/パッケージ名: Crawler

クラス/型: Crawler

メソッド/関数: crawl_and_createfile

hotexamples.comのコード掲載数: 2

Python Crawler.crawl_and_createfile - 2件のコード例が見つかりました。すべてオープンソースプロジェクトから抽出されたPythonのCrawler.Crawler.crawl_and_createfileの実例で、最も評価が高いものを厳選しています。コード例の評価を行っていただくことで、より質の高いコード例が表示されるようになります。

よく使われるメソッド

表示非表示

Crawler(30)

crawl(15)

click(5)

close(4)

crawl_native(4)

getPage(3)

_process_next_url(2)

crawl_and_createfile(2)

add_to_dirlist(2)

crawl_multithread(2)

_process_html_link(2)

_process_html_asset(2)

_process_html(2)

save_crawler_data(2)

save_lists(2)

_make_request(2)

__init__(2)

render_sitemap(2)

crawling_process(1)

create_file(1)

create_view(1)

getCurrentPage(1)

getLinkStructure(1)

crawling(1)

crawl_own_albums(1)

Crawl(1)

getNextPage(1)

getPage2(1)

getTreeIndex(1)

getVisited(1)

hasNext(1)

join(1)

loadConf(1)

printLinkStructure(1)

process_q(1)

startCrawl(1)

startCrawling(1)

go(1)

crawl_index(1)

crawl_one(1)

baidu_search(1)

SLEEP_TIME(1)

URL_LIMIT(1)

_normalize_url(1)

_parse_url(1)

add(1)

addNewWorks(1)

add_target_full_profile(1)

add_target_short_profile(1)

all(1)

コード例 #1

ファイルを表示

ファイル: xiaoran.py プロジェクト: tianrenz2/Coded-MapReduce

            linkdict = {
                value + 1: links[value].strip()
                for value in range(num_of_links)
            }
            reverse_linkdict = {
                links[value].strip(): value + 1
                for value in range(num_of_links)
            }

        # print(reverse_linkdict)
        os.chdir(folder_path)  # change the director for the folder path

        # Created an instance of crawler and pass user number and links file into
        crawler = Crawler(user, "res.txt", reverse_linkdict, linkdict)
        # #Call crawl_and_createfile method to get all target links and create file for each source link
        crawler.crawl_and_createfile()

        fileprocess = FileProcessor(folder_path, user, num_of_links)
        fileprocess.file_filling()
        fileprocess.index_value()
        #    fileprocess.index2pair()
        fileprocess.rename()
        # rename()
        fileprocess.create_pair_files('pair_dir')
        # if need for shuffle and reduce file

        fileprocess.max_len = fileprocess.find_largest()
        fileprocess.write_bin_files()

    if remapping:
        file_transfer = FileTransfer(users, folder_path, path)

コード例 #2

ファイルを表示

ファイル: main.py プロジェクト: xiaoral2/Research-of-MapReduce

            for value in range(num_of_links)
        }
        uci_linkdict = linkdict
        reverse_linkdict = {
            links[value].strip(): value + 1
            for value in range(num_of_links)
        }

    if recrawl:
        # print(len(reverse_linkdict.keys()))
        os.chdir(folder_path)  # change the director for the folder path
        print("Start crawling")
        # Created an instance of crawler and pass user number and links file into
        crawler = Crawler(user, "res.txt", reverse_linkdict, linkdict)
        # #Call crawl_and_createfile method to get all target links and create file for each source link
        crawler.crawl_and_createfile(False, False)

    if reprocess:
        if not reinit:
            with open(dir + "/res.txt", "r") as f:
                num_of_links = len(f.readlines())

        fileprocess = FileProcessor(folder_path, user, num_of_links, path)
        fileprocess.file_filling()
        fileprocess.index_value()
        fileprocess.rename()

    if remapping:
        if mode == 1:
            file_transfer = FileTransfer(user, folder_path, path, num_of_links)
            file_coded_transfer = FileCodedTransfer(user, folder_path, path,