Python index_file 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: indexer

메소드/함수: index_file

hotexamples.com에서의 예제들: 4

Python index_file - 4개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 indexer.index_file에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

파일: crawler.py 프로젝트: GabrieleNunez/Crawler.py

def go_index(page):
    """
    go_index(...) recursive function that will scrape web pages
    for every url it finds it will go in and call itself to continue
    caution this is a big memory hog and IT WILL fail eventually
    """
    try:
        scraper = WebScraper()
        if scraper.scrape(page):
            domain = indexer.get_domain(page)
            indexer.index_domain(domain)
            indexer.index_file(domain, domain, True)
            urls = scraper.get_link_urls()
            if indexer.has_crawable(urls):
                for url in urls:
                    title = url.encode(encoding="utf-8", errors="replace")
                    hash_path = "{0}.link".format(indexer.do_hash(title))
                    path = os.path.join(domain, hash_path)
                    if not os.path.exists(path) and url != scraper.page:
                        indexer.index_file(url, domain, True)
                        go_index(url)
        else:
            print("Can not scrape requested page {0}".format(page))
    except RuntimeError:
        # Figure out a way to respawn in another thread
        print("Runtime Error occurred. Killing Script")
        sys.exit()

예제 #2

파일 보기

파일: perftest.py 프로젝트: PaulRudin/xappy

def do_index(config, testrun):
    dbpath = testrun.dbpath(config)
    indexlogpath = testrun.indexlogpath(config)

    if not config.preserve or \
       not os.path.exists(dbpath) or \
       not os.path.exists(indexlogpath):

        if os.path.exists(dbpath):
            shutil.rmtree(dbpath)
        if os.path.exists(indexlogpath):
            os.unlink(indexlogpath)

        print "Starting index run (creating %s)" % dbpath
        indexer.index_file(inputfile=testrun.inputfile,
                           dbpath=dbpath,
                           logpath=indexlogpath,
                           flushspeed=testrun.flushspeed,
                           description=testrun.description,
                           maxdocs=testrun.maxdocs,
                           logspeed=testrun.logspeed)
        print "Ending index run"

예제 #3

파일 보기

파일: KyleAnthonyHW4.py 프로젝트: anthok/Python


crawler_backlog = {}
crawler_data = []
seed = "http://www.newhaven.edu/"
crawler_backlog[seed]=0

print("Creating Web Pickle....")
visit_url(seed, "www.newhaven.edu") #create raw_web.pickle with web contents 		 List of Tuples
out = open("raw_web.pickle", "bw") 
pickle.dump(crawler_data,out)
out.close()
print("Creating Data Pickle....")
data_load.traverse("fortune1") #creates raw_data.pickle with file contents   		 List of Tuples
print("Indexing Web Pickle....")
indexer.index_file("raw_web.pickle","out_data")
print("Indexing Data Pickle....")
indexer.index_file("raw_data.pickle","out_data")


getWeather("West Haven","CT")
searcher.searchFile("out_data")

예제 #4

파일 보기

파일: KyleAnthonyH3.py 프로젝트: anthok/Python

import data_load
import searcher 
import indexer 


#data_load.traverse("fortune1")
wordDictionary = indexer.index_file("raw_data.pickle", "the_shelve")
searcher.searchFile("the_shelve")