Python Scraper.is_known_url Examples

Programming Language: Python

Namespace/Package Name: scraper

Class/Type: Scraper

Method/Function: is_known_url

Examples at hotexamples.com: 1

Python Scraper.is_known_url - 1 examples found. These are the top rated real world Python examples of scraper.Scraper.is_known_url extracted from open source projects. You can rate examples to help us improve the quality of examples.

Frequently Used Methods

Show Hide

Scraper(30)

matchTag(7)

connect(6)

__init__(5)

_time_now(5)

close(5)

submit(3)

find_docs(3)

get_children(3)

create_destination(2)

extractTag(2)

get_papers(2)

begin(2)

get_all_page_uris(1)

get_all_skills(1)

get_css(1)

get_and_write_records(1)

getZipLinks(1)

get_manga(1)

get_paths(1)

get_post_data_per_page(1)

get_all_manga(1)

getGameList(1)

getSlist(1)

getQlist(1)

getInformation(1)

getIndexhtm(1)

get_prices(1)

getEvents(1)

getDepts(1)

getAppList(1)

gather_reddit_data(1)

fetch_most_recent_transactions(1)

fetch_booster_usage(1)

extractText(1)

create_organization_sets(1)

create_http_link(1)

get_price(1)

DownloadImage(1)

get_script(1)

scrape_ingredients(1)

update_submission_content(1)

store_parse(1)

stopped(1)

sort(1)

seturldata(1)

set_started_callback(1)

set_output_file(1)

set_finished_callback(1)

set_broadcast_document_callback(1)

Example #1

Show file

File: main.py Project: halldor/Reynir

def analyze():
    """ Analyze text from a given URL """

    url = request.form.get("url", "").strip()
    use_reducer = not ("noreduce" in request.form)
    dump_forest = "dump" in request.form
    metadata = None
    # Single sentence (True) or contiguous text from URL (False)?
    single = False
    keep_trees = False

    t0 = time.time()

    if url.startswith("http:") or url.startswith("https:"):
        # Scrape the URL, tokenize the text content and return the token list
        metadata, generator = process_url(url)
        toklist = list(generator)
        # If this is an already scraped URL, keep the parse trees and update
        # the database with the new parse
        keep_trees = Scraper.is_known_url(url)
    else:
        # Tokenize the text entered as-is and return the token list
        # In this case, there's no metadata
        toklist = list(tokenize(url))
        single = True

    tok_time = time.time() - t0

    t0 = time.time()

    # result = profile(parse, toklist, single, use_reducer, dump_forest)
    result, trees = parse(toklist, single, use_reducer, dump_forest, keep_trees)

    # Add a name register to the result
    create_name_register(result)

    parse_time = time.time() - t0

    if keep_trees:
        # Save a new parse result
        if Settings.DEBUG:
            print("Storing a new parse tree for url {0}".format(url))
        Scraper.store_parse(url, result, trees)

    result["metadata"] = metadata
    result["tok_time"] = tok_time
    result["parse_time"] = parse_time

    # Return the tokens as a JSON structure to the client
    return jsonify(result = result)