Python Searcher.preprocess Examples

Programming Language: Python

Namespace/Package Name: searcher

Class/Type: Searcher

Method/Function: preprocess

Examples at hotexamples.com: 1

Python Searcher.preprocess - 1 examples found. These are the top rated real world Python examples of searcher.Searcher.preprocess extracted from open source projects. You can rate examples to help us improve the quality of examples.

Frequently Used Methods

Show Hide

Searcher(30)

build_model(4)

get_simple_index(3)

get_result(3)

getFacets(2)

fromClass(2)

getQuery(2)

find_by_full_text(2)

evaluate_query(2)

get_search_results(2)

getDocs(2)

label_qualify(2)

clear_memory(2)

generate_tree(2)

print_results(2)

mp_request_handler(1)

get_matching_lines(1)

search_one_of(1)

search_meiju_list_by_english_name_keyword(1)

getHighlighting(1)

getHits(1)

search_courses(1)

getSearchResults(1)

get_keywords(1)

save_docs_info(1)

get_links(1)

get_matching_dirs(1)

get_matching_files(1)

get_new_posts(1)

move(1)

get_next_hps(1)

get_query_dict(1)

process_query(1)

get_res(1)

preprocess(1)

precision_recall(1)

get_weights(1)

open(1)

length_Predict(1)

normalized_query(1)

load(1)

BoundedMinHeap(1)

find_torrents(1)

gen(1)

count(1)

_Searcher__clean_input(1)

__init__(1)

_gsearch(1)

add_document(1)

add_subreddit(1)

Example #1

Show file

File: crawler.py Project: LYttAGrt/IR_lab4

    def get_new_data(self, url, searcher: Searcher):
        content = requests.get(url=url).content
        ts = datetime.now().timestamp() * 1000
        soup = bs4.BeautifulSoup(str(content))
        words = []
        for unit in soup.find_all(text=True):
            if not isinstance(unit, bs4.element.Comment) and not re.match(
                    r"[\s\r\n]", unit) and unit.parent.name not in [
                        'style', 'script', '[document]', 'head', 'title',
                        'meta'
                    ]:
                tokens = searcher.preprocess(unit, True)
                if len(tokens) > 0:
                    words.append(tokens)

        # convert to dict
        for word in words:
            if self.data.get(word) is None:
                self.data[word] = dict()
                self.data[word][ts] = [ts * 10**12]
            else:
                self.data[word][ts] += [ts * 10**12]
        return 0