Python ReadFile.read_file_by_name Exemples

Langage de programmation: Python

Espace de nommage/Pack: reader

Class/Type: ReadFile

Méthode/Fonction: read_file_by_name

Exemples au hotexamples.com: 2

Python ReadFile.read_file_by_name - 2 exemples trouvés. Ce sont les exemples réels les mieux notés de reader.ReadFile.read_file_by_name extraits de projets open source. Vous pouvez noter les exemples pour nous aider à en améliorer la qualité.

Méthodes fréquemment utilisées

Afficher Cacher

ReadFile(30)

read_file(21)

read_corpus(3)

get_files_names_in_dir(2)

read_file_by_name(2)

read_folder(2)

corpus_path(1)

create_files_name_list(1)

create_global_method(1)

get_all_path_of_parquet(1)

get_documents(1)

load_queries(1)

read_all_parquet(1)

read_dir(1)

read_fn(1)

set_corpus_path(1)

Méthodes fréquemment utilisées

ReadFile (30)

read_file (21)

read_corpus (3)

get_files_names_in_dir (2)

read_file_by_name (2)

read_folder (2)

corpus_path (1)

create_files_name_list (1)

create_global_method (1)

get_all_path_of_parquet (1)

Méthodes fréquemment utilisées

get_documents (1)

load_queries (1)

read_all_parquet (1)

read_dir (1)

read_fn (1)

set_corpus_path (1)

Exemple #1

0

Afficher le fichier

Fichier : part2.py Projet : adiashk/Search_Engine

def write_content_for_tweet_id(): corpus_path = "C:\\Users\\ASUS\\Desktop\\Data" config = ConfigClass(corpus_path) r = ReadFile(corpus_path=config.get__corpusPath()) names = r.get_files_names_in_dir() with open("text.csv", "w", newline='', encoding='utf-8') as f: writer = csv.writer(f) for name in names: documents_list = r.read_file_by_name(file_name=str(name)) for doc in documents_list: if doc[0] in tweet_ids: writer.writerow([doc[0], doc[2]])

Exemple #2

0

Afficher le fichier

Fichier : search_engine.py Projet : adiashk/Search_Engine

def run_engine(corpus_path, output_path, stemming, queries, num_docs_to_retrieve, word2vec): """ :return: """ # print("start: ", time.asctime(time.localtime(time.time()))) number_of_documents = 0 num_of_writes = 1 config = ConfigClass(corpus_path) r = ReadFile(corpus_path=config.get__corpusPath()) p = Parse(stemming) indexer = Indexer(config, word2vec) # documents_list = r.read_file(file_name='covid19_07-30.snappy.parquet') # TODO - handel all files ~50 (can do with from multiprocessing.pool import ThreadPool) # Iterate over every document in the file counter = 0 names = r.get_files_names_in_dir() for name in names: documents_list = r.read_file_by_name(file_name=str(name)) for idx, document in enumerate(documents_list): parsed_document = p.parse_doc(document) # parse the document if parsed_document == {}: # RT continue number_of_documents += 1 indexer.add_new_doc(parsed_document, num_of_writes) # index the document data counter += 1 if counter >= 500000: write_and_clean_buffer(indexer, num_of_writes, stemming, config, output_path) counter = 0 # print("finish parser & index number: ", num_of_writes, " At: ", time.asctime(time.localtime(time.time()))) num_of_writes += 1 # print('Finished parsing and indexing. Starting to export files') write_and_clean_buffer(indexer, num_of_writes, stemming, config, output_path) # print("finish parser & index: ", time.asctime(time.localtime(time.time()))) indexer.inverted_idx = { key: val for key, val in indexer.inverted_idx.items() if val != 1 } utils.save_obj(indexer.inverted_idx, "inverted_idx") # print("finish save index: ", time.asctime(time.localtime(time.time()))) return num_of_writes