Exemplos de ReadFile.read_file_by_name em Python

Linguagem de programação: Python

Espaço para nome / nome do pacote: reader

Classe / Tipo: ReadFile

Método / Função: read_file_by_name

Exemplos em hotexamples.com: 2

ReadFile.read_file_by_name em Python - 2 exemplos encontrados. Esses são os exemplos do mundo real mais bem avaliados de reader.ReadFile.read_file_by_name em Python extraídos de projetos de código aberto. Você pode avaliar os exemplos para nos ajudar a melhorar a qualidade deles.

Métodos Frequentes

Exibir Ocultar

ReadFile(30)

read_file(21)

read_corpus(3)

get_files_names_in_dir(2)

read_file_by_name(2)

read_folder(2)

corpus_path(1)

create_files_name_list(1)

create_global_method(1)

get_all_path_of_parquet(1)

get_documents(1)

load_queries(1)

read_all_parquet(1)

read_dir(1)

read_fn(1)

set_corpus_path(1)

Métodos Frequentes

ReadFile (30)

read_file (21)

read_corpus (3)

get_files_names_in_dir (2)

read_file_by_name (2)

read_folder (2)

corpus_path (1)

create_files_name_list (1)

create_global_method (1)

get_all_path_of_parquet (1)

Métodos Frequentes

get_documents (1)

load_queries (1)

read_all_parquet (1)

read_dir (1)

read_fn (1)

set_corpus_path (1)

Exemplo n.º 1

0

Exibir arquivo

Arquivo: part2.py Projeto: adiashk/Search_Engine

def write_content_for_tweet_id(): corpus_path = "C:\\Users\\ASUS\\Desktop\\Data" config = ConfigClass(corpus_path) r = ReadFile(corpus_path=config.get__corpusPath()) names = r.get_files_names_in_dir() with open("text.csv", "w", newline='', encoding='utf-8') as f: writer = csv.writer(f) for name in names: documents_list = r.read_file_by_name(file_name=str(name)) for doc in documents_list: if doc[0] in tweet_ids: writer.writerow([doc[0], doc[2]])

Exemplo n.º 2

0

Exibir arquivo

Arquivo: search_engine.py Projeto: adiashk/Search_Engine

def run_engine(corpus_path, output_path, stemming, queries, num_docs_to_retrieve, word2vec): """ :return: """ # print("start: ", time.asctime(time.localtime(time.time()))) number_of_documents = 0 num_of_writes = 1 config = ConfigClass(corpus_path) r = ReadFile(corpus_path=config.get__corpusPath()) p = Parse(stemming) indexer = Indexer(config, word2vec) # documents_list = r.read_file(file_name='covid19_07-30.snappy.parquet') # TODO - handel all files ~50 (can do with from multiprocessing.pool import ThreadPool) # Iterate over every document in the file counter = 0 names = r.get_files_names_in_dir() for name in names: documents_list = r.read_file_by_name(file_name=str(name)) for idx, document in enumerate(documents_list): parsed_document = p.parse_doc(document) # parse the document if parsed_document == {}: # RT continue number_of_documents += 1 indexer.add_new_doc(parsed_document, num_of_writes) # index the document data counter += 1 if counter >= 500000: write_and_clean_buffer(indexer, num_of_writes, stemming, config, output_path) counter = 0 # print("finish parser & index number: ", num_of_writes, " At: ", time.asctime(time.localtime(time.time()))) num_of_writes += 1 # print('Finished parsing and indexing. Starting to export files') write_and_clean_buffer(indexer, num_of_writes, stemming, config, output_path) # print("finish parser & index: ", time.asctime(time.localtime(time.time()))) indexer.inverted_idx = { key: val for key, val in indexer.inverted_idx.items() if val != 1 } utils.save_obj(indexer.inverted_idx, "inverted_idx") # print("finish save index: ", time.asctime(time.localtime(time.time()))) return num_of_writes