Python CorpusSearcher.add_phrase Exemples

Langage de programmation: Python

Espace de nommage/Pack: preparation.corpus_searcher

Class/Type: CorpusSearcher

Méthode/Fonction: add_phrase

Exemples au hotexamples.com: 2

Python CorpusSearcher.add_phrase - 2 exemples trouvés. Ce sont les exemples réels les mieux notés de preparation.corpus_searcher.CorpusSearcher.add_phrase extraits de projets open source. Vous pouvez noter les exemples pour nous aider à en améliorer la qualité.

Méthodes fréquemment utilisées

Afficher Cacher

CorpusSearcher(2)

add_phrase(2)

find_similar(1)

get_random(1)

Méthodes fréquemment utilisées

CorpusSearcher (2)

add_phrase (2)

find_similar (1)

get_random (1)

Exemple #1

0

Afficher le fichier

tokenizer = Tokenizer() tokenizer.load() random_questions = CorpusSearcher() random_facts = CorpusSearcher() # прочитаем список случайных вопросов из заранее сформированного файла # (см. код на C# https://github.com/Koziev/chatbot/tree/master/CSharpCode/ExtractFactsFromParsing # и результаты его работы https://github.com/Koziev/NLP_Datasets/blob/master/Samples/questions4.txt) print('Loading random questions and facts...') with codecs.open(questions_path, 'r', 'utf-8') as rdr: for line in rdr: if len(line) < 40: question = line.strip() question = ru_sanitize(u' '.join( tokenizer.tokenize(question.lower()))) random_questions.add_phrase(normalize_qline(question)) # Прочитаем список случайных фактов, чтобы потом генерировать отрицательные паттерны for facts_path in [ 'paraphrases.txt', 'facts4.txt', 'facts5.txt', 'facts6.txt', ]: with codecs.open(os.path.join(data_folder, facts_path), 'r', 'utf-8') as rdr: n = 0 for line in rdr: s = line.strip() if s: if s[-1] == u'?':

Exemple #2

0

Afficher le fichier

Fichier : generate_nonrelevant_premises.py Projet : DnAp/chatbot

# Прочитаем список случайных фактов, чтобы потом генерировать отрицательные паттерны corpus_path = os.path.expanduser('~/Corpus/Raw/ru/text_blocks.txt') n = 0 print(u'Loading samples from {}'.format(corpus_path)) with codecs.open(corpus_path, 'r', 'utf-8') as rdr: for line in rdr: line = line.strip() phrases = segmenter.split(line) for phrase in phrases: if phrase[-1] == '.': phrase = phrase.strip().replace('--', '-') if phrase.count('"') == 1: phrase = phrase.replace('"', '') if is_good_premise(phrase): random_facts.add_phrase(phrase) n += 1 if n > 5000000: break print('{} random facts in set'.format(len(random_facts))) # Для этих вопросов негативные сэмплы уже подобраны processed_questions = set() with codecs.open( os.path.join(data_folder, 'nonrelevant_premise_questions.txt'), 'r', 'utf-8') as rdr: for line in rdr: if '|' in line: question = plain(line.strip().split('|')[1])