Python ParaphraseMiningEvaluator 예제들

프로그래밍 언어: Python

네임스페이스/패키지 이름: sentence_transformers.evaluation

메소드/함수: ParaphraseMiningEvaluator

hotexamples.com에서의 예제들: 2

Python ParaphraseMiningEvaluator - 2개의 예제가 발견되었습니다. 이것들은 오픈소스 프로젝트에서 추출된 Python의 sentence_transformers.evaluation.ParaphraseMiningEvaluator에 대한 실세계 최고 등급의 예제들입니다. 예제들을 평가하여 예제의 품질 향상에 도움을 줄 수 있습니다.

예제 #1

파일 보기

 def test_ParaphraseMiningEvaluator(self):
     """Tests that the ParaphraseMiningEvaluator can be loaded"""
     model = SentenceTransformer('paraphrase-distilroberta-base-v1')
     sentences = {
         0: "Hello World",
         1: "Hello World!",
         2: "The cat is on the table",
         3: "On the table the cat is"
     }
     data_eval = evaluation.ParaphraseMiningEvaluator(
         sentences, [(0, 1), (2, 3)])
     score = data_eval(model)
     assert score > 0.99

예제 #2

파일 보기

        dev_sentences[row['qid']] = row['question']

        if len(dev_sentences) >= max_dev_samples:
            break

with open(os.path.join(dataset_path, "duplicate-mining/dev_duplicates.tsv"),
          encoding='utf8') as fIn:
    reader = csv.DictReader(fIn, delimiter='\t', quoting=csv.QUOTE_NONE)
    for row in reader:
        if row['qid1'] in dev_sentences and row['qid2'] in dev_sentences:
            dev_duplicates.append([row['qid1'], row['qid2']])

# The ParaphraseMiningEvaluator computes the cosine similarity between all sentences and
# extracts a list with the pairs that have the highest similarity. Given the duplicate
# information in dev_duplicates, it then computes and F1 score how well our duplicate mining worked
paraphrase_mining_evaluator = evaluation.ParaphraseMiningEvaluator(
    dev_sentences, dev_duplicates, name='dev')
evaluators.append(paraphrase_mining_evaluator)

###### Duplicate Questions Information Retrieval ######
# Given a question and a large corpus of thousands questions, find the most relevant (i.e. duplicate) question
# in that corpus.

# For faster processing, we limit the development corpus to only 10,000 sentences.
max_corpus_size = 100000

ir_queries = {}  #Our queries (qid => question)
ir_needed_qids = set()  #QIDs we need in the corpus
ir_corpus = {}  #Our corpus (qid => question)
ir_relevant_docs = {
}  #Mapping of relevant documents for a given query (qid => set([relevant_question_ids])