Skip to content

sarwar187/searchie_eval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pytrec_eval

A (tiny) library to evaluate TREC-like runs by using TREC-like qrels. Implements Kendall's tau similarity of rankings, t-test between runs etcetera. Moreover, it is capable to output metrics for evaluating classification and clustering algorithms. For the moment the implemented metrics are the following (partitioned by task):

  • Document Retrival: Average Precision (AP), Normalized Discounted Cumulative Gain (NDCG), Precision, Recall, Precision@k.
  • Classification: precision, recall, precision (multi-topic), recall (multi-topic), accuracy (multi-topic), exact match ratio, retrieval f-score.
  • Clustering: purity, nmi, randin index, f-score.

Loading Data

  • Use the class TrecRun to load a TREC-run from a file

run = pytrec_eval.TrecRun(<fileName>)

  • you can also use loadAll to load all TREC-runs contained in a list of file names.

runs = pytrec_eval.loadAll(['run1', 'run2', 'run3'])

  • use the class QRels to load the qrels from a file

qrels = pytrec_eval.QRels(<fileName>)

Evaluation

  • Use evaluate (resp., evaluateAll) to evaluate a run (resp., list of runs)

pytrec_eval.evaluate(run, qrels, metrics)

where metrics is a list of functions computing some metrics, for example, [pytrec_eval.avgPrec, pytrec_eval.ndcg].

  • It is possible to compute the ranking of a list of runs by using pytrec_eval.rankRuns as follows:

ranking = pytrec_eval(<list of TrecRuns>, qrels, measure)

where measure is a function that computes some metrics. A ranking is a list of pairs (TrecRun, score) ordered by decreasing score.

pytrec_eval can also creates plots:

  • The function plotDifferenceFromAvg plots an histogram highlighting the performance of a run for each topic. It is possible to save the plot into a file by using the optional parameter outputFile.

pytrec_eval.plotDifferenceFromAvg(trecRun, qrels, pytrec_eval.ndcg, outputFile='./ndcg.pdf', showPlot=True)

  • The function plotRecallPrecision plots the recall/precision curve of a given run. It is also possible to have multiple recall/precision curves, one for each topic, by setting perQuery = True.

pytrec_eval.plotRecallPrecision(trecRun, qrels, perQuery=True, outputFile='./recall-precision.pdf', showPlot=False)

  • The function plotRecallPrecisionAll plots the recall/precision curves of all runs contained in the input list of runs.

pytrec_eval.plotRecallPrecisionAll([run0, run1, run2], qrels, outputFile='./recall-precision-all.pdf', showPlot=False)

Statistical Tools

  • pytrec_eval features a function that computes the Student's t-test between a run and a list of other runs, for example,

pValues = pytrec_eval.ttest(run0, [run1, run2], qrels, pytrec_eval.ndcg)

returns a dictionary mapping the name of run1 to the p-value obtained by comparing the NDCG score of run1 to the NDCG score of run0, and the name of run2 to the p-value obtained by comparing the NDCG score of run2 to the NDCG score of run0.

  • Given two rankings it is possible to compute the Kendall's tau correlation between them as follows:

tau = pytrec_eval.rankSimilarity(ranking0, ranking1)

where ranking0 and ranking1 are lists of pairs (TrecRun, score) ordered by decreasing score.

For more functions/details, please check the documentation strings in the Python source code.

Requirements

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages