Skip to content

MiguelPeralvoPM/topmodel

 
 

Repository files navigation

topmodel

topmodel is a service for evaluating binary classifiers. It comes with built-in metrics and comparisons so that you don't have to build your own from scratch.

You can store your data either locally or in S3.

Metrics

Here are the graphs topmodel will give you for any binary classifier:

Precision/recall curve

Precision/recall curve

ROC (Receiver operating characteristic) curve

ROC curve

We also use bootstrapping to show the uncertainty on ROC curves and precision/recall curves. Here's an example:

ROC curve with bootstrapping

Marginal precision

The idea here is that among all items with score 0.9, you expect 90% of them to be in the target group (marked 'True'). This graph compares the expected rate to the actual rate -- the closer it is to a straight line, the better.

Marginal precision

Brier decomposition

These are a set of metrics that measure, among other things, how close the marginal precision is to a straight line. Read more about decomposing the Brier score

Brier

Score distribution

Plots the distribution of scores for all instances and only for instances labelled 'True'.

Score frequencies

Using topmodel locally

topmodel comes with example data so you can try it out right away. Here's how:

  1. Create a virtualenv

  2. Install the requirements: pip install -r requirements.txt

  3. Start a topmodel server:

    ./topmodel_server.py
    
  4. topmodel should now be running at http://localhost:9191.

  5. See a page of metrics for some example data at http://localhost:9191/model/data/test/my_model_name/

You can now add new models for evaluation! (see "How to add a model to topmodel" below for more)

Using topmodel with S3

It's better to store your model data in a S3 bucket, so that you don't lose it. To get this working:

Create a config.yaml file:

cp config_example.yaml config.yaml

and fill it in with the S3 bucket you want to use and your AWS secret key and access key. topmodel will automatically find models in the bucket as long as they're named correctly (see "How to add a model to topmodel")

Then start topmodel with

./topmodel_server.py --remote

How to add a model to topmodel

  1. Create a TSV with columns 'pred_score' and 'actual'. Save it to your_model_name.tsv. The columns should be separated by tabs. In each row:

    • actual should be 0 or 1 (True/False also work)
    • pred_score should be the score the model determined.
    • See the examples in example_data/
    • For example:
    actual	pred-score
    False	0.2
    True	0.8
    True	0.7
    False	0.3
    
  2. Copy the TSV to S3 at s3://your-s3-bucket/your_model_name/scores.tsv, or locally to data/your_model_name/scores.tsv

  3. You're done! Your model should appear at http://localhost:9191/ if you reload.

Developing topmodel

We'd love for you to contribute. If you run topmodel with

./topmodel_server.py --development

it will autoreload code.

There's example data to test on in data/test.

Authors

License

Copyright 2014 Stripe, Inc

Licensed under the MIT license.

About

Standard evaluations for binary classifiers so you don't have to

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 82.7%
  • HTML 17.3%