Active Bayesian Assessment for Black-Box Classifiers

This repo contains an implementation of the active Bayesian assessment models described in "Active Bayesian Assessment for Black-Box Classifiers", Disi Ji, Robert L. Logan IV, Padhraic Smyth, Mark Steyvers. [arXiv].

Setup

You will need Python 3.5+. Dependencies can be installed by running:

pip install -r requirements.txt

Data

We assess performance characteristics of neural models on several standard image and text classification datasets. The image datasets we use are: CIFAR-100 (Krizhevsky and Hinton, 2009), SVHN (Netzer et al., 2011) and ImageNet (Russakovsky et al., 2015). The text datasets we use are: 20 Newsgroups (Lang, 1995) and DBpedia (Zhang et al., 2015).

For image classification we use ResNet (He et al., 2016) architectures with either 110 layers (CIFAR-100) or 152 layers (SVHN and ImageNet). For ImageNet we use the pretrained model provided by PyTorch, and for CIFAR and SVHN we use the pretrained model checkpoints provided at: https://github.com/bearpaw/pytorch-classification. For text classification tasks we use fine-tuned BERTBASE (Devlin et al., 2019) models. These models were all trained on standard training sets in the literature, independent from the datasets used for assessment.

The assessment datasets are based on standard test sets used for each dataset in the literature. We build our assessment models on predictions that the classifiers made on the assessment datasets. Predictions can be downloaded from here.

Run the Active Bayesian Assessor

After downloading the data, update DATA_DIR, RESULTS_DIR and FIGURE_DIR and src/data_utils.py accordingly, to specify the input directory to read data from and the output output directory to write results and figures to.

To reproduce all the experimental results and figures we reported in the paper, run commands in script.

For example, to identify the extreme classes, navigate to src directory and run:

python active_learning_topk.py [dataset] 
    -metric [metric] 
    -mode [mode] 
    -topk [topk] 
    -pseudocount=[pseudocount]

where

dataset: name of the dataset. Path to the corresponding input data is hard-coded in src/data_utils.py.
metric: 'accuracy' or 'calibration_error', the metric used to rank classes.
mode: 'min' or 'max', to identify the most/least accurate/calibrated classes.
topk: the number of extreme classes to identify.
pseudocount: the strength of the prior.

Name		Name	Last commit message	Last commit date
Latest commit History 216 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
script		script

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src

src

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

script

script

Repository files navigation

Active Bayesian Assessment for Black-Box Classifiers

Setup

Data

Run the Active Bayesian Assessor

About

Releases

Packages

Languages

License

UCIDataLab/bayesian-blackbox

Folders and files

Latest commit

History

Repository files navigation

Active Bayesian Assessment for Black-Box Classifiers

Setup

Data

Run the Active Bayesian Assessor

About

Resources

License

Stars

Watchers

Forks

Languages