💡 Light Bulb

Light Bulb is an labeling tool built with state of the art active learning and semi supervised learning techniques. Currently supports text classification and image classification.

Getting Started

Mac OSX

brew install yarn
git clone https://github.com/czhu12/labelling-tool && cd labelling-tool
make

Usage

Configuration

Heres an example configuration:

task:
  title: What kind of animal is this?
  description: Select the type of animal you see, if there is none, select "Skip"
dataset:
  directory: dataset/image_classification/
  data_type: images
  judgements_file: outputs/image_multiclass_classification/labels.csv
label:
  type: classification
  classes:
    - Dog
    - Cat
    - Giraffe
    - Dolphin
    - Skip
model:
  directory: outputs/image_multiclass_classification/models/
user: chris

task

task:
  title: What kind of animal is this?
  description: Select the type of animal you see, if there is none, select "Skip"

dataset

dataset:
  directory: dataset/image_classification/
  data_type: images
  judgements_file: outputs/image_multiclass_classification/labels.csv

judgements_file defines the file that the labels are saved in.

data_type defines what type of model is used. Valid options are images and text

label

label:
  type: classification
  classes:
    - Dog
    - Cat
    - Giraffe
    - Dolphin

type defines the type of label, options are classification and binary.

model

model:
  directory: outputs/image_multiclass_classification/models/

directory defines where the trained model is saved.

user

user: chris

user defines who the person labeling is, which may be useful when the label's are used.

Example Text classification

To run the text classification demo:

make run config/text_multiclass_classification.yml

Example Image Classification

To run the Image classification demo:

make run config/image_multiclass_classification.yml

How It Works

Architecture

Most deep learning tasks can be framed as a encoder - decoder architecture. For example, text classification can be framed as an LSTM encoder that outputs into a logistic regression decoder. Object detection can be framed as a ResNet encoder with a regression decoder. All models in Light Bulb are framed as an encoder - decoder architecture, and the encoder are pre-trained on an external dataset (Image Net for images, and Wikitext-103 for text), and then fine-tuned on the target dataset.

Semi Supervised Text

Light Bulb's text encoder is a pretrained language model on wikitext-103 (inspired by ULMFiT), with a vocab limited to the most frequent 100k words in the corpus. The model is fine-tuned on the target dataset as a language model.

Semi Supervised Image

Light Bulb uses Squeeze Net pretrained on the ImageNet dataset to encode image data. The encoder is fine-tuned on the target dataset that is given to be labeled as an auto-encoder. Standard image augmentation techniques are used to expand the labeled training set.

Active Learning

Light Bulb will train a model as you provide training data through labeling. Light Bulb will sample items to be labeled by scoring the unlabeled items and sample the highest entropy items.

Coming Soon

Sequence Tagging
Object Detection
Sequence to Sequence Modeling
Dockerize Application

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
code		code
config		config
docs/images		docs/images
scripts		scripts
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
TODO.md		TODO.md
requirements.txt		requirements.txt

czhu12/labelling-tool

Folders and files

Latest commit

History

Repository files navigation