Structured Joint Embedding (SJE) for 0-shot learning

A Theano-based implementation of SJE for 0-shot learning [1]. It is an unofficial implementation of [1] and may not be reliable. The source code has only an educational purpose.

[1] Z. Akata et. al. "Zero-Shot Learning with Structured Embeddings" (link: http://arxiv.org/pdf/1409.8403v1.pdf).

0-shot learning

In the 0-shot learning scenario, the training and test classes are disjoint. To facilitate recognition a model has to successfully transfer additional information (for instance attributes) from known to unknown classes. In this example first an image is mapped into its representation (input embedding) with a global feature extractor (e.g. CNN). Next, a class name is encoded into its class representation (output embedding) using word2vec. Later, we train a compatibility function such that

F(input_embedding(image), output_embedding(class))

is large if the 'image' has 'class'.

In the test time, a given image 'im' is recognized by assigning a class 'class*' that maximizes the compatibility, that is

class* = \argmax_\{cl \in test_class} F(input_embedding(im), output_embedding(cl))

This approach follows an intuition: similar classes exhibit similar output embeddings (for instance similar classes have similar attributes, or similar wiki descriptions, or similar word2vec representation). Next, a compatibility between image representations (input embeddings) and the corresponding class representations (output embeddings) must be learnt.

Structured Joint Embedding (SJE)

The objective function is a binary ranking loss that separates positive compatibilities from negative compatibilities (following the ideas of structured SVM formulation). A compatibility is a function that measures dissimilarity between input and output embeddings.

The compatibility function between x and y is expressed as xWy with W being the compatibility matrix. This function is also similar to Mahalanobis distance but without positive definite or even symmetric constraints for the compatibility matrix. Thus the input and output embeddings can exhibit different dimensions. A more detailed information can be found in [1].

Experiments

I tested SJE on the CUB dataset with class word2vec output embedding, and achieved test accuracy ~= 22% This result corresponds to Table1, SJE column, CNN row in CUB \phi^w in [1].

Tested on

Python 2.7.3
Theano:167df2c43d1d08000105d448ff04b5bf2a6400c4

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
load_data.py		load_data.py
structured_joint_embedding.py		structured_joint_embedding.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

init.py

init.py

load_data.py

load_data.py

structured_joint_embedding.py

structured_joint_embedding.py

Repository files navigation

Structured Joint Embedding (SJE) for 0-shot learning

0-shot learning

Structured Joint Embedding (SJE)

Experiments

Tested on

About

Releases

Packages

Languages

License

Peratham/Structured_Joint_Embedding

Folders and files

Latest commit

History

Repository files navigation

Structured Joint Embedding (SJE) for 0-shot learning

0-shot learning

Structured Joint Embedding (SJE)

Experiments

Tested on

About

Resources

License

Stars

Watchers

Forks

Languages