Python utilities based on theano and pynnet and scikits.learn to learn vector encoders that map vector data to either:
- dense codes in low dimensional space, useful for semantic mapping and visualization by trying to preserve the local structure
- sparse codes in medium to high dimensional space, useful for semantic indexing, denoising and compression
This is experimental code. Nothing is expected to work as advertised yet :)
Implemented:
- deterministic (optimal) sparse encoding using an existing dictionary and Least Angle Regression (see
codemaker.sparse
)
Work in progress:
- stochastic neighbor embedding in low dim space using autoencoders
Planned:
- stochastic dictionary learning and approximate sparse coding using sparsity inducing autoencoders (see Ranzato 2007)
MIT: http://www.opensource.org/licenses/mit-license.php
Download the source distrib of the afore mentionned dependencies, untar them in the parent folder of codemaker
, build scikits.learn in local mode with python setup build_ext -i
and setup the dev environment with:
$ . ./activate.sh
You should now be able to fire you favorite python shell and import the codemaker package:
>>> import codemaker
>>> help(codemaker)
Run the tests with the nosetests command.
Sample usage can be found in the examples folder. Lower level usage patterns can also be found in the tests folder.
Plot showing the results of the swissroll exampleFailed attempt at using the codemaker
embedding utility to extract a 2D manifold from a toy dataset.