Ext-RESCAL

Scalable Tensor Factorization

Ext-RESCAL is a memory efficient implementation of RESCAL, a state-of-the-art algorithm for DEDICOM-like tensor factorization. Ext-RESCAL is written in Python and leverages on the SciPy Sparse module.

Current Version

0.6

Features

3-D sparse tensor factorization [1]
Joint 3-D sparse tensor and 2-D sparse matrix factorization (extended version) [2-3]
Handy input format
Support of float values
The implementation provably scales well to the domains with millions of nodes on the affordable hardware

[1] M. Nickel, V. Tresp, H. Kriegel. A Three-way Model for Collective Learning on Multi-relational Data // Proceedings of the 28th International Conference on Machine Learning (ICML'2011). - 2011.

[2] M. Nickel, V. Tresp, H. Kriegel. Factorizing YAGO: Scalable Machine Learning for Linked Data // Proceedings of the 21st international conference on World Wide Web (WWW'2012). - 2012.

[3] Nickel, Maximilian. Tensor factorization for relational learning. Diss. München, Ludwig-Maximilians-Universität, Diss., 2013, 2013.

Expected Applications

Link Prediction
Collaborative Filtering
Entity Search

Prerequisites

Python 2.7+
Numpy 1.6+
SciPy 0.12+

Usage Examples

Let's imagine we have the following semantic graph:

Each tensor slice represents an adjacency matrix of the corresponding predicate (member-of, genre, cites). We run the RESCAL algorithm to decompose a 3-D tensor with 2 latent components and zero regularization on the test data:

python rescal.py --latent 2 --lmbda 0 --input tiny-example --outputentities entity.embeddings.csv --outputfactors latent.factors.csv --log rescal.log

The test data set represents a tiny entity graph of 3 adjacency matrices (frontal tensor slices) in the row-column representation. See the directory tiny-example. Ext-RESCAL will output the latent factors for the entities into the file entity.embeddings.csv.

Then, we assume that there is an entity-term matrix:

Then, we run the extended version of RESCAL algorithm to decompose a 3-D tensor and 2-D matrix with 2 latent components and regularizer equal to 0.001 on the test data (entity graph and entity-term matrix):

python extrescal.py --latent 2 --lmbda 0.001 --input tiny-mixed-example --outputentities entity.embeddings.csv --outputterms term.embeddings.csv --outputfactors latent.factors.csv --log extrescal.log

If we plot the resulting embeddings, we would get the following picture, which reveals the similarity of entities and words in the same latent space:

In case of float values in the entity-term matrix (e.g. TF-IDF weighted vectors), one may use extrescal-float.py script for calculation, providing the file ext-matrix-elements in the input directory, which contains the values.

Development and Contribution

Ext-RESCAL has been developed by Nikita Zhiltsov. This project is a fork of the original code base provided by Maximilian Nickel. Ext-RESCAL may contain some bugs, so, if you find any of them, feel free to contribute the patches via pull requests into the develop branch.

Release Notes

0.6 (March 21, 2014):

Make the extended algorigthm output fixed (by replacing random initialization)
Add handling of float values in the extended task
Add the util for matrix pseudoinversion
Switch to Apache License 2.0

0.5 (March 29, 2013):

Greatly improve the convergence speed via initialization of starting matrices with eigenvectors

0.4 (March 14, 2013):

Add efficient computation of the exact objective value via trick with trace

0.3 (March 12, 2013):

Fix random sampling for the basic task
Add output of latent factors

0.2 (February 26, 2013):

Add an opportunity to approximate the objective function via random sampling
Bug fixes
Change the default settings

0.1 (January 31, 2013):

The basic implementation of both the algorithms

Credit

The original algorithms are an intellectual property of the authors of the cited papers.

Disclaimer

The author is not responsible for implications from the use of this software.

License

Licensed under the Apache License, Version 2.0: http://www.apache.org/licenses/LICENSE-2.0

Citation

If you have used this software in your research work, please cite the following paper:

Zhiltsov, N., Agichtein, E. Improving Entity Search over Linked Data by Modeling Latent Semantics. Proceedings of the International Conference on Information and Knowledge Management (CIKM 2013). ACM, 2013.

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
tiny-example		tiny-example
tiny-mixed-example		tiny-mixed-example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
commonFunctions.py		commonFunctions.py
commonFunctionsTest.py		commonFunctionsTest.py
extrescal-float.py		extrescal-float.py
extrescal.py		extrescal.py
extrescalFunctions.py		extrescalFunctions.py
extrescalFunctionsTest.py		extrescalFunctionsTest.py
pseudoinverse.py		pseudoinverse.py
rescal.py		rescal.py

License

brianholland/Ext-RESCAL

Folders and files

Latest commit

History

Repository files navigation

Ext-RESCAL

Scalable Tensor Factorization

Current Version

Features

Expected Applications

Prerequisites

Usage Examples

Development and Contribution

Release Notes

Credit

Disclaimer

License

Citation

About

Resources

License

Stars

Watchers

Forks