Bear

Python implementation of BEAR and SlashBurn. This is an experimental repo for PPR (personalized pagerank), but well, it does work in blazing speed.

Requirements

scipy
numpy
networkx (optional)
tensorflow (optional)

I used tensorflow for GPU acceleration of matrix computations. Why not pytorch or theano? PR it.

Documentation

All PPR implementation assumes that given graph has no sink (irreducible), which means normalized-tranposed google matrix should be a valid stochastic matrix. Note that any of approximation scheme can produce a non-stochastic vectors. You should normalize it if necessary.

Any of PPR class defined in ppr.py has 4(5) methods:

__init__(self, jump_prob=0.05, *args, **kwargs): jump_prob is a common option for all algorithms denoting a jumping probability for PPR, while other options are specified for each algorithm.
preprocess(self, filename): Reads matrix directly from csv file, then proceed to preprocessing step defined in each algorithm. The csv file must be csv format of graph edges, sorted in row-first order (see data/small.csv for sample). Note that we can compute the H matrix while reading!
query(self, q): Computes PPR for given query vector q. It should be a 1-d numpy array of dimension n. Returns pagerank vector.
save(self, filename): Just pickle dumping.
load(self, filename): Just pickle loading. Not fully implemented (PR it).

Four implemented PPR algorithms are:

PPRNetworkx: Computes PPR using networkx. Requires installation of the library. Note that this is the slowest iterative method (since networkx is implemented for general purpose).
PPRIterative: Computes PPR using iterative method. Simplest baseline. No preprocessing, but slow in query-time.
PPRLUDecomposition: Computes PPR using LU decomposition after ordering nodes with degree of nodes.
PPRBear: Computes PPR using BEAR with SlashBurn. I do know that this PPR prefix is lame.

Usage

Assume graph is store in some csv file named small.csv with row-first order (must be):

0,1
1,0
1,2
2,1
2,3
...

(This example data can be found in data/small.csv.) You can compute a simple personalized pagerank via following code.

import numpy as np
from ppr import PPRBear as Bear
bear = Bear()
bear.preprocess('data/small.csv')
r = bear.query(np.ones(15)/15)
print(r.sum())  # 1.0

Simple benchmark code is located in benchmark.py. But well, you should try these algorithms on graphs with millions of nodes.

Note that there are also many useful functions in utils.py! Specially, full implementation of Strongly Connected Components (Kosaraju's algorithm) and SlashBurn (Kang's algorithm) is in there.

from utils import scc
nodes = [1, 2, 3, 4]
edges = [(1, 2), (2, 1), (3, 4), (4, 3)]
ccs = scc(nodes, edges)
print(ccs)  # [[3, 4], [1, 2]]

import numpy as np
from scipy.sparse import coo_matrix
from utils import verbose_matrix, reorder_matrix, slashburn
A = np.array([[1, 1, 0, 0], [1, 1, 1, 1], [0, 1, 1, 0], [0, 1, 0, 1]], dtype=np.int32)
A = coo_matrix(A)
print(verbose_matrix(A))
"""
1 1
1 1 1 1
  1 1
  1   1
"""
perm, wing = slashburn(A)
print(wing)  # 1
A = reorder_matrix(A, perm)
print(verbose_matrix(A))
"""
1     1
  1   1
    1 1
1 1 1 1
"""

Using with Tensorflow

They're located in ppr_tf.py:

PPRIterativeTF
PPRLUDecompositionTF
PPRBearTF

Due to the initialization of tensorflow variables, preprocessing steps are merged into __init__. See benchmark.py for more detailed usage. Currently, since tensorflow does not support various sparse matrix manipulations like LU decomposition solver, only sparse-dense multiplication is used. Note that my GPU is GeForce GTX 1080. Results of some experiments are located in profile.txt. You can download some large datasets in here.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
benchmark.py		benchmark.py
ppr.py		ppr.py
pprtf.py		pprtf.py
profile.txt		profile.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

benchmark.py

benchmark.py

ppr.py

ppr.py

pprtf.py

pprtf.py

profile.txt

profile.txt

utils.py

utils.py

Repository files navigation

Bear

Requirements

Documentation

Usage

Using with Tensorflow

About

Releases

Packages

Languages

License

theeluwin/bear

Folders and files

Latest commit

History

Repository files navigation

Bear

Requirements

Documentation

Usage

Using with Tensorflow

About

Resources

License

Stars

Watchers

Forks

Languages