Skip to content

maniospas/pygrank

 
 

Repository files navigation

pygrank

Fast node ranking algorithms on large graphs.
Node score diffusion · Recommendation and ranking · Community structure · Link prediction · Graph signal processing

License: Apache Software License
Author: Emmanouil (Manios) Krasanakis
Dependencies: networkx,numpy,scipy,sklearn,wget
Backends (optional): numpy,tensorflow,pytorch,torch_sparse,matvec
Externally install non-numpy backends before using them.

build codecov Downloads

🚀 New features (after 0.2.10)

🛠️ Installation

pygrank works with Python 3.9 or later. The latest version can be installed with pip per:

pip install --upgrade pygrank

To run the library on backpropagateable backends, either change the automatically created configuration file (follow the instructions in the stderr console) or run parts of your code within a context manager to override other configurations like this:

import pygrank as pg
with pg.Backend("tensorflow"):
    ... # run your pygrank code here

Otherwise, everything runs on top of numpy, which is faster for forward passes. Node ranking algorithms can be defined outside contexts and run inside.

⚡ Quickstart

Before looking at details, here is fully functional pipeline that scores the importance of a node in relation to a list of "seed" nodes within a graph's structure:

import pygrank as pg
graph, seeds, node = ...

pre = pg.preprocessor(assume_immutability=True, normalization="symmetric")
algorithm = pg.PageRank(alpha=0.85)+pre >> pg.Sweep() >> pg.Ordinals()
ranks = algorithm(graph, seeds)
print(ranks[node])
print(algorithm.cite())

The graph can be created with networkx or, for faster computations, with the pygrank.fastgraph module. Nodes can hold any kind of object or data type (you don't need to convert them to integers).

The above snippet starts by defining a preprocessor, which controls how graph adjacency matrices are normalized. In this case, a symmetric normalization is applied (which is ideal for undirected graphs) and we also assume graph immutability, i.e., that it will not change in the future. When this assumption is declared, the preprocessor hashes a lot of computations to considerably speed up experiments or autotuning.

The snippet uses the chain operator to wrap node ranking algorithms by various kinds of postprocessors. You can also put algorithms into each other's constructors if you are not a fan of functional programming. The chain starts from a pagerank graph filter with diffusion parameter 0.85. Other filters can be declared, including automatically tuned ones.

The produced algorithm is run as a callable, yielding a map between nodes and values (in graph signal processing, such maps are called graph signals) and the value of a node is printed. Graph signals can also be created and directly parsed by algorithms, for example as:

signal = pg.to_signal(graph, {v: 1. for v in seeds})
ranks = algorithm(signal)

Finally, the snippet prints a recommended citation for the algorithm.

More examples

Showcase
Big data FAQ
Downstream tasks

🧠 Overview

Analyzing graph edges (links) between graph nodes can help rank or score nodes based on proximity to structural or attribute-based communities given known example members. With the introduction of graph signal processing and decoupled graph neural networks, the importance of node ranking has drastically increased, as its ability to perform induction by quickly spreading node information through edges has been theoretically and experimentally corroborated. For example, it can be used to make predictions based on a few known node attributes or based on the outputs of feature-based machine learning models.

pygrank is a collection of node ranking algorithms and practices that support real-world conditions, such as large graphs and heterogeneous preprocessing and postprocessing requirements. Thus, it provides ready-to-use tools that simplify the deployment of theoretical advancements and testing of new algorithms.

Some of the library's advantages are:

  1. Compatibility with networkx, plain numpy, tensorflow, pytorch, matvec.
  2. Datacentric interfaces that do not require transformations to identifiers.
  3. Large graph support with sparse data structures and scalable algorithms.
  4. Seamless pipelines (e.g., operation chains), from graph preprocessing up to benchmarking and evaluation.
  5. Modular components to be combined and a functional chain interface for complex combinations.
  6. Fast running time with highly optimized operations

🔗 Material

Tutorials & Documentation
Functional Interface

Quick links
Measures
Graph Filters
Postprocessors
Tuners
Downloadable Datasets

Backend resources
numpy (default, no additional installation)
tensorflow
pytorch
torch_sparse
matvec

🔥 Features

  • Graph filters
  • Community detection
  • Link prediction
  • Graph normalization
  • Convergence criteria
  • Postprocessing (e.g., fairness awareness)
  • Evaluation measures
  • Benchmarks
  • Autotuning
  • Graph Neural Network (GNN) support

👍 Contributing

Feel free to contribute in any way, for example through the issue tracker or by participating in discussions. Please check out the contribution guidelines to bring modifications to the code base. If so, make sure to follow the pull checklist described in the guidelines.

📓 Citation

If pygrank has been useful in your research and you would like to cite it in a scientific publication, please refer to the following paper:

@article{krasanakis2022pygrank,
  author       = {Emmanouil Krasanakis, Symeon Papadopoulos, Ioannis Kompatsiaris, Andreas Symeonidis},
  title        = {pygrank: A Python Package for Graph Node Ranking},
  journal      = {SoftwareX},
  year         = 2022,
  month        = oct,
  doi          = {10.1016/j.softx.2022.101227},
  url          = {https://doi.org/10.1016/j.softx.2022.101227}
}

To publish research that makes use of provided methods, please cite all relevant publications.

About

Recommendation algorithms for large graphs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.9%
  • Shell 0.1%