Skip to content

justanothergithubber/corr-im

Repository files navigation

Correlational Robust Influence Maximization

Overview

This repository contains code relating to the paper titled "Correlation Robust Influence Maximization". The code is provided primarily for reproducibility, and certain code are left as-is to reproduce results from when the code was written. Ideas from this code may be used to create a library.

Data

Datasets used

We use two real datasets:

  • polblogs - Update 25th April 2021, it seems that Mark Newman's page is no longer accessible, but there are multiple mirrors of the polblogs dataset and all can be used.
  • wikivote

Please place the downloaded files polblogs.gml and Wiki-Vote.txt in a folder called data after download. Note that we modify all input graphs to simple graphs with Graph.simplify(). In this case, polblogs is changed from a multidigraph with self-loops to a simple graph.

Use your own dataset

To use your own dataset, please modify the files graph_functions.py and config.py. graph_functions.py is used in both loading graphs and assigning edge weights. config.py influences the graphs loaded in experiment.py.

Usage

Assuming all dependencies and data are in place, to reproduce the results, a user simply needs to run

python paper.py

The script will begin calculating correlation-robust expected influence as well as the expected influence under independence cascade and place such data files into a folder called out/. After such data is gathered, the script will run analyses which should produce the graphs and tables as shown in the paper.

If a user wishes to run a specific experiment, users may also use experiment.py. For example, to run the correlation greedy experiment on polblogs with k up to 20 and homogeneous edge weights of 0.37, a user can use

python experiment.py polblogs 20 graph_techniques 0 -p 0.37

Because polblogs and graph_techniques are actually IntEnums (see config.py), the same experiment can be represented by

python experiment.py 0 20 0 0 -p 0.37

Users are encouraged to run python experiment.py --help to look at the options.

Dependencies

Python Packages

To run and reproduce the results as shown, users are encouraged to run

pip install -r requirements.txt

Summarily, the following packages are required:

To represent graphs, we mainly use igraph. NetworkX was used in synthetic graph generation and visualisation. We use Pyomo as one of the methods to find the seed with the highest marginal gain, but even with a fast linear program solver Gurobi, it is not nearly as fast as graph-based methods. We use Matplotlib to plot graphs and Pillow for image manipulation.

Other dependecies

  • Pruned Monte Carlo Simulator - because it is used for comparison, we also use a pruned Monte Carlo simulator, though we have modified it as in the fork for addition of some features mostly relating to data collection. Note that binaries are not provided but the code should compile relatively quickly - it takes an i7-7500U processor less than a minute. The resulting pmc_greed.exe and pmc_est.exe need to be in PATH for paper.py or experiment.py to run properly.
  • Linear Program Solver - if, for example, a user wants to use a linear program for influence maximization, then a solver is required. The files as provided assume that gurobipy is installed. Any solver that Pyomo can interface with can be used, but only CBC has been tested. To use an alternative solver, please modify line 9 of linear_program.py.

We note that users who only wish to use the algorithm may modify the code and simply use igraph.

Files

  • paper.py - this file is the main file that generates then analyses all data as shown in the paper. Running paper.py or specifically its get_data() function is expected to take around 20 hours on an i7-7500U processor. After the objective values and computational times are stored in folder out/, the rest of the functions should not take longer than 5 minutes.
  • experiment.py - this file represents a single experiment where the influence diffusion process proceeds to completion on a graph. We collect data while performing the experiments, mostly relating to objective values and computational times. We consider only the computational time from the start of the diffusion process to the end of the diffusion process, and do not include the setup computations.
  • greed.py - this file only contains accelgreedy, which is the accelerated greedy algorithm which seeks the seed with the highest marginal gain. The algorithm is also alternatively called 'Lazy Greedy.'
  • cascade.py - this file stores the functions used for calculating expected influence. Calculations for comonotone and independent cascade models are provided, but not exactly used, as the Python implementation of independent cascade could not compare to the C++ alternative.
  • linear_program.py - this file stores the linear program for the correlation robust influence calculation problem. That is, given a seed set, the expected influence under a adversarial correlations given that the marginal probabilities for nodes to activate other nodes are fixed is calculated.
  • graph_functions.py - this file stores all functions relating to the graphs themselves.
  • pmc.py - this file stores all functions for interfacing with the Pruned Monte Carlo simulation program. This requires pmc_greed.exe and pmc_est.exe to be in PATH.
  • config.py - this file stores defaults and constants.

Other notes

Note that within accelgreedy(...), there is a part which builds a distance matrix based on the graph, which speeds up the expected influence calculation. This has memory requirements scaling quadratically with the number of nodes of the graph, and so is not expected to scale well above millions. While the script can be changed to not require this, it will slow down the calculations.

Citation

We copy, in full, the Bibtex entry as provided by NeurIPS 2020:

@inproceedings{NEURIPS2020_4ee78d41,
 author = {Chen, Louis and Padmanabhan, Divya and Lim, Chee Chin and Natarajan, Karthik},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {H. Larochelle and M. Ranzato and R. Hadsell and M. F. Balcan and H. Lin},
 pages = {7078--7089},
 publisher = {Curran Associates, Inc.},
 title = {Correlation Robust Influence Maximization},
 url = {https://proceedings.neurips.cc/paper/2020/file/4ee78d4122ef8503fe01cdad3e9ea4ee-Paper.pdf},
 volume = {33},
 year = {2020}
}

About

Correlation Robust Influence Maximization

Topics

Resources

License

Stars

Watchers

Forks

Languages