Skip to content

xjdr/SEAL_OGB

 
 

Repository files navigation

SEAL_OGB -- An Implementation of SEAL for OGB Link Prediction Tasks

About

This repository supports the following paper:

M. Zhang, P. Li, Y. Xia, K. Wang, and L. Jin, Revisiting Graph Neural Networks for Link Prediction. [PDF]

SEAL is a GNN-based link prediction method. It first extracts a k-hop enclosing subgraph for each target link, then applies a Double Radius Node Labeling (DRNL) scheme to give each node an integer label as its additional feature. Finally, these labeled enclosing subgraphs are fed to a graph neural network to predict link existences.

This repository reimplements SEAL with the PyTorch-Geometric library, and tests SEAL on the Open Graph Benchmark (OGB) datasets. SEAL ranked 1st place on 3 out of 4 link prediction datasets in the OGB Leaderboard at the time of submission. It additionally supports Planetoid like datasets, such as Cora, CiteSeer and PubMed, where random 0.85/0.05/0.1 split and AUC metric are used. Using custom datasets is also easy by replacing the Planetoid dataset with your own.

ogbl-ppa ogbl-collab ogbl-ddi ogbl-citation2
Val results 51.25%±2.52%* 63.89%±0.49%* 28.49%±2.69% 87.57%±0.31%*
Test results 48.80%±3.16%* 63.64%±0.71%* 30.56%±3.86% 87.67%±0.32%*

* State-of-the-art results; evaluation metrics are Hits@100, Hits@50, Hits@20 and MRR, respectively. For ogbl-collab, we have switched to the new rule, where after all hyperparameters are determined on the validation set, we include validation edges in the training graph and retrain to report the test performance. For ogbl-citation2, it is an updated version of the deprecated ogbl-citation.

The original implementation of SEAL is here.

The original paper of SEAL is:

M. Zhang and Y. Chen, Link Prediction Based on Graph Neural Networks, Advances in Neural Information Processing Systems (NIPS-18). [PDF]

Requirements

Latest tested combination: Python 3.8.5 + PyTorch 1.6.0 + PyTorch_Geometric 1.6.1 + OGB 1.2.4.

Install PyTorch

Install PyTorch_Geometric

Install OGB

Other required python libraries include: numpy, scipy, tqdm etc.

Usages

ogbl-ppa

python seal_link_pred.py --dataset ogbl-ppa --num_hops 1 --use_feature --use_edge_weight --eval_steps 5 --epochs 20 --dynamic_train --dynamic_val --dynamic_test --train_percent 5 

ogbl-collab

python seal_link_pred.py --dataset ogbl-collab --num_hops 1 --use_feature --train_percent 10 --use_valedges_as_input

According to OGB, this dataset allows including validation links in training when all the hyperparameters are finalized using the validation set. Thus, you should first tune your hyperparameters without "--use_valedges_as_input", and then append "--use_valedges_as_input" to your final command when all the hyperparameters are determined. See issue.

ogbl-ddi

python seal_link_pred.py --dataset ogbl-ddi --num_hops 1 --ratio_per_hop 0.2 --use_edge_weight --eval_steps 1 --epochs 10 --dynamic_val --dynamic_test --train_percent 1 

For the above three datasets, append "--runs 10" to do experiments for 10 times and get the average results.

ogbl-citation2

python seal_link_pred.py --dataset ogbl-citation2 --num_hops 1 --use_feature --use_edge_weight --eval_steps 1 --epochs 10 --dynamic_train --dynamic_val --dynamic_test --train_percent 2 --val_percent 1 --test_percent 1

Because this dataset uses mean reciprocal rank (MRR) as the evaluation metric where each positive testing link is ranked against 1000 random negative ones, it requires extracting 1001 enclosing subgraphs for every testing link. This is very time consuming. Thus, the above command uses "--val_percent 1" and "--test_percent 1" to only evaluate on 1% of validation and test data to get a fast unbiased estimate of the true MRR. To get the true MRR, please change them to "--val_percent 100" and "test_percent 100". Also, because this dataset is expensive to evaluate, we first train 10 models with 1% validation data in parallel, record the best epoch's model from each run, and then evaluate all 10 best models together using the "--test_multiple_models --val_percent 100 --test_percent 100" option. This option enables evaluating multiple pretrained models together with a single subgraph extraction process for each link, thus avoiding extracting subgraphs for testing links repeatedly for 10 times. You need to specify your pretrained model paths in "seal_link_pred.py".

Cora

python seal_link_pred.py --dataset Cora --num_hops 3 --use_feature --hidden_channels 256 --runs 10

CiteSeer

python seal_link_pred.py --dataset CiteSeer --num_hops 3 --hidden_channels 256 --runs 10

PubMed

python seal_link_pred.py --dataset PubMed --num_hops 3 --use_feature --dynamic_train --runs 10

For all datasets, if you specify "--dynamic_train", the enclosing subgraphs of the training links will be extracted on the fly instead of preprocessing and saving to disk. Similarly for "--dynamic_val" and "--dynamic_test". You can increase "--num_workers" to accelerate the dynamic subgraph extraction process.

Other labeling tricks

By default SEAL uses the DRNL labeling trick. You can alternatively use other labeling tricks such as DE (distance encoding), DE+, ZO (zero-one labeling), etc., by appending "--node_label de", "--node_label de+", and "--node_label zo".

Heuristic methods

This repository also implements two link prediction heuristics: Common Neighbor (CN) and Adamic Adar (AA), which turn out to have surprisingly better performance than many GNN methods on ogbl-ppa and ogbl-collab. An example usage of Common Neighbor is:

python seal_link_pred.py --use_heuristic CN --dataset ogbl-ppa

License

SEAL_OGB is released under an MIT license. Find out more about it here.

Reference

If you find the code useful, please cite our papers.

@misc{zhang2020revisiting,
  title={Revisiting Graph Neural Networks for Link Prediction},
  author={Muhan Zhang and Pan Li and Yinglong Xia and Kai Wang and Long Jin},
  year={2020},
  eprint={2010.16103},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}

@inproceedings{zhang2018link,
  title={Link prediction based on graph neural networks},
  author={Zhang, Muhan and Chen, Yixin},
  booktitle={Advances in Neural Information Processing Systems},
  pages={5165--5175},
  year={2018}
}

Muhan Zhang, Facebook AI

10/13/2020

About

An open-source implementation of SEAL for link prediction in open graph benchmark (OGB) datasets.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%