Skip to content
/ EAkit Public
forked from THU-KEG/EAkit

Entity Alignment toolkit (EAkit), a lightweight, easy-to-use and highly extensible PyTorch implementation of many entity alignment algorithms.

License

Notifications You must be signed in to change notification settings

yyht/EAkit

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EAkit

Entity Alignment toolkit (EAkit), a lightweight, easy-to-use and highly extensible PyTorch implementation of many entity alignment algorithms. The algorithm list is from Entity_Alignment_Papers.

Table of Contents

  1. Design
  2. Organization
  3. Usage
    1. Run an implemented model
      1. Semantic Matching Models
      2. GNN-based Models
      3. KE-based Models
      4. Results
    2. Write a new model
  4. Dataset
  5. Reqirements
  6. TODO
  7. Acknowledgement

Design

We sort out the existing entity alignment algorithms and modularizing the composition of them, and then define an abstract structure as 1 Encoder - N Decoder(s), where different modules are regarded as specific implementations of different encoders and decoders, so as to restore the structures of the algorithms.

Framework of EAkit

Organization

./EAkit
├── README.md                           # Doc of EAkit
├── _runs                               # Tensorboard log dir
├── data                                # Datasets. (unzip data.zip)
│   └── DBP15K
├── examples                            # Shell scripts of implemented algorithms
│   ├── Tensorboard.sh                  # Start Tensorboard visualization
│   ├── run_BootEA.sh
│   ├── run_ComplEx.sh
│   ├── run_ConvE.sh
│   ├── run_DistMult.sh
│   ├── run_GCN-Align.sh
│   ├── run_HAKE.sh
│   ├── run_KECG.sh
│   ├── run_MMEA.sh
│   ├── run_MTransE.sh
│   ├── run_NAEA.sh
│   ├── run_RotatE.sh
│   ├── run_TransE.sh
│   ├── run_TransEdge.sh
│   ├── run_TransH.sh
│   └── run_TransR.sh
├── load_data.py                        # Load datasets. (data adapter)
├── models.py                           # Encoders & Decoders
├── run.py                              # Main
├── semi_utils.py                       # Bootstrap strategy
└── utils.py                            # Sampling methods, ...

Usage

Run an implemented model

  1. Start TensorBoard for metrics visualization (run under examples/):
./Tensorboard.sh
  1. Modify and run a script as follow (examples are under examples/):
CUDA_VISIBLE_DEVICES=0 python3 run.py --log gcnalign \
                                    --data_dir "data/DBP15K/zh_en" \
                                    --rate 0.3 \
                                    --epoch 1000 \
                                    --check 10 \
                                    --update 10 \
                                    --train_batch_size -1 \
                                    --encoder "GCN-Align" \
                                    --hiddens "100,100,100" \
                                    --decoder "Align" \
                                    --sampling "N" \
                                    --k "25" \
                                    --margin "1" \
                                    --alpha "1" \
                                    --feat_drop 0.0 \
                                    --lr 0.005 \
                                    --train_dist "euclidean" \
                                    --test_dist "euclidean"

In detail, the following methods are currently implemented:

Semantic Matching Models

GNN-based Models

KE-based Models

Results

Results on DBP15K(zh_en, ja_en, fr_en).

Hits@1 Hits@10 MRR Hits@1 Hits@10 MRR Hits@1 Hits@10 MRR
MTransE 0.419 0.753 0.535 0.433 0.773 0.549 0.407 0.751 0.526
BootEA 0.490 0.793 0.593 0.499 0.813 0.605 0.515 0.838 0.623
TransEdge 0.519 0.813 0.621 0.526 0.825 0.632 0.397 0.824 0.543
MMEA 0.405 0.672 0.499 0.397 0.680 0.496 0.442 0.749 0.550
GCN-Align 0.410 0.756 0.527 0.442 0.810 0.566 0.430 0.813 0.557
NAEA 0.323 0.481 0.381 0.311 0.457 0.363 0.307 0.460 0.362
KECG 0.467 0.815 0.586 0.485 0.843 0.605 0.479 0.844 0.602
TransE 0.343 0.634 0.441 0.365 0.710 0.480 0.374 0.735 0.493
TransH 0.436 0.735 0.540 0.450 0.778 0.561 0.485 0.821 0.599
TransR 0.371 0.697 0.481 0.368 0.709 0.484 0.378 0.741 0.497
RotatE 0.423 0.754 0.534 0.448 0.785 0.561 0.439 0.800 0.560
HAKE 0.288 0.588 0.391 0.319 0.607 0.421 0.319 0.638 0.428
DistMult 0.180 0.400 0.255 0.058 0.179 0.099 0.095 0.285 0.157
ComplEx 0.115 0.265 0.166 0.063 0.251 0.146 0.141 0.332 0.206
ConvE 0.210 0.466 0.299 0.339 0.556 0.415 0.350 0.602 0.439

Write a new model

  1. Divide the algorithm at the abstract level to obtain the structure of 1 (or 0) Encoder and 1 (or more) Decoder(s).
  2. Register the modules and add extra parameters in the top-level encoder (class Encoder) and top-level decoder (class Decoder) in models.py.
  3. Implement the concrete encoding module (class Encoder_Instance) and decoding module(s) (class Decoder_Instance) according to the given template.
  4. Write an execution script (XXX.sh) with parameter settings to run the new model.
  5. (Adapt a new dataset in load_data.py, and add a new sampling strategy in utils.py.)

Example of writing a new model

Dataset

(Currently, EAkit only supports DBP15K, but it is easy to adapt to other datasets.)

  • DBP15K is from the "mapping" folder of JAPE(But need to combine "ref_ent_ids" and "sup_ent_ids" into a single file named "ill_ent_ids")

Here, you can directly unpack the data file after downloading:

unzip data.zip

Reqirements

  • Python3 (tested on 3.7.7)
  • PyTorch (tested on 1.4.0)
  • PyTorch Geometric (PyG) (tested on 1.4.3)
  • TensorBoard (tested on 2.0.2)
  • Numpy
  • Scipy
  • Scikit-learn
  • Graph-tool (if use bootstrapping)

TODO

  • Results of BootEA, TransEdge, MMEA, NAEA are not satisfactory, they need debug (maybe on the bootstrapping process).

There are still many algorithms that need to be implemented (integrated):

  • Semantic Matching Models: NTAM, AttrE, CEAFF, ...
  • GNN-based Models: AVR-GCN, AliNet, MRAEA, CG-MuAlign, RDGCN, HGCN, GMNN, ...
  • KE-based Models: TransD, CapsE, ...
  • GAN-based Models: SEA, AKE, ...
  • Other Models: OTEA, ...

Find algorithms from Entity_Alignment_Papers.

Pull requests for implementing algorithms & updating (reproducible) results with shell scripts are welcome!

Acknowledgement

We refer to some codes of the following repos, and we appreciate for their great contributions: PyTorch Geometric, BootEA, TransEdge, AliNet, TuckER. If we miss some, do please let us know in Issues.

This project is mainly contributed by Chengjiang Li, Lei Hou, Juanzi Li.

About

Entity Alignment toolkit (EAkit), a lightweight, easy-to-use and highly extensible PyTorch implementation of many entity alignment algorithms.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%