Skip to content

A DeepWalk implementation for ontologies using NetworkX and Gensim

Notifications You must be signed in to change notification settings

vortext/DeepOntology

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepOntology

DeepOntology is a reimplementation of DeepWalk, specifically tuned for ontologies. Under the hood it uses NetworkX and Gensim to construct a SkipGram model on (Metropolized) Random Walks from spanning forests. By walking parent-of relations of an ontology a lower dimensional continuous embedding is made that encodes for those relations. These embeddings can be used for clustering and classification tasks, without resorting to sparse models of ancestor relations.

Caveat

The random walks are constructed using multi-process parallelization without shared memory, which means the whole graph has to fit it memory n times (n=number of processes). Out of core or shared memory realization is work in progress.

Usage

python train.py --help
usage: DeepOntology [-h] --input INPUT --output OUTPUT [--delimiter DELIMITER]
                    [--format FORMAT]
                    [--representation-size REPRESENTATION_SIZE]
                    [--walks WALKS] [--length LENGTH] [--iter ITER]
                    [--window-size WINDOW_SIZE] [--workers WORKERS]
                    [--metropolized METROPOLIZED] [--binary BINARY]
                    [--seed SEED]

optional arguments:
  -h, --help            show this help message and exit
  --input INPUT         Input graph file (default: None)
  --output OUTPUT       Output representation file (default: None)
  --delimiter DELIMITER
                        Delimiter used in the input file, e.g. ',' for CSV
                        (default: None)
  --format FORMAT       Format of the input file, either edgelist or adjlist
                        (default: edgelist)
  --representation-size REPRESENTATION_SIZE
                        Number of latent dimensions to learn for each node.
                        (default: 64)
  --walks WALKS         Number of walks for each node (default: 5)
  --length LENGTH       Length of the random walk on the graph (default: 40)
  --iter ITER           Number of iteration epocs (default: 5)
  --window-size WINDOW_SIZE
                        Window size of the skipgram model (default: 5)
  --workers WORKERS     Number of parallel processes (default: 1)
  --metropolized METROPOLIZED
                        Use Metropolis Hastings for random walk (default:
                        False)
  --binary BINARY       Use the binary output format for Word2Vec (default:
                        True)
  --seed SEED           Seed for random walk generator. (default: 1)

Citing

@inproceedings{Perozzi:2014:DOL:2623330.2623732,
 author = {Perozzi, Bryan and Al-Rfou, Rami and Skiena, Steven},
 title = {DeepWalk: Online Learning of Social Representations},
 booktitle = {Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
 series = {KDD '14},
 year = {2014},
 isbn = {978-1-4503-2956-9},
 location = {New York, New York, USA},
 pages = {701--710},
 numpages = {10},
 url = {http://doi.acm.org/10.1145/2623330.2623732},
 doi = {10.1145/2623330.2623732},
 acmid = {2623732},
 publisher = {ACM},
 address = {New York, NY, USA},
 keywords = {deep learning, latent representations, learning with partial labels, network classification, online learning, social networks},
}

About

A DeepWalk implementation for ontologies using NetworkX and Gensim

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages