PSDVec

Source code for "A Generative Word Embedding Model and its Low Rank Positive Semidefinite Solution" (accepted by EMNLP'15) and "PSDVec: Positive Semidefinte Word Embedding" (about the use of this toolset, under review).

Update v0.4: Online block-wise factorization:

Obtain 25000 core embeddings, into 25000-500-EM.vec:
- python factorize.py -w 25000 top2grams-wiki.txt
Obtain 45000 noncore embeddings, totaling 70000 (25000 core + 45000 noncore), into 25000-70000-500-BLKEM.vec:
- python factorize.py -v 25000-500-EM.vec -o 45000 top2grams-wiki.txt
Incrementally learn other 50000 noncore embeddings (based on 25000 core), into 25000-120000-500-BLKEM.vec:
- python factorize.py -v 25000-70000-500-BLKEM.vec -b 25000 -o 50000 top2grams-wiki.txt
Repeat 3 a few times to get more embeddings of rarer words.

Pretrained 120,000 embeddings and evaluation results are uploaded.

Update v0.3: Block-wise factorization

Pretrained 100,000 embeddings and evaluation results are uploaded (now replaced by an expanded set of 120,000 embeddings).

Testsets are by courtesy of Omer Levy (https://bitbucket.org/omerlevy/hyperwords/src).

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
testsets		testsets
.gitignore		.gitignore
25000-120000-500-BLKEM.part1.rar		25000-120000-500-BLKEM.part1.rar
25000-120000-500-BLKEM.part2.rar		25000-120000-500-BLKEM.part2.rar
29291-500-EM.rar		29291-500-EM.rar
README.md		README.md
absentwords.txt		absentwords.txt
evaluate.py		evaluate.py
extractwiki.py		extractwiki.py
fact.bat		fact.bat
factorize.py		factorize.py
gram.bat		gram.bat
gramcount.pl		gramcount.pl
patch to gensim.py		patch to gensim.py
perlxs.h		perlxs.h
results 20000+80000.txt		results 20000+80000.txt
results 25000+45000.txt		results 25000+45000.txt
results 25000+95000.txt		results 25000+95000.txt
results 25000.txt		results 25000.txt
supp.pdf		supp.pdf
top1grams-wiki-clean.txt		top1grams-wiki-clean.txt
utils.py		utils.py

ZhouJiaLinmumu/topicvec

Folders and files

Latest commit

History

Repository files navigation

PSDVec

Update v0.4: Online block-wise factorization:

Update v0.3: Block-wise factorization

About

Resources

Stars

Watchers

Forks

Languages