word2vec-get-started

Welcome

Fit language model tasks with word2vec and insuranceqa-corpus.

Install

scripts/compile.sh # verified on Ubuntu 16.04
source ./env.sh
word2vec # for verify

Train

cp localrc.sample localrc # modify keys
scripts/train.sh

Get similarities

Post training, a model file is generated in tmp, use distance to get similarities for words.

$ src/distance tmp/iqa.w2v.20170909113039.bin1.neg1.cbow0.win5.iter30.embed100.thr30
Enter word or sentence (EXIT to break): 家庭

Word: 家庭  Position in vocabulary: 83

                                              Word       Cosine distance
------------------------------------------------------------------------
                                            日托                0.648058
                                            住房                0.645767
                                            初创                0.631415
                                            宝石                0.621161
                                            家务                0.612938

To compute all distances, use scripts/dist-analysis.sh MODEL_FILE

Principal component analysis(PCA)

deps

Jupyter Notebook

Install

cd tools/word2vec_boostpy
python setup.py install
pip install -U numpy matplotlib scipy scikit-learn ipython jupyter

Run

./scripts/pca.sh

Demo

open http://localhost:8888/notebooks/word2vec-get-started.ipynb

iqabot.v2

First, run ElasticSearch Service and Hanlp-api Service with elasticsearch-get-started.

cd iqabot.v2
cp config.sample.py config.py
python bot.py --query="为什么要获得医疗保险补充保险"

License

Apache 2.0

Trouble Shooting

compile error on Ubuntu install build essentials

sudo apt-get install build-essential

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
corpus		corpus
iqabot.v2		iqabot.v2
scripts		scripts
src		src
tools		tools
wikidata		wikidata
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
localrc.sample		localrc.sample

License

hailiang-wang/word2vec-get-started

Folders and files

Latest commit

History

Repository files navigation

word2vec-get-started

Welcome

Install

Train

Get similarities

Install

Run

Demo

iqabot.v2

License

Trouble Shooting

About

Topics

Resources

License

Stars

Watchers

Forks

Languages