Skip to content

Estimate similarities between words using Wordnet based measure, ESA, LSA, word2vec. Also, calculated correlation of all techniques with WordSim353 and compared them.

Notifications You must be signed in to change notification settings

IngleJaya95/Word-Similarity-Estimation-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Code Information : Code consist of 6 folder, 5 for individual methods and 1 for google embedding All the code are well commented and presented in ipython jupyter notebook interactive shell. And make easy sense. Though in word to vec you have to execute make file followed by bash file for the code.

Software Requirement :

  1. Java Netbeans
  2. Python Packages a. Jupyter Notebook b. Pandas c. Numpy d. Scipy e. nltk f. Sklearn g. Seematch h. os (Having Anaconda Distribution will be appreciated)
  3. GCC compiler for the C code. System Requirement : A decent system with 4 GB ram and 50 GB space on harddisk(ESA index occuppies most of it). A higher configuration system will always be appreciated.

WordSim353 file can vary from the corpus to corpus. Since it can be the case that all the words are not present in vocabulary.

Warning : Increasing data above a threshold in some particular codes can be dangerous for your system and you will be solely responsible for your actions. Python package like sklearn require large amount of memory for TF-IDF matrix. If convert from sparse storage format to normal numpy format.

Code is equally contributed by 1. me 2. Shubham Patel (https://github.com/lifeisshubh)

About

Estimate similarities between words using Wordnet based measure, ESA, LSA, word2vec. Also, calculated correlation of all techniques with WordSim353 and compared them.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published