Code Information : Code consist of 6 folder, 5 for individual methods and 1 for google embedding All the code are well commented and presented in ipython jupyter notebook interactive shell. And make easy sense. Though in word to vec you have to execute make file followed by bash file for the code.
Software Requirement :
- Java Netbeans
- Python Packages a. Jupyter Notebook b. Pandas c. Numpy d. Scipy e. nltk f. Sklearn g. Seematch h. os (Having Anaconda Distribution will be appreciated)
- GCC compiler for the C code. System Requirement : A decent system with 4 GB ram and 50 GB space on harddisk(ESA index occuppies most of it). A higher configuration system will always be appreciated.
WordSim353 file can vary from the corpus to corpus. Since it can be the case that all the words are not present in vocabulary.
Warning : Increasing data above a threshold in some particular codes can be dangerous for your system and you will be solely responsible for your actions. Python package like sklearn require large amount of memory for TF-IDF matrix. If convert from sparse storage format to normal numpy format.
Code is equally contributed by 1. me 2. Shubham Patel (https://github.com/lifeisshubh)