Skip to content

adityamogadala/xLiMeSemanticIntegrator

Repository files navigation

This readme provides information about dependencies, installation instructions and how to get started with the code.

Dependencies

Code is Written in Python 2.7+ and Java. Also, it depends on.

Installation Instructions (Debian/Ubuntu)

  1. $git clone https://github.com/adityamogadala/xLiMeSemanticIntegrator.git
  2. Make sure you have python-dev and setuptools. Otherwise install
    • sudo apt-get install python-dev
    • sudo pip install --upgrade setuptools
  3. BLAS/LAPACK are required for scipy and numpy. If not already present, install.
    • sudo apt-get install libblas-dev liblapack-dev
  4. $sudo pip install -r requirements.txt
  5. $sudo pip install kafka-python
  6. Download Word Embeddings (Monolingual and Bilingual) zip files. Extract and keep them in StoreWordVec/wiki for Wikipedia, StoreWordVec/news for News etc..
  7. Get MongoDB and run the following.
    • $sudo mkdir -p data/db/ (Create at $HOME directory for MongoDB database)

Get Started

  • Start MongoDB deamon with authentication and create admin user for all DBs.
    • $sudo mongod --fork --logpath mongodb.log --auth
    • $mongo
    • > use admin.
    • > db.createUser({user:"username",pwd:"password",roles: [{role:"userAdminAnyDatabase",db: "admin"}]}) (Create super user and password for the "admin" database).
    • > exit
    • $mongo -u username -p password --authenticationDatabase admin
    • > use MyStore (Create Your own Database which will be used in Config file)
    • > db.createUser({user:"username",pwd:"password",roles: [{role:"dbOwner",db: "MyStore"}]}) (Create Username and Password for the database).
    • > exit
  • Update config/Config.conf as suggested in the file.
  • $python setup.py
  • Start service/collector.sh to collect data from the Kafka stream.
    • $ nohup sh collector.sh &
  • Test if your MongoDB database collections exist and create text indexes for them.
    • $mongo -u username -p password --authenticationDatabase admin
    • > use MyStore
    • > db.auth("username","password") ("MyStore" user authentication)
    • > show collections
    • > db.getCollection('YOUR_COLLECTION_NAME').ensureIndex( {Text: "text", Title: "text"}, {dropDups: true} )
    • > exit
  • Start service/vecgenerator.sh to generate vectors for subtitles and speech to text data (yet to add for news and social media).
    • $ nohup sh vecgenerator.sh &
  • Examples folder contains few examples on how to use different classes for tasks such as simple search, advanced search, monolingual and cross-lingual document similarity and analytics. You can use individual python files or ipython file (.ipynb) for execution.

About

More Information about Features, Deliverables and Publications @

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published