Skip to content

iron-fe/rntn_on_Spark

Repository files navigation

rntn-spark

Description

Repository for MTA final Msc. project: Distributed RNTN.
The purpose of this project is to implement the Recurssive Neural Tensor Network (RNTN) for sentiment analysis as described in the paper by R. Socher in a distributed manner using Apache Spark.
We are following the Downpour paradigm described by Jeffrey Dean from google and implemented by Dirk Neumann's DeepDist project.

Please bare in mind: This is a work in progress! This is, by no means, a download-and-run project.

pre-requites and setup instructions

  1. RNTN

  2. Download/clone the forked semantic-rntn project to every node on your cluster. This is based on the original semantic-rntn project. The only difference is that I have taken the existing project and turned it into a module, thus enabling it to be installed and managed on all nodes of the cluster.

  3. Install by running:
    python setup.py install

  4. DeepDist

  5. At the moment, some updates are needed in order to run RNTN using DeepDist. Those updates are available from my forked Deepdist project. Until my pull requests are approved, Download/clone the forked DeepDist project to every node on your cluster.

  6. Install by running:
    python setup.py install

  7. Spark

  8. Follow the instructions on Downloading and installing Spark from the documentation. Make sure you know the paths to pyspark and py4j.

  9. rntn-spark

  10. Download/clone the rntn-spark project (this).

  11. In the configuration file: update the paths to Spark's python and py4j paths and set the app name.

  12. Update the sparkrunner.sh script with your master address and port.

  13. Run:
    ```sh sparkrunner.sh``

Support

Please use github's issues to report troubles.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published