Course Project for cs600.615

Contributers:

Javad Fotouhi

Poorya Mianjy

Satya Prateek (sbommar1@jhu.edu)

Xiaozhou Zhou (xzhou18@jhu.edu)

Warning: Don't run programs in the damsl gateway, instead, ssh to a working machine. Please see the file damsl-cluster.txt for all available machines.

Using Docker to run a pyspark test script:

ssh to one host machine where the image damsl/spark is stored.

ssh qp-hd4

Create a docker container with the Spark image damsl/spark:

docker run -it --name xpjs -v /damsl/projects/MD/Simulations/bpti:/data -v /home/bdslss15-xpjs/test:/test -v /output damsl/spark

Note: You only need to create it once, or you can try to create other containers with a different name. Volumes mounted:

/data: a read-only volume mounted from the bpti data folder on the host server, which includes all bpti data we need for this project
/test: a read-only volume mounted from the host server which contains some test spark script and data
/output: a read-write volume that can be used to hold the output

After creating the container, use CTRL+P+Q to exit so that the container keeps running

Enter the bash shell of the container and go to the root directory:

docker exec -it xpjs bash 
cd ..

test spark using test script. (make sure that triangles.out is not in /output)

/software/spark-1.2.1/bin/spark-submit test/triangle_count.py test/testdata

Then you should see triangles.out in the /output folder

exit the container

exit

Alternatively you can use CTRL+P+Q

Preprocessing the feature files:

In the root directory of the repo:

python src/DataReader.py /damsl/projects/MD/Simulations/bpti/features/ data/

Please see the document in DataReader.py for more information. The output /data folder is too large (>10G) so DON'T push it into this repo. (already added "/data" to .gitignore) It's expected to cost more than an hour to preprocessing all feature files in non-spliting mode, and even longer time in splitting mode.

If the json file is too large to open, you can check the data in python shell. For example:

python
>>> import json
>>> a = json.loads(open("data/chi1.json","r").read()) # parse the json file into an array
>>> print a[3] # print the feature values of the 3rd data point

The output is a big 1-D array.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
autoencoders		autoencoders
doc		doc
output		output
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

autoencoders

autoencoders

doc

doc

output

output

src

src

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Course Project for cs600.615

Contributers:

Warning: Don't run programs in the damsl gateway, instead, ssh to a working machine. Please see the file damsl-cluster.txt for all available machines.

Using Docker to run a pyspark test script:

Preprocessing the feature files:

About

Releases

Packages

Languages

PhiphyZhou/autoencoder-for-bpti

Folders and files

Latest commit

History

Repository files navigation

Course Project for cs600.615

Contributers:

Warning: Don't run programs in the damsl gateway, instead, ssh to a working machine. Please see the file damsl-cluster.txt for all available machines.

Using Docker to run a pyspark test script:

Preprocessing the feature files:

About

Resources

Stars

Watchers

Forks

Languages