These are benchmarks for Spark (in python)
- The characteristics of these benchmarks are heavy MapReduce operations, including:
- Logistic Regression in neural network training
- ...
submit-to-spark.sh
: wrapper ofspark-submit
script insidespark/bin
dir- renaming existing log files: as
1.out
,2.out
, ...n.out
in chronological order - all log files are read-only: ensure that chronological orde won't mess up
- submit via
spark-submit
, write new log to(n+1).out
- TODO: echo revision number of benchmark to log file
- TODO: add arg parsing to support general benchmark testing
- renaming existing log files: as
benchmark-*
: directories containing all source code for different benchmarks
- this repo (
benchmark/
in the following graph) and theSpark
directory share the same parent directory - the log file generated by
submit-to-spark.sh
is stored inbm-log/<benchmark-name>/
project
|
`-- benchmark-python/
| |
| `-- LogReg/
| |
| `-- ...
|
`-- spark-1.5.0/
| |
| `-- ...
|
`-- bm-log/
|
`-- LogReg/
| |
| `-- 1.out
| |
| `-- 2.out
`-- ...