Skip to content

ZimpleX/spark-benchmark-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

These are benchmarks for Spark (in python)

  • The characteristics of these benchmarks are heavy MapReduce operations, including:
    • Logistic Regression in neural network training
    • ...

Content

  • submit-to-spark.sh: wrapper of spark-submit script inside spark/bin dir
    • renaming existing log files: as 1.out, 2.out, ... n.out in chronological order
    • all log files are read-only: ensure that chronological orde won't mess up
    • submit via spark-submit, write new log to (n+1).out
    • TODO: echo revision number of benchmark to log file
    • TODO: add arg parsing to support general benchmark testing
  • benchmark-* : directories containing all source code for different benchmarks

Dir structure

  • this repo (benchmark/ in the following graph) and the Spark directory share the same parent directory
  • the log file generated by submit-to-spark.sh is stored in bm-log/<benchmark-name>/
project
|
`-- benchmark-python/
|   |
|   `-- LogReg/
|   |
|   `-- ...
|
`-- spark-1.5.0/
|   |
|   `-- ...
|
`-- bm-log/
    |
    `-- LogReg/
    |   |
    |   `-- 1.out
    |   |
    |   `-- 2.out
    `-- ...

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published