Project to test spark code. Please refer to task.py for per application conditions.
Below will be image of network architecture when containers are deployed using docker-compose. Ip address might vary depending on your environment.
I have tested this repos on below container platform.
- docker 19.03.5
- docker-compose 1.25.0, build 0a186604
docker-compose up --scale spark-worker=NUM_OF_WORKERS -d
cd $APP_DIR make deploy NAME=NAME_OF_APP
start-history-server.sh --properties-file spark-history-server.conf
pip install -t deps MODULES
cd deps
zip deps.zip .
I have my docker installed as below filesystem condition.
Below will be sample result for spark benchmarking.
App Name | Description | DataSet | Number of Datasets | Dataset size | Executor Cpus | Executore Memory | Duration |
---|---|---|---|---|---|---|---|
s3connect | Upload jpg imgs from local hdfs to S3 | Landsat8 | 35193(small.jpg) | 2907B - 18969B | 1 | 1G | 4.0h (4,091.70 sec / 10K img) |
s3connect | Upload jpg imgs from local hdfs to S3 | Landsat8 | 35193(small.jpg) | 2907B - 18969B | 2 | 1G | 2.5h (2,557.30 sec / 10K img) |
s3connect | Upload jpg imgs from local hdfs to S3 | Landsat8 | 35193(small.jpg) | 2907B - 18969B | 3 | 1G | 1.6h (1,636.70 sec / 10K img) |