DASH performs deep learning training job scheduling on heterogeneous GPU types in a cluster
16 NVIDIA TESLA K80 GPUs
8 NVIDIA TESLA V100 GPUs
- Python 3.6
- Tensorflow 1.14
- CUDA 10.0
CIFAR10 dataset: link
Go to directory final/final4_new/
.
First allocate an external node (or scheduler node) for tcp client
On each node with the GPUs, start the tcp server by
python gpu_server.py args
Go to the external node, start DASH scheduling on the benchmark
python main.py args