`RSDS` (Rust Dask Scheduler)

rsds is a Rust implementation of the Dask/distributed centralized server and scheduler. It serves mostly as an experiment for evaluating the performance gain of having a Dask server written in a language without automatic memory management and for benchmarking different scheduling algorithms.

Disclaimer

Dask/distributed has a very complex feature set and protocol and we do not support most of the advanced features like dashboard or custom communication protocols (UCX) at this moment.

If rsds can run your use case, you could possibly see some speedup if the scheduler is the bottleneck of your pipeline. If it isn't, it can be actually slower than Dask, since it uses much simpler scheduling heuristics. YMMV.

If your pipeline cannot be run by rsds, feel free to send us an issue.

Usage

To compile and use rsds, you must have Rust toolchain installed. You can install it using e.g. Rustup.

Build rsds:

$ RUSTFLAGS="-C target-cpu=native" cargo build --release

Install our modified version of Dask:
```
$ pip install git+https://github.com/Kobzol/distributed@simplified-encoding
```
The modifications that we had to perform to make it manageable to implement the Dask protocol in Rust are described here.
Use rsds-scheduler instead of dask-scheduler when starting a Dask cluster:
```
$ ./target/release/rsds-scheduler
```

After that just use target/release/rsds-scheduler as you would use dask-scheduler. Be wary that most of the command line options from dask-scheduler are not supported though.

Hello world example

Setup a cluster on the local machine

# run server
$ ./target/release/rsds-scheduler
# run worker (in another shell)
$ dask-worker localhost:8786

Run a simple example that uses a Dask dataframe:

import dask
from dask.distributed import Client

client = Client("tcp://localhost:8786")

df = dask.datasets.timeseries(start="2020-01-01", end="2020-01-03")
result = df.groupby("name")["x"].mean().compute()
print(result)

Benchmarks

You can find a set of benchmarks in the scripts folder. Here are some result of comparing RSDS and Dask on 1 and 7 node clusters with 24 workers per node.

Reports

dask/distributed#3139

Name		Name	Last commit message	Last commit date
Latest commit History 247 Commits
.github/workflows		.github/workflows
benches		benches
dask		dask
resources		resources
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

benches

benches

dask

dask

resources

resources

scripts

scripts

src

src

tests

tests

.gitignore

.gitignore

Cargo.lock

Cargo.lock

Cargo.toml

Cargo.toml

LICENSE

LICENSE

README.md

README.md

setup.cfg

setup.cfg

Repository files navigation

`RSDS` (Rust Dask Scheduler)

Disclaimer

Usage

Hello world example

Benchmarks

Reports

About

Releases

Packages

Languages

License

vyomkeshj/rsds

Folders and files

Latest commit

History

Repository files navigation

RSDS (Rust Dask Scheduler)

Disclaimer

Usage

Hello world example

Benchmarks

Reports

About

Resources

License

Stars

Watchers

Forks

Languages

`RSDS` (Rust Dask Scheduler)