WARNING: This is alpha software. Have fun!
MutNN is an experimental ONNX runtime that supports seamless multi-GPU graph execution on CUDA GPUs and provides baseline implementations of both model and data parallelism.
V0.2: Adds support for multi-graph / multi-tenant NN execution!
Developed by Benjamin Ghaemmaghami & Saurabh Gupta
import onnx2parla as o2p
import numpy as np
def load_batch(start, end):
batch_size = end - start
return np.random.random((batch_size,3,224,224))
def store_batch(res):
print(res)
# config with batch_size = 16, total_batches = 256
cfg = o2p.Config(store_batch, load_batch, 16, 256)
o2p_model = o2p.build("resnet18v1.onnx", config)
o2p_model.run()
Current operator support is limited with most of the focus placed on supporting CNNs.
The full list is here
The ONNX Model Zoo is a great place to find a wide selection of pre-trained neural network graphs. All of the major frameworks have some capability to be converted to ONNX graphs.
- Datatypes are assumed to be float32
- No support for multiple graph inputs/outputs
- Conv operator can occasionally crash when run on GPU
- No ONNX versioning support, most operators support only the newest version
First check your ONNX graph using something like Netron to ensure all operators in your graph are supported by ONNX2Parla
Next, enable pass debugging mode
cfg = o2p.Config(...)
cfg.debug_passes = True
and inspect the generated GML files to see if your graph is correctly being processed by the system.