This project is the foundation for sharing machine learning models. It helps to maintain the
registry, the remote storage where all model files are stored in a structured, cataloged way.
It defines modelforge.Model
, the base class for all the models which is capable of automatic
fetching from the registry. It provides the abstraction over loading and saving models on disk
as well.
Each model receives a UUID and carries other metadata. The underlying file format is ASDF.
Currently, only one registry storage backend is supported: Google Cloud Storage.
src-d/ml uses modelforge
to make ML on source code accessible
for everybody.
pip3 install modelforge
The project exposes two interfaces: API and command line.
modelforge
package contains the most important classes and functions: Model
base class,
merge_strings
, split_strings
which optimize the serialization of string lists,
disassemble_sparse_matrix
, assemble_sparse_matrix
which handle sparse matrices.
A "model" here means something which holds the data and can be (de)serialized, like in
web development.
Models can be registered with modelforge.register_model()
- this is not strictly needed, but needed for extended model dumps. Most typically, you would like to import all your model classes and register them in a single module.
It is possible to register a custom registry storage with modelforge.backends.register_backend()
.
python3 -m modelforge --help
init
initializes the empty registry.publish
pushes the model file specified to the registry and updates the indexdump
prints brief information about the model. Local path, URL or UUID must be specified:
modelforge dump https://storage.googleapis.com/models.cdn.sourced.tech/models/<model>/<uuid>.asdf \
--backend "gcs" --args bucket="models.cdn.sourced.tech"
modelforge dump <uuid> --backend "gcs" --args bucket="models.cdn.sourced.tech"
modelforge dump /path/to/model
list
lists all the models in the registry.delete
deletes a model, UUID must be specified.
It is possible to specify the default backend, backend's options and the vendor. Create
modelforgecfg.py
anywhere in your project tree.
docker build -t srcd/modelforge .
docker run -it --rm srcd/modelforge --help
We use PEP8 with line length 99 and ". All the tests must pass:
python3 -m unittest discover /path/to/modelforge
Apache 2.0.