Developed by students of the Simulator ML (Karpov.Courses)
Robusta ML Framework is an extension of the Scikit-learn library that provides additional features and capabilities for data processing and building machine learning models.
Robusta ML Framework library features include:
- Support for a large number of machine learning algorithms and models, including classical algorithms.
- Implementation of data preprocessing methods such as feature scaling, outlier processing, categorical feature coding.
- Tools for choosing the best model, including cross-validation, hyperparameter fitting, and model evaluation.
- Ability to save and load results for later use.
- Table of contents
- Getting started
- Modules
- Project principles and design decisions
- Testing
- Getting in touch
This project is available on PyPI, meaning that you can just:
pip install robusta
Otherwise, clone/download the project and in the project directory run:
python setup.py install
If you would like to make major changes to integrate this with your proprietary system, it probably makes sense to clone this repository and to just use the source code.
git clone https://github.com/uberkinder/robusta
Alternatively, you could try:
pip install -e git+https://github.com/uberkinder/robusta.git
In this section, we detail some robusta available functionality. More examples are offered in the Jupyter notebooks here. Another good resource is the tests.
- RepeatedGroupKFold
- RepeatedKFold
- StratifiedGroupKFold
- RepeatedStratifiedGroupKFold
- AdversarialValidation
- PermutationImportance
- GroupPermutationImportance
- ShuffleTargetImportance
- BlendRegressor
- BlendClassifier
- CaruanaRegressor
- NNGRegressor
- GridSearchCV
- OptunaCV
- RandomSearchCV
- It should be easy to swap out individual components of the optimization process with the user's proprietary improvements.
- Usability is everything: it is better to be self-explanatory than consistent.
- There is no point in portfolio optimization unless it can be practically applied to real asset prices.
- Everything that has been implemented should be tested.
- Inline documentation is good: dedicated (separate) documentation is better. The two are not mutually exclusive.
- Formatting should never get in the way of coding: because of this, I have deferred all formatting decisions to Black.
Tests are written in pytest, and I have tried to ensure close to 100% coverage. Run the tests by navigating to the package directory and simply running pytest
on the command line.
If you are having a problem with Robusta, please raise a GitHub issue.