This is a high-level machine learning framework that allows for the users to easily run multiple types of machine learning experiments at the drop of a hat. I'm currently working on developing this project, along with the wiki pages further.
To build from source (which is currently the only way to build this), use the Makefile:
$ make
This will call the setup.py
script and will attempt to install the package onto your system. If you find any issues, please create one and I'll get on to it. I haven't done these sorts of things before, so bugs are expected.
A basic example for the API is below:
# examples/demo.py - DataBunch and DataFrame demonstrations
# Data preprocessing
from sklearn.datasets import load_iris, load_diabetes
from alexandria.experiment import Experiment
if __name__ == '__main__':
# Data preprocessing
iris = load_iris()
experiment = Experiment(
name='Cross Validation Example #1',
dataset=iris,
xlabels='data',
ylabels='target',
models=['rf', 'dt', 'knn', 'nb']
)
experiment.trainCV(nfolds=10, metrics=['accuracy', 'rec', 'prec', 'auc'])
experiment.summarizeMetrics()
Output:
name Accuracy Recall Precision AUC
--------------------- ------------- ------------- ------------- -------------
sklearn.random forest 0.9600±0.0442 0.9600±0.0442 0.9644±0.0418 0.9907±0.0147
sklearn.decision tree 0.9600±0.0442 0.9600±0.0442 0.9644±0.0418 0.9700±0.0332
sklearn.k neighbors 0.9667±0.0447 0.9667±0.0447 0.9738±0.0339 0.9873±0.0222
sklearn.naive bayes.Gaussian 0.9533±0.0427 0.9533±0.0427 0.9627±0.0325 0.9947±0.0088
# Data preprocessing for dataframe object
diabetes_df = load_diabetes(as_frame=True).frame
data_cols = diabetes_df.columns[:-1] # All columns, but the last one is the target
target_col = diabetes_df.columns[-1] # 'target'
experiment = Experiment(
name='Cross Validation Example #2',
dataset=diabetes_df,
xlabels=data_cols,
ylabels=target_col,
models=['rf', 'dt', 'knn']
)
experiment.trainCV(nfolds=10, metrics='r2')
experiment.summarizeMetrics()
Output:
Cross Validation Example #2
name R2
--------------------- --------------
sklearn.random forest 0.3963±0.1006
sklearn.decision tree -0.2044±0.2989
sklearn.k neighbors 0.3329±0.1247
Code:
# Let's run all of the Naive Bayes models and compare their performance
models = {
'sklearn': [
{
'model': 'nb',
'flavor': 'bernoulli'
},
{
'model': 'nb',
'flavor': 'Categorical'
},
{
'model': 'nb',
'flavor': 'complement'
},
{
'model': 'nb',
'flavor': 'gaussian'
},
{
'model': 'nb',
'flavor': 'multi'
}
]
}
experiment = Experiment(
name='Naive Bayes Experiment',
dataset=iris,
xlabels='data',
ylabels='target',
modellibdict=models
)
experiment.trainCV(nfolds=10, metrics=['acc', 'rec', 'prec', 'auc'])
experiment.summarizeMetrics()
Output:
Naive Bayes Experiment
name Accuracy Recall Precision AUC
------------------------------- ------------- ------------- ------------- -------------
sklearn.naive bayes.Bernoulli 0.3333±0.0000 0.3333±0.0000 0.1111±0.0000 0.5000±0.0000
sklearn.naive bayes.Categorical 0.9267±0.0629 0.9267±0.0629 0.9355±0.0595 0.9847±0.0179
sklearn.naive bayes.Complement 0.6667±0.0000 0.6667±0.0000 0.4926±0.0148 0.9780±0.0181
sklearn.naive bayes.Gaussian 0.9533±0.0427 0.9533±0.0427 0.9627±0.0325 0.9947±0.0088
sklearn.naive bayes.Multinomial 0.9533±0.0670 0.9533±0.0670 0.9599±0.0608 0.9860±0.0256