Skip to content

ibrahim85/alexandria

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

94 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

alexandria

This is a high-level machine learning framework that allows for the users to easily run multiple types of machine learning experiments at the drop of a hat. I'm currently working on developing this project, along with the wiki pages further.

Build

To build from source (which is currently the only way to build this), use the Makefile:

$ make

This will call the setup.py script and will attempt to install the package onto your system. If you find any issues, please create one and I'll get on to it. I haven't done these sorts of things before, so bugs are expected.

Examples

Basic Classification

A basic example for the API is below:

# examples/demo.py - DataBunch and DataFrame demonstrations
# Data preprocessing
from sklearn.datasets import load_iris, load_diabetes

from alexandria.experiment import Experiment

if __name__ == '__main__':
	# Data preprocessing
	iris = load_iris()

	experiment = Experiment(
		name='Cross Validation Example #1',
		dataset=iris,
		xlabels='data',
		ylabels='target',
		models=['rf', 'dt', 'knn', 'nb']
	)
	experiment.trainCV(nfolds=10, metrics=['accuracy', 'rec', 'prec', 'auc'])
	experiment.summarizeMetrics()

Output:

name                   Accuracy       Recall         Precision      AUC
---------------------  -------------  -------------  -------------  -------------
sklearn.random forest  0.9600±0.0442  0.9600±0.0442  0.9644±0.0418  0.9907±0.0147
sklearn.decision tree  0.9600±0.0442  0.9600±0.0442  0.9644±0.0418  0.9700±0.0332
sklearn.k neighbors    0.9667±0.0447  0.9667±0.0447  0.9738±0.0339  0.9873±0.0222
sklearn.naive bayes.Gaussian  0.9533±0.0427  0.9533±0.0427  0.9627±0.0325  0.9947±0.0088

Basic Regression with Pandas DataFrame

	# Data preprocessing for dataframe object
	diabetes_df = load_diabetes(as_frame=True).frame
	data_cols = diabetes_df.columns[:-1] # All columns, but the last one is the target
	target_col = diabetes_df.columns[-1] # 'target'

	experiment = Experiment(
		name='Cross Validation Example #2',
		dataset=diabetes_df,
		xlabels=data_cols,
		ylabels=target_col,
		models=['rf', 'dt', 'knn']
	)
	experiment.trainCV(nfolds=10, metrics='r2')
	experiment.summarizeMetrics()

Output:

Cross Validation Example #2
name                   R2
---------------------  --------------
sklearn.random forest  0.3963±0.1006
sklearn.decision tree  -0.2044±0.2989
sklearn.k neighbors    0.3329±0.1247

Naive Bayes Flavors Comparison

Code:

# Let's run all of the Naive Bayes models and compare their performance
	models = {
		'sklearn': [
			{
				'model': 'nb',
				'flavor': 'bernoulli'
			},
			{
				'model': 'nb',
				'flavor': 'Categorical'
			},
			{
				'model': 'nb',
				'flavor': 'complement'
			},
			{
				'model': 'nb',
				'flavor': 'gaussian'
			},
			{
				'model': 'nb',
				'flavor': 'multi'
			}
		]
	}
	experiment = Experiment(
		name='Naive Bayes Experiment',
		dataset=iris,
		xlabels='data',
		ylabels='target',
		modellibdict=models
	)
	experiment.trainCV(nfolds=10, metrics=['acc', 'rec', 'prec', 'auc'])
	experiment.summarizeMetrics()

Output:

Naive Bayes Experiment
name                             Accuracy       Recall         Precision      AUC
-------------------------------  -------------  -------------  -------------  -------------
sklearn.naive bayes.Bernoulli    0.3333±0.0000  0.3333±0.0000  0.1111±0.0000  0.5000±0.0000
sklearn.naive bayes.Categorical  0.9267±0.0629  0.9267±0.0629  0.9355±0.0595  0.9847±0.0179
sklearn.naive bayes.Complement   0.6667±0.0000  0.6667±0.0000  0.4926±0.0148  0.9780±0.0181
sklearn.naive bayes.Gaussian     0.9533±0.0427  0.9533±0.0427  0.9627±0.0325  0.9947±0.0088
sklearn.naive bayes.Multinomial  0.9533±0.0670  0.9533±0.0670  0.9599±0.0608  0.9860±0.0256

About

High-level Machine Learning Framework

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.9%
  • Makefile 0.1%