Skip to content

goaaron/SigSpark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SigSpark

Compiles against the Spark 2.0 build.

SigSpark integrates SigOpt's functionality into Spark for performing Bayesian optimization over hyperparameters. It is designed to automatically run experiments in a manner that iteratively adjusts parameter values so as to minimize some user-defined objective over a pre-defined state space in as few runs as possible.

####Relevant Publications

Practical Bayesian Optimization of Machine Learning Algorithms  
Jasper Snoek, Hugo Larochelle and Ryan Prescott Adams  
Advances in Neural Information Processing Systems, 2012  

Multi-Task Bayesian Optimization  
Kevin Swersky, Jasper Snoek and Ryan Prescott Adams  
Advances in Neural Information Processing Systems, 2013  

Input Warping for Bayesian Optimization of Non-stationary Functions  
Jasper Snoek, Kevin Swersky, Richard Zemel and Ryan Prescott Adams  
International Conference on Machine Learning, 2014  

Bayesian Optimization and Semiparametric Models with Applications to Assistive Technology  
Jasper Snoek, PhD Thesis, University of Toronto, 2013  

Bayesian Optimization with Unknown Constraints
Michael Gelbart, Jasper Snoek and Ryan Prescott Adams
Uncertainty in Artificial Intelligence, 2014

www.blog.sigopt.com

####Example Usage

>>>import org.apache.spark.ml.tuning.CrossValidator
>>>import org.apache.spark.ml.regression.LinearRegression
>>>import org.apache.spark.ml.evaluation.RegressionEvaluator

>>>val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>>>val data = sqlContext.read.format("libsvm").load("data/mllib/sample_linear_regression_data.txt")

>>>val lr = new LinearRegression()
>>>val cv = new CrossValidator()

Format of bounds is: Array((String,Double,Double,String))
(ParameterName :String, Max: Double, Min: Double, type: String)

>>>val bounds =  Array(("elasticNetParam", 1.0, 0.0, "double"), ("regParam",1.0,0.0, "double"))

>>>cv.setNumFolds(10)
>>>cv.setEstimator(lr)
>>>cv.setEvaluator(new RegressionEvaluator)

Establish the experiment: (name: String, api_token: String, iteration: int, bounds)

>>>cv.setSigCV("Timing","ADGGBVMWFCLSDKFMGVMFLKF", 10, bounds)
>>>cv.askSuggestion(lr)

>>>cv.SigFit(data)

####Application to topic modeling

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published