The model is run and built based on the information in the configuration file. It is divided into two parts: Data and Model. Below are the meanings of each of the parameters.
Data::Input::
- path to csv input dataData::Id::
- instrument id columnData::Time::
- time columnData::Weight::
- column to use for sample weightsData::NumericalFeatures::
- comma-separated list of numerical features (including timeindex and instrument id)Data::CategoricalFeatures::
- comma-separated list of categorical features
Data::OutputInSample::
- path to csv output predictionsData::OutputOutSample::
- path to csv output predictionsData::Target::
- target column to predictData::StartInSample::
- time index of where to begin in-sample, can be omitted for alpha predictionsData::EndInSample::
- time index of where to end in-sampleData::StartOutSample::
- time index of where to begin out-sampleData::EndOutSample::
- time index of where to end out-sample
Data::AlphaDirectory::
- path of where to savae output of predictions when model read from file
Model::RNN::
- number of neurons in each of the model’s RNN layers (comma-separated list, one number for each layer)Model::Dense::
- number of neurons in each of the model’s fully-connected layers (comma-separated list, one number for each layer)Model::Activation::
- activation function to use (MUST BE ONE OF THESE: tanh, relu, sigmoid, softmax)Model::BatchSize::
- number of instruments to feed through model at onceModel::StepSize::
- number of timesteps to train model on at once (set to -1 to use the max available)
Model::NumEpochs::
- number of epochs for trainingModel::LearningRate::
- learning rate for Adam optimizerModel::KeepProb::
- dropout probability between layersModel::InitScale::
- max of range for initializing weightsModel::MaxGradNorm::
- max allowed gradient, anything above clipped to this valueModel::OutputDirectory::
- directory to save model
Model::InputDirectory::
- path to saved model
Model::NumCores::
- number of CPU cores to use when running model (if not specified, will be set to max available)
Whenever a model is run, it will produce summaries that can be viewed with Tensorboard. The summaries track two scalar values (mean squared error and r-squared) while the model trains.
To view the summaries with Tensorboard, run the following command:
tensorboard --logdir=$PATH/TO/SUMMARIES
Then navigate to the path given (usually http://0.0.0.0:6006) in a web browser to view the graphs (note: there are known issues with using Tensorboard with Safari).