Main purpose of this package is providing machine learning based python API for shotgun.
Below list is a part of the main things that this package will do:
- Gather shotgun information and generate cache data files in csv format.
- Generate jupyter-notebook file which user can use for investigating 'irregular' or heavy cost consuming pattern based on statical method.
- Predict cost value based on machine learning method.
Below template workflow can be useful for using this package.
- Generate source data based on shotgun field information.
- Generate analyze report based on #1.
- Update configure setting based on analyze results based on #2.
- Generate resource data based on updated configure setting #3.
- Train machine learning model based on #4.
- Get prediction based on trained model in #5.
Upper basic workflow is based on below three sub parts.
- Generate source data based on shotgun information.
- Analyze based on A and generate re-source data based on analyzed results.
- Generate machine learning model based on re-source and get prediction.
This process will gather shotgun field information and generate csv cache for generating analyze report.
At first,script must be registered for gathering shotgun data from python api.
Set script application key.
Proper configure setting must be done before gathering shotgun field information.
# Generate configure file from config_template.yaml
cp config/config_template.yaml config/config.yaml
Below is the actual sample setting.
library_paths:
shotgun: 'Z:\Dev\python-api' # This path is pointing shotgun api package directory.
model:
trainer:
data_processor:
analyze_report_export_directory: 'analyze_report' # Analyze report will be generated under <package root>/analyze_report
resource_handler:
source_generator:
shotgun:
connection:
site: 'https://test.shotgunstudio.com' # Shotgun url.
script_name: 'tool_test' # Script name which was generated.
api_key: 'xxxxxxxxxxxxx' # Script application key which was generated.
data:
source_schema: 'Shot' # Target Granularity
text:
feature_list: ['description', 'tags', 'notes', 'open_notes'] # Shotgun fields which will be used as list of string values.
cost: 'duration' # Shotgun cost field name.
skip_features: ['id', 'created_at', 'image', 'project', 'updated_at'] # Shotgun fields which will be skipped.
source_includes: ['bunny_010_0010', 'bunny_010_0020', 'bunny_010_0030', 'bunny_010_0040', 'bunny_010_0050'] # Shot list which will be used.
Below process can generate source data file based on shotgun field values.
# Set root package root directory.
import os
import sys
import pandas
import seaborn as sns
# Set for inline graph.
import matplotlib.pyplot as plt
%matplotlib inline
# Setup environment values.
os.environ['ML_FOR_SG_ROOT'] = 'Z:\Dev\Github\ml-for-sg'
sys.path.append('Z:\Dev\Github\ml-for-sg')
This process will generate csv file automatically under /source_data. (Directory path "source_data" could be changed based on configure setting.)
from lib.source_generator.shotgun.source_generator import ShotgunSourceGenerator
shotgun_source_generator = ShotgunSourceGenerator()
shotgun_source_generator.generate_source_data(project_id=70)
This workflow can find out target data like below.
- Finding out cost consuming tasks. (For example, top %3 cost consuming task.)
- Finding out too long cut duration shots. (For example, over 300 frame long shots.)
These kind of filtered result will be good starting point for finding solution. Actual process will be below.
- Gather shotgun information and convert to pandas DataFrame.
- Filter 'irregular' data pattern based on #1. (Not implemented yet.)
- Check actual field values and find out issue point.
User can find below actual sample process in /ipynb/sample.ipynb.
Open jupyter file in /ipynb/sample.ipynb.
# Analyze feature based on source data.
from lib.resource.feature_analyzer import FeatureAnalyzer
feature_analyzer = FeatureAnalyzer()
feature_analyzer.analyze()
This workflow can find out target data like below.
- Finding out high correlation field with cost field. (For example, top 5 field list which have high correlation with cost.)
These kind of filtered result will be good starting point for finding solution. Actual process will be below.
- Gather shotgun information and convert to pandas DataFrame.
- Filter high 'correlation' data pattern based on #1. (Not implemented yet.)
- Check actual field values and find out issue point.
Based on investigation results, user can update configure setting.
# For updating configure,
# please update df_x_remove_columns, df_x_remove_columns2 and df_x_text_remove_columns
# and execute below.
from lib.resource.resource_handler import ResourceHandler
resource_handler = ResourceHandler()
data = {
'skip_feature_for_df_x_full': df_x_remove_columns + df_x_remove_columns2,
'skip_feature_for_df_x_text_full': df_x_text_remove_columns
}
resource_handler.save_source_convert_config(data)
from lib.resource.resource_handler import ResourceHandler
resource_handler = ResourceHandler()
resource_handler.convert_source_to_resource()
This workflow can get prediction like below.
- Finding out cost value based on shotgun field value. (For example, task cost prediction which contains specific tag - 'fire', 'water', etc.)
These kind of prediction is kind of imitation which is done by production.
- This fx task will take much cost, because this fx task have 'fire' effect.
- This shot will task much cost, because this shot have 3 different furry character.
Actual process will be below.
- Select regression model or get recommendation model. (Just only Ordinary Least Squares method implemented. More model will be implemented later.)
- Fit #2 data to #3 model.
- Get prediction based on #1.
User can find below actual sample process in /ipynb/sample.ipynb.
resource_handler = ResourceHandler()
data = resource_handler.load_resource_data()
from lib.trainer.trainer import Trainer
trainer = Trainer()
merged_feature = trainer.merge_features(data)
from lib.model.linear_regression.lasso_model import LassoModel
lasso_model = LassoModel()
trained_lasso_model = trainer.train_model(lasso_model, merged_feature)
trainer.save_trained_model('lasso', trained_lasso_model)
trained_model = resource_handler.load_trained_model('lasso')
from collections import OrderedDict
new_data = OrderedDict([
('assets__backdrop', 1),
('assets__cliff', 0),
('sg_sequence__bunny_010', 1)]
)
c = ['assets__backdrop', 'assets__cliff', 'sg_sequence__bunny_010']
v = [1, 0, 1]
for column in merged_feature.columns:
if column not in c:
c.append(column)
v.append(0)
# .values.reshape(1, -1) because it must be 2-dim, because we passed only one new observation
new_data = pandas.DataFrame([v], columns=c, index=['xxx'])
# Use the model to make predictions
print new_data
trained_model[0].predict(new_data)
- Shotgun API v3+.
- Python v2.7.
- Pandas v0.22+.
- Sklearn v0.19.1+
Will be delivered later which can replace below temporary description.
Current sample config is focusing 'cost' investigation. So 'cost' input value will be needed for testing. Below simple api can be used to push dummy cost data to sample shotgun site. After getting 30 trial shotgun site, below process can be used for pushing dummy tag, description.
# Set root package root directory.
import os
import sys
os.environ['ML_FOR_SG_ROOT'] = 'Z:\Dev\Github\ml-for-sg'
sys.path.append('Z:\Dev\Github\ml-for-sg')
# Create shotgun data manager.
from lib.source_generator.shotgun.source_generator import ShotgunSourceGenerator
shotgun_source_generator = ShotgunSourceGenerator()
handler = shotgun_source_generator.handler
from lib.utils.dummy_data_generation import *
for shot in ['bunny_010_0010', 'bunny_010_0020', 'bunny_010_0030', 'bunny_010_0040', 'bunny_010_0050']:
task_sources = get_raw_task_source(handler, shot)
register_sample_sg_data(handler, task_sources)
register_heavy_feature_tag(handler, task_sources)
After this operation, shotgun site will have dummy tag / description values.