XGBoost

25/09/2018

add feature to training data: water level (R047)
label: use diff
train offline model in time order (data in 2 months for base model, and data in next month for time series training process)
evaluation: use result value in last time + diff (current label) as the result in current time and compute MAE

Data Exploration

a: the time series data for Chlorines and flow

b: The time series data is truncated for water-inlet periods

File Name	Content	Source file
output_2_days_water_inlet.csv	2 periods of data	03_2 _days _data

File Name	Content	Source file
output_3_files.csv	pump open data	02_pre-processing
output_all_data.csv	all data	02_pre-processing
output_open_4_data.csv	pump open with levels data	02_pre-processing
output_all_4_data.csv	all data with levels	02_pre-processing
modelling_dataset.csv	original label	04_build_dataset
modelling_diff_dataset.csv	diff label	04_build_dataset
modelling_only_open_diff_dataset_levels.csv	diff label only open data with levels	06_build_dataset_with_4_files
modelling_diff_dataset_levels.csv	diff label with levels and 2h close data	06_build_dataset_with_4_files

Train and save the best model

Use saved model to do prediction

Content	File Name
test_y_hat_file	test_y_hat.csv
test_y_file	test_y.csv
prediction_file	prediction_result.csv (['y_hat', 'y', 'diff'])

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.gitignore		.gitignore
01_time_series.ipynb		01_time_series.ipynb
02_pre-processing.ipynb		02_pre-processing.ipynb
03_2 _days _data.ipynb		03_2 _days _data.ipynb
04_build_dataset.ipynb		04_build_dataset.ipynb
05_test_result_visualisation.ipynb		05_test_result_visualisation.ipynb
06_build_dataset_with_4_files.ipynb		06_build_dataset_with_4_files.ipynb
README.md		README.md
constant.py		constant.py
modelling.py		modelling.py
my_model.model		my_model.model
record.md		record.md
testing.py		testing.py
tools.py		tools.py
training.py		training.py