- Train test split the data, rate = 0.25
- 3-fold cross-validation on the trainning data so that the test size of each split will be equal to the final test size
- Apply best parameters to test data
- Compared with default parameters
-
- Dataset: SF 10%
- Anomaly rate : 4.5%
- random_state = 1
- nu : tuned using gridsearch_cv [0.045, 0.18]
- kernel : default
- gamma : default
-
- Dataset: SF 10%
- Anomaly rate : 4.5%
- random_state : 2
- nu: tuned using gridsearch_cv [0.045, 0.18]
- kernel : poly
- gamma : default
- degree : default
-
- Dataset: SF 10%
- Anomaly rate : 4.5%
- random_state = 3
- nu : tuned using gridsearch_cv [0.045, 0.18]
- kernel : default
- gamma : default
-
- Dataset: SF 100%
- Anomaly rate : 0.5%
- random_state = 1
- contamination : tuned using gridsearch_cv [0.005, 0.2]
- n_estimator : default
- max_samples : default
-
- Dataset: SF 100%
- Anomaly rate : 0.5%
- random_state = 2
- contamination : tuned using gridsearch_cv [0.005, 0.02]
- n_estimator : default
- max_samples : default
-
- Dataset: SF 100%
- Anomaly rate : 0.5%
- random_state : 3
- contamination : tuned using gridsearch_cv [0.005, 0.02]
- n_estimator : default
- max_samples : default
-
- Dataset: SF 10%
- Anomaly rate : 4.5%
- random_state = 1
- contamination : tuned manually [0.045, 0.02]
- n_estimator : default
- max_samples : default
-
- Dataset: SF 20%
- Anomaly rate : 0.5%
- random_state = 2
- contamination : tuned manually [0.005, 0.02]
- n_estimator : default
- max_samples : default
-
- Dataset: SF 50%
- Anomaly rate : 0.5%
- random_state = 2
- contamination : tuned manually [0.005, 0.02]
- n_estimator : default
- max_samples : default
-
- Dataset: SF 70%
- Anomaly rate : 0.5%
- random_state : 3
- contamination : tuned manually [0.005, 0.02]
- n_estimator : default
- max_samples : default
-
- Dataset: SF 100%
- Anomaly rate : 0.5%
- random_state : 4
- contamination : tuned manually [0.005, 0.02]
- n_estimator : default
- max_samples : default
-
- Dataset: SF 100%
- Anomaly rate : 0.5%
- random_state = 1
- contamination : tuned using gridsearch_cv [0.005, 0.02]
- n_estimator : default
- max_samples : default
-
- Dataset: SF 20%
- Anomaly rate : 4.5%
- random_state = 2
- contamination : tuned using gridsearch_cv [0.045, 0.18]
- n_estimator : default
- max_samples : default
-
- Dataset: SF 50%
- Anomaly rate : 10%
- random_state : 3
- contamination : tuned using gridsearch_cv [0.01, 0.04]
- n_estimator : default
- max_samples : default
-
- Dataset: SF 100%
- Anomaly rate : 20%
- random_state : 4
- contamination : tuned using gridsearch_cv [0.2, 0.8]
- n_estimator : default
- max_samples : default
- Tuned algorithms manually starting Week 7 since gridsearch cv proved to not yield a good enough result
- Split the original SA dataset into normal and abnormal dât
- Frac
- Script:
notebook/Increasing the number of normal data.ipynb
We start with the 1% SA dataset, the data is then split into normal and abnormal data
Week 8 : Worked on
Phan Trung Thành