This section contains commonly used basic machine learning techniques.
Where applicable, cross-validation and feature importance is included.
- Classification
- Random Forest
- Decision Tree
- Naive Bayes
- ADABoost
- Logistic Regression
- sklearn
- statsmodel
- (includes ROC and AUC scores)
- SVC (Polynomial, RBF, Sigmoid, Linear)
- Gradient Boosting
- XGBoost
- Neural Network
- Voting
- LSTM
- Evaluations
- Confusion matrix
- Classification report
- Accuracy (cv training sets)
- Accuracy (manual)
- Accuracy (sklearn)
- F1
- Regression
- Random Forest
- Lasso Regression (L1)
- Ridge Regression (L2)
- Neural Network (keras)
- LSTM
- Evaluations
- MSE
- To be updated
This section contains commonly used oversample/undersample techniques.
- Classification
- SMOTE
- ADASYN
- Regression
- SMOTER
This section contains commonly used pre-processing techniques including but limited to:
- Describing datasets
- value_counts
- missing values
- PDPBox
- Manipulation
- datetime values
- data types
- splitting & concat
- merge / join
- sorting
- renaming / mapping
- reordering categorical variables
- dicretizing
- using .at
- using if/elif (user-defined breaks)
- equal width bins
- equal frequency bins
- k-means (silhouette or ssd)
- row/column conditional selection
- dropping / appending
- imputation
- encoding
- Onehot
- mean/target
- Plotting
- all categorical
- all numerical
- distplot
- binned
- Train test split
- Scaling
- StandardScalar
- MinMaxScaler
- Others
- pivot tables / crosstabs
- binning
- importing list of tiles
- startswith, endswith, contains
- np.where (adding new column based on conditions)
- extract first # digits
- duplicates