Esempio n. 1
0
#       Known as a sparse model, ends up using less features than given to it.
# Ridge squares the residuals while Lasso takes the absolute value

# Train test split
X_train, X_test, y_train, y_test = train_test_split(X,
                                                    y,
                                                    test_size=0.2,
                                                    random_state=1)
# Check shapes after train test split
X_train.shape, X_test.shape, y_train.shape, y_test.shape

# Scaling the Features (Normalisation & Standardisation)
# Normalise - get all values to be bwt 0 & 1 (sensitive to outliers)
scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.fit_trainform(X_test)

# Standardise - subtract the mean & divide by std deviation (robust to outliers) *Recommended for most cases
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.fit_transform(X_test)

# Ensamble Methods (Combination models):
#   Heterogenoues Models - combines models of diff types
#       Voting - Use agg predictions from group of models
#       Stacking - Good for when diff models have diff strengths on the same dataset
#               - Predictions/Errors are uncorrelated
#   Homogeneous Models - combines models of the same type
#       Bagging - (Bootstrap Aggregating)
#               - Multiple versions of the same model are trained on diff subsets of the same training data
#               - Models are trained in paralled