GitHub - Foroozani/Machine-Learning: ML algorithms, employing different approaches

Machine-Learning, algorithm exapmles

Regularization and Overfitting Linear regression

(Overfitting):A very common problem in machine learning (ML) when the model is much more complex than it should be (i.e. for using a lot of features) it may perform very well on the training data, but it performs very badly for new (unseen) data. In such situations, the model can not generalize well. Thereare several methods to tackle this problem:

How to prevent overfitting?

The possible solutions are:

To simplify the model by selecting one with fewer parameters, or by reducing the number of features
Regulalization
- L1 Regularization
- L2 Regularization
Dropout Regularization
Normalizing input
Early stopping
Data Augmentation: To gather more training data
Model Ensembles
Cross-validation

Optimization algorithm

Mini-batch gradient decent (is a hyperparameter)
Gradient decent with momentum
RMSprop
Adam optimization algorithm
Learning rate decay
Normalizing inputs

What Is Data Normalization, and Why Do We Need It?

The process of standardizing and reforming data is called “Data Normalization.” It’s a pre-processing step to eliminate data redundancy. Often, data comes in, and you get the same information in different formats. In these cases, you should rescale values to fit into a particular range, achieving better convergence.

What Is Dropout and Batch Normalization?

Dropout is a technique of dropping out hidden and visible units of a network randomly to prevent overfitting of data (typically dropping 20 percent of the nodes). It doubles the number of iterations needed to converge the network. Batch normalization: is the technique to improve the performance and stability of neural networks by normalizing the inputs in every layer so that they have mean output activation of zero and standard deviation of one.

What Will Happen If the Learning Rate Is Set Too Low or Too High?

When your learning rate is too low, training of the model will progress very slowly as we are making minimal updates to the weights. It will take many updates before reaching the minimum point
If the learning rate is set too high, this causes undesirable divergent behavior to the loss function due to drastic updates in weights. It may fail to converge (model can give a good output) or even diverge (data is too chaotic for the network to train).

Hyperparameter tuning

We need to tune our hyperparameters to get the best out of them.
Hyperparameters are important:

Learning rate alpha
momentum beta
monibatch size
No. of hidden inputs
No. of layers
Learning rate decay
Regularization lambda
Activation function
Adam beta1 & beta2

It is hard to decide which hyperparemeter is the most important in a problem. TIt depends a lit on your problem. One of the ways to tune is to sample a grid with N hyperparameter settings and then try all settings combinations on your problem. One can use coarse to fine sampling.

Support Vector Machines

The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor), so 150 total samples. Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters.

We can also use "GridSearch" method to improve the accuracy.

K-Means algorithm

K Means Clustering is an unsupervised learning algorithm that tries to cluster data based on their similarity. For this exaple, first we cleare an artificial data then we use scikit-learn library to cluster the data.

Neural-Nets

Keras Regession: In this example I use TF Regression and try to predict the house prices by using them. As you can guess, there are various methods to suceed this and each method has pros and cons.

The data set can be found at Kaggle: visit the website

Principal Component Analysis (PCA)

Large datasets are increasingly common and are often difficult to interpret. Principal component analysis (PCA) is a technique for reducing the dimensionality of such datasets, increasing interpretability but at the same time minimizing information loss. It does so by creating new uncorrelated variables that successively maximize variance.

Some refrences: https://www.mit.edu/~amidi/l/fa/

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
Books		Books
Data		Data
PCA		PCA
Plots		Plots
figures		figures
lib-sheet		lib-sheet
modules		modules
preprocessing		preprocessing
01-linear-regression.png		01-linear-regression.png
01-linear-regression.py		01-linear-regression.py
02-svm-project.py		02-svm-project.py
02-svm.png		02-svm.png
03-KMeans.py		03-KMeans.py
03-kmeans.png		03-kmeans.png
04-KerasRegressionCode.py		04-KerasRegressionCode.py
05-clustering.py		05-clustering.py
06-advancedOptimization.ipynb		06-advancedOptimization.ipynb
07-image_processing.ipynb		07-image_processing.ipynb
08-normal_equation.py		08-normal_equation.py
11-SimpleLinearRegression.ipynb		11-SimpleLinearRegression.ipynb
KerasRegression.png		KerasRegression.png
README.md		README.md
optimozation.mp4		optimozation.mp4
pca.py		pca.py

Foroozani/Machine-Learning

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Languages