This is the repository for my machine_learning academic process
-
With an input image of size 416 × 416, we make detections on 3 scales : 13 × 13 26 × 26 52 × 52
-
The model produces a set of (t_x, t_y, t_w, t_h) where : (t_x, t_y) center box coordinates. (t_w, t_h) height and weight of the box. 1/ you will have to compute new coordinates (b_x, b_y, b_w, b_h) using this :
-
Normalization :
- LCS for NST intro
- Neural Style Transfer with Tensorflow
- LCS for NST tasks 0-1
- LCS for NST tasks 2 - Oumaima Merhben
- LCS for NST tasks 3-4
- LCS Linalg Einsum
- LCS for NST tasks 2 - Myriam Azzouz
Hidden Markov Models:
-
Some explanations and hints for the remaining tasks (proposed by @Fares Nadjar):
Previously you were dealing only with markov chains, whether it's absorbing or not.. Now you're dealing with Hidden Markov Chains, what's the difference ? There is a main difference, you have states that you observe (in this case 6 possible outfits) and hidden states (very cold, cold, neutral, hot and very hot) The hidden states follow the rules of a Markov Chain with a transition probability between themselves (which is the Transition Matrix in the Main) We also have the probability to wear a certain outfit knowing the weather P(outfit/weather) is know, this is the Emission matrix Initial is the matrix of the probabilities to get the first hidden state Observation is a matrix containing the observed outfits generated randomly following the emission matrix
Task 4: We're asked to do the forward algorithm and return
- P the likelihood of the observation given the model (Observation will be like [0,5,4,2,3,1,5] with 0 to five the outfit observed, and P is the probability to get to this observation
- F is an array containing the forward path probabilities Useful thing to understand : The bayes formula that says : P(A/B) = P(B/A)xP(A) / P(B) Knowing this, we need to distinguish first whether we're in the initial t=0 or t>1
if t = 0: The forward probability of a hidden state s is the initial probability to get to that hidden state multiplied by the probability to wear the observed outfit knowing we're in the hidden state
if t > 0 The forward probability of a hidden state s is the sum of the probabilities z(s',s) where : z(s',s) is the forward probability of being in state s' at the time t-1 multiplied by the transition probability to move from s' to s multiplied by the probability that we wear the outfit observed at instant t knowing we're in hidden state s.
This will give you the array F for each time t and P will just be the sum of the last forward probabilities for each state
Task 5 : Viterbi It's almost the same thing as the forward propagation. It's the exact same thing for the initial hidden state Then, we don't take the sum of all the observations at t-1 but we take the highest one so we need to keep track of the value of the highest probability at t-1 and keep track of the position of the highest probability (hint : np.max and np.argmax) After we're done with this, the likelihood will just be the highest probability at t-1 path will be the position of the highest prob at each state t-1 to get the states we went through Main difference :
- Forward gives us the likelihood of a certain obsevation
- Viterbi gives us the likelihood of the most probable observation For the backward algorithm, it's the exact same thing as forward but in reverse.
-
- Things can be fixed without disabeling the eager mode . We can add "add_loss" layer that will handle the error layer instead of passing the loss in the compile : This way : vae.add_loss(vae_loss) vae.compile(optimizer='adam')
- tensorflow add_loss
-
a very good comparison of popular LSTM variants Conclusion: "The most commonly used LSTM architecture (vanilla LSTM) performs reasonably well on various datasets. None of the eight investigated modifications significantly improves performance." (same thing for GRU)
-
Implementing a recurrent neural network with back-propagation
- Attention in RNNs
- Attention VS Encoder Decoder
- Data preprocessing for deep learning: Tips and tricks to optimize your data pipeline using Tensorflow
- What Exactly Is Happening Inside the Transformer
- Dissecting BERT Part 2: BERT Specifics
-
Machine Learning Applied to Capital Markets Presentation - Francois Friggit: Presentation Summary
-
Question of the day : You are building an object detection algorithm using a neural network architecture with a dense layer final. You want to be able to detect multiple items simultaneously. Would you use softmax or sigmoid as activation? Why?
- Asw: actually it depends on the algorithm you are using. As a friendly reminder : Sigmoid = Multi-Label Classification Problem: Non-exclusive outputs Softmax = Multi-Class Classification Problem = Only one right answer = Mutually exclusive outputs. classification sigmoid vs softmax/ For exemple in Yolo case we use Sigmoid byv the end: Yolo
Date of Session | Session Name | Session Recording Link | Cohort |
---|---|---|---|
2/19/21 | Transfer Learning on CIFAR 10 : Coding | video | 10 - 11- 12 |
4/20/21 | Autoencoders : Explanation Session | video | 10 - 11 |
4/23/21 | Autoencoders : Coding | video | 10 - 11 |
5/12/21 | NLP: Word Embedding Evaluation Metrics | video | 10 - 11 |
25/05/21 | Error Analysis: coding + explanation | video | 10 - 11- 12 |
28/05/21 | Regularization: Coding | video | 10 - 11- 12 |
16 July. 2021 | Policy Gradients: Coding Session | video | 10 - 11 |
9 July. 2021 | Temporal Difference: Coding Session | video | 10 - 11 |
6 July 2021 | Temporal Difference: Explanation | video | 10 - 11 |
22 June 2021 | Q_learning: Explanation | video | 10 - 11 |
11 June 2021 | CNN (Coding session) | video | 10 - 11- 12 |
19 July 2021 | Neural Style Transfer : Linalg Einsum | video | 10 - 11 - 12 |