Skip to content

asjad99/Yanga

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About:

  • Real world has lots of sequential data (non-i.i.d data) e.g Time-series: stock market, speech, video analysis

  • If underlying process unknown, we can construct model to predict next word/event in sequence

  • In Natural Language Processing, Sequence Modeling is the task of predicting what word/letter comes next.

  • In sequence learning problems, we know that the true output at timestep ‘t’ is dependent on all the inputs that the model has seen up to the time step ‘t’. Since we don’t know the true relationship, we need to come up with an approximation such that the function would depend on all the previous inputs.

image

The key thing to note here is that the task is not changing for every timestep, whether we are predicting the next character or tagging the part of speech of a word. The input to the function is changing at every time step because for longer sentences the function needs to keep track of the larger set of words. In other words, we need to define a function that has these characteristics:

  • Ensure that the output Yt is dependent on previous inputs

  • Ensure that the function can deal with a variable number of inputs

  • Ensure that the function executed at each time step is the same.

    image

Recurrent Neural Networks(RNN) are a type of Neural Network where the output from the previous step is fed as input to the current step.

In RNN, you can see that the output of the first time step is fed as input along with the original input to the next time step. The input to the function is denoted in orange color and represented as an xᵢ. The weights associated with the input is denoted using a vector U and the hidden representation (sᵢ) of the word is computed as a function of the output of the previous time step and current input along with bias. The output of the hidden represented (sᵢ) is given by the following equation,

Once we compute the hidden representation of the input, the final output (yᵢ) from the network is a softmax function of hidden representation and weights associated with it along with the bias. We are able to come up with an approximate function that is able to satisfy all the three conditions that we have set to solve the problems of sequence learning.


Case Studies:

1. Recommender Engine

see blog post: https://www.asjadk.io/music/

2. Decision Support for Context-aware Resource Allocation using LSTM networks:

The allocation of resources to process tasks can have a significant impact on the performance (such as cost, time) of those tasks, and hence of the overall process. Past resource allocation decisions, when correlated with process execution histories annotated with quality of service (or performance) measures, can be a rich source of knowledge about the best resource allocation decisions. The optimality of resource allocation decisions is not determined by the process instance alone, but also by the context in which these instances are executed. This phenomenon turns out to be even more compelling when the resources in question are human resources. Human workers with same the organizational role and capabilities can have heterogeneous behaviors based on their operational context. In this work, we propose an approach to supporting resource allocation decisions by extracting information about the process context and process performance from past process executions.

DataSet Configuration Epochs Train Valid Test
ResourceWithAmountContext Medium 13 74.045 187.323 191.933
ResourceWithOUtContext Medium 13 2.050 2.321 2.518
ResourceWithWL&AmountContext Medium 13 77.944 161.044 164.025
ResourceWithAmountFamiliarityContext Medium 13 404.591 498.988 517.228
ResourceWithAmountContext Large 25 51.332 151.331 149.891
ResourceWithOUtContext Large 25 2.050 2.271 2.518
ResourceWithWL&AmountContext Large 25 48.452 135.032 137.056
ResourceWithAmountFamiliarityContext Large 25 159.232 372.712 371.158

Results:

We trained LSTM network of two sizes with configuration labelled as medium and large in Table above.  Both Networks have two layers and unrolled it has 35 steps. At its core the model consists of an LSTM cell that processes one task(executed by a particular resource) at a time and the network computes probability values for the next task in the sequence. The network memory state is initialized using a vector of zeros.  The network updates itself after reading each task in the given the sequence. Because of computational resource constraints we processed the data in mini-batches of size 20. The medium LSTM network has 200 units per layer and its parameters are initialized uniformly in [−0.1, 0.1]. The medium model was trained for 13 epochs with a learning rate of 1.0 and after 4 epochs we reduce the learning rate by a factor of 0.8 after each epoch. The large LSTM network has 650 units per layer and we initialize its parameters uniformly in [-0.05,0.05].  We train the LSTM for 25 epochs with a learning rate of 1.0, and after 6 epochs we decrease it by a factor of 1/1.15 after each epoch. We clip the norm gradients of both network (normalized by minibatch size) at 5. Training this network takes about one and a half hour on an 2.4GHz Macbook Pro.  

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published