04. Long Short Term Memory (LSTM)
04. Long Short Term Memory (LSTM)
Long Short Term Memory (LSTM)
Prerequisites
1
1. Recurrent Neural Network <b>Recurrent Neural Network</b> have a problem when training owing to update gradient without insecure like <span style="color:#FFD5D5">vanishing or exploding gradient</span>. This leads to exponential growth or decay of gradients.
What is Long Short Term Memory (LSTM)
1. What is Long Short Term Memory (LSTM)?
A Model that is similar structure but having additional cell state parameters</span> composed of preventing gradient update issue while long term sequence.
Structure
\[Input \rightarrow LSTM \rightarrow LSTM \rightarrow LSTM ... \rightarrow Softmax \rightarrow Output\]- Can have more secure model with cell state
- it is also having vanishing or exploding gradient problem if sequence is too long.
2. Why use Long Short Term Memory (LSTM)?
1
Keep Memory with additional state
Create a cell state highway allowing gradient to flow with minimal decay.
3. How use Long Short Term Memory (LSTM)?
1
Hidden State + cell state
- $ Forget Gate: f_t = \sigma(W_f x_t + U_f h_{t-1} + b_f) $
- $ Input Gate: i_t = \sigma(W_i x_t + U_i h_{t-1} + b_i) $
- $ Candidate Memory: \tilde{c}t = \tanh(W_c x_t + U_c h{t-1} + b_c) $
- $ Cell Update: c_t = f_t \odot c_{t-1} + i_t \odot \tilde{c}_t $
- $ Output Gate: o_t = \sigma(W_o x_t + U_o h_{t-1} + b_o) $
- $ Hidden State: h_t = o_t \odot \tanh(c_t) $
| Component | Role |
|---|---|
| Forget gate | Decides what old memory to erase |
| Input gate | Decides what new info to store |
| Output gate | Controls exposure of memory |
| Cell state | Gradient highway |
If:
- $f_t \approx 1$
- $i_t \approx 0$
Then memory is preserved almost perfectly.
4. What is FEW PROBLEM of Long Short Term Memory (LSTM)?
1
2
1. Vanishing Gradient
2. Exploding Gradient
It is also remain the gradient problem. but it is more better than Vanilla RNN
This post is licensed under CC BY 4.0 by the author.

