The decoder
The decoder utilizes the context vector \( \vc \) to generate the output sequence \( \seq{\vy^{(1)}, \ldots, \vy^{\dash{\tau}}} \).
The decoder achieves this transformation by learning the following function.
\begin{equation}
\vy^{(t)} = g_d(\vc, \vy^{(t-1)}, \vd^{(t)}), ~~\forall t=1,\ldots,\dash{\tau}
\label{eqn:decoder-output}
\end{equation}
where, \( g_d \) is the decoding function learned by the model and \( \vd^{(t)} \) is the internal state of the decoder.
The decoder state itself follows a recurrence relationship with previous state as
\begin{equation}
\vd^{(t)} = f_d(\vc, \vy^{(t-1)}, \vd^{(t)}), ~~\forall t=1,\ldots,\dash{\tau}
\label{eqn:decoder-state}
\end{equation}
For example, the decoder could be modeled as an LSTM-based RNN and \( \vd^{(t)} \) can be the hidden state of such RNN.
The length of the output sequence, \( \dash{\tau} \) could be different from the length of the input sequence, \( \tau \).