RNN: A refresher on notation
In a sequence modeling task, inputs appear as a sequence of elements \( \mX = \seq{\vx^{(1)},\ldots,\vx^{(\tau)}} \).
Each element of the sequence, \( \vx^{(t)} \in \real^N \), is a vector consisting of \( N \) features, \(\vx^{(t)} = [x_1^{(t)}, x_2^{(t)}, \ldots, x_N^{(t)}] \).
RNNs work on the principal of applying the same unit, a cell with the same parameters, at each time step to infer the state of the RNN.
This results in applying same recurrent function \( f \) at every time step.
A typical recipe in neural networks for defining functions is this: To get an output multiply a weight vector to the input vector, add in some bias, and then apply the activation function to allow the modeling of nonlinearity in the output.
To infer the current state, a simple version of the function \( f \) is no different, as this definition shows.
\begin{align}
\vh^{(t)} &= f(\vh^{(t-1)}, \vx^{(t)}; \mTheta) \\\\
&= \text{tanh}\left( \mW \vh^{(t-1)} + \mU \vx^{(t)} + \vb \right)
\end{align}
Here, the parameters \( \mTheta \) include \( \mW, \mU, \) and \( \vb \). The parameters \( \mW \) and \( \mU \) are weight matrices and \( \vb \) is the bias vector, because we typically wish to represent states as multidimensional vectors.
Similarly, the output of the RNN cell can be calculated as a function of its current state.
\begin{align}
\vo^{(t)} &= g(\vh^{(t)}; \dash{\mTheta}) \\\\
&= \mV \vh^{(t)} + \vc
\end{align}
where, \( \mV \) and \( \vc \) denote the weight and bias, (the parameters \( \dash{\mTheta} \) of the output function \( g \). Again \( \mV \) is a matrix and \( \vc \) is a vector to enable multidimensional outputs.