RNN | CΛTΞИCOΔΞ

/ɑr ɛn ˈɛn/

n. "Neural network with feedback loops maintaining hidden state across time steps for sequential data processing."

RNN is a class of artificial neural networks where connections form directed cycles, allowing hidden states to persist information from previous time steps—enabling speech recognition, time-series forecasting, and natural language processing by capturing temporal dependencies. Unlike feedforward networks, RNNs loop outputs back as inputs via h_t = tanh(W_hh * h_{t-1} + W_xh * x_t), but suffer vanishing gradients limiting long-term memory unless addressed by LSTM/GRU gates.

Key characteristics of RNN include:

Hidden State: h_t captures previous context; updated each timestep via tanh/sigmoid.
Backpropagation Through Time: BPTT unfolds network across T timesteps for gradient computation.
Vanishing Gradients: Long sequences <100 steps cause ∂L/∂W → 0; LSTM solves via gates.
Sequence-to-Sequence: Encoder-decoder architecture for machine translation, attention added later.
Teacher Forcing: Training feeds ground-truth inputs not predictions to stabilize learning.

A conceptual example of RNN character-level text generation flow:

1. One-hot encode 'H' → [0,0,...,1,0,...0] (256-dim)
2. h1 = tanh(W_xh * x1 + W_hh * h0) → next char probs
3. Sample 'e' from softmax → feed as x2
4. h2 = tanh(W_xh * x2 + W_hh * h1) → 'l' prediction
5. Repeat 100 chars → "Hello world" generation
6. Temperature sampling: divide logits by 0.8 for diversity

Conceptually, RNN is like reading a book with short-term memory—each word updates internal context state predicting the next word, but forgets distant chapters unless LSTM checkpoints create long-term memory spanning entire novels.

In essence, RNN enables sequential intelligence from Bluetooth voice activity detection to HBM-accelerated Transformers on SerDes clusters, evolving into attention-based models while SIMD vectorizes recurrent matrix multiplies on FFT-preprocessed time series from EMI-shielded sensors.

Modeling

Data

Compute