/ˌviː.eɪˈiː/

noun … “a probabilistic neural network that learns latent representations for generative modeling.”

VAE, or Variational Autoencoder, is a type of generative neural network that extends the concept of Autoencoder by introducing probabilistic latent variables. Instead of encoding an input into a fixed deterministic vector, a VAE maps inputs to a distribution in a latent space, typically Gaussian, allowing the model to generate new data points by sampling from this distribution. This probabilistic approach enables both reconstruction of existing data and generation of novel, realistic samples, making VAE a powerful tool in unsupervised learning and generative modeling.

The architecture of a VAE consists of an encoder, a latent space parameterization, and a decoder. The encoder predicts the mean and variance of the latent distribution, the latent vector is sampled using the reparameterization trick to maintain differentiability, and the decoder reconstructs the input from the sampled latent point. Training minimizes a combination of reconstruction loss and a regularization term (the Kullback-Leibler divergence) that ensures the latent space approximates the prior distribution, typically a standard normal distribution.

VAE is widely used in image generation, anomaly detection, data compression, and semi-supervised learning. For images, convolutional layers from CNN are often incorporated to extract hierarchical spatial features, while in sequential data tasks, recurrent layers like RNN can process temporal dependencies. The probabilistic nature allows smooth interpolation between data points, latent space arithmetic, and controlled generation of new samples.

Conceptually, VAE is closely related to Autoencoder, Transformer-based generative models, and probabilistic graphical models. Its innovation lies in combining representation learning with a generative probabilistic framework, allowing latent embeddings to encode both structural and statistical characteristics of the data.

An example of a VAE in Julia using Flux:

using Flux

encoder = Chain(Dense(784, 400, relu), Dense(400, 20*2))  # outputs mean and log-variance
decoder = Chain(Dense(10, 400, relu), Dense(400, 784, sigmoid))
vae = Chain(encoder, decoder)

x = rand(Float32, 784, 1)
z_mean, z_logvar = encoder(x)
epsilon = randn(Float32, size(z_mean))
z = z_mean .+ exp.(0.5 .* z_logvar) .* epsilon  # reparameterization
x_recon = decoder(z) 

The intuition anchor is that a VAE is a “creative autoencoder”: it not only compresses data into a meaningful latent space but also treats this space probabilistically, enabling it to imagine, generate, and interpolate new data points in a coherent way, bridging the gap between data compression and generative modeling.