The goal of this blogpost is to derive the mathematical formulation of the Variational Autoencoder (VAE) from simple principles and intuitively interpret it. We first describe what a VAE is, followed by how it differs from other neural networks of its class, then we derive its formulation as presented in Kingma et al. in detail and, lastly, intuitively explain how the final framework achieves what we describe. We assume the reader is familiar with basic principles of machine learning, like neural networks and gradient descent, very basic graph jargon , like nodes and edges, simple probabilistic concepts like probability density functions , the expectation and the notion of i.i.d , simple calculus and linear algebra. For hands-on experiments and code, see our respective blogpost . What is a VAE? A VAE is an autoencoder  (AE).  An AE is a neural network that is trained to  copy its input to its output. Internally, it has a hidden layer whose output \(h\) is referred to as the code ...
