In this blogpost, we explore how Wasserstein Generative Adversarial Nets (WGAN) improve upon the minimax game / objective of Generative Adversarial Nets (GAN) to stabilize training and make the value of the game correlate better with the performance of the generator. We first derive the divergence between the real data distribution and the generated one that GANs minimize. Then, we discuss how this divergence is sub-optimal for the optimization of neural networks and introduce the Wasserstein distance, proving that it has better properties w.r.t. neural net optimization. Thereafter, we prove that the Wasserstein distance, although intractable, can be approximated and indeed back-propagated to the generator. We assume the reader is familiar with basic principles of machine learning, like neural networks and gradient descent, simple probabilistic concepts like, probability density functions , the basic formulation of GANs and Lipschitz functions . GANs minimize the Jensen-Shannon di