Autoencoders are a type of neural network used in unsupervised learning. They encode the input data to a lower-dimensional vector and attempt to reconstruct the input from the vector. So, our output is the same as the input. They are feature selective, which ensures that they can prioritize and learn the important features in the data. mklj
You can read more about them in this post.
There are different types of autoencoders. Let us learn about a few commonly used autoencoders – Regularized Autoencoders and Variational Autoencoders
We know that we cannot allow the autoencoders to directly copy the input, hence we reduce the sizes of the encoder, decoder, and hidden layer. But this limits the model’s capacity. To prevent this, a loss function is introduced that prevents the model from copying the input. The model is encouraged to learn the following properties:
- the sparsity of the representation
- the smallness of the derivative of the representation
- robustness to noise or to missing inputs
Keeping these properties in mind, we get 3 types of autoencoders- Sparse, Denoising, and Contractive
We know how the regular autoencoders work. What is different about sparse autoencoders is that they have a sparsity constraint on the hidden units. This constraint is a penalty that is applied to the neurons to achieve a bottleneck. The penalty ensures that only a small number of neurons are activated(i.e. it directly affects the activations of the neurons), this forces the model to learn the unique statistical features of the data. You can think of the penalty as a regularizer, the only difference is that a regularizer affects the weights of a neuron while the penalty affects the activations of a neuron.
The formula given below describes the average activation of a neuron in the hidden layer.
Here, aj(2) (x) is the activation of hidden neuron j in layer 2.
We enforce the following constraint where ρ is the sparsity parameter. The value of ρ is close to zero.
This sparsity penalty can be imposed using L1 regularization(see our post) or KL Divergence
We apply the L1 regularization on the activation by adding a scaled regularization term to the loss function. Mathematically, it is expressed as follows
Kullback-Leibler Divergence(KL Divergence)
The KL divergence tells us the difference between two different distributions. It is expressed as follows,
We try to minimize this term so that,
The cost term is defined as,
In the sparse autoencoders, we add a penalty term to the cost function. In denoising autoencoders, we try to minimize the reconstruction error term. In a more simple way, normal autoencoders try to reconstruct the input image as the output. In a denoising one, it tries to reconstruct the output from a corrupted or noisy input image. This noise is added randomly to the input images.
Please remember that noise is only added during the training.
So far you know what denoising autoencoders are. So, by comparing those two, contractive autoencoders can be explained as follows:
Denoising autoencoders make the reconstruction function resist small but ﬁnite-sized perturbations of the input, while contractive autoencoders make the feature extraction function resist inﬁnitesimal perturbations of the inputDeep Learning. MIT Press
The contraction autoencoder adds a penalty term in the loss function. This sensitivity penalization term is the sum of squares of all partial derivatives of the extracted features with respect to input dimensions. Mathematically, it is expressed as below,
The loss is then calculated as,
The main idea of contractive autoencoders is to make autoencoders robust to small perturbations(or disturbances) around the training points. Contractive autoencoders are better at feature extraction than denoising autoencoders.
A variational autoencoder(VAE) describes the attributes of an image in a probabilistic manner. You can observe the difference in the description of attributes in the pictures below. A regular autoencoder describes an attribute as a value while a VAE describes the attribute as a combination of latent vectors μ (mean) and σ (standard deviation).
VAEs are effective in other domains of machine learning. They are used to draw images, achieve optimal results in semi-supervised learning supervised learning, as well as interpolate between sentences.
Han Xiao, Kashif Rasul, & Roland Vollgraf (2017). Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. CoRR, abs/1708.07747.
Ian Goodfellow, Yoshua Bengio, & Aaron Courville (2016). Deep Learning. MIT Press.
Salah Rifai, Pascal Vincent, Xavier Muller, Xavier Glorot, and Yoshua Bengio. 2011. Contractive auto-encoders: explicit invariance during feature extraction. In Proceedings of the 28th International Conference on International Conference on Machine Learning (ICML’11). Omnipress, Madison, WI, USA, 833–840.
Dutta, Sanghamitra & Bai, Ziqian & Jeong, Haewon & Low, Tze & Grover, Pulkit. (2018). A Unified Coded Deep Neural Network Training Strategy based on Generalized PolyDot codes. 1585-1589. 10.1109/ISIT.2018.8437852.
Kumar, Varun & Nandi, G. & Kala, Rahul. (2014). Static hand gesture recognition using stacked Denoising Sparse Autoencoders. 2014 7th International Conference on Contemporary Computing, IC3 2014. 99-104. 10.1109/ic3.2014.6897155.