Dimensionality Reduction Techniques
Dimensionality reduction techniques are methods to reduce the dimensionality of a modeling problem. This is very important when working with big data and high-dimensional data sets. Learning from this data is a very challenging task for machine learning algorithms, because those datasets often consist of thousands of features. The goal of using dimensionality reduction techniques is to reduce the number of features but without having to lose much information. In other words less information that captures the essence of a pattern in the data can improve the learning model Performance. This article provides a short overview what methods can be used.
One method to reduce the dimensionality of a learning problem is to use a convolutional neural network. One example is to learn from a physical system. Because a deep learning model reduces the dimensionality with respect to time and space of the physical problem. The original reference to this example can be found here. The CNN model learns to map this three-dimensional problem into a reduced latent space. More details can be found in our article on a physical system.
Another method to reduce dimensionality of data sets is called Principle Component Analysis (PCA). It enables a representation that shows the underlying structure in the data meaning the directions where is the most variance in the data. Those with less variance are often sampled out of the dataset. PCA is one of the most often applied dimensionality reduction techniques. More details can be found in our article on a PCA.
There are several other dimensionality reduction techniques that all have different levels of complexity. Examples include missing value ratio, low variance filter, or high correlation filter. Other more complex techniques are Independent Component Analysis (ICA) or methods based on projections. A great comprehensive overview can be found here.
Dimensionality Reduction Techniques Details
We refer to the following video for more details:
Dimensionality Reduction Techniques have been used for decades but Convolutional Neural Networks (#CNNs) as one of these technique is just recently added to this list: https://t.co/2UD30J8bPX pic.twitter.com/4PXqjBaa5E
— Big Data Tips (@BigDataTips) March 3, 2019