VC Dimension
VC Dimension stands for Vapnik Chervonenkis (VC) dimension that in turn is a key element in statistical learning theory that is useful to understand when making use of big data with machine learning or data mining. It was defined by Vladimir Vapnik and Alexey Chervonenkis and refers to the fact that different learning models have different powers to learn from data. The VC dimension thus reflects the capacity of a hypothesis set of functions that are learned by classification algorithms. This capacity is also referred to as complexity of the classification model, its expressive or representational power, or its flexibility with regard to generalize out of sample. The goal is to have a value that quantifies the complexity of a learning model such as used in classification.
The VC dimension is a measure of what it takes a learning model to learn from data that often leads to a trade-off between model complexity and ability to learn. The VC dimension thus clarifies the relationship between the number of specific model parameters and degrees of freedom to learn from data. More degrees of freedom refer to a more powerful model with lots of parameters that in turn can represent complex systems but also are affected by overfitting. On the other hand less degrees of freedom refer to a less powerful model with less parameters that tend not to overfit the data but may be not able to learn the pattern in the data at hand.
One example of the above described model complexity is the difference between a high-degree polynomial or a linear model. The high-degree polynomial can be very wiggly in order to fit a given set of training data points extremely well. This high-degree polynomial model has a high capacity but is likely to overfit the given data. In other words the polynomial classifier will make too much errors on new unseen points since it is just too wiggly designed. On the other hand, a much simpler model with much lower capacity is a linear learning model that may not fit the training data very well.
The VC dimension was created to quantify the representational power of a learning model that in turn is not easy to perform. In more detail the VC dimension is defined as the cardinality of the largest set of points that a classification algorithm and learning model can shatter.
VC Dimension Details
The following video consists of further useful pieces of information: