Support Vector Machine - Overview

4 minute read

Published:

Introduction

This is the first part of the Support Vector Machine’s series of articles in which I will mainly talk about the math behind this algorithm. It implements vector as the base for the learning algorithm, so I presume that you have a lot of mathematical knowledge, especially in vector.


The Primary Goal and the Important Aspects

Support Vector Machine (SVM) is one of the machine learning’s classifier. Its goal is to find the optimal separating hyperplane which maximizes the margin of the data train.

From its goal, we can see that there are three important parts of this algorithm, namely the optimal separating hyperplane, the margin, and the data train. From the last part we know that SVM will be implemented in a data train, so it is a supervised learning algorithm. Moreover, this algorithm classifies the data into a certain class which makes it as a classification algorithm. To predict a class of a new data, SVM uses a hyperplane as the model that separates the classes and we can classify a new data just by looking at its position towards the hyperplane.

Next, what is hyperplane? First, we know that the data train can be implemented in a space having any dimensions. If we use a two-dimensional space, the hyperplane becomes a line. If we use a three-dimensional space, it becomes a plane. And in more dimensions, it becomes a hyperplane. So, a hyperplane is just a generalization for a plane which the main task is to separate the data train into two classes.

Based on the previous paragraph, it is clear that when we separate the data using a linear field (line, plane, etc), we can only divide the data train into two classes in which in this case, we can say that there are two categories of data, namely, positive (1) and negative (-1). The positive class can be presumed to be on the top of the hyperplane, and vice versa for the negative class.

Furthermore, since this algorithm uses a linear field for the class division, it is clear that if we want to separate the data train perfectly then there should be an enough space between both classes that enables a linear field to separate them into the decent position. The characteristic of this kind of the data train is called as linearly separable. Here is a simple illustration about the characteristics.

Linearly and Not linearly separable


The Problems

A hyperplane separating the data train normally

From the above illustration, we can see that there is a hyperplane located arbitrarily which separates the data into two classes, one is above the line and the other is below the line. However, there is an error for this model when we add some data into the data train.

Inaccuracy of the hyperplane

Based on the illustration, we can simply take a conclusion that it is not a good model of hyperplane when it is located too close to data points of a class as there is a possibility that a new data will go out of the scope of the hyperplane. So we need to enlarge the distance from the data point to the hyperplane, yet the question is how do we find the best distance?

To answer that question, I’ve mentioned a word margin in the previous sub-topic where it is one of the aspect of a separating hyperplane. To find the margin of a hyperplane, we calculate the distance between the hyperplane and the closest data point, double the value and we get the margin. Here is an illustration of margin.

Margin

From the illustration, it is clear that when we move the hyperplane to be closer to the data point, then the margin will be smaller and we’ve known that this is not a good approach as it does not anticipate the characteristic of the new data. So, we can conclude that the optimal hyperplane is the one that maximizes the margin of the data train in which it would be more consistent in receiving any new data that has unpredictable characteristic.

Furthermore, we don’t need to stick to one equation to build a hyperplane, which means there are another possible hyperplanes with different equation that might be the optimal one. Here is an example of several hyperplanes having different equation.

Hyperplanes having different equation

Based on the illustration and the characteristic of an optimal hyperplane, our task is to find a hyperplane with certain equation and has the biggest margin.

Next, we’ll see how to compute the margin.


References

  • Images - Personal Documents