# In a hard margin support vector machine

## Introduction

In a hard margin support vector machine, we try to find the decision boundary that maximizes the margin between the two classes. The margin is defined as the distance between the decision boundary and the nearest data point from either class. The hard margin support vector machine is a robust method that works well when there is a clear separation between the two classes, but it can be sensitive to outliers. In this tutorial, we will learn about hard margin support vector machines and how to train them using scikit-learn.

## Preliminaries

In this section, we formulate the optimization problem for a support vector machine (SVM) with a soft margin. We also introduce the notion of a kernel function, which enables us to solve the optimization problem in a way that is computationally more efficient than the direct approach.

### Notation and Definitions

In this document, we adopt the following notation and definitions. Let $G$ be a nontrivial undirected graph with vertex set $V(G)$ and edge set $E(G)$. For any $v \in V(G)$, we define $N_G(v) = {u \in V(G) : uv \in E(G)}$ to be the \emph{neighborhood} of $v$, and we define $d_G(v) = |N_G(v)|$ to be the \emph{degree} of $v$. The \emph{minimum degree} $\delta(G)$ and the \emph{maximum degree} $\Delta (G)$ of $G$ are defined, respectively, as $\delta (G)=\min_{v\in V (G)}d_{g} (v )$ and $\Delta (g)=\max { v\in v (g)} d_{ g } (. )$.

### The Hard Margin Support Vector Machine

A Support Vector Machine (SVM) is a powerful and versatile Machine Learning model, capable of performing both classification and regression. In this article, we’ll be focusing on the former; that is, SVMs that are used for classification tasks.

The goal of a SVM is to find the best possible line (or hyperplane) that can neatly separate our classes so that we can easily classify new data points as belonging to one class or the other. To do this, constraining our decision boundary so that it’s as wide as possible is key. This is because the width of our decision boundary margin (i.e., how far away from our decision boundary line new data points can be without being misclassified) corresponds directly with how confident we can be in our predictions.

In support vector machines, there are two main types of C: hard margin C and soft margin C. The idea behind hard margin C is to have a very low tolerance for error; in other words, we want our decision boundary to be as wide as possible so that we don’t risk misclassifying any data points. To achieve this, we need to make sure that all of the data points in our training set are classified correctly. Unfortunately, this isn’t always possible; sometimes there will be outliers or data points that just don’t fit nicely into one category or the other no matter where we place our decision boundary. This is where soft margin C comes into play; by allowing for some misclassification, we can create a wider decision boundary that still does a pretty good job of separating our data.

## The Optimal Margin Classifier

### The Objective Function

In order to train a supervised classifier, we need to define an objective function that the classifier will optimize. The objective function must take as input a set of training data and output a prediction for each datapoint. For example, if we are trying to classify points in two-dimensional space into two classes, we could use the following objective function:

def objective_function(data):
predictions = []
for datapoint in data:
x, y = datapoint
if x > y:
predictions.append(1)
else: # x <= y
predictions.append(0)
return predictions

This function would output 1 if the x-coordinate of the input point is greater than the y-coordinate, and 0 otherwise. Given enough training data, this function could learn to classify points pretty accurately.

### The Decision Function

The decision function of a margin classifier is the function that takes in an input vector x and outputs a class label y. The function is defined by a weight vector w and a bias b as follows:

y = sign(w⋅x + b)

Where sign(.) is the sign function that outputs 1 if the argument is positive, -1 if the argument is negative, and 0 if the argument is 0. The weight vector w determines the direction of the decision boundary, and the bias b determines where the decision boundary intersects with the y-axis.

### The Optimal Margin Classifier

In machine learning, the optimal margin classifier is a theoretical perfect classifier. A margin is a separation of sets of data points that maximizes the distance between the sets. The optimal margin classifier is a method of binary classification that finds the separating hyperplane with the greatest possible distance between the closest data points in each set. This is also known as the maximum margin classifier.

The optimal margin classifier can be used for any binary classification problem, whether the data points are linearly separable or not. If the data points are not linearly separable, then a soft margin classifier can be used instead, which allows for some misclassifications.

The optimal margin classifier is often used as a theoretical ideal to compare other classifiers against. However, it is also possible to use an algorithm to find the maximum margin separating hyperplane for any given dataset. This is known as support vector machines (SVMs).

## The Soft Margin Support Vector Machine

A hard margin support vector machine only allows for perfect classification, while a soft margin support vector machine is more tolerant of misclassifications. The trade-off is that the hard margin support vector machine is more likely to overfit the data, while the soft margin support vector machine is more likely to underfit the data.

### The Objective Function

The objective function for the SVM is to find the decision boundary that maximizes the margin between the two classes while still correctly classifying all training examples. This can be written mathematically as:

maximize w,b

subject to yi(xi⋅w+b)≥1 for all i=1,…,m.

### The Decision Function

In binary classification, the notion of a decision function is fundamental. A decision function is a mapping f:X→Y, where X is the input space and Y={-1,+1} is the output space. Given an input x∈X, the decision function assigns a label y∈Y to x, which we refer to as f(x). The goal of a binary classifier is to learn a function f such that for any previously unseen point x, f(x) correctly predicts the label of x.

The soft margin support vector machine (SVM) is a binary classifier that tries to find the decision function that maximizes the margin between the two classes. The margin is defined as the distance between the Decision Boundary and the nearest data point from each class. Intuitively, we can think of the margin as a measure of how well separated the two classes are. The larger the margin, the better separated the two classes are and thus, ( hopefully! ) ,the better our classifier will be at correctly predicting labels for previously unseen points.

### The Soft Margin Support Vector Machine

The Soft Margin Support Vector Machine (SMSVM) is a modification of the standard Support Vector Machine (SVM) that allows for some degree of misclassification. The standard SVM finds the line that maximizes the distance between itself and the closest points from each class, known as the support vectors. The support vectors are used to define the margin, which is the distance between the line and the closest data point from each class. The SMSVM modifies this by adding a penalty term for points that fall within the margin or on the wrong side of the line. This penalty term is controlled by a parameter known as C. A smaller value for C results in a wider margin and more tolerance for misclassification, while a larger value results in a narrower margin and less tolerance for misclassification.

## Conclusion

In a hard margin support vector machine, the decision boundary is completely determined by the support vectors. If any of the training data is non-linearly separable, then the support vector machine will not be able to find a decision boundary that separates all of the training data.