Python Programming Tutorials

Beginning SVM from Scratch in Python

Welcome to the 25th part of our machine learning tutorial series and the next part in our Support Vector Machine section. In this tutorial, we're going to begin setting up or own SVM from scratch.

Before we dive in, however, I will draw your attention to a few other options for solving this constraint optimization problem:

First, the topic of constraint optimization is massive, and there is quite a bit of material on the subject. Even just our subsection: Convex Optimization, is massive. A starting place might be: https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf. For a starting place for constraint optimization in general, you could also check out http://www.mit.edu/~dimitrib/Constrained-Opt.pdf

Within the realm of Python specifically, the CVXOPT package has various convex optimization methods available, one of which is the quadratic programming problem we have (found @ cvxopt.solvers.qp).

Also, even more specifically there is libsvm's Python interface, or the libsvm package in general. We are opting to not make use of any of these, as the optimization problem for the Support Vector Machine IS basically the entire SVM problem.

Now, to begin our SVM in Python, we'll start with imports:

import matplotlib.pyplot as plt
from matplotlib import style
import numpy as np
style.use('ggplot')

We'll be using matplotlib to plot and numpy for handling arrays. Next we'll have some starting data:

data_dict = {-1:np.array([[1,7],
                          [2,8],
                          [3,8],]),
             
             1:np.array([[5,1],
                         [6,-1],
                         [7,3],])}

Now we are going to begin building our Support Vector Machine class. If you are not familiar with object oriented programming, don't fret. Our example here will be a very rudimentary form of OOP. Just know that OOP creates objects with attributes, the functions within the class are actually methods, and we use "self" on variables that can be referenced anywhere within the class (or object). This is by no means a great explanation, but it should be enough to get you going. If you are confused about the code, just ask!

class Support_Vector_Machine:
    def __init__(self, visualization=True):
        self.visualization = visualization
        self.colors = {1:'r',-1:'b'}
        if self.visualization:
            self.fig = plt.figure()
            self.ax = self.fig.add_subplot(1,1,1)

The __init__ method of a class is one that runs whenever an object is created with the class. The other methods will only run when called to run. For every method, we pass "self" as the first parameter mainly out of standards. Next, we are adding a visualization parameter. We're going to want to see the SVM most likely, so we're setting that default to true. Next, you can see some variables like self.color and self.visualization. Doing this will allow us to reference self.colors for example in other methods within our class. Finally, if we have visualization turned on, we're going to begin setting up our graph.

Next, let's go ahead and add a couple more methods: fit and predict.

class Support_Vector_Machine:
    def __init__(self, visualization=True):
        self.visualization = visualization
        self.colors = {1:'r',-1:'b'}
        if self.visualization:
            self.fig = plt.figure()
            self.ax = self.fig.add_subplot(1,1,1)
    # train
    def fit(self, data):
        pass

    def predict(self,features):
        # sign( x.w+b )
        classification = np.sign(np.dot(np.array(features),self.w)+self.b)

        return classification

The fit method will be used to train our SVM. This will be the optimization step. The predict method will predict the value of a new featureset once we've trained the classifier, which is just the sign(x.w+b) once we know what w and b are.

The full code up to this point:

import matplotlib.pyplot as plt
from matplotlib import style
import numpy as np
style.use('ggplot')

class Support_Vector_Machine:
    def __init__(self, visualization=True):
        self.visualization = visualization
        self.colors = {1:'r',-1:'b'}
        if self.visualization:
            self.fig = plt.figure()
            self.ax = self.fig.add_subplot(1,1,1)
    # train
    def fit(self, data):
        pass

    def predict(self,features):
        # sign( x.w+b )
        classification = np.sign(np.dot(np.array(features),self.w)+self.b)

        return classification
        
data_dict = {-1:np.array([[1,7],
                          [2,8],
                          [3,8],]),
             
             1:np.array([[5,1],
                         [6,-1],
                         [7,3],])}

In the next tutorial, we'll pick up and begin working on the fit method.

The next tutorial:

Practical Machine Learning Tutorial with Python Introduction
Regression - Intro and Data
Regression - Features and Labels
Regression - Training and Testing
Regression - Forecasting and Predicting
Pickling and Scaling
Regression - Theory and how it works
Regression - How to program the Best Fit Slope
Regression - How to program the Best Fit Line
Regression - R Squared and Coefficient of Determination Theory
Regression - How to Program R Squared
Creating Sample Data for Testing
Classification Intro with K Nearest Neighbors
Applying K Nearest Neighbors to Data
Euclidean Distance theory
Creating a K Nearest Neighbors Classifer from scratch
Creating a K Nearest Neighbors Classifer from scratch part 2
Testing our K Nearest Neighbors classifier
Final thoughts on K Nearest Neighbors
Support Vector Machine introduction
Vector Basics
Support Vector Assertions
Support Vector Machine Fundamentals
Constraint Optimization with Support Vector Machine
Beginning SVM from Scratch in Python
Support Vector Machine Optimization in Python
Support Vector Machine Optimization in Python part 2
Visualization and Predicting with our Custom SVM
Kernels Introduction
Why Kernels
Soft Margin Support Vector Machine
Kernels, Soft Margin SVM, and Quadratic Programming with Python and CVXOPT
Support Vector Machine Parameters
Machine Learning - Clustering Introduction
Handling Non-Numerical Data for Machine Learning
K-Means with Titanic Dataset
K-Means from Scratch in Python
Finishing K-Means from Scratch in Python
Hierarchical Clustering with Mean Shift Introduction
Mean Shift applied to Titanic Dataset
Mean Shift algorithm from scratch in Python
Dynamically Weighted Bandwidth for Mean Shift
Introduction to Neural Networks
Installing TensorFlow for Deep Learning - OPTIONAL
Introduction to Deep Learning with TensorFlow
Deep Learning with TensorFlow - Creating the Neural Network Model
Deep Learning with TensorFlow - How the Network will run
Deep Learning with our own Data
Simple Preprocessing Language Data for Deep Learning
Training and Testing on our Data for Deep Learning
10K samples compared to 1.6 million samples with Deep Learning
How to use CUDA and the GPU Version of Tensorflow for Deep Learning
Recurrent Neural Network (RNN) basics and the Long Short Term Memory (LSTM) cell
RNN w/ LSTM cell example in TensorFlow and Python
Convolutional Neural Network (CNN) basics
Convolutional Neural Network CNN with TensorFlow tutorial
TFLearn - High Level Abstraction Layer for TensorFlow Tutorial
Using a 3D Convolutional Neural Network on medical imaging data (CT Scans) for Kaggle
Classifying Cats vs Dogs with a Convolutional Neural Network on Kaggle
Using a neural network to solve OpenAI's CartPole balancing environment