Dhairya Kothari

SVM with MNIST

Parts:

- (1) Exploring SVM

- (2) SVM with RBF kernel

- (3) SVM with Poly kernel


Describe how the multi-class classification is different for SVC and LinearSVC. Be explicit, don't just describe what's in the documentation. For example, what does 'one-against-one' and 'one-vs-the-rest' mean?

The one-against-one classifier trains binary classifer for N class (multi class) data set. Each classifier recieves a pair of classes from the training set and we learn to classify between these two labels/classes. On the other hand, One versus Rest approach, we train on classifier per class, with the samples from that class labelled as Postive class and the rest as Negative class, and repeating these N times gives us a N class classifier. Now, all the samples are given Weights (probablity) for each class and from them we choose a winner class, giving the final Label. In order to perform Multi class classification we need to transform into a set of binary classification problem. When it comes to multi class classification The main difference between SVC and LinearSVC is they use One Vs One and One Vs Rest approach. One clear difference in SVC and Linear SVC is: SVC offers us different Kernels (rbf or poly) while LinearSVC just produces a linear margin of seperation. While in SVC the max iterations are infinite, LinearSVC limits them to 1000.

The last major difference is, in LinearSVC we have an option to choose between dual form of SVM or single form. In SVC we do no have that option.

Importing the packages and data with Tensorflow

In [20]:
from scipy.stats import mode
import numpy as np
#from mnist import MNIST
from time import time
import pandas as pd
import os
import matplotlib.pyplot as matplot
import matplotlib
%matplotlib inline

import random
matplot.rcdefaults()
from IPython.display import display, HTML
from itertools import chain
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
import seaborn as sb
from sklearn.model_selection import ParameterGrid
from sklearn.svm import SVC, LinearSVC
import warnings
warnings.filterwarnings('ignore')
In [21]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data/')
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
In [22]:
train = mnist.train.images
validation = mnist.validation.images
test = mnist.test.images

trlab = mnist.train.labels
vallab = mnist.validation.labels
tslab = mnist.test.labels

train = np.concatenate((train, validation), axis=0)
trlab = np.concatenate((trlab, vallab), axis=0)

Imp thing to remember: Data is 0-1 normalized

We save a lot of compute time by keeping the data that way and we dont lose any significant amount of accuracy

Linear SVC

Running a Sample Linear SVM classifier on default values to see how the model does on MNIST data

In [8]:
svm = LinearSVC(dual=False)
svm.fit(train, trlab)
Out[8]:
LinearSVC(C=1.0, class_weight=None, dual=False, fit_intercept=True,
     intercept_scaling=1, loss='squared_hinge', max_iter=1000,
     multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
     verbose=0)
In [11]:
svm.coef_
svm.intercept_
Out[11]:
array([-1.20849557, -0.1362278 , -0.81846194, -1.19352824, -0.50981085,
        0.03587096, -1.14999805, -0.24171445, -2.0858455 , -1.32422686])
In [12]:
pred = svm.predict(test)
In [15]:
accuracy_score(tslab, pred) # Accuracy
Out[15]:
0.91820000000000002
In [13]:
cm = confusion_matrix(tslab, pred)
matplot.subplots(figsize=(10, 6))
sb.heatmap(cm, annot = True, fmt = 'g')
matplot.xlabel("Predicted")
matplot.ylabel("Actual")
matplot.title("Confusion Matrix")
matplot.show()

As we can see that the SVM does a pretty decent job at classifying, we still get the usual misclassification on 5-8, 2-8, 5-3, 4-9. However, accuracy of 91.82% is good

(i)

Running Linear SVC for multiple cost factor(s) C

In [23]:
acc = []
acc_tr = []
coefficient = []
for c in [0.0001,0.001,0.01,0.1,1,10,100,1000,10000]:
    svm = LinearSVC(dual=False, C=c)
    svm.fit(train, trlab)
    coef = svm.coef_
    
    p_tr = svm.predict(train)
    a_tr = accuracy_score(trlab, p_tr)
    
    pred = svm.predict(test)
    a = accuracy_score(tslab, pred)
    
    coefficient.append(coef)
    acc_tr.append(a_tr)
    acc.append(a)
In [24]:
c = [0.0001,0.001,0.01,0.1,1,10,100,1000,10000]

matplot.subplots(figsize=(10, 5))
matplot.semilogx(c, acc,'-gD' ,color='red' , label="Testing Accuracy")
matplot.semilogx(c, acc_tr,'-gD' , label="Training Accuracy")
#matplot.xticks(L,L)
matplot.grid(True)
matplot.xlabel("Cost Parameter C")
matplot.ylabel("Accuracy")
matplot.legend()
matplot.title('Accuracy versus the Cost Parameter C (log-scale)')
matplot.show()

We clearly see a bias variance trade off in the graph. As the cost increases, the Training accuracy increases, so as the test accuracy, but only till c=1, then we see over fitting. From, c=10 to 1000 we see the model overfitting and we see Low Bias and High Variance
So as we go from Left to Right: Bias Decreases and Variance Increases

(ii)

We choose the model with best testing accuracy i.e. c = 1

In [25]:
svm_coef = coefficient[4]
svm_coef.shape
Out[25]:
(10, 784)
In [26]:
matplot.subplots(2,5, figsize=(24,10))
for i in range(10):
    l1 = matplot.subplot(2, 5, i + 1)
    l1.imshow(svm_coef[i].reshape(28, 28), cmap=matplot.cm.RdBu)
    l1.set_xticks(())
    l1.set_yticks(())
    l1.set_xlabel('Class %i' % i)
matplot.suptitle('Class Coefficients')
matplot.show()

These images look nothing like the images we saw in Logistic regression or Naive Bayes. In Naive Bayes, the underlying number was clearly visible, while in Logistice regression the pattern seemed quite distinct between all the classes. However, here you dont see any apparant patterns or distinctness and really hard to differentiate.

(iii)

Linear SVC with Penalty: l1

In [14]:
acc = []
acc_tr = []
coefficient = []
for c in [0.0001,0.001,0.01,0.1,1,10,100,1000,10000]:
    svm = LinearSVC(dual=False, C=c, penalty='l1')
    svm.fit(train, trlab)
    coef = svm.coef_
    
    p_tr = svm.predict(train)
    a_tr = accuracy_score(trlab, p_tr)
    
    pred = svm.predict(test)
    a = accuracy_score(tslab, pred)
    
    coefficient.append(coef)
    acc_tr.append(a_tr)
    acc.append(a)
In [19]:
c = [0.0001,0.001,0.01,0.1,1,10,100,1000,10000]

matplot.subplots(figsize=(10, 5))
matplot.semilogx(c, acc,'-gD' ,color='red' , label="Testing Accuracy")
matplot.semilogx(c, acc_tr,'-gD' , label="Training Accuracy")
#matplot.xticks(L,L)
matplot.grid(True)
matplot.xlabel("Cost Parameter C")
matplot.ylabel("Accuracy")
matplot.legend()
matplot.title('Accuracy versus the Cost Parameter C (log-scale)')
matplot.show()

Exact same thing with just a slight difference is clearly observed here as well. We see a bias variance trade off in the graph. As the cost increases, the Training accuracy increases, so as the test accuracy, but only till c=1, then we see over fitting. From, c=10 to 1000 we see the model overfitting and we see Low Bias and High Variance. Only thing is with L1 Penalty we have a lesser effect of overfitting and the model performs really poorly with lesser cost values.
Again, as we go from Left to Right: Bias Decreases and Variance Increases

Once more, we choose the model with best testing accuracy i.e. c = 1

In [16]:
svm_coef = coefficient[4]
svm_coef.shape
Out[16]:
(10, 784)
In [17]:
matplot.subplots(2,5, figsize=(24,10))
for i in range(10):
    l1 = matplot.subplot(2, 5, i + 1)
    l1.imshow(svm_coef[i].reshape(28, 28), cmap=matplot.cm.RdBu)
    l1.set_xticks(())
    l1.set_yticks(())
    l1.set_xlabel('Class %i' % i)
matplot.suptitle('Class Coefficients')
matplot.show()

It reflects my views on Linear SVC with L2 (default) penalty, these images look very vaguely like the original images, also different than we saw in Logistic regression or Naive Bayes. In Naive Bayes, the underlying number was clearly visible, while in Logistice regression the pattern seemed quite distinct between all the classes. However, here you dont see any apparant patterns or distinctness and hard to differentiate between classes. However, we can also interpret some numbers like 0,5,6,8

Another important observation is, the images are also different looking than its L2 siblings as well, not by a bigger margin, but still different none the less.

SVC RBF kernel

In [50]:
from scipy.stats import mode
import numpy as np
from time import time
import pandas as pd
import os
import matplotlib.pyplot as matplot
import matplotlib
%matplotlib inline

import random
matplot.rcdefaults()
from IPython.display import display, HTML
from itertools import chain
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
import seaborn as sb
from sklearn.model_selection import ParameterGrid
from sklearn.svm import SVC, LinearSVC
import warnings
warnings.filterwarnings('ignore')
In [51]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data/')
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
In [52]:
train = mnist.train.images
validation = mnist.validation.images
test = mnist.test.images

trlab = mnist.train.labels
vallab = mnist.validation.labels
tslab = mnist.test.labels

train = np.concatenate((train, validation), axis=0)
trlab = np.concatenate((trlab, vallab), axis=0)

We generate a random sample of the data and check how the distribution is compared to the original distribution

We choose sampling as running these many models would increase the time complexity as well. SO keeping time constraint in mind I sample 10% of the data.

In [53]:
#generating a random sequence for sampling
seq = np.random.randint(0,60000,6000)
train_samp = train[seq]
trlab_samp = trlab[seq]

train_samp.shape
trlab_samp.shape
Out[53]:
(6000,)
In [54]:
seq = np.random.randint(0,10000,1000)
test_samp = test[seq]
tslab_samp = tslab[seq]

test_samp.shape
tslab_samp.shape
Out[54]:
(1000,)
In [55]:
fig, ax = matplot.subplots(1,2, figsize=(10,4))
ax[0].hist(trlab_samp)
ax[1].hist(trlab)
fig.show
matplot.show()

(i)

Running SVC for multiple cost factor(s) C and Gamma

In [56]:
coefficient = []
n_supp = []
sup_vec = []
i = 0
df = pd.DataFrame(columns = ['c','gamma','train_acc','test_acc'])
for c in [0.01, 0.1, 1, 10, 100]:
    for g in [0.01, 0.1, 1, 10, 100]:
        svm = SVC(kernel='rbf', C=c, gamma=g)
        model = svm.fit(train_samp, trlab_samp)
        globals()['model%s' % i] = model
        d_coef = svm.dual_coef_
        support = svm.n_support_
        sv = svm.support_
    
        p_tr = svm.predict(train_samp)
        a_tr = accuracy_score(trlab_samp, p_tr)
    
        pred = svm.predict(test_samp)
        a = accuracy_score(tslab_samp, pred)
    
        coefficient.append(d_coef)
        n_supp.append(support)
        sup_vec.append(sv)
        df.loc[i] = [c,g,a_tr,a]
        i=i+1
In [57]:
df
Out[57]:
c gamma train_acc test_acc
0 0.01 0.01 0.644333 0.657
1 0.01 0.10 0.110500 0.133
2 0.01 1.00 0.110500 0.133
3 0.01 10.00 0.110500 0.133
4 0.01 100.00 0.110500 0.133
5 0.10 0.01 0.918833 0.911
6 0.10 0.10 0.229500 0.231
7 0.10 1.00 0.110500 0.133
8 0.10 10.00 0.110500 0.133
9 0.10 100.00 0.110500 0.133
10 1.00 0.01 0.977167 0.951
11 1.00 0.10 1.000000 0.868
12 1.00 1.00 1.000000 0.157
13 1.00 10.00 1.000000 0.133
14 1.00 100.00 1.000000 0.133
15 10.00 0.01 1.000000 0.964
16 10.00 0.10 1.000000 0.877
17 10.00 1.00 1.000000 0.160
18 10.00 10.00 1.000000 0.133
19 10.00 100.00 1.000000 0.133
20 100.00 0.01 1.000000 0.964
21 100.00 0.10 1.000000 0.877
22 100.00 1.00 1.000000 0.160
23 100.00 10.00 1.000000 0.133
24 100.00 100.00 1.000000 0.133

Comment on the bias and variance of the SVC classifier with respect to C and gamma. Comment on the results overall in comparison to LinearSVC. What values would you choose?

We see a bias variance trade off in the table. As the cost and gamma increases, the Testing accuracy decreases, as we see over fitting. We see the model overfitting and we see Low Bias and High Variance. Interesting thing is, keeping the cost constant, we increase the Gamma we get immediate overfitting. And the cost behaves the same as it did in Linear SVC only difference being, the best model performance here is C=10 and Gamma=0.01
So as we increase the Cost: Bias Decreases and Variance Increases
So as we increase the Gamma: Bias Decreases and Variance Increases

(ii)

We choose C=10 and Gamma=0.01 to look at the Support vectors

In [63]:
pd.DataFrame(coefficient[15]) # dual_coef_
Out[63]:
0 1 2 3 4 5 6 7 8 9 ... 2467 2468 2469 2470 2471 2472 2473 2474 2475 2476
0 0.000000 0.000000 0.00000 0.000000 0.00000 1.233223 0.463437 0.000000 0.000000 0.00000 ... -0.000000 -0.000000 -0.000000 -0.665725 -0.000000 -0.000000 -0.979185 -0.000000 -0.651719 -0.286580
1 0.733241 0.338779 0.00000 0.000000 0.00000 1.599885 0.847806 0.000000 0.148967 0.00000 ... -0.000000 -0.000000 -0.000000 -2.656140 -0.000000 -0.000000 -0.000000 -0.885662 -0.000000 -0.318750
2 0.000000 0.000000 0.16819 0.000000 0.00000 1.332503 1.047378 0.000000 0.124441 0.00000 ... -0.500293 -0.000000 -0.000000 -1.489073 -0.000000 -0.000000 -0.000000 -0.464601 -0.450088 -0.182432
3 0.000000 0.000000 0.00000 0.000000 0.00000 3.575345 0.498475 0.000000 0.188376 0.00000 ... -0.000000 -0.000000 -0.000000 -3.857057 -0.000000 -0.000000 -1.831419 -0.000000 -0.000000 -3.918929
4 0.000000 0.000000 0.00000 0.000000 0.00000 2.451250 2.909456 0.000000 0.788204 0.03408 ... -0.000000 -0.000000 -0.435685 -0.000000 -2.925009 -1.568582 -0.503543 -2.701709 -0.000000 -0.974694
5 0.000000 0.000000 0.00000 0.000000 0.00000 3.574567 0.000000 0.000000 0.000000 0.00000 ... -0.000000 -0.000000 -0.000000 -1.288726 -0.000000 -0.000000 -4.472535 -0.038601 -0.000000 -2.727674
6 0.000000 0.000000 0.00000 0.000000 0.00000 2.472752 1.138073 0.000000 0.000000 0.00000 ... -0.000000 -0.000000 -0.000000 -0.782389 -0.000000 -0.000000 -1.677264 -0.057191 -0.000000 -0.000000
7 0.000000 0.000000 0.00000 0.425428 0.90783 0.350570 0.944754 0.000000 0.000000 0.00000 ... -0.107843 -7.508478 -0.000000 -1.582007 -0.000000 -0.000000 -0.000000 -1.648960 -0.000000 -0.000000
8 0.000000 0.000000 0.00000 0.000000 0.00000 2.271634 0.520486 0.290487 0.000000 0.00000 ... -0.821034 -0.000000 -0.000000 -5.498512 -0.000000 -0.000000 -3.585487 -0.217215 -0.656942 -2.623120

9 rows × 2477 columns

The support vectors identified by the SVC each belong to a certain class (0 to 9). In the dual coefficients, they are ordered according to the class they belong to. The support vectors are organized according to these two variables. Each support vector being clearly identified with one class, it becomes evident that it can be implied in at most n_classes-1 one-vs-one problems, viz every comparison with all the other classes. But it is entirely possible that a given support vector will not be implied in all one-vs-one problems. SVC also gives you the weights of the support vectors for the classes 0, 1, ..., 9 in their respective one-vs-one problems. Comparisons to all other classes except its own are made, resulting in n_classes - 1 i.e. 9 columns. The order in which this happens follows the order of the unique classes exposed above. There are as many rows in each group as there are support vectors i.e. 2477.

In [65]:
pd.DataFrame(n_supp[15]) # n_support_
Out[65]:
0
0 180
1 144
2 270
3 274
4 277
5 303
6 220
7 212
8 312
9 285

"nsupport" divides the number of support vestors by the class. So we can say that when class 0 has 180 support vectors, it means 180 are the positive support vectors and rest all are the negative support vectors for 0-versus-rest classifier.

Sampling one positive support vector for each class

In [68]:
ind = 0
matplot.subplots(2,5, figsize=(24,10))
for i in range(len(n_supp[15])):
    l1 = matplot.subplot(2, 5, i + 1)
    sv_image = train_samp[sup_vec[15][ind:ind+n_supp[15][i]]][0]
    l1.imshow(sv_image.reshape(28, 28), cmap=matplot.cm.RdBu)
    l1.set_xticks(())
    l1.set_yticks(())
    l1.set_xlabel('Class %i vs All' % i)
    ind = ind + n_supp[15][i]
matplot.suptitle('Support Vectors for Positive Classes')
matplot.show()

Sampling one negative support vector for each class

In [67]:
ind = n_supp[15][0]
matplot.subplots(2,5, figsize=(24,10))
for i in range(len(n_supp[15])-1):
    l1 = matplot.subplot(2, 5, i + 1)
    sv_image = train_samp[sup_vec[15][ind:ind+n_supp[15][i+1]]][100]
    l1.imshow(sv_image.reshape(28, 28), cmap=matplot.cm.RdBu)
    l1.set_xticks(())
    l1.set_yticks(())
    l1.set_xlabel('Class %i vs All' % i)
    ind = ind + n_supp[15][i+1]
ind = 0
l1 = matplot.subplot(2, 5, 10)
sv_image = train_samp[sup_vec[15][ind:ind+n_supp[15][0]]][100]
l1.imshow(sv_image.reshape(28, 28), cmap=matplot.cm.RdBu)
l1.set_xticks(())
l1.set_yticks(())
l1.set_xlabel('Class 9 vs All')
matplot.suptitle('Support Vectors for Negative Classes')
matplot.show()

We can clearly interpret that the SVs in positive class are clearly distinguishable, while the SVs on the negative class show a clear difference from the class they are in, say for eg, one SV negative class 9 vs All looks no where near as a 9, which makes it better at classifying that particular class i.e. 9 in this case.

SVC Poly kernel

In [34]:
from scipy.stats import mode
import numpy as np
from time import time
import pandas as pd
import os
import matplotlib.pyplot as matplot
import matplotlib
%matplotlib inline

import random
matplot.rcdefaults()
from IPython.display import display, HTML
from itertools import chain
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
import seaborn as sb
from sklearn.model_selection import ParameterGrid
from sklearn.svm import SVC, LinearSVC
import warnings
warnings.filterwarnings('ignore')
In [35]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data/')
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
In [36]:
train = mnist.train.images
validation = mnist.validation.images
test = mnist.test.images

trlab = mnist.train.labels
vallab = mnist.validation.labels
tslab = mnist.test.labels

train = np.concatenate((train, validation), axis=0)
trlab = np.concatenate((trlab, vallab), axis=0)
In [37]:
#generating a random sequence for sampling
seq = np.random.randint(0,60000,6000)
train_samp = train[seq]
trlab_samp = trlab[seq]

train_samp.shape
trlab_samp.shape
Out[37]:
(6000,)
In [38]:
seq = np.random.randint(0,10000,1000)
test_samp = test[seq]
tslab_samp = tslab[seq]

test_samp.shape
tslab_samp.shape
Out[38]:
(1000,)

(iii)

Running SVC for multiple cost factor(s) C and Degree

In [39]:
coefficient = []
n_supp = []
sup_vec = []
i = 0
df = pd.DataFrame(columns = ['c','degree','train_acc','test_acc'])
for c in [0.01, 0.1, 1, 10, 100]:
    for d in [2,3,4,5,6]:
        svm = SVC(kernel='poly', C=c, degree=d)
        model = svm.fit(train_samp, trlab_samp)
        globals()['model%s' % i] = model
        d_coef = svm.dual_coef_
        support = svm.n_support_
        sv = svm.support_
    
        p_tr = svm.predict(train_samp)
        a_tr = accuracy_score(trlab_samp, p_tr)
    
        pred = svm.predict(test_samp)
        a = accuracy_score(tslab_samp, pred)
    
        coefficient.append(d_coef)
        n_supp.append(support)
        sup_vec.append(sv)
        df.loc[i] = [c,d,a_tr,a]
        i=i+1
In [41]:
df
Out[41]:
c degree train_acc test_acc
0 0.01 2.0 0.114167 0.116
1 0.01 3.0 0.114167 0.116
2 0.01 4.0 0.114167 0.116
3 0.01 5.0 0.114167 0.116
4 0.01 6.0 0.114167 0.116
5 0.10 2.0 0.118000 0.122
6 0.10 3.0 0.114167 0.116
7 0.10 4.0 0.114167 0.116
8 0.10 5.0 0.114167 0.116
9 0.10 6.0 0.114167 0.116
10 1.00 2.0 0.658333 0.668
11 1.00 3.0 0.131667 0.131
12 1.00 4.0 0.114167 0.116
13 1.00 5.0 0.114167 0.116
14 1.00 6.0 0.114167 0.116
15 10.00 2.0 0.922667 0.907
16 10.00 3.0 0.546833 0.555
17 10.00 4.0 0.136500 0.140
18 10.00 5.0 0.114167 0.116
19 10.00 6.0 0.114167 0.116
20 100.00 2.0 0.981333 0.945
21 100.00 3.0 0.889167 0.864
22 100.00 4.0 0.455667 0.448
23 100.00 5.0 0.146167 0.149
24 100.00 6.0 0.115333 0.116

Comment on the bias and variance of the SVC classifier with respect to C and gamma. Comment on the results overall in comparison to LinearSVC. What values would you choose?

We also see that the polynomial kernel behaves very wierdly, the reason being, change in cost and degree effects the entire polynomial hyperplane rather than localised hyperplane which is the case in rbf kernel, hence poly kernel is less stable than rbf.

We see a bias variance trade off in the table. As the cost and degree increases, the Testing accuracy decreases, as we see over fitting. We see the model overfitting and we see Low Bias and High Variance. Interesting thing is, keeping the cost constant, we increase the degree we get immediate overfitting. And the cost behaves the same as it did in Linear SVC only difference being, the best model performance here is C=100 and degree=2
So as we increase the Cost: Bias Decreases and Variance Increases
So as we increase the Degree: Bias Decreases and Variance Increases

We choose C=100 and Degree=2 to look at the Support vectors

In [81]:
pd.DataFrame(coefficient[20]) # dual_coef_
Out[81]:
0 1 2 3 4 5 6 7 8 9 ... 3682 3683 3684 3685 3686 3687 3688 3689 3690 3691
0 10.0 0.0 10.0 0.000000 0.0 0.000000 10.0 0.000000 10.000000 10.0 ... -0.000000 -0.0 -0.000000 -10.000000 -10.0 -0.0 -0.00000 -2.879215 -0.000000 -10.0
1 10.0 10.0 10.0 4.625618 0.0 0.000000 10.0 0.000000 5.846203 10.0 ... -10.000000 -0.0 -0.000000 -0.000000 -10.0 -0.0 -0.00000 -0.000000 -0.000000 -0.0
2 10.0 0.0 10.0 10.000000 0.0 2.985243 10.0 0.000000 0.000000 10.0 ... -2.875142 -0.0 -10.000000 -10.000000 -10.0 -0.0 -0.00000 -0.000000 -0.000000 -10.0
3 10.0 0.0 10.0 10.000000 0.0 0.000000 10.0 0.000000 10.000000 10.0 ... -10.000000 -0.0 -0.000000 -0.000000 -10.0 -0.0 -0.00000 -10.000000 -0.000000 -0.0
4 10.0 10.0 10.0 10.000000 10.0 10.000000 10.0 0.000000 10.000000 10.0 ... -10.000000 -0.0 -10.000000 -10.000000 -10.0 -10.0 -10.00000 -10.000000 -0.000000 -10.0
5 10.0 0.0 10.0 10.000000 0.0 0.000000 10.0 0.000000 10.000000 10.0 ... -10.000000 -10.0 -0.000000 -8.748646 -10.0 -0.0 -0.00000 -10.000000 -0.000000 -0.0
6 10.0 0.0 10.0 10.000000 0.0 10.000000 10.0 6.083243 10.000000 10.0 ... -4.797141 -0.0 -5.501533 -8.288558 -10.0 -0.0 -0.00000 -0.000000 -0.000000 -10.0
7 10.0 10.0 10.0 10.000000 0.0 10.000000 10.0 0.000000 0.000000 10.0 ... -10.000000 -0.0 -10.000000 -10.000000 -10.0 -10.0 -6.64933 -10.000000 -2.200562 -10.0
8 10.0 0.0 10.0 10.000000 0.0 10.000000 10.0 1.573163 10.000000 10.0 ... -10.000000 -10.0 -0.000000 -0.000000 -10.0 -0.0 -0.00000 -10.000000 -0.000000 -10.0

9 rows × 3692 columns

In [82]:
pd.DataFrame(n_supp[20]) # n_support_
Out[82]:
0
0 246
1 321
2 366
3 399
4 437
5 431
6 294
7 383
8 388
9 427

Sampling one positive support vector for each class

In [83]:
ind = 0
matplot.subplots(2,5, figsize=(24,10))
for i in range(len(n_supp[20])):
    l1 = matplot.subplot(2, 5, i + 1)
    sv_image = train_samp[sup_vec[20][ind:ind+n_supp[20][i]]][0]
    l1.imshow(sv_image.reshape(28, 28), cmap=matplot.cm.RdBu)
    l1.set_xticks(())
    l1.set_yticks(())
    l1.set_xlabel('Class %i vs All' % i)
    ind = ind + n_supp[20][i]
matplot.suptitle('Support Vectors for Positive Classes')
matplot.show()

Sampling one negative support vector for each class

In [84]:
ind = n_supp[20][0]
matplot.subplots(2,5, figsize=(24,10))
for i in range(len(n_supp[20])-1):
    l1 = matplot.subplot(2, 5, i + 1)
    sv_image = train_samp[sup_vec[20][ind:ind+n_supp[20][i+1]]][100]
    l1.imshow(sv_image.reshape(28, 28), cmap=matplot.cm.RdBu)
    l1.set_xticks(())
    l1.set_yticks(())
    l1.set_xlabel('Class %i vs All' % i)
    ind = ind + n_supp[20][i+1]
ind = 0
l1 = matplot.subplot(2, 5, 10)
sv_image = train_samp[sup_vec[20][ind:ind+n_supp[20][0]]][100]
l1.imshow(sv_image.reshape(28, 28), cmap=matplot.cm.RdBu)
l1.set_xticks(())
l1.set_yticks(())
l1.set_xlabel('Class 9 vs All')
matplot.suptitle('Support Vectors for Negative Classes')
matplot.show()

We can clearly interpret that the SVs in positive class are clearly distinguishable, while the SVs on the negative class show a clear difference from the class they are in, say for eg, one SV negative class 9 vs All looks no where near as a 9, which makes it better at classifying that particular class i.e. 9 in this case.

To Summarize the Model Performance (test accuracy):

Linear SVC (best performance): 92 %
SVC rbf (best performance): 96.4 %
SVC poly (best performance): 94.5 %
Logistic regression (prev assignment): 89 %
Naive Bayes (prev assignment): 81 %


It is clear that SVM with 'rbf' kernel gives the best result among all these models.

-------------------------- X --------------------------