Dhairya Kothari¶

SVM with MNIST¶

Parts:¶

- (1) Exploring SVM¶

- (2) SVM with RBF kernel¶

- (3) SVM with Poly kernel¶

Describe how the multi-class classification is different for SVC and LinearSVC. Be explicit, don't just describe what's in the documentation. For example, what does 'one-against-one' and 'one-vs-the-rest' mean?

The one-against-one classifier trains binary classifer for N class (multi class) data set. Each classifier recieves a pair of classes from the training set and we learn to classify between these two labels/classes. On the other hand, One versus Rest approach, we train on classifier per class, with the samples from that class labelled as Postive class and the rest as Negative class, and repeating these N times gives us a N class classifier. Now, all the samples are given Weights (probablity) for each class and from them we choose a winner class, giving the final Label. In order to perform Multi class classification we need to transform into a set of binary classification problem. When it comes to multi class classification The main difference between SVC and LinearSVC is they use One Vs One and One Vs Rest approach. One clear difference in SVC and Linear SVC is: SVC offers us different Kernels (rbf or poly) while LinearSVC just produces a linear margin of seperation. While in SVC the max iterations are infinite, LinearSVC limits them to 1000.

The last major difference is, in LinearSVC we have an option to choose between dual form of SVM or single form. In SVC we do no have that option.¶

Importing the packages and data with Tensorflow¶

from scipy.stats import mode
import numpy as np
#from mnist import MNIST
from time import time
import pandas as pd
import os
import matplotlib.pyplot as matplot
import matplotlib
%matplotlib inline

import random
matplot.rcdefaults()
from IPython.display import display, HTML
from itertools import chain
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
import seaborn as sb
from sklearn.model_selection import ParameterGrid
from sklearn.svm import SVC, LinearSVC
import warnings
warnings.filterwarnings('ignore')

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data/')

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz

train = mnist.train.images
validation = mnist.validation.images
test = mnist.test.images

trlab = mnist.train.labels
vallab = mnist.validation.labels
tslab = mnist.test.labels

train = np.concatenate((train, validation), axis=0)
trlab = np.concatenate((trlab, vallab), axis=0)

Imp thing to remember: Data is 0-1 normalized¶

We save a lot of compute time by keeping the data that way and we dont lose any significant amount of accuracy

Linear SVC¶

Running a Sample Linear SVM classifier on default values to see how the model does on MNIST data

svm = LinearSVC(dual=False)
svm.fit(train, trlab)

LinearSVC(C=1.0, class_weight=None, dual=False, fit_intercept=True,
     intercept_scaling=1, loss='squared_hinge', max_iter=1000,
     multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
     verbose=0)

svm.coef_
svm.intercept_

array([-1.20849557, -0.1362278 , -0.81846194, -1.19352824, -0.50981085,
        0.03587096, -1.14999805, -0.24171445, -2.0858455 , -1.32422686])

pred = svm.predict(test)

accuracy_score(tslab, pred) # Accuracy

0.91820000000000002

cm = confusion_matrix(tslab, pred)
matplot.subplots(figsize=(10, 6))
sb.heatmap(cm, annot = True, fmt = 'g')
matplot.xlabel("Predicted")
matplot.ylabel("Actual")
matplot.title("Confusion Matrix")
matplot.show()

As we can see that the SVM does a pretty decent job at classifying, we still get the usual misclassification on 5-8, 2-8, 5-3, 4-9. However, accuracy of 91.82% is good

(i)¶

Running Linear SVC for multiple cost factor(s) C¶

acc = []
acc_tr = []
coefficient = []
for c in [0.0001,0.001,0.01,0.1,1,10,100,1000,10000]:
    svm = LinearSVC(dual=False, C=c)
    svm.fit(train, trlab)
    coef = svm.coef_
    
    p_tr = svm.predict(train)
    a_tr = accuracy_score(trlab, p_tr)
    
    pred = svm.predict(test)
    a = accuracy_score(tslab, pred)
    
    coefficient.append(coef)
    acc_tr.append(a_tr)
    acc.append(a)

c = [0.0001,0.001,0.01,0.1,1,10,100,1000,10000]

matplot.subplots(figsize=(10, 5))
matplot.semilogx(c, acc,'-gD' ,color='red' , label="Testing Accuracy")
matplot.semilogx(c, acc_tr,'-gD' , label="Training Accuracy")
#matplot.xticks(L,L)
matplot.grid(True)
matplot.xlabel("Cost Parameter C")
matplot.ylabel("Accuracy")
matplot.legend()
matplot.title('Accuracy versus the Cost Parameter C (log-scale)')
matplot.show()

We clearly see a bias variance trade off in the graph. As the cost increases, the Training accuracy increases, so as the test accuracy, but only till c=1, then we see over fitting. From, c=10 to 1000 we see the model overfitting and we see Low Bias and High Variance
So as we go from Left to Right: Bias Decreases and Variance Increases

(ii)¶

We choose the model with best testing accuracy i.e. c = 1¶

svm_coef = coefficient[4]
svm_coef.shape

(10, 784)

matplot.subplots(2,5, figsize=(24,10))
for i in range(10):
    l1 = matplot.subplot(2, 5, i + 1)
    l1.imshow(svm_coef[i].reshape(28, 28), cmap=matplot.cm.RdBu)
    l1.set_xticks(())
    l1.set_yticks(())
    l1.set_xlabel('Class %i' % i)
matplot.suptitle('Class Coefficients')
matplot.show()

These images look nothing like the images we saw in Logistic regression or Naive Bayes. In Naive Bayes, the underlying number was clearly visible, while in Logistice regression the pattern seemed quite distinct between all the classes. However, here you dont see any apparant patterns or distinctness and really hard to differentiate.

(iii)¶

Linear SVC with Penalty: l1¶

acc = []
acc_tr = []
coefficient = []
for c in [0.0001,0.001,0.01,0.1,1,10,100,1000,10000]:
    svm = LinearSVC(dual=False, C=c, penalty='l1')
    svm.fit(train, trlab)
    coef = svm.coef_
    
    p_tr = svm.predict(train)
    a_tr = accuracy_score(trlab, p_tr)
    
    pred = svm.predict(test)
    a = accuracy_score(tslab, pred)
    
    coefficient.append(coef)
    acc_tr.append(a_tr)
    acc.append(a)

c = [0.0001,0.001,0.01,0.1,1,10,100,1000,10000]

matplot.subplots(figsize=(10, 5))
matplot.semilogx(c, acc,'-gD' ,color='red' , label="Testing Accuracy")
matplot.semilogx(c, acc_tr,'-gD' , label="Training Accuracy")
#matplot.xticks(L,L)
matplot.grid(True)
matplot.xlabel("Cost Parameter C")
matplot.ylabel("Accuracy")
matplot.legend()
matplot.title('Accuracy versus the Cost Parameter C (log-scale)')
matplot.show()

Exact same thing with just a slight difference is clearly observed here as well. We see a bias variance trade off in the graph. As the cost increases, the Training accuracy increases, so as the test accuracy, but only till c=1, then we see over fitting. From, c=10 to 1000 we see the model overfitting and we see Low Bias and High Variance. Only thing is with L1 Penalty we have a lesser effect of overfitting and the model performs really poorly with lesser cost values.
Again, as we go from Left to Right: Bias Decreases and Variance Increases

Once more, we choose the model with best testing accuracy i.e. c = 1¶

svm_coef = coefficient[4]
svm_coef.shape

(10, 784)

matplot.subplots(2,5, figsize=(24,10))
for i in range(10):
    l1 = matplot.subplot(2, 5, i + 1)
    l1.imshow(svm_coef[i].reshape(28, 28), cmap=matplot.cm.RdBu)
    l1.set_xticks(())
    l1.set_yticks(())
    l1.set_xlabel('Class %i' % i)
matplot.suptitle('Class Coefficients')
matplot.show()

It reflects my views on Linear SVC with L2 (default) penalty, these images look very vaguely like the original images, also different than we saw in Logistic regression or Naive Bayes. In Naive Bayes, the underlying number was clearly visible, while in Logistice regression the pattern seemed quite distinct between all the classes. However, here you dont see any apparant patterns or distinctness and hard to differentiate between classes. However, we can also interpret some numbers like 0,5,6,8

Another important observation is, the images are also different looking than its L2 siblings as well, not by a bigger margin, but still different none the less.

SVC RBF kernel¶

from scipy.stats import mode
import numpy as np
from time import time
import pandas as pd
import os
import matplotlib.pyplot as matplot
import matplotlib
%matplotlib inline

import random
matplot.rcdefaults()
from IPython.display import display, HTML
from itertools import chain
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
import seaborn as sb
from sklearn.model_selection import ParameterGrid
from sklearn.svm import SVC, LinearSVC
import warnings
warnings.filterwarnings('ignore')

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data/')

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz

train = mnist.train.images
validation = mnist.validation.images
test = mnist.test.images

trlab = mnist.train.labels
vallab = mnist.validation.labels
tslab = mnist.test.labels

train = np.concatenate((train, validation), axis=0)
trlab = np.concatenate((trlab, vallab), axis=0)

We generate a random sample of the data and check how the distribution is compared to the original distribution¶

We choose sampling as running these many models would increase the time complexity as well. SO keeping time constraint in mind I sample 10% of the data.

#generating a random sequence for sampling
seq = np.random.randint(0,60000,6000)
train_samp = train[seq]
trlab_samp = trlab[seq]

train_samp.shape
trlab_samp.shape

(6000,)

seq = np.random.randint(0,10000,1000)
test_samp = test[seq]
tslab_samp = tslab[seq]

test_samp.shape
tslab_samp.shape

(1000,)

fig, ax = matplot.subplots(1,2, figsize=(10,4))
ax[0].hist(trlab_samp)
ax[1].hist(trlab)
fig.show
matplot.show()

(i)¶

Running SVC for multiple cost factor(s) C and Gamma¶

coefficient = []
n_supp = []
sup_vec = []
i = 0
df = pd.DataFrame(columns = ['c','gamma','train_acc','test_acc'])
for c in [0.01, 0.1, 1, 10, 100]:
    for g in [0.01, 0.1, 1, 10, 100]:
        svm = SVC(kernel='rbf', C=c, gamma=g)
        model = svm.fit(train_samp, trlab_samp)
        globals()['model%s' % i] = model
        d_coef = svm.dual_coef_
        support = svm.n_support_
        sv = svm.support_
    
        p_tr = svm.predict(train_samp)
        a_tr = accuracy_score(trlab_samp, p_tr)
    
        pred = svm.predict(test_samp)
        a = accuracy_score(tslab_samp, pred)
    
        coefficient.append(d_coef)
        n_supp.append(support)
        sup_vec.append(sv)
        df.loc[i] = [c,g,a_tr,a]
        i=i+1

df

Comment on the bias and variance of the SVC classifier with respect to C and gamma. Comment on the results overall in comparison to LinearSVC. What values would you choose?

We see a bias variance trade off in the table. As the cost and gamma increases, the Testing accuracy decreases, as we see over fitting. We see the model overfitting and we see Low Bias and High Variance. Interesting thing is, keeping the cost constant, we increase the Gamma we get immediate overfitting. And the cost behaves the same as it did in Linear SVC only difference being, the best model performance here is C=10 and Gamma=0.01
So as we increase the Cost: Bias Decreases and Variance Increases
So as we increase the Gamma: Bias Decreases and Variance Increases

(ii)¶

We choose C=10 and Gamma=0.01 to look at the Support vectors¶

pd.DataFrame(coefficient[15]) # dual_coef_

The support vectors identified by the SVC each belong to a certain class (0 to 9). In the dual coefficients, they are ordered according to the class they belong to. The support vectors are organized according to these two variables. Each support vector being clearly identified with one class, it becomes evident that it can be implied in at most n_classes-1 one-vs-one problems, viz every comparison with all the other classes. But it is entirely possible that a given support vector will not be implied in all one-vs-one problems. SVC also gives you the weights of the support vectors for the classes 0, 1, ..., 9 in their respective one-vs-one problems. Comparisons to all other classes except its own are made, resulting in n_classes - 1 i.e. 9 columns. The order in which this happens follows the order of the unique classes exposed above. There are as many rows in each group as there are support vectors i.e. 2477.

pd.DataFrame(n_supp[15]) # n_support_

"nsupport" divides the number of support vestors by the class. So we can say that when class 0 has 180 support vectors, it means 180 are the positive support vectors and rest all are the negative support vectors for 0-versus-rest classifier.

Sampling one positive support vector for each class¶

ind = 0
matplot.subplots(2,5, figsize=(24,10))
for i in range(len(n_supp[15])):
    l1 = matplot.subplot(2, 5, i + 1)
    sv_image = train_samp[sup_vec[15][ind:ind+n_supp[15][i]]][0]
    l1.imshow(sv_image.reshape(28, 28), cmap=matplot.cm.RdBu)
    l1.set_xticks(())
    l1.set_yticks(())
    l1.set_xlabel('Class %i vs All' % i)
    ind = ind + n_supp[15][i]
matplot.suptitle('Support Vectors for Positive Classes')
matplot.show()

Sampling one negative support vector for each class¶

ind = n_supp[15][0]
matplot.subplots(2,5, figsize=(24,10))
for i in range(len(n_supp[15])-1):
    l1 = matplot.subplot(2, 5, i + 1)
    sv_image = train_samp[sup_vec[15][ind:ind+n_supp[15][i+1]]][100]
    l1.imshow(sv_image.reshape(28, 28), cmap=matplot.cm.RdBu)
    l1.set_xticks(())
    l1.set_yticks(())
    l1.set_xlabel('Class %i vs All' % i)
    ind = ind + n_supp[15][i+1]
ind = 0
l1 = matplot.subplot(2, 5, 10)
sv_image = train_samp[sup_vec[15][ind:ind+n_supp[15][0]]][100]
l1.imshow(sv_image.reshape(28, 28), cmap=matplot.cm.RdBu)
l1.set_xticks(())
l1.set_yticks(())
l1.set_xlabel('Class 9 vs All')
matplot.suptitle('Support Vectors for Negative Classes')
matplot.show()

We can clearly interpret that the SVs in positive class are clearly distinguishable, while the SVs on the negative class show a clear difference from the class they are in, say for eg, one SV negative class 9 vs All looks no where near as a 9, which makes it better at classifying that particular class i.e. 9 in this case.¶

SVC Poly kernel¶

from scipy.stats import mode
import numpy as np
from time import time
import pandas as pd
import os
import matplotlib.pyplot as matplot
import matplotlib
%matplotlib inline

import random
matplot.rcdefaults()
from IPython.display import display, HTML
from itertools import chain
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
import seaborn as sb
from sklearn.model_selection import ParameterGrid
from sklearn.svm import SVC, LinearSVC
import warnings
warnings.filterwarnings('ignore')

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data/')

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz

train = mnist.train.images
validation = mnist.validation.images
test = mnist.test.images

trlab = mnist.train.labels
vallab = mnist.validation.labels
tslab = mnist.test.labels

train = np.concatenate((train, validation), axis=0)
trlab = np.concatenate((trlab, vallab), axis=0)

#generating a random sequence for sampling
seq = np.random.randint(0,60000,6000)
train_samp = train[seq]
trlab_samp = trlab[seq]

train_samp.shape
trlab_samp.shape

(6000,)

seq = np.random.randint(0,10000,1000)
test_samp = test[seq]
tslab_samp = tslab[seq]

test_samp.shape
tslab_samp.shape

(1000,)

(iii)¶

Running SVC for multiple cost factor(s) C and Degree¶

coefficient = []
n_supp = []
sup_vec = []
i = 0
df = pd.DataFrame(columns = ['c','degree','train_acc','test_acc'])
for c in [0.01, 0.1, 1, 10, 100]:
    for d in [2,3,4,5,6]:
        svm = SVC(kernel='poly', C=c, degree=d)
        model = svm.fit(train_samp, trlab_samp)
        globals()['model%s' % i] = model
        d_coef = svm.dual_coef_
        support = svm.n_support_
        sv = svm.support_
    
        p_tr = svm.predict(train_samp)
        a_tr = accuracy_score(trlab_samp, p_tr)
    
        pred = svm.predict(test_samp)
        a = accuracy_score(tslab_samp, pred)
    
        coefficient.append(d_coef)
        n_supp.append(support)
        sup_vec.append(sv)
        df.loc[i] = [c,d,a_tr,a]
        i=i+1

df

Comment on the bias and variance of the SVC classifier with respect to C and gamma. Comment on the results overall in comparison to LinearSVC. What values would you choose?

We also see that the polynomial kernel behaves very wierdly, the reason being, change in cost and degree effects the entire polynomial hyperplane rather than localised hyperplane which is the case in rbf kernel, hence poly kernel is less stable than rbf.

We see a bias variance trade off in the table. As the cost and degree increases, the Testing accuracy decreases, as we see over fitting. We see the model overfitting and we see Low Bias and High Variance. Interesting thing is, keeping the cost constant, we increase the degree we get immediate overfitting. And the cost behaves the same as it did in Linear SVC only difference being, the best model performance here is C=100 and degree=2
So as we increase the Cost: Bias Decreases and Variance Increases
So as we increase the Degree: Bias Decreases and Variance Increases

We choose C=100 and Degree=2 to look at the Support vectors¶

pd.DataFrame(coefficient[20]) # dual_coef_

pd.DataFrame(n_supp[20]) # n_support_

Sampling one positive support vector for each class¶

ind = 0
matplot.subplots(2,5, figsize=(24,10))
for i in range(len(n_supp[20])):
    l1 = matplot.subplot(2, 5, i + 1)
    sv_image = train_samp[sup_vec[20][ind:ind+n_supp[20][i]]][0]
    l1.imshow(sv_image.reshape(28, 28), cmap=matplot.cm.RdBu)
    l1.set_xticks(())
    l1.set_yticks(())
    l1.set_xlabel('Class %i vs All' % i)
    ind = ind + n_supp[20][i]
matplot.suptitle('Support Vectors for Positive Classes')
matplot.show()

Sampling one negative support vector for each class¶

ind = n_supp[20][0]
matplot.subplots(2,5, figsize=(24,10))
for i in range(len(n_supp[20])-1):
    l1 = matplot.subplot(2, 5, i + 1)
    sv_image = train_samp[sup_vec[20][ind:ind+n_supp[20][i+1]]][100]
    l1.imshow(sv_image.reshape(28, 28), cmap=matplot.cm.RdBu)
    l1.set_xticks(())
    l1.set_yticks(())
    l1.set_xlabel('Class %i vs All' % i)
    ind = ind + n_supp[20][i+1]
ind = 0
l1 = matplot.subplot(2, 5, 10)
sv_image = train_samp[sup_vec[20][ind:ind+n_supp[20][0]]][100]
l1.imshow(sv_image.reshape(28, 28), cmap=matplot.cm.RdBu)
l1.set_xticks(())
l1.set_yticks(())
l1.set_xlabel('Class 9 vs All')
matplot.suptitle('Support Vectors for Negative Classes')
matplot.show()

We can clearly interpret that the SVs in positive class are clearly distinguishable, while the SVs on the negative class show a clear difference from the class they are in, say for eg, one SV negative class 9 vs All looks no where near as a 9, which makes it better at classifying that particular class i.e. 9 in this case.¶

To Summarize the Model Performance (test accuracy):¶

Linear SVC (best performance): 92 %
SVC rbf (best performance): 96.4 %
SVC poly (best performance): 94.5 %
Logistic regression (prev assignment): 89 %
Naive Bayes (prev assignment): 81 %

It is clear that SVM with 'rbf' kernel gives the best result among all these models.

	c	gamma	train_acc	test_acc
0	0.01	0.01	0.644333	0.657
1	0.01	0.10	0.110500	0.133
2	0.01	1.00	0.110500	0.133
3	0.01	10.00	0.110500	0.133
4	0.01	100.00	0.110500	0.133
5	0.10	0.01	0.918833	0.911
6	0.10	0.10	0.229500	0.231
7	0.10	1.00	0.110500	0.133
8	0.10	10.00	0.110500	0.133
9	0.10	100.00	0.110500	0.133
10	1.00	0.01	0.977167	0.951
11	1.00	0.10	1.000000	0.868
12	1.00	1.00	1.000000	0.157
13	1.00	10.00	1.000000	0.133
14	1.00	100.00	1.000000	0.133
15	10.00	0.01	1.000000	0.964
16	10.00	0.10	1.000000	0.877
17	10.00	1.00	1.000000	0.160
18	10.00	10.00	1.000000	0.133
19	10.00	100.00	1.000000	0.133
20	100.00	0.01	1.000000	0.964
21	100.00	0.10	1.000000	0.877
22	100.00	1.00	1.000000	0.160
23	100.00	10.00	1.000000	0.133
24	100.00	100.00	1.000000	0.133

	0	1	2	3	4	5	6	7	8	9	...	2467	2468	2469	2470	2471	2472	2473	2474	2475	2476
0	0.000000	0.000000	0.00000	0.000000	0.00000	1.233223	0.463437	0.000000	0.000000	0.00000	...	-0.000000	-0.000000	-0.000000	-0.665725	-0.000000	-0.000000	-0.979185	-0.000000	-0.651719	-0.286580
1	0.733241	0.338779	0.00000	0.000000	0.00000	1.599885	0.847806	0.000000	0.148967	0.00000	...	-0.000000	-0.000000	-0.000000	-2.656140	-0.000000	-0.000000	-0.000000	-0.885662	-0.000000	-0.318750
2	0.000000	0.000000	0.16819	0.000000	0.00000	1.332503	1.047378	0.000000	0.124441	0.00000	...	-0.500293	-0.000000	-0.000000	-1.489073	-0.000000	-0.000000	-0.000000	-0.464601	-0.450088	-0.182432
3	0.000000	0.000000	0.00000	0.000000	0.00000	3.575345	0.498475	0.000000	0.188376	0.00000	...	-0.000000	-0.000000	-0.000000	-3.857057	-0.000000	-0.000000	-1.831419	-0.000000	-0.000000	-3.918929
4	0.000000	0.000000	0.00000	0.000000	0.00000	2.451250	2.909456	0.000000	0.788204	0.03408	...	-0.000000	-0.000000	-0.435685	-0.000000	-2.925009	-1.568582	-0.503543	-2.701709	-0.000000	-0.974694
5	0.000000	0.000000	0.00000	0.000000	0.00000	3.574567	0.000000	0.000000	0.000000	0.00000	...	-0.000000	-0.000000	-0.000000	-1.288726	-0.000000	-0.000000	-4.472535	-0.038601	-0.000000	-2.727674
6	0.000000	0.000000	0.00000	0.000000	0.00000	2.472752	1.138073	0.000000	0.000000	0.00000	...	-0.000000	-0.000000	-0.000000	-0.782389	-0.000000	-0.000000	-1.677264	-0.057191	-0.000000	-0.000000
7	0.000000	0.000000	0.00000	0.425428	0.90783	0.350570	0.944754	0.000000	0.000000	0.00000	...	-0.107843	-7.508478	-0.000000	-1.582007	-0.000000	-0.000000	-0.000000	-1.648960	-0.000000	-0.000000
8	0.000000	0.000000	0.00000	0.000000	0.00000	2.271634	0.520486	0.290487	0.000000	0.00000	...	-0.821034	-0.000000	-0.000000	-5.498512	-0.000000	-0.000000	-3.585487	-0.217215	-0.656942	-2.623120

	0
0	180
1	144
2	270
3	274
4	277
5	303
6	220
7	212
8	312
9	285

	c	degree	train_acc	test_acc
0	0.01	2.0	0.114167	0.116
1	0.01	3.0	0.114167	0.116
2	0.01	4.0	0.114167	0.116
3	0.01	5.0	0.114167	0.116
4	0.01	6.0	0.114167	0.116
5	0.10	2.0	0.118000	0.122
6	0.10	3.0	0.114167	0.116
7	0.10	4.0	0.114167	0.116
8	0.10	5.0	0.114167	0.116
9	0.10	6.0	0.114167	0.116
10	1.00	2.0	0.658333	0.668
11	1.00	3.0	0.131667	0.131
12	1.00	4.0	0.114167	0.116
13	1.00	5.0	0.114167	0.116
14	1.00	6.0	0.114167	0.116
15	10.00	2.0	0.922667	0.907
16	10.00	3.0	0.546833	0.555
17	10.00	4.0	0.136500	0.140
18	10.00	5.0	0.114167	0.116
19	10.00	6.0	0.114167	0.116
20	100.00	2.0	0.981333	0.945
21	100.00	3.0	0.889167	0.864
22	100.00	4.0	0.455667	0.448
23	100.00	5.0	0.146167	0.149
24	100.00	6.0	0.115333	0.116

	0	1	2	3	4	5	6	7	8	9	...	3682	3683	3684	3685	3686	3687	3688	3689	3690	3691
0	10.0	0.0	10.0	0.000000	0.0	0.000000	10.0	0.000000	10.000000	10.0	...	-0.000000	-0.0	-0.000000	-10.000000	-10.0	-0.0	-0.00000	-2.879215	-0.000000	-10.0
1	10.0	10.0	10.0	4.625618	0.0	0.000000	10.0	0.000000	5.846203	10.0	...	-10.000000	-0.0	-0.000000	-0.000000	-10.0	-0.0	-0.00000	-0.000000	-0.000000	-0.0
2	10.0	0.0	10.0	10.000000	0.0	2.985243	10.0	0.000000	0.000000	10.0	...	-2.875142	-0.0	-10.000000	-10.000000	-10.0	-0.0	-0.00000	-0.000000	-0.000000	-10.0
3	10.0	0.0	10.0	10.000000	0.0	0.000000	10.0	0.000000	10.000000	10.0	...	-10.000000	-0.0	-0.000000	-0.000000	-10.0	-0.0	-0.00000	-10.000000	-0.000000	-0.0
4	10.0	10.0	10.0	10.000000	10.0	10.000000	10.0	0.000000	10.000000	10.0	...	-10.000000	-0.0	-10.000000	-10.000000	-10.0	-10.0	-10.00000	-10.000000	-0.000000	-10.0
5	10.0	0.0	10.0	10.000000	0.0	0.000000	10.0	0.000000	10.000000	10.0	...	-10.000000	-10.0	-0.000000	-8.748646	-10.0	-0.0	-0.00000	-10.000000	-0.000000	-0.0
6	10.0	0.0	10.0	10.000000	0.0	10.000000	10.0	6.083243	10.000000	10.0	...	-4.797141	-0.0	-5.501533	-8.288558	-10.0	-0.0	-0.00000	-0.000000	-0.000000	-10.0
7	10.0	10.0	10.0	10.000000	0.0	10.000000	10.0	0.000000	0.000000	10.0	...	-10.000000	-0.0	-10.000000	-10.000000	-10.0	-10.0	-6.64933	-10.000000	-2.200562	-10.0
8	10.0	0.0	10.0	10.000000	0.0	10.000000	10.0	1.573163	10.000000	10.0	...	-10.000000	-10.0	-0.000000	-0.000000	-10.0	-0.0	-0.00000	-10.000000	-0.000000	-10.0

	0
0	246
1	321
2	366
3	399
4	437
5	431
6	294
7	383
8	388
9	427

Dhairya Kothari¶

SVM with MNIST¶

Parts:¶

- (1) Exploring SVM¶

- (2) SVM with RBF kernel¶

- (3) SVM with Poly kernel¶

The last major difference is, in LinearSVC we have an option to choose between dual form of SVM or single form. In SVC we do no have that option.¶

Importing the packages and data with Tensorflow¶

Imp thing to remember: Data is 0-1 normalized¶

Linear SVC¶

(i)¶

Running Linear SVC for multiple cost factor(s) C¶

(ii)¶

We choose the model with best testing accuracy i.e. c = 1¶

(iii)¶

Linear SVC with Penalty: l1¶

Once more, we choose the model with best testing accuracy i.e. c = 1¶

SVC RBF kernel¶

We generate a random sample of the data and check how the distribution is compared to the original distribution¶

(i)¶

Running SVC for multiple cost factor(s) C and Gamma¶

(ii)¶

We choose C=10 and Gamma=0.01 to look at the Support vectors¶

Sampling one positive support vector for each class¶

Sampling one negative support vector for each class¶

SVC Poly kernel¶

(iii)¶

Running SVC for multiple cost factor(s) C and Degree¶

We choose C=100 and Degree=2 to look at the Support vectors¶

Sampling one positive support vector for each class¶

Sampling one negative support vector for each class¶

To Summarize the Model Performance (test accuracy):¶

-------------------------- X --------------------------¶