Dhairya Kothari

Implement a multi-layer perceptron to classify the MNIST data that we have been working with all semester. Use MLPClassifier in sklearn.

In [1]:
from scipy.stats import mode
import numpy as np
#from mnist import MNIST
from time import time
import pandas as pd
import os
import matplotlib.pyplot as matplot
import matplotlib
%matplotlib inline

import random
matplot.rcdefaults()
from time import time
from IPython.display import display, HTML
from itertools import chain
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
import seaborn as sb
from sklearn.model_selection import ParameterGrid
from sklearn.neural_network import MLPClassifier
import warnings
warnings.filterwarnings('ignore')
In [2]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data/')
Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
In [3]:
train = mnist.train.images
validation = mnist.validation.images
test = mnist.test.images

trlab = mnist.train.labels
vallab = mnist.validation.labels
tslab = mnist.test.labels

train = np.concatenate((train, validation), axis=0)
trlab = np.concatenate((trlab, vallab), axis=0)
In [17]:
mlp = MLPClassifier()
mlp.fit(train, trlab)
Out[17]:
MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)
In [19]:
accuracy_score(tslab, mlp.predict(test)) # Test Accuracy
Out[19]:
0.9778

(a) Find a good combination of parameter values for the MLPClassifier that provides the best accuracy on the 10,000 test images. Describe your parameter choices and why you believe these values are good choices.


We choose Alpha and Max_iter as the parameter to run the model on and select the best from those.

According to Scikit Learn- MLP classfier documentation,
Alpha is L2 or ridge penalty (regularization term) parameter.
Max_iter is Maximum number of iterations, the solver iterates until convergence.
So, these ones makes sense as we could see some changing values of accuracy while tuning them,

In [4]:
i = 0
df = pd.DataFrame(columns = ['alpha','max_iter','train_acc','test_acc','train_time'])
for a in [0.00001,0.0001,0.001,0.01, 0.1, 1, 10]:
    for mi in [10,100,200,500,1000,2000]:
        st = time()
        mlp = MLPClassifier(alpha=a, max_iter=mi)
        mlp.fit(train, trlab)
        end = time() - st
        
        acc_tr = accuracy_score(trlab, mlp.predict(train)) # Train Accuracy
        acc = accuracy_score(tslab, mlp.predict(test)) # Test Accuracy
        df.loc[i] = [a,mi,acc_tr,acc,end]
        i=i+1
In [5]:
df # Results
Out[5]:
alpha max_iter train_acc test_acc train_time
0 0.00001 10.0 0.989533 0.9751 9.513362
1 0.00001 100.0 1.000000 0.9792 35.246778
2 0.00001 200.0 0.999950 0.9782 35.320454
3 0.00001 500.0 1.000000 0.9787 42.534170
4 0.00001 1000.0 0.999650 0.9764 35.734075
5 0.00001 2000.0 0.999983 0.9781 39.468011
6 0.00010 10.0 0.989417 0.9758 9.325813
7 0.00010 100.0 0.999900 0.9782 35.416230
8 0.00010 200.0 0.992400 0.9709 44.222664
9 0.00010 500.0 0.998633 0.9764 40.425065
10 0.00010 1000.0 0.994300 0.9728 38.569622
11 0.00010 2000.0 0.999983 0.9776 43.423535
12 0.00100 10.0 0.988950 0.9743 9.373941
13 0.00100 100.0 0.999950 0.9781 33.523193
14 0.00100 200.0 0.999917 0.9795 31.724409
15 0.00100 500.0 0.999633 0.9752 42.274478
16 0.00100 1000.0 0.999400 0.9778 31.284236
17 0.00100 2000.0 1.000000 0.9782 37.767680
18 0.01000 10.0 0.987850 0.9761 9.285706
19 0.01000 100.0 0.999000 0.9779 36.310609
20 0.01000 200.0 0.998700 0.9771 43.446597
21 0.01000 500.0 0.999167 0.9779 36.770835
22 0.01000 1000.0 0.999500 0.9796 40.686252
23 0.01000 2000.0 0.997367 0.9775 36.911208
24 0.10000 10.0 0.983750 0.9731 9.193460
25 0.10000 100.0 0.990917 0.9787 32.303950
26 0.10000 200.0 0.992667 0.9802 33.043918
27 0.10000 500.0 0.991667 0.9787 37.553917
28 0.10000 1000.0 0.992283 0.9779 33.461008
29 0.10000 2000.0 0.991833 0.9797 36.753789
30 1.00000 10.0 0.953550 0.9536 9.187445
31 1.00000 100.0 0.957850 0.9568 25.442695
32 1.00000 200.0 0.959267 0.9581 26.905587
33 1.00000 500.0 0.957533 0.9601 24.393637
34 1.00000 1000.0 0.958333 0.9589 25.306217
35 1.00000 2000.0 0.958250 0.9568 43.656156
36 10.00000 10.0 0.882517 0.8880 9.103221
37 10.00000 100.0 0.877850 0.8825 20.827414
38 10.00000 200.0 0.881017 0.8848 22.719449
39 10.00000 500.0 0.882133 0.8890 15.773970
40 10.00000 1000.0 0.880600 0.8878 15.246566
41 10.00000 2000.0 0.883083 0.8910 24.275589

(b) Using the parameters you found in part (a) present a plot that compares the test accuracy vs. the number of nodes in the hidden layer. Comment on your results.

As we can see that Model number 26 gives the best testing accuracy so parameters we select are
Parameters:
Alpha = 0.1
Max Iterations = 200

In [7]:
acc = []
acc_tr = []
timelog = []
for l in [10,20,50,100,200,500,1000]:
    t = time()
    mlp = MLPClassifier(alpha=0.1, max_iter=200, hidden_layer_sizes=(l,))
    mlp.fit(train, trlab)
    endt = time() - t
        
    a_tr = accuracy_score(trlab, mlp.predict(train)) # Train Accuracy
    a = accuracy_score(tslab, mlp.predict(test)) # Test Accuracy

    acc_tr.append(a_tr)
    acc.append(a)
    timelog.append(endt)
In [15]:
l = [10,20,50,100,200,500,1000]
N = len(l)
l2 = np.arange(N)
matplot.subplots(figsize=(10, 5))
matplot.plot(l2, acc, label="Testing Accuracy")
matplot.plot(l2, acc_tr, label="Training Accuracy")
matplot.xticks(l2,l)
matplot.grid(True)
matplot.xlabel("Hidden Layer Nodes")
matplot.ylabel("Accuracy")
matplot.legend()
matplot.title('Accuracy versus Nodes in the Hidden Layer for MLPClassifier', fontsize=12)
matplot.show()

Inferences from the graph:

  • So as we go from left to right on this graph, the bias decreases and variance increases
  • As the nodes increases there is an increase in accuracy
  • That stops at 500 nodes, then we see overfitting kicking in
  • The accuracy drops after the nodes increases above 500 in the hidden layer
  • Giving us the optimum nodes = 500 for this problem
  • From the below graph, we can also see that as the nodes increases naturally the time increases as well
In [16]:
l = [10,20,50,100,200,500,1000]
N = len(l)
l2 = np.arange(N)
matplot.subplots(figsize=(10, 5))
matplot.plot(l2, timelog, label="Training time in s")
matplot.xticks(l2,l)
matplot.grid(True)
matplot.xlabel("Hidden Layer Nodes")
matplot.ylabel("Time (s)")
matplot.legend()
matplot.title('Training Time versus Nodes in the Hidden Layer for MLPClassifier', fontsize=12)
matplot.show()

Final model parameters for highest test accuracy:
Alpha = 0.1
Max Iterations = 200
Hidden Layer Nodes = 500

(c) How does the accuracy of your MLP classifier compare to what you found with KNN, Naïve Bayes, Logistic Regression, and SVM on this data set? How does the training time of the MLP classifier compare to the others?

Accuracy listed belongs to the best parameters in each model.

Model Accuracy Training Time
MLP ~98.2 % ~ 180 s
kNN ~97.17 % 24 minutes
NB 81.49 % 12 minutes
Log Reg ~87.77 % 2 hrs
SVM ~96.4 % 18 minutes

*Note: the training time to be taken with a grain of salt as all the times reported from the previous models were run on different systems and are either the average or best case times for those models.

Regardless of the time inconsistencies in measuring, MLP is still miles ahead of all the other models in terms of magnitude of time and also in Accuracy.

Conclusion:

MLP Claasifier performs the best and most optimally, giving us the best test accuracy i.e. 98.2% and it trains within 3 minutes or under 180 seconds.

------------------------------------------- X -------------------------------------------