Dhairya Kothari¶

Implement a multi-layer perceptron to classify the MNIST data that we have been working with all semester. Use MLPClassifier in sklearn.¶

from scipy.stats import mode
import numpy as np
#from mnist import MNIST
from time import time
import pandas as pd
import os
import matplotlib.pyplot as matplot
import matplotlib
%matplotlib inline

import random
matplot.rcdefaults()
from time import time
from IPython.display import display, HTML
from itertools import chain
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score
import seaborn as sb
from sklearn.model_selection import ParameterGrid
from sklearn.neural_network import MLPClassifier
import warnings
warnings.filterwarnings('ignore')

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data/')

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz

train = mnist.train.images
validation = mnist.validation.images
test = mnist.test.images

trlab = mnist.train.labels
vallab = mnist.validation.labels
tslab = mnist.test.labels

train = np.concatenate((train, validation), axis=0)
trlab = np.concatenate((trlab, vallab), axis=0)

mlp = MLPClassifier()
mlp.fit(train, trlab)

MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=(100,), learning_rate='constant',
       learning_rate_init=0.001, max_iter=200, momentum=0.9,
       nesterovs_momentum=True, power_t=0.5, random_state=None,
       shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
       verbose=False, warm_start=False)

accuracy_score(tslab, mlp.predict(test)) # Test Accuracy

0.9778

(a) Find a good combination of parameter values for the MLPClassifier that provides the best accuracy on the 10,000 test images. Describe your parameter choices and why you believe these values are good choices.¶

We choose Alpha and Max_iter as the parameter to run the model on and select the best from those.

According to Scikit Learn- MLP classfier documentation,
Alpha is L2 or ridge penalty (regularization term) parameter.
Max_iter is Maximum number of iterations, the solver iterates until convergence.
So, these ones makes sense as we could see some changing values of accuracy while tuning them,

i = 0
df = pd.DataFrame(columns = ['alpha','max_iter','train_acc','test_acc','train_time'])
for a in [0.00001,0.0001,0.001,0.01, 0.1, 1, 10]:
    for mi in [10,100,200,500,1000,2000]:
        st = time()
        mlp = MLPClassifier(alpha=a, max_iter=mi)
        mlp.fit(train, trlab)
        end = time() - st
        
        acc_tr = accuracy_score(trlab, mlp.predict(train)) # Train Accuracy
        acc = accuracy_score(tslab, mlp.predict(test)) # Test Accuracy
        df.loc[i] = [a,mi,acc_tr,acc,end]
        i=i+1

df # Results

(b) Using the parameters you found in part (a) present a plot that compares the test accuracy vs. the number of nodes in the hidden layer. Comment on your results.¶

As we can see that Model number 26 gives the best testing accuracy so parameters we select are
Parameters:
Alpha = 0.1
Max Iterations = 200

acc = []
acc_tr = []
timelog = []
for l in [10,20,50,100,200,500,1000]:
    t = time()
    mlp = MLPClassifier(alpha=0.1, max_iter=200, hidden_layer_sizes=(l,))
    mlp.fit(train, trlab)
    endt = time() - t
        
    a_tr = accuracy_score(trlab, mlp.predict(train)) # Train Accuracy
    a = accuracy_score(tslab, mlp.predict(test)) # Test Accuracy

    acc_tr.append(a_tr)
    acc.append(a)
    timelog.append(endt)

l = [10,20,50,100,200,500,1000]
N = len(l)
l2 = np.arange(N)
matplot.subplots(figsize=(10, 5))
matplot.plot(l2, acc, label="Testing Accuracy")
matplot.plot(l2, acc_tr, label="Training Accuracy")
matplot.xticks(l2,l)
matplot.grid(True)
matplot.xlabel("Hidden Layer Nodes")
matplot.ylabel("Accuracy")
matplot.legend()
matplot.title('Accuracy versus Nodes in the Hidden Layer for MLPClassifier', fontsize=12)
matplot.show()

Inferences from the graph:¶

So as we go from left to right on this graph, the bias decreases and variance increases
As the nodes increases there is an increase in accuracy
That stops at 500 nodes, then we see overfitting kicking in
The accuracy drops after the nodes increases above 500 in the hidden layer
Giving us the optimum nodes = 500 for this problem
From the below graph, we can also see that as the nodes increases naturally the time increases as well

l = [10,20,50,100,200,500,1000]
N = len(l)
l2 = np.arange(N)
matplot.subplots(figsize=(10, 5))
matplot.plot(l2, timelog, label="Training time in s")
matplot.xticks(l2,l)
matplot.grid(True)
matplot.xlabel("Hidden Layer Nodes")
matplot.ylabel("Time (s)")
matplot.legend()
matplot.title('Training Time versus Nodes in the Hidden Layer for MLPClassifier', fontsize=12)
matplot.show()

Final model parameters for highest test accuracy:
Alpha = 0.1
Max Iterations = 200
Hidden Layer Nodes = 500

(c) How does the accuracy of your MLP classifier compare to what you found with KNN, Naïve Bayes, Logistic Regression, and SVM on this data set? How does the training time of the MLP classifier compare to the others?¶

Accuracy listed belongs to the best parameters in each model.

Model	Accuracy	Training Time
MLP	~98.2 %	~ 180 s
kNN	~97.17 %	24 minutes
NB	81.49 %	12 minutes
Log Reg	~87.77 %	2 hrs
SVM	~96.4 %	18 minutes

*Note: the training time to be taken with a grain of salt as all the times reported from the previous models were run on different systems and are either the average or best case times for those models.

Regardless of the time inconsistencies in measuring, MLP is still miles ahead of all the other models in terms of magnitude of time and also in Accuracy.

Conclusion:¶

MLP Claasifier performs the best and most optimally, giving us the best test accuracy i.e. 98.2% and it trains within 3 minutes or under 180 seconds.

	alpha	max_iter	train_acc	test_acc	train_time
0	0.00001	10.0	0.989533	0.9751	9.513362
1	0.00001	100.0	1.000000	0.9792	35.246778
2	0.00001	200.0	0.999950	0.9782	35.320454
3	0.00001	500.0	1.000000	0.9787	42.534170
4	0.00001	1000.0	0.999650	0.9764	35.734075
5	0.00001	2000.0	0.999983	0.9781	39.468011
6	0.00010	10.0	0.989417	0.9758	9.325813
7	0.00010	100.0	0.999900	0.9782	35.416230
8	0.00010	200.0	0.992400	0.9709	44.222664
9	0.00010	500.0	0.998633	0.9764	40.425065
10	0.00010	1000.0	0.994300	0.9728	38.569622
11	0.00010	2000.0	0.999983	0.9776	43.423535
12	0.00100	10.0	0.988950	0.9743	9.373941
13	0.00100	100.0	0.999950	0.9781	33.523193
14	0.00100	200.0	0.999917	0.9795	31.724409
15	0.00100	500.0	0.999633	0.9752	42.274478
16	0.00100	1000.0	0.999400	0.9778	31.284236
17	0.00100	2000.0	1.000000	0.9782	37.767680
18	0.01000	10.0	0.987850	0.9761	9.285706
19	0.01000	100.0	0.999000	0.9779	36.310609
20	0.01000	200.0	0.998700	0.9771	43.446597
21	0.01000	500.0	0.999167	0.9779	36.770835
22	0.01000	1000.0	0.999500	0.9796	40.686252
23	0.01000	2000.0	0.997367	0.9775	36.911208
24	0.10000	10.0	0.983750	0.9731	9.193460
25	0.10000	100.0	0.990917	0.9787	32.303950
26	0.10000	200.0	0.992667	0.9802	33.043918
27	0.10000	500.0	0.991667	0.9787	37.553917
28	0.10000	1000.0	0.992283	0.9779	33.461008
29	0.10000	2000.0	0.991833	0.9797	36.753789
30	1.00000	10.0	0.953550	0.9536	9.187445
31	1.00000	100.0	0.957850	0.9568	25.442695
32	1.00000	200.0	0.959267	0.9581	26.905587
33	1.00000	500.0	0.957533	0.9601	24.393637
34	1.00000	1000.0	0.958333	0.9589	25.306217
35	1.00000	2000.0	0.958250	0.9568	43.656156
36	10.00000	10.0	0.882517	0.8880	9.103221
37	10.00000	100.0	0.877850	0.8825	20.827414
38	10.00000	200.0	0.881017	0.8848	22.719449
39	10.00000	500.0	0.882133	0.8890	15.773970
40	10.00000	1000.0	0.880600	0.8878	15.246566
41	10.00000	2000.0	0.883083	0.8910	24.275589