SANTANDER BANK

PRODUCT RECOMMENDATION SYSTEM

Dhairya kothari, Michigan Technological University

rahul Gowla, Michigan Technological University

Santander bank is wholly owned subsidy of the Spanish Santander group. Santander bank offers a lending hand to their customers through personalizing their product recommendations, they want to improve their customer’s satisfaction by recommending appropriate products based on their previous purchases. With this recommendation system, they can meet each individual customer needs and ensure their satisfaction. In the project, we mainly concentrated on building User Based Collaborative Filtering because this gives more accurate results when compared with other method. Brief idea on solving or developing this recommendation system (UBCF) follows collecting the required amount of data for the analysis, data preprocessing, Implementing UBCF, conclusion. Results can be evaluated by checking the accuracy of this model.

Concepts: Datamining → Recommender systems → Collaborative filtering →Using Jaccard distance

Key Words: Recommender systems, User Based Collaborative Filtering (UBCF), Jaccard distance, Product Recommender system, R programming

ACM Reference format:

Ben Trovato, G.K.M. Tobin, Lars Thřrvӓld, Lawrence P. Leipuner, Sean Fogarty, Charles Palmer, John Smith, and Julius P. Kumquat. 1997. SIG Paper in word Format. ACM J. Comput. Cult. Herit. 9, 4, Article 39 (March 2010),4 pages.

DOI: 10.1145/1234

 

1 INTRODUCTION

Recommender systems are very important when considered for business value, these systems will pull money for the companies. These days’ companies such Amazon, Netflix and many companies are using recommender systems because they produce actual results which helps their customers by saving a lot of time.

In this project, we are provided with 1.5 years of customer’s behavior in purchasing the products of Santander bank. The challenge is to predict the products which the costumers may buy in future. To be precise data from date 2015-01-28 to 2016-05-28 is provided in addition to this we need to predict what products those customers may purchase on 2016-06-28 (last month). As we will be predicting the products for customers this problem is classified as recommender system.

 

 

2 BACKGROUND

 In recommender systems, based on what we need to predict, system can be designed as content-based systems or collaborative filtering systems. Content-based systems are systems which examine the properties of the item and recommend same item with similar feature. On the other hand, collaborative filtering systems are systems which recommend based on the similarity measure between the items or users and recommend those items. In our project as we need to predict the products, which the customers will purchase in the future, we will be building a User based collaborative filtering model. We will be briefly dealing with association rules in which we mine the relation between variables. For example, if customer buys product A then there are high chances that customer even buys product B. We will be strongly finding the relation between the variables. Generally, UBCF finds the users with same behavior and recommends products based on their behavior. Here distance is been computed between the users and the minimum distance users will be recommended with same products. For easy understand we will walk through a simple example on how UBCF works. Consider figure 2. in this we can see the items, which the users purchased in the past.

 

customer

Item 1

Item 2

Item 3

Item 4

Item 5

Item 6

A

X

X

X

 

X

X

B

X

 

X

X

X

X

C

 

 

X

X

X

 

 

Figure 2

 

In this example, we can see that user A and user B have similar behavior and that behavior is computed using various methods but in our project, we used Jaccard distance similarity because it does not work with weights it will consider only two cases yes if a product is bought and No product is not purchased. As in our dataset we need to predict the product it makes sense to use Jaccard distance similarity upon other methods.

 

3 BUILDING THE MODEL

 In this project, we programmed this model using R programming language. In R tool we have a package calle “recommender lab” which deals with recommender systems. Developing recommender system followed these steps

 

3.1 Collecting the required amount of data for the analysis

3.2 Data Preprocessing

3.3 Implementing UBCF

3.4 Evaluation

3.1 Collection of dataset

 

3.1 COLLECTION OF DATASET

Data is being provided by kaggle.com, a hub for datasets. Initially downloading this dataset was the challenging task as it is huge dataset (approx. 3GB) it was really hard to download and read the file. Excel cannot deal with such a huge file we tried many methods and finally found a library in R which can deal with large datasets. R contains a library called “fread” under data.table package, which was helpful to handle larger datasets.

 

 

3.2DATA PREPROCESSING

In the dataset, we did not consider all the variables we neglected certain variables, which are not significant, and we used only few of them for our prediction. When we check the distribution of age, variable it’s interesting that this dataset is having bimodal peaks one peak around 18 to 20 of age and another 45 to 50 of age group (Figure 3.2). When data points are considered we have a total of 13647309 rows and 48 variables, which is very huge. As it is complex to run our model on entire dataset, we tried to sample it. We have a column called user id which is the unique customer ID given to each customer in the Santander bank. We found that on an average there were 625457 customers in each month and the customer’s behavior was reflected in every month. For our convenience, we randomly sampled these customers from one month and chosen only those customers in all the months of our dataset. to be specific we took a sample of 100000 customers in each month so that data will be balanced and it will not be biased to one direction.

 

Figure 3.2

3.3   IMPLIMENTING UBCF

In this project, we tried implementing User Based Collaborative Filtering method. UBCF is a method where we compute the similarity between the users and finds the users with similar behavior and recommend products based on their past purchases.

In R programming, we have a package called recommend lab, which deals with recommender systems. Using this package, we can easily build a UBCF model. We can specify the distance calculating method within the parameters. Here we used Jacard similarity to compute the similarity between users.

 

 

 


4 RESULTS/EVALUATION

Evaluating results was really a tough task. As our data contains huge samples we calculated our traditional accuracy in samples. First, we considered the products purchased in the May month as testing data. We observed there were more than 800000 observation in this, which was difficult to compute we sampled 5000 samples (such that those represent the medians points of nearest data points, in turn nearly representing the complete data set) with nearest neighbor 5 and computed the traditional accuracy, by what we recommended and what the customer bought anyways without the system. We ended up with 22 % this is decent accuracy because we just executed our model on small fraction of medians of the dataset. If we consider 10 nearest neighbors with enormous number of samples then the accuracy will increase further. Based on the model we will recommend products to the customers.

We also suggest that more accurate performance metrics be implemented to track the system performance, like number of clicks or people actually buying the products after implementing the system and % increase in sale. 

 

In this project, the main challenge we faced was with the dataset. As it was massive we couldn’t read it directly into R tool. We had to do lots of preprocessing. Moving on we could recommend product to the customers in the May month accurately. We even observed the histogram of the customers where we could observe that age group between 18 to 25 and 40 to 50 has the maximum number accounts and are performing maximum number of transactions. Concentrating on these age group customers and recommending products to them will surely boost up the companies’ profits.

As our data set is quite large and code execution takes significant time, for convenience, we have saved the RatingMatrix.csv (dimension: user X products) and the final recommendations to the test Users in Recommendation.csv.

 

 

5 FUTURE WORK

Cold start: in this project, we recommended products based on their purchases history compared with other users with similar behavior. But if there are any new customers in the dataset he will not be having any history and it will be hard to recommend products to him. This is called cold start problem. In this project, we didn’t deal with cold start problem which can be done in future.

 

REFERENCES

[1]

https://www.kaggle.com/c/santander-product-recommendation

[2]

http://www.snet.tu-berlin.de/fileadmin/fg220/courses/SS11/snet-project/recommender-systems_asanov.pdf

[3]

https://recsys.acm.org/recsys16/

[4]

Collaborative feature-combination recommender exploiting explicit and implicit user feedback. Markus Zanker and Markus Jessenitschnig University Klagenfurt, Intelligent Systems and Business Informatics Research Group Klagenfurt, Austria.

[5]

https://www.analyticsvidhya.com/blog/2016/01/xgboost-algorithm-easy-steps/

[6]

Recommender systems handbook by Lior Rokach and Bracha Shapira

  [7] P. Resnick and H. R. Varian, “Recommender systems,” Commun. ACM, vol. 40, pp. 56–58, March 1997.