Cross selling prediction

Problem Statement and Business Goal

Our client is an Insurance company that has provided Health Insurance to its customers. They need our help in finding out whether the policyholders (customers) from past year will also be interested in Vehicle Insurance provided by the company.

Therefore, the aim of this project is to build a model to predict whether a customer would be interested in Vehicle Insurance. This is extremely helpful for the company because it can then accordingly plan its communication strategy to reach out to those customers and optimise its business model and revenue.

The Dataset

The train.csv file contains the following variables:

Gender: Gender of the customer;
Age: Age of the customer;
Driving_License:
- 0 : Customer doesn't have DL;
- 1 : Customer already has DL.
Region_Code: Unique code for the region of the customer;
Previously_Insured:
- 0 : Customer doesn't have Vehicle Insurance;
- 1 : Customer already has Vehicle Insurance.
Vehicle_Age: Age of the Vehicle;
Vehicle_Damage:
- 0 : Customer didn't get his/her vehicle damaged in the past;
- 1 : Customer got his/her vehicle damaged in the past.
Annual_Premium: The amount customer needs to pay as premium in the year.
Policy_Sales_Channel: Anonymized Code for the channel of outreaching to the customer ie. Different Agents, Over Mail, Over Phone, In Person, etc.;
Vintage: Number of Days, Customer has been associated with the company;
Response:
- 0 : Customer isn't interested in Vehicle Insurance provided by the company;
- 1 : Customer is interested in Vehicle Insurance provided by the company. On the other hand, the test.csv file contains the same variables as train.csv except Response, which represents the target variable of our problem, namely the variable to predict.

Brief Summary

The first part of the project is dedicated to analyzing the relationships between the features and the target variable, and properly preparing the dataset for the second part of the project, which consists of building different models and choosing the best one, based on the value of precision and recall. In particular, we address the problem of class imbalance using different approaches, that are:

to give different weights to the majority and minority classes;
to delete instances from the majority class (undersampling);
to duplicate examples from the minority class (oversampling);
SMOTENC, i.e. an extension of the Synthetic Minority Oversampling Technique (SMOTE) that can be used also in the presence of categorical data among the features.

NOTE: I suggest to open the notebook with nbviewer at this link: https://nbviewer.org/github/CrisLap/Cross-selling-project/blob/main/Cross%20Selling%20Project.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Cross Selling Project.ipynb		Cross Selling Project.ipynb
LICENSE		LICENSE
README.md		README.md
response_estimate.csv		response_estimate.csv
test.csv		test.csv
train.csv		train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cross selling prediction

Problem Statement and Business Goal

The Dataset

Brief Summary

About

Releases

Packages

Languages

License

CrisLap/Cross-selling-project

Folders and files

Latest commit

History

Repository files navigation

Cross selling prediction

Problem Statement and Business Goal

The Dataset

Brief Summary

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages