Managing Retention and Churn with Azure and Databricks

End to End solution built with Azure Databricks and other Azure services (Synapse, Azure Function, Logic App, Power BI) to predict churns and so retain customers.
The KKbox datasets used are from Kaggle challenge: https://www.kaggle.com/c/kkbox-churn-prediction-challenge/data.
KKbox is a music streaming service.
Machine Learning models are from the Databricks blog: https://databricks.com/blog/2020/08/24/profit-driven-retention-management-with-machine-learning.html

Overview

Data have been manually copied to Azure Data Lake from Kaggle.
In a real scenario, data will be sourced from transactional systems.

Results

Use a Power BI dashboard to get meaningfull insights and prevent customers to churn

Prerequisites

Azure Subscription
Terraform (for deployment)
Powershell (for deployment)
Databricks Powershell Module (for the Databricks deployment part) : https://github.com/gbrueckl/Databricks.API.PowerShell
- You might need to bypass the execution policy to install this module
  - https://docs.microsoft.com/en-us/powershell/module/microsoft.powershell.security/set-executionpolicy?view=powershell-7.1
  - Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass
- Install-Module -Name DatabricksPS

Deployment

Run the following commands from the automatic deployment directory (run them from a powershell):

terraform init
terraform plan
terraform apply -auto-approve

Deployment will create all the resources part of the architecture.
In addition of creating resources, deployment will also:

Azure Databricks
- Upload notebooks in the Databricks Workspace
Azure Synapse Analytics
- Upload Sql scripts in Azure Synapse Analytics
- Create the Linked Services
- Create the Datasets
- Create the Pipelines
Azure Data Lake Storage
- Create the directories where raw data and predictions will be saved

From the resource-group tf file in the resource-group directory, you will find out what is the name of the resource group created. Per default, resource group name will start with e2e-churn-demo-

Step 1 Databricks setup

Create a cluster : https://docs.microsoft.com/en-us/azure/databricks/clusters/create
Mount the storage (created during the deployment) in Databricks
- You can use the notebook "Churn 00_Mount Storage" (uploaded automatically during the deployment) as a template
- The example relies on Azure Key Vault and Databricks Key Vault https://docs.microsoft.com/en-us/azure/databricks/security/secrets/secret-scopes
- https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/adls-gen2/azure-datalake-gen2-sp-access
Upload KKbox Dataset from Kaggle (https://www.kaggle.com/c/kkbox-churn-prediction-challenge/data) then run the notebook called "Load Data" (uploaded automatically during the deployment)

Step 2 Role assignment

https://docs.microsoft.com/en-us/azure/role-based-access-control/role-assignments-portal?tabs=current

Assign the storage blob data contributor role to e2e-churn-demo-3v8nqm (Synapse Workspace) for the storage e2echurndemostor3v8nqm
Assign the storage blob data contributor role to e2e-churn-demo-workspace-3v8nqm (Databricks Workspace) for the storage e2echurndemostor3v8nqm
- As for Databricks, You will have to create a Service Principal. It will be used to access the data lake.
- https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/adls-gen2/azure-datalake-gen2-sp-access

Step 3 Run the sql scripts in the Synapse Serverless Pool

When deploying the solution, SQL scripts are uploaded automatically.

Please run at least these 2 scripts in this order on the serverless pool:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/author-sql-script#run-your-sql-script

Create Database Demo
Create Master Key
- choose a Master Key Encryption password and run the script using the Database Demo
Run any SQL script left

Make sure to select the serverless Pool

Step 4 Check Linked Services in Azure Synapse

When deploying the solution, Linked Services are created automatically.

Please verify the connection for the following Linked Services:

Serverless Synapse
- Enter the password (check the synapse.tf file) and test the connection
Databricks
- Get a Databricks Token : https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/authentication#--generate-a-personal-access-token)
- Create a cluster in Databricks that will be used to run the Databricks Notebooks : https://docs.microsoft.com/en-us/azure/databricks/clusters/create
- Enter these information and test the connection

Step 5 Azure Function (optional)

An Azure function is automatically created once deployment is complete.
You will have to implement the code. In this case, bind the function to a queue.
In this architecture we are sending a message to a queue (predictionChurning) to trigger a logic app workflow. The logic app will send an email with a Power BI dashboard attached containing information about potential customer churns.

https://docs.microsoft.com/en-us/azure/azure-functions/functions-bindings-http-webhook

Step 6 Azure Logic App (optional)

You can implement easily different type of logic with logic app.
In this architecture, we use it to automatically refresh the Power BI report then send an email with the dashboard attached.
https://docs.microsoft.com/en-us/azure/logic-apps/logic-apps-overview
You might need either Power BI premium or enable Power BI embedded to make the Power Bi part working

https://docs.microsoft.com/en-us/powerapps/maker/portals/admin/set-up-power-bi-integration#:~:text=%20Enable%20Power%20BI%20Embedded%20service%20%201,center.%20Go%20to%20Set%20up%20Power...%20More%20

Step 7 Run the different pipelines

You can create the model directly in Databricks or use the pipeline Churning Model Creation in Azure Synapse
- You will have to mount the storage and have the KKbox dataset in your datalake.
You can run the prediction notebook in Databricks or use the pipeline Churning predictions in Azure Synapse
You can run the notification pipeline in Azure Synapse to trigger the Logic app workflow and receive the Power BI report automatically in an email.
- You can also visualize directly the report in Azure Synapse.
- https://docs.microsoft.com/en-us/azure/synapse-analytics/quickstart-power-bi

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
automatic deployment		automatic deployment
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Managing Retention and Churn with Azure and Databricks

Overview

Results

Prerequisites

Deployment

Step 1 Databricks setup

Step 2 Role assignment

Step 3 Run the sql scripts in the Synapse Serverless Pool

Step 4 Check Linked Services in Azure Synapse

Step 5 Azure Function (optional)

Step 6 Azure Logic App (optional)

Step 7 Run the different pipelines

Resources

About

Releases

Packages

Languages

License

MediaEntertainmentLabs/Churn-Retention-Demo

Folders and files

Latest commit

History

Repository files navigation

Managing Retention and Churn with Azure and Databricks

Overview

Results

Prerequisites

Deployment

Step 1 Databricks setup

Step 2 Role assignment

Step 3 Run the sql scripts in the Synapse Serverless Pool

Step 4 Check Linked Services in Azure Synapse

Step 5 Azure Function (optional)

Step 6 Azure Logic App (optional)

Step 7 Run the different pipelines

Resources

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages