Skip to the content.

Project-1: Understanding Churn in SME Customers of Utility Company: Exploring the Role of Price Sensitivity and Developing a Predictive Model for Discount Incentive

Exploring churn in SME customers of Utility Company, we investigated the impact of price sensitivity and developed a predictive model to identify customers at risk. Our findings revealed the need for feature engineering and model optimization to improve churn prediction accuracy and guide targeted discount incentives.

Following are the proposed steps for addressing this challenge.

Hypothesis Formulation:

The hypothesis we aim to test can be formulated as follows:

“Price changes significantly influence the likelihood of churn among SME customers of our client.”

Major Steps to Test the Hypothesis:

1. Data Collection:

The client has sent over some data which includes:

Description of all the data is as follows:

client_data.csv

price_data.csv

2. Exploratory Data Analysis:

After conducting a thorough analysis of the data, following are some patterns that were identified.

Following are some more visualization to help us understand the data.

Now in order to test our hypothesis that price sensitivy has a major influence on churn, we can create a correlation plot to identify how closely churn is related to price data.

From the correlation plot, it shows a higher magnitude of correlation between other price sensitivity variables, however overall the correlation with churn is very low. This indicates that there is a weak linear relationship between price sensitity and churn. This suggests that for price sensivity to be a major driver for predicting churn, we may need to engineer the feature differently.

3. Feature Engineering:

We now has a good understanding of the data and we can use the data to further understand the business problem. We need to brainstorm and build out features to uncover signals in the data that could inform the churn model.

Some features that were built are as follows.

4. Model Development:

We will use ‘Random Forest classifier’ to train our data.

Some advantages of the random forest classifier include:

On the flip side, some disadvantages of the random forest classifier include:

5. Model Evaluation:

Let’s evaluate how well this trained model is able to predict the values of the test dataset.

We are going to use 3 metrics to evaluate performance:

Following are some interpretation from the results.

6. Interpretation and Recommendations:

Overall, we’re able to very accurately identify clients that do not churn, but we are not able to predict cases where clients do churn! What we are seeing is that a high % of clients are being identified as not churning when they should be identified as churning. This in turn tells me that the current set of features are not discriminative enough to clearly distinguish between churners and non-churners.

we need go back and investigate feature engineering to try and create more predictive features. We may also experiment with optimising the parameters within the model to improve performance.

In the Random Forest classifier, we’re able to extract feature importances using the built-in method on the trained model. Let’s investigate the feature importance.

From this chart, we can observe the following points:

* Net margin and consumption over 12 months is a top driver for churn in this model.
* Margin on power subscription also is an influential driver.
* Time seems to be an influential factor, especially the number of months they have been active, their tenure and the number of months since they updated their contract
* Our price sensitivity features are scattered around but are not the main driver for a customer churning

The last observation is important because this relates back to our original hypothesis:

Is churn driven by the customers’ price sensitivity?

Based on the output of the feature importances, it is not a main driver but it is a weak contributor. However, to arrive at a conclusive result, more experimentation is needed.