Predictive Machine Learning for Telco Customer Churn

29 May 2023

Project

by Muhammad Reyhan Arighy Data Scientist

Business Understanding

Identification

In the initial stage, the process involves identifying issues within the business context that require actions or solutions to be achieved. This process involves evaluating a business to identify areas that need improvement in order to achieve the desired business goals.

Market Research

Currently, telecommunications service customers have numerous options and can easily switch subscriptions from one service provider to another. Many customers frequently switch subscriptions due to the various promotions offered by different telecommunications service providers. The provider that offers the best service and the most competitive prices will be the customer's choice.

Marketing practitioners in this industry strive to prevent customers from switching to competing companies. Why is that? Because acquiring new customers is far more expensive than retaining loyal existing customers. Therefore, retaining existing customers is a high priority compared to acquiring new customers. This aligns with the journal article Reichheld, F.F. and Sasser, E. (1990) Zero Defections: Quality Comes to Services. Harvard Business Review, 68, 105-111.

However, retaining customers is not an easy task either. A common approach is to provide special pricing packages or bonuses to customers to prevent them from being tempted to switch to a competing company. However, if such offers are extended to all existing customers, the cost can become expensive as only a small fraction of customers tend to churn (unsubscribe) in general. There's no urgency to provide special offers or bonuses to loyal customers, as they will likely remain customers even without them.

A more effective approach is to ensure that special offers or bonuses are only given to specific customers known to have a propensity to churn. Since these offers are targeted specifically to certain customers, the cost required for promotions becomes lower.

Churn Prediction is one widely used in the industry with the aim of identifying which customers are likely to unsubscribe and understanding the symptoms or signs that arise. By paying attention to these signs, customers with a high likelihood of churning can be contacted and subsequently offered special pricing packages to prevent them from actually unsubscribing.

Case Analysis and Goals

In this case, a predictive model will be developed for a telecommunications company that offers internet services. Many customers switch subscriptions to competing companies due to more attractive pricing and services, which impacts the company's revenue loss and potentially leads to customer dissatisfaction. The company's management recognizes this issue and plans to launch promotional programs to mitigate churn rate. These promotional programs will only be offered to a group of customers deemed susceptible to churn. To make this more effective, Machine Learning is required to identify this customer group.

The goal of this predictive model is to generate a churn score for each customer, indicating whether they are predicted to unsubscribe or not. This predictive model will use predictors based on patterns of customer internet service usage on the company's network. The prediction results will then be translated into appropriate actions, as described in the preceding paragraph, in order to reduce churn, enhance customer satisfaction, and increase revenue and profitability for the company.

How can the created prediction discern customer usage patterns? An individual's past behavior can serve as a reference point for understanding their future behavior. These behaviors are what will be analyzed from the available data. By identifying the signs that indicate someone is likely to churn, the company can take measures to prevent them from actually unsubscribing.

Ultimately, the results of this model will be used by the company's marketing division to offer special packages to customers with a yes churn score, with the aim of preventing them from switching to competing companies.

Data Understanding

Dataset Information

In the early stages, the information contained will be described in more depth to understand its characteristics. The set of data provided can be freely accessible for the research purposes. Please directly go through this link for the dataset source. Each row the data contained consists of 11 columns, each of which contains information as follows:

Dataset represents customer profiles, both those who have stopped and those who are still subscribed. Whether or not customers switch to the various services offered is a condition in the past that can be used as an indicator to predict whether current customers tend to churn or not.

Explanatory Data Analysis

Monthly Charges and Tenure Relationship

tenure vs monthly charges based on customer churn

Based on the visualization results, it appears that there is a linear relationship between the Tenure and Monthly Charges features for churn customers although the resulting correlation is not so significant. That means that customers who churn are those whose subscription duration is longer, the fees charged will be more expensive. The assumptions that can be concluded can be described as follows:

It's normal that the longer the subscription, the more expensive the rate paid will be due to the end of the promo period, so this applies both to churn customers and those who don't.
The longer the subscription, the increase in the rate charged is not proportional to the increase in the perceived quality of service.
Customers whose promotion period has expired will consider this as a loss. In economics, this phenomenon is known as behavioural economics which explains that humans tend to avoid more losses (loss aversion) than seek profits (gain seeking).

The solutions offered include:

Communicate clearly to customers regarding the promo period which will expire when approaching the specified deadline so that it is not considered a unilateral decision when the tariff charged will increase to the normal price.
Providing membership programs or advanced promotions to maintain and improve good relations between the company and customers.
Increasing the value of services can increase customer retention, including in the form of improving service quality, new features or advantages, and loyalty programs as a form of incentive so that customers remain loyal in using the company's services.

Internet Services

tenure and monthly charges distribution on internet service

It can be seen that fiber optic service has a higher price range than DSL. This is because fiber optic uses more sophisticated technology and more expensive installation costs. On the other hand, there are customers who do not use the internet service but are still charged a monthly fee as a basic fee so that the current contract remains active. There are several assumptions that can be described regarding the characteristics of these customers, including:

The customer may have decided to temporarily stop using the internet service for certain reasons, such as being too expensive.
The subscriber may have just registered for a subscription but has not yet activated the internet service.
Customers may not have access to the internet service at their place of residence so they decide not to use any service even if they have to pay a monthly fee because the contract duration has not ended. It can be seen that most of those who churn are those whose contract duration is under 7 months. This means that once the contract ends, customers who experience this situation will not renew their subscription contract, especially those whose contract duration is on a month-to-month basis.
A customer may use the services of a competitor but still retain the existing contract for some reason. This can be termed as multi-channel consumer which refers to customers who utilize many service channels but are not loyal in using one particular product on a regular basis. In the industry, customers with these characteristics are often the target of marketing and customer retention strategies due to their potential to increase sales and customer loyalty.

In this case, the company must be able to identify the factors that cause customers not to use the internet service, such as bills that are not proportional to the quality of service, and provide the right solution to meet their needs. This will help retain customers as active customers and increase customer satisfaction.

Regarding customers who use the internet service, based on the DSL it appears that most of the churn are those whose monthly bills are in the range of 40 to 60 while for the fiber optic class most of those churn is between 80 and 100.

Based on the number of users, fiber optic service is ranked first, followed by DSL. In addition, the rest do not use both services. Users of DSL services have a much lower ratio of subscribers churn than fiber optic services. Therefore, it is necessary to pay special attention to users of fiber optic services to find out what factors cause a quite large churn subscriber ratio.

In accordance with previous insights, the rates charged for fiber optic services tend to be more expensive than DSL because they use more sophisticated technology and are also more expensive in terms of cost. Even so, an evaluation is still needed whether the quality provided is commensurate with the rates charged. In addition, there are other alternatives such as DSL which may be sufficient to meet the needs and according to customer preferences. The solutions offered for this problem include:

Offer customers who churn to use other alternatives that suit their needs so customers don't have to switch services to competitors.
Provide a clear and thorough understanding objectively regarding the advantages and disadvantages of each service. This is done so that customers are not disappointed with service recommendations that are not on target and tend to push selling.
Provide a clear and thorough understanding objectively regarding the advantages and disadvantages of each service. This is done so that customers are not disappointed with service recommendations that are not on target and tend to push selling.
Re-evaluate the rates charged for fiber optic services.

Additional Services

Based on a brief observation on the histogram above, it is known that the proportion of customers who do not activate the internet service but still subscribe is much higher than the proportion of customers who churn. The difference ranges up to 13 times more. This applies to all additional services features because the aggregation of the number of subscribers will automatically be isolated from the Yes or No class to the No internet service class.

On the one hand, customers who use internet services but do not activate additional services, the proportion of churn customers is far greater than those who still activate additional services. That means churn customers are mostly those who use internet service but do not activate additional services. Even so, there are still other cases where customers who activate additional services will eventually churn. The proportion of customers who churn can be described as follows:

In the online security feature, it is found that the proportions for the No and Yes classes are 66 % and 15 % respectively.
In the online backup feature, it is found that the proportions for the No and Yes classes are 63 % and 23 %, respectively.
In the device protection feature, it is found that the proportions for the No and Yes classes are 58 % and 27 % respectively.
In the tech support feature, it is found that the proportions for the No and Yes classes are 64 % and 17 % respectively.

Based on the findings above, there are several reasons why customers who do not use additional services have a very high churn tendency. Among them are the ignorance of customers regarding the availability of additional services, the presence of additional costs that do not want to be charged, the need for additional services, and the presence of obstacles in activating additional services. Solutions that can be offered include the following:

Offer a more complete and attractive service package at competitive prices along with discounts and special promotions so that customers will experience added value to the quality of services provided. With better service quality, customer retention will increase so that there will be many considerations if switching services to competitors.
Provide education and more detailed information regarding the benefits and advantages of using additional services.
Improving the quality and reliability of additional services so that customers will feel satisfied and trust to use them.
Evaluate the price of each additional service feature with reference to customer needs and preferences.

The findings above can be used as material for evaluation by the company in preparing future actionable plans. Regarding the proportion of customers who don't churn, the number still dominates across classes for every observed feature. This is in line with the statement regarding Market Research section, namely the number of customers that churn is indeed only a small portion. Therefore, the analysis carried out will be focused on exploring insight related to churn customers.

Based on the violinplot visualization above, general information is obtained regarding the density or distribution of customers who churn or do not for the characteristics of each class in the four features. The information described will focus on identifying churn customers, namely as follows:
Based on the violinplot visualization above, general information is obtained regarding the density or distribution of customers who churn or do not for the characteristics of each class in the four features. The information described will focus on identifying churn customers, namely as follows:
Based on the violinplot visualization above, general information is obtained regarding the density or distribution of customers who churn or do not for the characteristics of each class in the four features. The information described will focus on identifying churn customers, namely as follows:
Based on the violinplot visualization above, general information is obtained regarding the density or distribution of customers who churn or do not for the characteristics of each class in the four features. The information described will focus on identifying churn customers, namely as follows:

Customers who do not activate additional services, whether they use the internet service or not, have a tendency to churn when their subscription duration is below 20 months. This is in line with the initial assumptions in the Monthly Charges and Tenure Relationship section. Therefore, this assumption has a fairly good level of validity in interpreting churn customer factors.
For customers who activate additional services, the duration of the churn customers is spread wider so that it will be more difficult to detect their tendency to churn.
For the monthly bill that is charged, those who do not use the internet service are charged a very cheap bill compared to others. The currency used is not known, so this feature will only use units. The density is centered around the number 20 for those who churn or not.
For those who do not activate additional services, most of those who churn are those who pay a monthly bill of between 70 and 100 for each additional related feature or service.
For those who activate additional services, have a wider range of monthly billing values compared to those who do not activate. Thus customers who activate additional services will be more difficult to detect because the range of values is wider.

This interpretation can be used as a reference for companies to predict churn customers so that they can help find the right preventive solutions in preparing actionable plans in the future.

Additional Features

Based on the visualization above, it can be seen that the dependents feature has a large churn portion of customers for the No class. The number is large enough that almost half of the total subscribers are still subscribed. On the other hand, the portion of subscribers for the Yes class has a ratio of 1 to 7 of subscribers who each subscribe. This raises the assumption that customers who have dependents tend to churn less than customers who do not.

Based on the duration of the subscription, it appears that subscribers who do not have dependents tend to churn faster than those who do not. On the other hand, based on the monthly bills charged, it appears that both customers who churn and those who do not have a different but insignificant distribution of value.

The solution that can be offered is that companies can consider offering discount programs or special promotions for customers with this profile. Psychologically, customers who have no dependents may feel less attached to the internet service used because there are no dependents to consider in their decision. Therefore, they will be more free to switch services to competitors if they feel dissatisfied with the services provided.
Based on the visualization above, it can be seen that the contract feature has a large churn customer portion for the month-to-month class. The amount is large enough to exceed half of the total subscribers who are still subscribed. On the other hand, customers who take the duration of the one-year contract have a churn portion with a ratio of 1 to 16 from customers who each subscribe. Then, for customers who take the duration of the contract two-year it has a churn portion with a ratio of 1 to 125 from customers who are still subscribed. This leads to the assumption that customers who take month-to-month contract durations result in a much greater propensity to churn than customers who take other contract durations.

Based on the subscription duration, it is seen that customers who take the month-to-month contract duration tend to churn the fastest, followed by one-year contract duration and then two-year. On the other hand, based on the monthly invoices charged, it can be seen that two-year contracts have the highest central tendency, followed by one-year and month-to-month contracts.

The solution that can be offered is that the company can provide special services for the duration of one-year and two-year contracts in the form of network maintenance guarantees so that customers can consider using the service in the long term. This needs to be done so that customers feel guaranteed about the quality of the service that will be used so they will not take the duration of the month-to-month contract on a trial and error basis or in other terms referred to as trial customers.
Based on the visualization above, it can be seen that the paperless billing feature has a large portion of churn customers for the Yes class. The number is large enough that almost half of the total subscribers are still subscribed. On the other hand, customers who use the paper-based billing method have a churn portion with a ratio of 1 to 6 from subscribers who each subscribe. This raises the assumption that customers who use the paperless billing method have a greater tendency to churn than customers who use the paper-based billing method.

Based on subscription duration, it appears that customers using the paper-based billing method tend to churn faster than those using the paperless billing method. On the other hand, based on the monthly bills charged, it is seen that customers who use the paperless billing method have a relatively relatively non-significantly different tendency from the paper-based billing method and produce a narrower spread.

Some of the factors that cause bills using the paperless method tend to make customers churn are due to security risks regarding personal financial information and a lack of supervision regarding billing information that is charged. This can be overcome by building a closer relationship with the customer so that they know if there is a change in the rates of the services used so that bad prejudice will not arise, such as the company being considered to have decided unilaterally to increase the price of services for no apparent reason. In addition, from a security point of view, the company must assure customers that personal information, such as financial, will not be misused and the company complies with applicable regulations and laws governing the security of customer personal data.

Classification Model

Feature Importances

It can be seen that there are 3 most influential features in predicting customers who churn or not, including tenure, contract, and fiber optic in internet service respectively. The other features have very little contribution in determining the final prediction result.

This Shapley value basically only measures the average contribution of each feature from all observations and predictions made by algorithmic model which have gone through resampling and hyperparameter tuning processes. As a result, the model will have limitations in interpreting some of the features that are considered not so important for their contribution but participate in producing the probability prediction of an observation. This is reasonable because there are some observations that behave out of character in general so that they fall into edge cases. Although basically, the number of observations is not that many or significant.

Predictor and Target Relationship

It can be seen that there is a clear relationship regarding the quantified probability based on the Shapley value. The explanation can be described as follows:

A tenure feature with a greater value, indicated by a redder color, has a smaller Shapley value and vice versa, with a smaller tenure feature value, the Shapley value will get bigger. This represents that customers with a short subscription duration, or in other words, new customers, have a high tendency to churn. This statement is in line with the insights obtained in the previous discussion in the Exploratory Data Analysis section. Even so, there are some cases that are difficult to predict as seen from the Shapley value which is in the zero value range. In the graph, it can be seen that some of the tenure feature values, whether high, medium or low, converge in the zero mean area.
The contract feature with an increasing value, respectively month-to-month, one-year and two-year, has a tendency to continue to subscribe based on output of negative Shapley value. Negative values will contribute to the overall resulting probability value so that the value will be smaller and the probability of churn will be smaller. This statement is in line with the insights obtained in the previous discussion in the Exploratory Data Analysis section. It can be seen that there is a clearer distribution separation of Shapley values where the month-to-month category is in the positive area, while the one-year and two-year categories are in the negative area. This means that customers who take month-to-month contracts will be considered by the algorithm model as causing churn while those who take others will be considered as causing customers to be loyal.
Fiber optic in internet service feature is categorical in nature, which can be represented as a customer using the fiber optic service will have a tendency to churn on a relatively insignificant scale. This can be used as an additional evaluation for the company to identify other related factors such as service quality and prices that are deemed inappropriate for the service.
Other features can be described in a similar way but will not be discussed in more depth considering that the contribution made to the churn probability of a customer is very small.

The explanation above can be used as a form of generalization and does not have to happen in every local instance or observation. Next, a visualization will be carried out to see the relationship between Shapley values in each value of the monthly charges, tenure and contract features based on customers who churn and are still subscribing.

shap and 3 most important features relationship

It can be seen that the subscription duration is directly related to the type of contract used so customers who take a month-to-month contract will have a greater tendency to churn compared to one-year and two-year contracts. In addition, as we've known previously, customers who use fiber optic tend to churn yet this circumstances don't apply in general.

In the context of the interpretability of the algorithm model, the Shapley value measures the contribution of each feature in making predictions in each observational data. This method is done by distributing credit or importance from each feature among other features fairly. Key idea behind Shapley value is to consider all possible feature combinations and calculate the marginal contribution of each feature when added to a mix of other features. By taking the average value of the existing contributions to all possible feature combinations, you will get Shapley values for each feature.

Model Limitation

As previously explained, the classification model that has been built is very good at predicting churn customers and generalizing existing datasets. Even so, this algorithm model has limitations in some cases that are difficult to predict. This has also been stated in the Exploratory Data Analysis section and a number of observational samples will be seen that are included in the edge cases section.

There are several conditions that will be used to classify the observation sample which includes edge cases, one common approach is to get the sum of Shapley values from all features in each observation that is around zero.

At least 15 observation samples were found which were considered the most extreme edge cases. In these cases it is certain that the classification model will have very wrong prediction results because the features that generally have a large contribution have turned into decisive features. Then an examination process will be carried out on the 5 observation samples that fall into the edge cases category.

It can be seen that in the 5 examples of edge cases above, the relationship given to the prediction results can be seen based on color, where red makes a positive contribution to customers who churn and blue makes a negative contribution or vice versa. The cumulative Shapley value obtained is also of 0 indicating that these observation samples cause the developed algorithm model to produce predictions that are not much different from the original guess or random prediction.

The analysis process in this way can be used by companies to detect which cases are considered edge cases so that an additional, more in-depth approach is needed besides using the classification model which was built to make predictions.

Conclussions

The premise of the development results of the developed classification model can be seen from the following classification report.

Based on a positive recall value, the algorithm model is able to predict as much as 93% of all customers who churn even though it has a fairly low precision value, which is 43%. This means that the algorithm model has an accuracy in predicting churn customers by 58%. The algorithm model built is indeed focused on predicting as many churn customers as possible compared to the quality of accuracy in predicting for all observations. This decision was taken based on the previously formulated business context that retaining customers who have the potential to churn is far more important than predicting customers who will not churn. This decision is also supported by the journal Reichheld, F.F. and Sasser, E. (1990) Zero Defects: Quality Comes to Services. Harvard Business Review, 68, 105-111 regarding the importance of reducing the churn rate which has an impact on revenue and profitability of the company. The calculation obtained can be described as follows.

If a company wants to make a special offer or bonus of 10% of the average price in each category of internet service features for all customers without using the prediction results of the algorithm model, then the calculation of the costs incurred by the company is as follows:

Company oughts to allocate minimum capital for discount campaign of 6109 to all customers.
This leads to marketing costs with no return on investment (ROI) of 4316 or 71 % in costs inefficiency.

If a company wants to make a special offer or bonus of 10% of the average price in each category of internet service features for all customers who are predicted churn by the classification model, then the calculation of the costs incurred by the company is as follows:

Company oughts to allocate minimum capital for discount campaign of 3330 to all churned customers predicted.
This leads to marketing costs with efficiency of 50 % and retains 93 % of all potentially churned customers.
In fact, this will save costs of 2779 or 64 % of unmeasurable costs.

Based on the calculations above, it is found that the algorithm model built is able to save up to 64% of the allocation of wasted funds or unmeasurable costs which are calculated from the total allocation of funds for promotional activities without involving the machine learning model. On the other hand, the algorithm model is able to predict 93% of potential customers will churn.

Even so, the algorithm model that was built produces drawback with its inability to detect all customers who churn. Therefore, it takes a long process and input from domain experts to continue to develop the classification model so that it can produce even better predictions for the company's business interests.

Recommendations

The development and evaluation of algorithm models must continue to be carried out in order to be able to answer the needs and challenges in the dynamic business world. Things that can be done include the following:

Data Requirements: New relevant features are needed to be able to develop an effective classification model. Among them are customer demographics, product usage traffic, transaction history, customer interactions and feedback, and others.
Model Deployment and Monitoring: The classification model must be tested in a logical and precise sequence in production environment, such as A/B Testing. Further discussion is needed regarding integration with existing systems, compliance with scalability, and real-time predictions. Therefore, it is important to always monitor on an ongoing basis and carry out the training process periodically to ensure that model performance will remain accurate and relevant.
Customer Segmentation: The results of existing predictions can be developed to perform customer segmentation based on similarities in certain features. This can add insights as well as a more flexible strategy in making approaches and offerings related to other products owned by the company. By doing so, it is expected that there will be an increase in the level of satisfaction felt by customers.
Revenue Leveraging: A special approach is required for customers who do not use the service at all. Companies must maximize this by providing trial and limited programs so that customers can experience the quality of the service they have so that it will increase the possibility of using the service outside of the trial period.
Debugging: The algorithm model will always have drawback which must be mitigated by experts who handle it. Among them are understanding the limits of the capabilities of developed classification model by continuously analyzing inaccurate prediction results and using them as evaluation material to train existing algorithm models or conducting trials with other algorithm models with different characteristics so that they can fill in the deficiencies of the main model.
Intervention Timing: Detect customers who churn are closely related to the time of intervention. It would be more suitable if the developed algorithm model produces predictions in the form of probabilities such as Shapley values. That way, the approach taken does not always have to be in the form of special offers or bonuses, but rather to understand the dynamic behavior that can be detected by the classification model so that the approach taken will have a proportional level, such as solutive communication and customer-care oriented.

By applying the right recommendations, the developed classification model can be a valuable tool and asset for companies to take proactive, measurable steps in retaining customers and increasing customer satisfaction. Thus, every step taken must be based on data so that each evaluation and development process can be more measurable.

intentionally left blank

Please find codes detailed on Github channel.

Github