Segmentation: Cluster And Linear Discriminant Analysis

Nov 9, 2021
6 min read

A Way to Better Targeting

Simply put, segmentation means dividing a our targeted customers into groups with distinct characteristics, so we can approach them most effectively. The characteristics most commonly used for market segmentation in research are demographics, attitudes, values, habits, motivations, and lifestyle. The basic idea is to identify clusters of people who are share similar characteristics among themselves. This knowledge can help us figure out to whom, how, where and when we should offer our products or services. While segmentation is a really broad topic, in this post we will try to scratch the surface, navigate the reader through the most important concepts, cover some good practices, and give a couple of tips on avoiding common pitfalls.

Key metrics: What do I need?

There is no straightforward recipe here. However, the topics you should consider when planning a segmentation study are category/brand usage frequency, media consumption habits, life stage, purchasing behavior, sources of information (what influences customers along their path-to-purchase), life values, and lifestyle. What we find especially useful when doing segmentation is including the needs that people satisfy by purchasing a product or service. In our experience, segmentations based on needs tend to be informative and actionable – meaning that the insights gained from them can easily be translated into concrete strategies. Here is an example of a question assessing needs that person is satisfied by using telecom services:

„Which of the following aspects of the telecom service is the most important to you personally? 1) to get the things done 2) to entertain me 3) to increase my social interactions 4) to feel less worried about people I care for 5) to organize life efficiently 6) to impress others 7) to learn 8) to keep up with the news 9) to have a sense of control 10) not to feel alone“

The main concepts: What should I know?

Clustering

Several analytical approaches can be used for segmentation, but cluster analysis is the most common one. Cluster analysis itself can also be done using several algorithms. The most popular is K-means clustering. The basic principle of K-means is partitioning customers into clusters based on the set of questions that we decided to base our segmentation on. The algorithm partitions our sample so that each member of our sample is assigned to the cluster with the nearest centroid. Centroid is simply the mean of all the input questions taken together. The procedure is iterative and based on minimizing the distance between each member of the sample and the nearest centroid. The outcome of the k-means clustering is cluster membership and the distance from the centroid.

Using the results with new data

Once we have our segments ready, we can go one step further and conduct linear discriminant analysis (LDA) to determine which questions had the most important role in defining the segments. When we find out what those defining questions are, we can use them in future studies and reconstruct the segments based on LDA coefficients. Using LDA will reduce the accuracy of the clustering in the follow-up studies, but still enables us to approximate the segment membership without asking respondents the whole set of questions.

Challenges of segmentation

So far, so good – clustering doesn’t sound that complicated. We just decided on the variables we want to base our segmentation on, input them into a clustering algorithm, and we get our clusters. However, segmentations have a reputation of being somewhat challenging to do. There are a couple of reasons for this. First of all, segmentation results frequently serve as the pivotal point for the entire brand strategy. So, if the results are not in line with the brand management’s vision, it can cause a some friction. The second reason is the fact that there is no single correct approach when it comes to segmentation. Successful segmentations require both method expertise and domain knowledge. The third reason lies in the complexity of the interpretation of the results. For example, even the number of segments that we will settle on is subject to interpretation. Those are just some of the common causes that lead to unfulfilled expectations and dissatisfaction of the stakeholders. "Report on the shelf" is a common term that describes the fate of many segmentation analyses. In order for our segmentation to avoid this fate, there are a couple of steps we can take.

Deciding on the number of segments

Firstly, we can be careful about how we decide on the number of segments. There are several computational procedures for the assessment of the optimal number of clusters. An interested reader can search for hierarchical clustering with the Ward method (available in most statistical software packages) or simulation/model-based approach (available in Mclust R package). However, in our experience, none of the computational methods can supersede thoughtful interpretation of several clustering options and choose the most useful and interpretable one. Good solutions usually have 3 to 8 clusters. Determining the number of segments arbitrarily may sound like cheating, but remember that you are not changing anything in your data. You are just trying to find a model that best fits your business reality.

Interpreting the segments

After deciding on the number of clusters comes the even trickier part – interpretation. Unfortunately, there are no shortcut for this. First, make a comprehensive cross-tabulation with cluster membership in columns and questions in rows. Make notes about all relevant significant differences (see our post about A/B testing) between the segments. Based on the notes, you should start profiling your segments. Try to find segments that are: 1) distinctive (qualitatively different, not just points in the continuum) and 2) targetable (can be reached by a market intervention). Organize a workshop with colleagues and brainstorm the ideas. Do not be shy to share your dilemmas with the clients. They can provide beneficial insights and domain knowledge. The best segmentation studies arise due to synergetic collaboration between analysts (method experts) and clients (domain experts).

Communicating the segments and creating personas

Once you have your interpretations of the clusters, you should communicate them effectively. You will end up with many details about demographics, attitudes, values, habits, motivations, and lifestyle. You have to wrap them all up into a coherent story about the people in the clusters. Come up with the names of the clusters that are concise, accurate, and memorable. Sometimes it is handy to have personas. Personas are the best (most typical) representatives of the clusters. To identify personas, you can look at the respondents with the lowest distance from the cluster centroid and read the characteristics of those respondents. You should give them some concrete names (Mike, George, Melisa) and even search for the images of the people that match the characteristics you have in mind. It is a creative process meant to bring the results closer to your clients.

The interpretation of the outcome: How should we read it?

Researchers frequently display segmentation results using bubble charts. Bubble chart allows us to see the relative position of the segments against the two most important discriminant functions. Without going into too much detail, discriminative functions are sets of coefficients assigned to our data points (respondents, in our case) to achieve the best possible classification. We use the first and second discriminative function as X and Y axes, respectively, and plot our clusters for visualization purposes. The centers of the bubbles in the chart show the cluster's best representatives (respondents with the shortest distance from the cluster centroid).

The volume of the bubbles usually represents the size of the segments (number of people), which should serve as an approximation of each segment's market share.

In our example dashboard, you can switch between market share and value share by using the radio button in the top left corner. Value share is the calculated proportion of the typical spending on the category product or the service. We use value share to estimate the profitability of the segments. The difference between market share and volume share is very important when choosing the segment on which to focus. The selection of the segment or segments should be in line with the broad business strategy but, as a rule of thumb, we can say that we should choose segments that are a) large enough and b) profitable enough.

Bubble chart, along with its advantages, has one important disadvantage. It isn't easy to gauge the differences in bubble sizes visually. Our advice is to always accompany the bubble chart with a simple side-by-side bar chart that allows one to observe the magnitude of differences quickly and precisely.

Key take-aways: How can I use it?

When you are discussing the strategy of the product or service, keep in mind your personas. It helps to focus your efforts and address the needs of a specific targeted customer instead of the vague intuition about the market structure.
Include the segments in the other surveys. For example, when you do a concept or product test, include the critical questions you get from the LDA and see how the product or the concept resonates with your targeted audience.
If you have a database of your customers (from CRM, for example), extrapolate the results from the segmentation study (data fusion) using the key questions identified by LDA (assuming that you have those questions in CRM). That will allow you to increase the personalization of the relationship with the customers. Offer what they need, in a way that they want, through the channel that they follow.