cancel
Showing results for 
Search instead for 
Did you mean: 
Community_Admin
Community Team Member
Community Team Member

Analytical Need

Market Basket Analysis is a data mining technique that is widely used to identify consumer patterns, such as which items are purchased together, or how the purchase of one item affects the likelihood of another item being purchased.

One of the famous examples, even though it is almost certainly an urban legend, is the story about beers and diapers.  A supermarket chain discovered in its analysis that customers that bought diapers often bought beer as well, so they put the diapers close to beer coolers, and their sales increased dramatically.

Another widely used example of cross selling on the web with market basket analysis is Amazon.com’s use of “customers who bought book A also bought or took an interest in book B”.

Modeling Challenge

In our example, we will analyze the correlation between different products and product categories within a set of orders.

(Sample data files, ecdata and dash, attached below)

We will start by preparing our data model to support this type of analysis and then build a dashboard for our market basket analysis.

Solution

In our example we will use a Microsoft ‘Northwind’ database sample, which holds the sales transactions that occur between the “Northwind traders” company and its customers, as well as the purchase transactions between Northwind and its suppliers.

We will start by demonstrating how to adjust the data model to support market basket analysis by using a custom SQL to join a table to a duplication of itself.

Basic Terminology

  • Items are the objects that we are identifying associations between.
    In our example, an item is a product.
  • Transactions are instances of groups of items coexisting together.
    In our example, each order is a transaction.

Open the Northwind database in the Sisense ElastiCube Manager

Community_Admin_0-1634556348041.png

The ‘OrderDetails’ table holds the order transactions and the products that were purchased in each order.

To analyze if there are any associations between one product or a product category to another, we will create a self-join with the following conditions:

  • Transaction unique identifier = Transaction unique identifier
    In our example, Order ID = Order ID
  • Item <> Item
    In our example, Product ID <> Product ID

Click “Add Data”, and then select the “Custom SQL Expression”.

Community_Admin_1-1634556348058.png

A new window will open, enter the new table name (for example OrderDetailsForAnalysis), and click on the V icon.

Enter the self-join script in the table editor window:

SELECT o1.* ,o2.ProductID AS Product2ID, o2.Quantity AS Product2Quantity 

FROM [Order Details] o1, [Order Details] o2 

WHERE o1.OrderID = o2.OrderID AND o1.ProductID <> o2.ProductID

Community_Admin_2-1634556348016.png

Preview the results and save the table.

A new table is created with no connection to any other table.

Connect the table to the same tables that the original table (OrderDetails) was connected to.

In our example, join the new ‘OrderDetailsForAnalysis’ table with the ‘Products’ and ‘Orders’ tables.

If the item we are analyzing is part of a hierarchy, like in our case where products can be grouped into a category, and we want to use the hierarchy as part of the analysis, we should do one of the following:

  • Bring the hierarchy as part of our new custom table (add the 2nd product name and category name as part of the SQL)
  • Duplicate the hierarchy dimensions table/s, and join the tables with the 2nd item identifier.

In our example, we choose the 2nd option, and duplicate the ‘Products’ and ‘Category’ tables, and join Product2ID from the ‘OrderDetailsForAnalysis’ table with ProductID from the duplicate ‘Products’ table.

Community_Admin_3-1634556348055.png

We can disconnect the original ‘OrderDetails’ table and make it invisible.

Our data model is now ready. Build the cube and open the Sisense dashboard.

Analyze with Reports and Dashboards

You can easily visualize and analyze the correlation between one product to another or one category to another by creating a pivot table that will display the number of occurrences per pair.

Create a new widget, and add the following:

  • CategoryName from the 1st categories table
  • CategoryName from the 2nd categories table
  • Arithmetic function on a field (for example ‘Count All’ function the ‘Quantity’ field) from the transactions table

Click on the advanced configuration at the bottom left.

Drag the 2nd category name from the rows area into the column area.

You can also highlight some of the cells to highlight important information for the dashboard’s consumers. For example, mark the top correlation per each one of the different product categories.

Click on the color button of your measure.

A new window opens.

Click on the conditional tab, and set the formula to the max of occurrences.

Our product category correlation table is ready. Each number inside the table represents the number of individuals who have purchased both from the rows product category and the column product category.

Straight away, you can see that many of the categories are correlated with the beverage category, but beverage is the most correlated with confectionary and seafood.

Note, that since we are analyzing the product category and not the products themselves, we can have a situation where an individual purchased two different products from the same category, and this is why we have numbers greater than 0 where the rows and the columns category name is identical.

You can easily create different kinds of reports and graphs to visualize and assist you in your analysis.

Several examples are presented below:

  • Stacked column chart, that easily summarizes the contribution of each item (product/ category) to another
  • Tree map chart per each item (product/ category)
  • Pie chart that will display the correlated product/category distribution and can be affected by product/category filter selection within the dashboard

 Attachments

563 KB - Northwind.ecdata

.sdata (for Sisense Linux) for the NorthWind elasticube and the csv files for the source table are attached.

Rate this article:
Version history
Last update:
‎02-23-2024 10:50 AM
Updated by:
Contributors