Working with 2 fact tables which have a direct 1:M relationship, and fetching measures from both. A common example from the retail industry is Order Header and Order Line Items tables
A standard connection could create data duplication, if :
Any measures are taken from the Header table while a dim (Filter \ category) is taken from the Line Items table.
Or similarly, a widget containing measures from the Order Header & Line Items.
Creating a Consolidated Fact table:
Create a custom SQL table in which the two tables will be joined, so the new table is on the line item’s level, but containing all of the respective header record’s attributes.
The measures from the line items table will be brought untouched into the new tables, while measures from the Header table will be broken down and distributed to all the line items that it consists of (each line item will hold a part; The original value will be retrieved from the sum of all line items’ values).
The common distribution methods are:
Distribute the values equally between the rows
Distribute the values by each row’s proportional contribution to the header’s total
Example: We have the following orders tables:
Create a custom table based on the original Header and Line Items tables according to the desired distribution method above.
In this example, the SQL expression would look like this:
H.Freight / OT.[Items In Order] AS [Freight (Distributed Equally)],
/*Proportional Distribution (By Line Total) */
H.Freight*(LI.[Line Total] / OT.[Total Amount]) AS [Freight (Proportional)]
FROM [Order Header] H JOIN [Order Line Items] LI ON
H.[ORDER ID] = LI.[ORDER ID] JOIN
/*Calculate each order's total items and amount*/
(SELECT [Order ID], count(*) [Items In Order], sum ([Line Total]) [Total Amount]
FROM [Order Line Items]
GROUP BY [Order ID] ) OT
ON H.[Order ID] = OT.[Order ID]
Connect the new master table to all relevant dimensions and use it like any other fact.
Make sure to make both original tables invisible.
Image 1. Updated schema
Important to Note
This use cases’ data duplication can also be solved on the front-end by use of multi-pass aggregations. However, we highly recommend the modeling approach, as it provides a simpler solution for the dashboard designers and reduces the risk of miss-use.