cancel
Showing results for
Did you mean:

# Running Sum on Distinct Values

Community Team Member

## Analytical Need

A common use case is to know the number of new users in the system.
It could be that same user appear in various months but we want to count each unique user only once.
If we will use a simplistic approach of count distinct calculation on each month and then SUM the results (for a running sum), we will be including the same user more than once (for example, if the user appeared on January and also in March I would like to count him only once).

## Solution

As an example, let's look at the following sample set: this is the fact table called data.
1. Generate a custom table with the earliest date for each User and call it EarliestDate
2. Add a Key column from date and user.
3. Add a column flagEarliest with the value 1.
SELECT
d.[User]
,min(d.[Date])
,concat(tostring(min(d.[Date])),d.[User]) Key
,1 flagEarliest
FROM [Data] d
GROUP BY d.[User]
4. Add a similar Key to the data table from date and user.
Key = concat(tostring(date),user)
5. Add a custom table that will left join the tables: data and EarliestDate. Call it DataWithEarliestDate
SELECT
d.[Date]
,d.[User]
,e.[flagEarliest]
FROM [data] d
LEFT JOIN EarliestDate e
ON d.key = e.[Key]
Order By d.[Date]

6. Hide the data and EarliestDate tables. We will not be needing them on the dashboard.
7. The DataWithEarliestDate table should look similar to this:
8. We can now perform a simple SUM on the flagEarliest column and the result would be the Running Sum of the distinct users.
To see the above in action, download the dash file, the smodel and the CSV.
Version history
Last update:
‎03-02-2023 08:51 AM
Updated by:
Contributors
Community Toolbox

Recommended quick links to assist you in optimizing your community experience:

Product Feedback Forum: