Comparison of data changes between data loads

francoisvv · ‎07-24-2024

We are getting more and more requests from clients to be able to see the data changes between data refreshes. My idea was to firstly load both the old and new data into the same elasticube, and then union the data into single tables with a flag for "old" and "new", and then it will be easy to switch between the old and new data. However, any ideas on how to flag the data changes without having to write variable by variable comparisons and flagging? Looking forward to some creative ideas.

Liliia_DevX · ‎07-24-2024

@francoisvv hi!

Please check if the Accumulative build type could help to store the historical data in the easier way for you. Please refer to the 'Accumulate By' section on this guide: https://docs.sisense.com/main/SisenseLinux/elasticube-build-settings.htm#AccumulateBy

Best Regards,
Lily

francoisvv · ‎07-24-2024

Thanks so much for this valuable information. I just checked on our instances, and were able to enable it, however, from what I see, it does not take into consideration records that are unique. Do you perhaps know how it identifies the new records and is this done on a dataset level or can it be specified by a criteria, for example new records added or changed by an employee or subject?

Liliia_DevX · ‎07-25-2024

@francoisvv you're welcome! Hope it helps to achieve your goal 🙂

To make the accumulative build work properly, you need to select a specific integer or date column to act as an index, and if the index value is greater in a subsequent build, then data is accumulated for the table. Is there any id or date column available in your dataset that could be used as an index for accumulation?

Best Regards,
Lily

francoisvv · ‎07-26-2024

Hi Lily,

Thanks so much, I played around with it a bit, but I think my use case requires me to be able to dive a bit deeper into the changes itself. Although the accumulation does help, I want to be able to identify not only new / changed records in a dataset level, but by a row level, drilled down to a subject level. So if there is 10 subjects in the first data load, each with 2 records, and with the next load, 5 of the subjects have 2 new records loaded each, each of the new records should be flagged as new, and if 2 of the original subjects have had 3 variable values changed or updated, to be able to flag that record as changed based on all the fields that changed, not just a date. Hope this makes sense! I am busy playing around with flagging with SQL as I can use more defined group by values so will let you know what I manage with that.

Liliia_DevX · ‎07-26-2024

@francoisvv Got it! Thanks for the additional details. If you find a SQL solution for your use-case, please share it with us!

Please also take a look at the post following the link below. It provides an example on how to see the historical data per the dimension and not just the current state: https://community.sisense.com/t5/knowledge/slowly-changing-dimension/ta-p/8925

Best Regards,
Lily

DRay · ‎08-01-2024

Hello @francoisvv ,

I wanted to follow up to see if the solution offered by @Liliia_DevX worked for you.

If so, please click the 'Accept as Solution' button on the appropriate post, that way other users with the same questions can find the answer. If not, please let us know so that we can continue to help.

Thank you.

David Raynor (DRay)

Sisense Community

Comparison of data changes between data loads